Xq22 deletions and correlation with distinct neurological disease traits in females: Further evidence for a contiguous gene syndrome
Abstract
Xq22 deletions that encompass PLP1 (Xq22-PLP1-DEL) are notable for variable expressivity of neurological disease traits in females ranging from a mild late-onset form of spastic paraplegia type 2 (MIM# 312920), sometimes associated with skewed X-inactivation, to an early-onset neurological disease trait (EONDT) of severe developmental delay, intellectual disability, and behavioral abnormalities. Size and gene content of Xq22-PLP1-DEL vary and were proposed as potential molecular etiologies underlying variable expressivity in carrier females where two smallest regions of overlap (SROs) were suggested to influence disease. We ascertained a cohort of eight unrelated patients harboring Xq22-PLP1-DEL and performed high-density array comparative genomic hybridization and breakpoint-junction sequencing. Molecular characterization of Xq22-PLP1-DEL from 17 cases (eight herein and nine published) revealed an overrepresentation of breakpoints that reside within repeats (11/17, ~65%) and the clustering of ~47% of proximal breakpoints in a genomic instability hotspot with characteristic non-B DNA density. These findings implicate a potential role for genomic architecture in stimulating the formation of Xq22-PLP1-DEL. The correlation of Xq22-PLP1-DEL gene content with neurological disease trait in female cases enabled refinement of the associated SROs to a single genomic interval containing six genes. Our data support the hypothesis that genes contiguous to PLP1 contribute to EONDT.
1 INTRODUCTION
Xq22 encompasses an approximately 10 Mb genomic interval on the X chromosome in which ~100 annotated genes map; PLP1 is one of the 14 disease-associated genes mapping to this interval. It is a dosage-sensitive gene as evidenced by both copy-number gains and copy-number losses resulting in disease traits (Inoue, 2005, 2017; Torisu et al., 2012; Woodward et al., 2005). In addition, to copy number variants (CNV), rare simple nucleotide variants in PLP1, including nucleotide substitutions, small insertions or deletions (indels), and regulatory region alterations, have also been associated with neurological disease (Inoue, 2005, 2017; Lee, Madrid et al., 2006; Osorio & Goldman, 2018). Duplication of PLP1 is the most commonly observed variant allele in clinically ascertained cases and it is usually associated with Pelizaeus–Merzbacher disease (PMD; MIM# 312080), a dysmyelinating leukodystrophy affecting the central nervous system with characteristic clinical features including nystagmus, spastic quadriplegia, ataxia, and developmental delay, in males. In contrast to duplications, the complete or partial deletion of PLP1 is the rarest mutational event being observed in less than 5% of clinically ascertained cases with PLP1 variants.
Hemizygous Xq22 deletions encompassing PLP1 are typically small in size (<600 kb) relative to other PLP1 CNVs, such as heterozygous deletions and duplications, and they can be associated with PMD or with spastic paraplegia type 2 (SPG2; MIM# 312920) a progressive lower extremity spasticity disease. Hemizygous deletion, in addition to central nervous system disease, can also involve peripheral neuropathy in some cases (Garbern et al., 1997; Inoue, 2005, 2017). In the heterozygous state, deletions are notable for extensive variability in their size (range: 33 kb–5 Mb) and the associated phenotypes are characterized by variable expressivity of neurological disease ranging from a mild age-dependent penetrant form of SPG2 to a severe early-onset neurological disease trait (defined here as EONDT; Inoue et al., 2002; Matsufuji et al., 2013; Raskind, Williams, Hudson, & Bird, 1991; Torisu et al., 2012; Yamamoto et al., 2014). The associated EONDT is characterized by severe infancy-onset developmental delay/intellectual disability (DD/ID), behavioral abnormalities, hypotonia, strabismus, and potentially an emerging recognizable pattern of craniofacial dysmorphology (Yamamoto et al., 2014).
To date, nine nonrecurrent Xq22 deletions encompassing PLP1 (Xq22-PLP1 deletions) have been reported, of which five were identified in hemizygous male probands and their less severely affected heterozygous female relatives, while three were identified in heterozygous female probands with EONDT and one was identified in a heterozygous female proband with an unclassified severe DD/ID phenotype (Inoue et al., 2002; Matsufuji et al., 2013; Raskind et al., 1991; Torisu et al., 2012; Yamamoto et al., 2014). Notably, none of the published families with EONDT affected females have reported male carriers of the same Xq22-PLP1 deletion. The increased severity and distinctive disease nature observed in EONDT female cases, as opposed to the mild SPG2 manifestation in affected female relatives of male probands, were attributed to the unique gene content of large Xq22 deletions found exclusively in female cases. Two smallest regions of overlap (SRO) within large deletions were suggested to underlie the disease manifestation, hg19.g.chrX:102,233,526–102,957,289 (~730 kb) and hg19.g.chrX:102,993,719–103,982,269 (~990 kb; Yamamoto et al., 2014). The study of additional rare Xq22 deletion CNV in unrelated families and EONDT cases in females may facilitate further understanding of this disease-gene locus and disease-trait manifestation as well as provide insight into sex influenced neurodevelopmental disease expression.
SV mutagenesis mechanistic studies of Xq22 duplication and triplication CNVs suggest a role for low-copy repeats (LCRs) in stimulating rearrangement formation via replicative repair mechanisms (Beck et al., 2015; Carvalho et al., 2011; Carvalho et al., 2012; Lee, Carvalho, & Lupski, 2007; Zhang et al., 2017). Genomic studies of five Xq22-PLP1 deletions did not reveal evidence for the potential involvement of LCRs or other paralogous genomic structures that might render genomic instability (Inoue et al., 2002; Matsufuji et al., 2013; Torisu et al., 2012). From breakpoint-junction studies, microhomology was the only persistent finding at Xq22-PLP1 deletions junctions allowing one to surmise potential nonhomologous recombination mechanism as contributing to deletion formation (Inoue et al., 2002). Although previous molecular studies of Xq22-PLP1 gains, duplications, and triplications, have provided data revealing PLP1 is a dosage-sensitive gene and some insight into genomic instability and rearrangement mechanisms, more molecular and clinical data from rare Xq22-PLP1 deletion cases in both males and females are warranted to infer potential upstream mechanisms for Xq22 deletion CNV formation and downstream genic mechanism(s) mediating trait manifestation.
We investigated eight unrelated patients carrying Xq22 deletions that partially or fully encompass PLP1 (five female probands and three male probands). We describe the clinical details of affected females manifesting EONDT and examine potential genotype/phenotype correlation by mapping the breakpoint-junction and studying the size, extent, and gene content of their deletions. At the molecular level, we show the results of an analysis of the genomic features from 17 individuals, including nine published cases, carrying Xq22-PLP1 deletions. We computationally and experimentally define a novel class of intrachromosomal repeats and provide evidence for intrinsic genome architecture fomenting genomic instability. These data implicate the involvement of large inverted repeat (IR) structures and short tandem repeats (STRs) in the formation of Xq22-PLP1 deletions. Our genotype/phenotype correlation findings support the contention that deletion gene content may provide prognostic information in female cases.
2 MATERIALS AND METHODS
2.1 Editorial policies and ethical considerations
The clinical information for families HOU1008, HOU1025, and HOU2971 was ascertained and studied using protocol H-16213 approved by the Institutional Review Board for research involving human subjects at Baylor College of Medicine while the information for families HOU3003, HOU3004, and HOU4615 was ascertained using protocol IRB#2002-142 approved by the Institutional Review Board for research involving human subjects at Nemours/Alfred I. duPont Hospital for Children. Cases BAB2615 and BAB2614 were ascertained through studies of anonymized data from clinical microarray CMA screening at Baylor Genetics (Institutional Review Board approved protocol H-36612). Informed consent for inclusion in the study was obtained from all adult participants and from parents/guardians of minors. This study was conducted in accordance with the Declaration of Helsinki.
2.2 Subjects
Eight cases were investigated, five of which were unrelated affected female probands, BAB2595/FamilyHOU1008, BAB2614, BAB2650/FamilyHOU1025, BAB8120/FamilyHOU2971, BAB12522/FamilyHOU4615 while three were unrelated affected male probands, BAB2615, BAB8201/FamilyHOU3003, and BAB8204/FamilyHOU3004 (Figure 1). Clinical information was available for six of the eight cases (four females and two males), BAB2595, BAB2650, BAB8120, BAB8201, BAB8204, and BAB12522. Genomic DNA for all experiments was isolated from blood according to standard procedures.

2.3 Array comparative genomic hybridization (aCGH)
Two tiling-path high-density custom oligonucleotide microarrays were designed. The first interrogates the copy-number of a 15.5 Mb segment encompassing PLP1 on chromosome Xq22. A total of 40,208 60-mer oligonucleotide probes spanning chrX:97,915,511–113,400,000 (NCBI build 36) were selected with an average distribution of one probe per 386 bp (format 4 × 44 K) using the Agilent eArray website (https://earray.chem.agilent.com/suredesign/; Carvalho et al., 2012). The second interrogates the copy-number of an ~10 Mb segment encompassing PLP1 on chromosome Xq22 with a total of 24,643 60-mer probes spanning chrX:97,835,000–107,855,000 (NCBI build 35; format 4 × 44 K; Lee et al., 2007). The experimental procedures were performed according to the manufacturer's protocol (Agilent Oligonucleotide Array-Based CGH for Genomic DNA Analysis, Version 7.2; Agilent Technologies) with some modifications as described (Carvalho et al., 2009, 2011). Sex-matched controls (GM10851 as male control and GM15510 as female control, both from Coriell Institute) were used for all samples. Genomic copy number at a given locus, or map position as evidenced by interrogating oligonucleotide probe, was defined by analysis of the normalized log2 (Cy5/Cy3) ratio average of the CGH signal. Agilent Feature Extraction software (version10) and Agilent Genomic Workbench (version 7.0.4.0) were used to process scanned array images and analyze extracted files, respectively. Average log2 ratio thresholds of −1 and −4 were considered as heterozygous female and hemizygous male deletions, respectively.
2.4 PLP1 sequencing
Coding exons and intron 3 of PLP1 (NM_000533.4) were amplified via PCR in all five female cases harboring heterozygous PLP1 deletion. PCR products were purified and analyzed by Sanger di-deoxy DNA sequencing (primer sequences are provided in Table S1). Variants with minor allele frequency (MAF) of ≤0.001 in the 1000 Genomes Project database and the genome aggregation database (gnomAD) were considered rare (Genomes Project et al., 2015; Karczewski et al., 2019).
2.5 Mapping breakpoints and characterizing microhomology/microhomeology
Deletion breakpoint-junctions, or “join-points” where two discontinuous sequences (i.e., rearrangement substrate sequences) in the human reference genome are joined during the rearrangement process, were amplified for all eight cases, using inward-facing primers for PCR amplification, and Sanger sequenced. Primers were designed at the apparent boundaries of each deleted segment as defined by transitions from copy-number neutral to copy-number loss by aCGH analysis (primer sequences are provided in Table S1). Junction sequences were screened for single nucleotide variants (SNVs) flanking the junction. The identified Xq22-PLP1 deletions and SNVs detected in their junction sequences were deposited in the ClinVar database (https://www.ncbi.nlm.nih.gov/clinvar/) with accession numbers, SCV000995898 - SCV000995911.
The two reference genome loci to which the proximal and distal breakpoints mapped were extended and aligned to examine for microhomology/microhomeology at the join-points and sequence similarity flanking the join-points. Nucleotide identity of 100% between the proximal and distal reference strands leading to the junction was considered microhomology when found in a subtractive state in the patient's DNA sequence (i.e., identical sequence found once only in the patient's DNA at the deletion junction; Carvalho & Lupski, 2016). Microhomeology was defined as imperfect matches leading to the junction (cutoff of 70% identity with a maximum two-nucleotide gap followed by at least two perfectly matched nucleotides; adapted from Beck et al., 2019; Liu et al., 2017). Microhomeology at the breakpoint-junctions was recently reported as a feature associated with the replication-based mechanism of SV formation (Bahrambeigi et al., 2019; Liu et al., 2017). We also analyzed the similarity of DNA sequences surrounding breakpoint-junctions by alignment and similarity computation using the R Biostrings package in samples that had <220 bp of microhomeology on either side of the junction (adapted from Gu et al., 2015); 150 nucleotides of the reference sequence were obtained on either end of a breakpoint-junction and the two 300-nucleotide reference sequences were then centralized at the junction-microhomology or blunt end and aligned together using the Needleman–Wunsch algorithm Biostrings package. Sequence similarity was calculated within a 20 bp moving window as the percentage of aligned bases over the total count of nongap sequences, for which orientation relies on the alignment with sample sequence. A heatmap was plotted for each event.
2.6 Identifying and analyzing highly-similar intrachromosomal repeats (HSIRs)
Computational analyses of the haploid reference human genome were performed to identify highly-similar repeated sequences across the human genome that may contribute to local genomic instability. Our analyses aimed to identify all >700 bp intrachromosomal repeat sequences that are highly similar (95–100% identical) in the human genome reference assembly (GRCh37/hg19). We used the Local Alignment Search Tool, blastZ-like (LASTZ) algorithm (release 1.02.00) that performs pairwise alignment and sequence comparisons of large DNA sequences to identify highly similar intrachromosomal repeats (HSIR) (http://www.bx.psu.edu/∼rsharris/lastz/) genomic DNA sequence comparisons (http://www.bx.psu.edu/~rsharris/lastz/). This approach is distinct from other methods of repeat structure detection such as utilized for PMD-LCRs (Lee, Inoue et al., 2006), Segmental Dups (Bailey et al., 2002; Bailey, Yavor, Massa, Trask, & Eichler, 2001), Self Chain (Chiaromonte, Yap, & Miller, 2002; Kent, Baertsch, Hinrichs, Miller, & Haussler, 2003; Schwartz et al., 2003), and Repeat Masker (Jurka, 2000) in that it: (a) detects repeats by intrachromosomal pairwise alignment (i.e., self-comparison of each chromosome in the reference assembly); (b) imposes no size restriction on detected repeat fragments; (c) allows 95–100% identity levels between pairs; (d) involves no filtering or masking of repetitive elements and common repeats; and (e) imposes no distance restrictions between matching pairs. Using this approach, we identified a total of 58,469 HSIR pairs with a median size of 6 kb (minimal = 793 bp; maximal = 406,322 bp) and an average identity of 96.8%; of these pairs, 29,503 are in direct orientation and 28,966 are in inverted orientation in the hg19 reference assembly. The HSIR dataset is deposited as Supporting Additional Files S1 and S2, direct and inverted pairs, respectively.
We then further investigated HSIRs for coinciding deletion breakpoints (base pair-level resolution of breakpoint coordinates was used when available or aCGH-derived approximated coordinates depending on availability of the junction sequence); a total of six HSIRs were identified coincident with deletion breakpoints and these HSIRs were annotated with respect to four repeat datasets: (a) the PMD-LCRs dataset for LCRs at the PLP1 locus (chrX:98,551,855–106,551,854, coordinates lifted to the GRCh37/hg19 build); (b) the Segmental Dups dataset for genome-wide identified paralogous repeats; (c) the SelfChain dataset for gapped genome self-alignment; and (d) the RepeatMasker dataset for SINEs, LINEs, Alu elements, and other repetitive elements in the genome.
Breakpoints were also examined for repeat structures that were not part of the HSIR dataset but were found in any one of the other four repeat datasets, namely the PMD-LCRs dataset, the Segmental Dups dataset, the SelfChain dataset, and the RepeatMasker dataset. Details of breakpoint-coinciding HSIRs and other breakpoint-coinciding repeat structures are provided in Table S2.
2.7 Non-B DNA motifs
Three test/control groups of genomic intervals near the cluster of Xq22-PLP1 deletion breakpoints were examined for the presence of seven non-B DNA motifs via the publicly available non-B database (non-B DB) and its associated tools (Cer et al., 2013; https://nonb-abcc.ncifcrf.gov/apps/nBMST/default/). The three tested groups were: (a) the reference sequence embedded within the proximal and distal halves of the HSIR RepX-i1010 (chrX:102,874,647–102,944,647 and chrX:102,944,648–103,014,647); (b) the reference sequence embedded within 140 kb of RepX-i1010 (chrX:102,874,647–103,014,647) as well as two similarly sized control fragments flanking the start and end of this repeat structure region (chrX:102,734,646–102,874,646 and chrX:103,014,648–103,154,648); and (c) the reference sequence embedded within the identified 90 kb genomic instability hotspot (chrX:102,954,647–103,044,647), as well as two similarly-sized (90 kb) control fragments flanking the start and end of this hotspot region. The distribution of a motif in any one of the investigated regions was defined as the number of motifs start sites found in 10 kb moving frames within the region; the seven-frame distribution of the distal half of RepX-i1010 was compared to the seven-frame distribution of the proximal half, the 14-frame distribution of RepX-i1010 was compared to the two sets of 14-frame flanking control distributions, and the nine-frame distribution of the hotspot was compared to the two sets of nine-frame flanking control distributions. Statistical conclusions were based on the results obtained from the Wilcoxon Rank Sum double-sided or one-sided test considering nonnormal distributions. A heatmap plot was generated via the heatmap.2 function from the CRAN gplot R package for quantitative visualization of the three distributions.
2.8 X-chromosome inactivation (XCI)
Genomic DNA isolated from blood was used for implementing XCI studies according to an established protocol with modification (Allen, Zoghbi, Moseley, Rosenblatt, & Belmont, 1992). This protocol is designed to detect methylated HpaII and HhaI sites near the polymorphic CAG repeat in the human androgen receptor gene, AR (MIM# 313700). Sequences were analyzed using GeneMapper v3.7 (Applied Biosystems) software.
A glossary of abbreviations is provided in Table 1
Acronym | Expansion |
---|---|
aCGH | Array-comparative genomic hybridization |
AAMR | Alu-Alu mediated rearrangement |
APR | A-phased repeat |
BIR | Break-induced replication |
BAEP | Brain auditory evoked potential |
CC | Corpus callosum |
CNV | Copy-number variant |
CP | Cerebral palsy |
DSB | Double-stranded break |
DD/ID | Developmental delay/intellectual disability |
DR | Direct repeat |
EEG | Electroencephalogram |
EONDT | Early-onset neurological disease trait |
FoSTeS | Fork stalling and template switching |
GERD | Gastroesophageal reflux disease |
GI | Genomic instability |
GQ | G-quadruplex |
HSIR | Highly similar intrachromosomal repeat |
IR | Inverted repeat |
LCR | Low-copy repeat |
LINE | Long interspersed nuclear element |
LoF | Loss of function |
MAF | Minor allele frequency |
MIM | Mendelian inheritance in man |
MMBIR | Microhomology-mediated break-induced replication |
MMEJ | Microhomology-mediated end joining |
MR | Mirror repeat |
MRI | Magnetic resonance imaging |
NADE | p75NTR-associated cell death executor |
NHEJ | Nonhomologous end joining |
PMD | Pelizaeus–Merzbacher disease |
PVL | Periventricular leukomalacia |
seDSB | Single-ended double-stranded break |
SINE | Short interspersed nuclear element |
SNV | Single nucleotide variant |
SPG2 | Spastic paraplegia type 2 |
SRO | Smallest region of overlap |
SRS | Serial replication slippage |
STR | Short tandem repeat |
SV | Structural variation |
WM | White matter |
XCI | X-chromosome inactivation |
Xq22-PLP1-DEL | Xq22 deletion that encompasses any part of PLP1 |
3 RESULTS
3.1 Clinical findings in the investigated subjects
Four female (BAB2595, BAB2650 and BAB8120, BAB12522) and two male (BAB8201 and BAB8204) probands were examined clinically; pedigrees shown in Figure 1. Clinical features are summarized in Table 2 and further detailed information is provided in Supporting Additional File S3. All-female patients had clinical features consistent with the early onset neurological disease trait reported by Yamamoto et al. (2014) and defined here as EONDT including hypotonia at birth, severe DD/ID, and neurobehavioral abnormalities; BAB2595, BAB8120, and BAB12522 had autistic features and BAB2650 had anxiety. A facial photograph of one subject (BAB2650) has been published (Brender, Wallerstein, Sum, & Wallerstein, 2015) and shows dysmorphic features concordant with what has been observed in three female patients reported by Yamamoto et al. (2014) including the triangular face, broad forehead, strabismus, and prominent jaw; however, another subject (BAB12522) did not have readily apparent dysmorphic facial features. Brain MRIs revealed different degrees of increased T2 signal in the white matter of all four female subjects; BAB2650 and BAB12522 were noted to have a thin corpus callosum.
BAB2595 | BAB2650 | BAB8120 | BAB12522 | BAB8201 | BAB8204 | |
---|---|---|---|---|---|---|
Sex | F | F | F | F | M | M |
DEL size | 2.5 Mb | 693 kb | 5.6 Mb | 3.3 Mb | 6.7 kb | 71 kb |
Gestation | 32–33 weeks | 34–35 weeks | 41 weeks | 39 weeks | Full terma | 41 weeks |
Ageb | 13 years | 9 years | 3.5 years | 8 years | 15 years | 16 years |
DD/ID | Severe | Severe | Severe | Severe | DD | DD |
Behavioral abnormality | ASD | Anxiety | ASD | ASD | Perseverative disorder | None |
Eyes | Strabismus | Strabismus, nystagmus | Strabismus | Strabismus, left amblyopia | Strabismus | Strabismus |
Brain MRI/MRS | Posterior WM signal | Delayed myelination, thin CC, cerebral atrophy | Delayed myelination in parietal/periventricular regions | Diffuse hypomyelination, mild progressive myelination, thin CC, WM atrophy, decreased NAA | Diffuse hypomyelination, thin CC, brain atrophy | Periventricular WM change |
Spasticity | Hypotonia followed by spasticity | Hypotonia mixed with spasticity | Hypotonia followed by dystonia | Hypotonia then spasticity | Spasticity, dystonia | Hypotonia, spasticity |
GI | GERD, poor weight gain, constipation | GERD | GERD | GERD, constipation | Poor appetite, G-tube | GERD, poor weight gain |
Other | NR | Facial dysmorphic features | Abnormal BAEP, dysmorphic features | Seizure, ventricular septal defect, decreased bone mineral density, hypothyroidism | Seizure, prolonged latency in BAEP, peripheral neuropathy | Abnormal EEG, initially diagnosed as CP with PVL |
- Abbreviations: ASD, autism spectrum disorder; BAEP, brain auditory evoked potential; CC, corpus callosum; CP, cerebral palsy; DD, developmental delay; EEG, electroencephalogram; GERD, gastroesophageal reflux disease; GI, gastrointestinal; ID, intellectual disability; NR, not reported; PVL, periventricular leukomalacia; WM: white matter.
- a Specific length not reported.
- b Age at most recently reported clinical evaluation.
The two male patients (BAB8201 and BAB8204) presented with DD and spasticity consistent with SPG2. In addition, BAB8201 had seizures responsive to carbamazepine while BAB8204 had an abnormal electroencephalogram (EEG). Brain MRI was available on BAB8201 in which diffuse hyper-intensity in the white matter on T2-weighted images suggestive of hypomyelination, thin corpus callosum and reduced brain volume were detected. Clinical information from the family of BAB8201 with positive family history was published separately (Hisama et al., 2001).
All six patients had strabismus while BAB2650 was also noted to have had nystagmus. Five out of the six patients (BAB2650, BAB8120, BAB8201, BAB8204, and BAB12522) had gastrointestinal issues including gastroesophegeal reflux disease (GERD) and poor weight gain. Abnormal brain auditory evoked potential (BAEP) was reported in one female case, BAB8120, and in one male case, BAB8201. Information on BAEP testing in the second male patient, BAB8204, was lacking.
3.2 Size, genomic extent, and gene content of Xq22 deletions
Deletions were detected by aCGH and the exact coordinates for breakpoints were determined (Figure 2 and Table 3). The size range in males was determined to be from 6.7 to 854 kb (three male probands) and in females was from 693 kb to 5.6 Mb (five female probands; Table 3). Deletion alignment to the haploid reference genome and mapping of selected genes within the deletion intervals are shown in Figure 3a. Six of the deletions include the entire PLP1 gene (NM_000533.4) while two (in unrelated males, BAB8201, and BAB8204) encompass the first exon and part of intron 1 only (Figure 3b). BAB8201 harbors the smallest of all PLP1 deletions to date (6.7 kb) involving no consensus gene transcripts other than part of PLP1, while BAB2615 harbors the largest hemizygous deletion (854 kb) in which 23 genes in addition to PLP1 are deleted; nine of the deleted protein-coding genes appear to be highly expressed in the brain as found on the GTEx portal (https://gtexportal.org/home/) and as shown in Figure S1. Female subject BAB8120 harbors the largest heterozygous deletion (5.6 Mb) with 63 genes deleted in addition to PLP1, one of which is a gene associated with primary ciliary dyskinesia, PIH1D3 (MIM# 300933; Figure 3a). Physical examination of this patient at 3.5 years of age revealed no clinical evidence for a ciliopathy phenotype that could potentially result from haploinsufficiency of PIH1D3. All five heterozygous deletions (BAB2595, BAB2650, BAB8120, BAB2614, and BAB12522) include GLRA4 that has been previously associated with intellectual disability (Labonne et al., 2016) and notably they all extend further centromeric and include a gene that is differentially expressed in the brain, BEX3 (MIM# 300361; also known as nerve growth factor receptor-associated protein 1 [NGFRAP1]) and that has been suggested to contribute to disease (Yamamoto et al., 2014; Figure 3a).

Case ID | Sex | Inheritance | Coordinates (GRCh37/hg19) | Size (Mb) | XCI (blood) | Junction features |
---|---|---|---|---|---|---|
BAB2595 | F | de novoa | chrX:100,866,604–103,411,980 | 2.5 | Random 37:63 |
2 bp Microhomology |
BAB2650 | F | de novo | chrX:102,615,641–103,309,503 | 0.693 | Random 22:78 |
3 bp Microhomology |
BAB8120 (D: 268030) | F | de novo | chrX:101,029,649–106,702,784 | 5.6 | Skewed >90 | 22 bp Insertion 5 bp direct-repeat |
BAB12522 | F | de novo | chrX:102,066,350–105,409,822 | 3.3 | Random 67:33 |
127 bp Microhomology LINE-LINE |
BAB2614 | F | Unknown | chrX:102,436,725–105,520,605 | 3 | Random 60:40 |
2 bp Microhomology Alu-Alu |
BAB8201 | M | Maternal | chrX:103,029,773–103,036,548 | 0.0067 | NA | None |
BAB8204 | M | Unknown | chrX:102,967,297–103,038,606 | 0.071 | NA | 1 bp Microhomology |
BAB2615 | M | Unknown | chrX:102,543,473–103,398,234 | 0.854 | NA | 1 bp Microhomology |
Patient 1‡ | F | de novo | chrX:100,659,116–105,523,589b | 4.8 | Skewedc 79:21 |
NA |
Patient 2‡ | F | de novo | chrX:101,365,862–105,847,036b | 4.4 | Uninformative | NA |
Patient 3‡ | F | de novo | chrX:100,907,884–103,982,269b | 3 | Skewedd | NA |
Patient 5‡ | F | de novo | chrX:102,959,459–103,044,544b | 0.085 | Skewed 95:5 |
NA |
BAB1379§ | M | Maternal | chrX:102,993,735–103,510,087 | 0.516 | NA | 18 bp Microhomology Alu-Alu |
BAB1684§ | M | Maternal | chrX:102,957,288–103,314,255 | 0.356 | NA | 12 bp Insertion 3 bp Direct-repeat |
H152§,∴ | M | Maternal | chrX:103,009,829–103,214,881e | 0.205 | NA | 32 bp Insertion 2 bp Microhomology |
Patient« | M | Maternal | chrX:103,018,951–103,092,038 | 0.073 | NA | 1 bp Microhomology |
Pt. 1 (II-1)¤ | M | Maternal | chrX:103,033,333–103,066,901 | 0.033 | Random II-2: 67:33 I-1: 64:36 |
2 bp Microhomology |
Pt. 2 (II-2)¤ | F | |||||
Pt. 3 (I-1)¤ | F |
- Note: ‡Yamamoto et al. (2014), §Inoue et al. (2002), ∴Lee et al. (2007), «Torisu et al. (2012), ¤Matsufuji et al. (2013).
- Abbreviations: aCGH, array comparative genomic hybridization; D, DECIPHER ID; NA, not applicable; XCI, X-chromosome inactivation.
- a Paternal sample not available, but no clinical indication of the disease reported for the father.
- b Coordinates estimated using aCGH probes.
- c Reported as skewed in the original publication, but does not meet the skewed cutoff in this study (80:20).
- d Raw data unavailable.
- e Not all junctions resolved.

3.3 Genotype-phenotype correlations in females
We examined for unique genotypic aberrations in EONDT females that could potentially contribute to the distinctive nature and increased the severity of their disease as opposed to females deleted for PLP1 that manifest a neurological phenotype consistent with SPG2. We initially screened for rare (i.e., MAF of ≤0.001 in the 1000 Genomes Project database and the gnomAD database), likely damaging SNVs in the nondeleted PLP1 allele in all five females via PCR and Sanger sequencing of coding exons and of intron 3. We aimed to explore the possibility of biallelic PLP1 variation; however, no evidence was found for rare and likely damaging variants in the examined females with heterozygous Xq22-PLP1-DEL (data not shown).
Next, we examined gene content among all 14 known heterozygous-PLP1-deletion cases with clinical phenotypes available and attempted to elucidate an SRO that is explicitly shared between EONDT cases, three published (Yamamoto et al., 2014) and four herein, while excluded from the deletions of all known female SPG2 cases. Of note, Yamamoto et al. (2014) reported four Xq22-PLP1 deletion cases, the fourth case, Patient 5, harbored a small (85 kb) deletion that completely overlapped with the deletion of an SPG2 family (family of BAB1684 from Inoue et al., 2002) whose affected individuals (male and female) did not show features of EONDT. The original authors proposed that the distinct phenotype in Patient 5 cannot be explained by the Xq22 deletion and that it is likely due to a pathogenic variant that cannot be detected by aCGH such as copy-number neutral genomic rearrangements or Indels/SNVs (Yamamoto et al., 2014). Hence the Xq22 deletion of Patient 5 was excluded from our SRO analysis, as it is not expected to encompass genes that could potentially contribute to EONDT features.
We found an SRO upstream of PLP1 within a larger region that was previously suggested to contribute to disease in EONDT females (Yamamoto et al., 2014) and we referred to this region as the EONDT-SRO, chrX:102,615,641–102,957,288 (Figure 3a, green magnification). This region is ~342 kb in size and contains six genes: BEX3, RAB40A, TCEAL4, TCEAL3, TCEAL1, and MORF4L2, all of which are expressed in the brain albeit at different levels (Figure 3c; data sourced from the GTEx portal). These results parsimoniously enabled the possible exclusion of the second previously proposed SRO (downstream of PLP1 and containing IL1RAPL2; Figure 3a, grey box) from contributing to disease as it was not encompassed by two of the EONDT deletions in the current cohort, BAB2650 and BAB2595.
3.4 Breakpoint-Junction sequence features
Potential mechanisms of SV formation, that is upstream rearrangement mechanisms, can be surmised or inferred from the study of breakpoint-junction sequence features and mutational signatures such as insertional complexities, SNVs, microhomology, microhomeology (highly similar but imperfect matches), and flanking sequence similarity (Beck et al., 2019; Conrad et al., 2010; Drier et al., 2013; Liu et al., 2017; Ottaviani, LeCain, & Sheer, 2014; Yang et al., 2013). We aligned the eight deletion junction sequences to the reference haploid genome and examined them for the aforementioned features.
Seven of the eight junctions yielded simple alignments, that is they were resolved by a single join-point between two noncontiguous sequences, while one, the junction observed in BAB8120, was complex with at least two join-points linking a 22-bp insertion to the deletion breakpoints. The insertion in BAB8120 does not appear to have a perfect match in the reference haploid genome consistent with a nontemplated insertion or a bi-product of iterative template switching. Notably, due to the presence of the same 5-bp sequence unit twice in the patient's junction sequence flanking the 22-bp insertion (Figure 2f), two alignments to the reference haploid genome are feasible in this case, each exploiting different features and inferring distinct potential mechanism/s of formation (Figure S2); thus, the status of microhomology cannot be conclusively determined in BAB8120.
Most deletion junctions (7/8) were apparently simple DNA end-joining with six out of those seven (~85%) revealing microhomology (≥1, range: 1–127 bp) and two (~29%), BAB2614 and BAB12522, revealing microhomeology (Table 3). Increased flanking sequence similarity was observed in BAB2614 and in one of two alignments of BAB8120 (Figure S3).
Junction sequences (range:112–730 bp) were examined for SNVs. Variants were found in five instances involving four cases, BAB2614, BAB8201, BAB8204, and BAB12522 (Table S3 and Figure 2); three of the identified SNVs (in BAB2614, BAB8201, and BAB12522) were rare (1000Genomes MAF: 0, 0, and 0.001, respectively), while two (in BAB2614 and BAB8201) were polymorphic (1000Genome MAF: 0.083 and 1, respectively). Notably, the variant identified in BAB8201 was present in the junction sequence of his mother and grandmother (BAB8202 and BAB8203, respectively) who also carry the Xq22-PLP1 deletion, hence indicating in-cis inheritance with the deletion (pedigree shown in Figure 1). We were unable to explore for the possible de novo nature of the identified rare SNVs due to unavailability of parental/grandparental samples. Examined junction sequences are provided in Table S3.
Overall, junction sequences studied herein reveal some common features with the five previously published Xq22-PLP1 deletion junction sequences (Inoue et al., 2002; Lee et al., 2007; Matsufuji et al., 2013; Torisu et al., 2012) including microhomology (≥1, range: 1–127 bp) and insertional complexities; out of all 13 sequenced junctions to date, explicit microhomology was observed in 10/13 (~77%) and insertional complexities were observed in 3/13 (~23%; Table 3).
3.5 HSIR intrachromosomal repeats and other large repeat structures at deletion breakpoints
Genome architectural features at CNV breakpoints can potentially reflect intrinsic structures rendering the region susceptible to genomic instability, DNA single and/or double-stranded breaks (DSBs), and to being a potential rearrangement substrate for DNA break repair (Carvalho & Lupski, 2016; Chen, Cooper, Ferec, Kehrer-Sawatzki, & Patrinos, 2010). We explored the genome architecture at deletion breakpoints in the haploid reference computationally via our in-house developed method for detecting HSIRs as described in Section 2. Breakpoints were also investigated for coincidence with repeat structures that were not part of the HSIR dataset, but were found on any one of the PMD-LCRs dataset (Lee, Inoue et al., 2006), the Segmental Dups dataset (Bailey, 2001, 2002), or the RepeatMasker dataset (Jurka, 2000). This analysis was applied to a compiled cohort of 17 Xq22-PLP1 deletions (eight new and nine from the published literature).
Nine deletions (out of 17, ~53%) had breakpoints embedded within HSIRs with the greater part of the nine (5/9, 55%) involving a single HSIR, RepX-i1010. This HSIR is 142 kb in size and pair-aligns in an inverted orientation (>95% identity) to an equally-sized fragment on the p arm of chromosome X (chrX:52,103,260–52,245,416). In addition to those found within RepX-i1010, six other breakpoints were found nearby this HSIR at telomeric distances of ~2, ~13, ~17, 20, ~22, and ~28 kb (Figure 4a). Taken together, ~32% (11/34) of all Xq22-PLP1 deletion breakpoints are clustered within or near the telomeric half of RepX-i1010 in a region that is <90 kb in size. Such dense clustering of breakpoints is not seen anywhere else in the genomic vicinity implicated in Xq22-PLP1 deletions. We, therefore, propose this region to be a genomic instability hotspot (GI-hotspot) for DNA breakage (Figure 4a).

Other HSIRs were also identified at deletion breakpoints; however, they were not as recurrently implicated as RepX-i1010. The distribution of HSIRs and PMD-LCRs in the interval covering PLP1 deletions (chrX:100,650,000–106,703,000) and their coincidence with deletion breakpoints are illustrated in Figure S4. Notably, HSIRs that are part of inverted-repeat-pairs are more enriched in this locus than direct-repeat-pair HSIRs (Figure S4). Furthermore, all HSIRs that were identified at deletion breakpoints were part of IR pairs while no evidence was found for the involvement of any of the directly-oriented HSIR pairs in this region.
Three of the 17 Xq22-PLP1 deletion cases had breakpoints embedded within repeat-element pairs, including two from this cohort, BAB2614, and BAB12522. The deletion breakpoint-junction in BAB2614 merges a pair of directly-oriented AluY elements yielding a chimeric Alu hybrid (Figure 2g) thus defining an Alu-Alu recombination (Song et al., 2018); the implicated AluY pair shares 86.8% identity with 0.6% gaps as detected by the Smith–Waterman pair-alignment algorithm. Whereas, the deletion junction in BAB12522 merges a pair of directly oriented LINEs (Figure 2h).
Overall, out of the 17 cases of Xq22-PLP1 deletions known to date, the majority (12/17, ~71%) have at least one breakpoint residing in or near (maximum separation = 16.5 kb) an HSIR, four have breakpoints mapping to L1 LINE elements and two have breakpoints mapping to a pair of Alu elements (Table S2).
3.6 Non-B DNA motifs in RepX-i1010 and the GI-Hotspot
Non-B DNA motifs are short genomic DNA sequences, that is <100 nucleotides in size, that have the potential to adopt non-B DNA structure. Such motifs have been associated with genomic instability in several previous studies (Bacolla & Wells, 2009; Boyer, Grgurevic, Cazaux, & Hoffmann, 2013; Chen et al., 2013; Madireddy & Gerhardt, 2017; Oren et al., 2016; Wang & Vasquez, 2014; Wells, 2007; Wojciechowska, Napierala, Larson, & Wells, 2006; Zhao, Bacolla, Wang, & Vasquez, 2010). In particular, STRs, and IRs (including palindromic or potential cruciform forming sequences) have been found to be enriched near recurrent and nonrecurrent deletion CNV breakpoints (Akgun et al., 1997; Chen et al., 2013; Oren et al., 2016). We investigated RepX-i1010 and the GI-hotspot for seven non-B DNA motifs including G-Quadruplex forming sequences, STRs, A-phased repeats, mirror repeats (MR), IRs, direct repeats (DRs), and Z-DNA forming sequences (Cer et al., 2013; Wells, 2007). Non-B DNA motifs, specifically STRs, within large repeat structures (e.g., LCR) have been previously proposed to incite genomic instability and the formation of nonrecurrent-but-clustered CNV breakpoints (Li, Yen, & Shapiro, 1992; Liu et al., 2011). To understand the distribution and enrichment of non-B DNA motifs throughout the RepX-i1010 locus and the GI-hotspot region, we compared the density of these motifs in 10-kb intervals among three different reference genome sequence groups: (a) the distal half (70 kb) of RepX-i1010 where breakpoints cluster against the proximal half (70 kb) of it (Figure 4a); (b) RepX-i1010 against two flanking similarly sized (140 kb) control regions (Figure 4b); (c) the GI-hotspot region (90 kb in total, the distal 60 kb from RepX-i1010 + 30 kb telomeric to that) against two flanking similarly-sized (90 kb) control regions (Figure 4c). Heatmap visualization of the density of motifs in the three tested groups is illustrated in Figure 4.
Analysis of the first group indicated no significant enrichment in the distal half of RepX-i1010 as compared to the proximal half (one-sided the Wilcoxon Rank Sum test); however, the distribution of IR motifs was significantly distinct between the two halves (two-sided the Wilcoxon Rank Sum test; p = .017). Analysis of the second group revealed a significant enrichment of STR motifs within RepX-i1010 as compared to both flanking control regions (one-sided the Wilcoxon Rank Sum test; p = .021 with proximal control and .009 with distal control). MR and DR motifs were also enriched within RepX-i1010; however, only in comparison to the distal control region (one-sided the Wilcoxon Rank Sum test; p = .042 and .021, respectively). The third group analysis yielded enrichment of STR motifs in the GI-hotspot as compared to the downstream control region (one-sided the Wilcoxon Rank Sum test; p = .022) and enrichment of Z-DNA motifs in the GI-hotspot as compared to the upstream control region (one-sided the Wilcoxon Rank Sum test; p = .028). IR motifs were distributed distinctively across the GI-hotspot and the proximal control region (two-sided the Wilcoxon Rank Sum test; p = .014). The p values of other tests are indicated in Table 4.
p Value | IR | STR | MR | DR | GQ | ZDNA | APR | ||
---|---|---|---|---|---|---|---|---|---|
Group 1 | RepX-within: distal vs. proximal | Two-sided | .01746* | 1 | .8973 | .6752 | .3031 | .1762 | .726 |
One-sided | .9939 | .5516 | .6018 | .7456 | .8784 | .08808 | .363 | ||
Group 2 | RepX vs. proximal flank | Two-sided | .2309 | .04161* | .4357 | .3676 | .5506 | .4886 | .2696 |
One-sided | .8933 | .02081* | .8715 | .1838 | .2753 | .7709 | .8756 | ||
RepX vs. distal flank | Two-sided | 1 | .01854* | .08597 | .04195* | .2896 | .5658 | .0606 | |
One-sided | .5 | .00927* | .04298* | .02098* | .1448 | .7331 | .973 | ||
Group 3 | Hotspot vs. proximal flank | Two-sided | .01465* | .5006 | .9292 | .9285 | .3295 | .05537 | 1 |
One-sided | .9943 | .2503 | .5706 | .5712 | .8572 | .02769* | .5375 | ||
Hotspot vs. distal flank | Two-sided | .06151 | .04525* | .5049 | .5047 | .2892 | .3643 | .1811 | |
One-sided | .9749 | .02262* | .2525 | .2524 | .1446 | .1821 | .9236 |
- Note: The three tested groups were: (a) the distal half of RepX-i1010 (70 kb) tested against the proximal half of RepX-i1010 (70 kb); (b) 140 kb of RepX-i1010 tested against 140 kb of proximal sequence and 140 kb of distal sequence; and (c) a 90 kb genomic instability hotspot tested against 90 kb of proximal sequence and 90 kb of distal sequence.
- Abbreviations: APR, A-phased repeats; DR, direct repeats; GQ, G-quadruplexes, IR, inverted repeats; MR, mirror repeats; STR, short tandem repeats; ZDNA, Z-DNA.
- * p < .05.
In summary, we found evidence for the potential role of STR and Z-DNA motifs in inciting instability and leading to Xq22-PLP1-DEL formation as they were significantly enriched at the proposed genomic instability hotspot where ~47% of proximal breakpoints appeared to be clustered.
3.7 XCI status in blood tissue of female probands
XCI testing on peripheral blood-derived DNA using microsatellite trinucleotide polymorphic alleles of the androgen receptor gene, AR, revealed heterogeneous results among the five studied female cases. Four of them (BAB2614, BAB2595, BAB2650, and BAB12522) showed random patterns of XCI in peripheral blood, while the fourth (BAB8120) revealed XCI skewing where >90% inactivation of the deletion-bearing X-chromosome (paternal chromosome as revealed by SNP-phasing, data not shown) was evident. Data summarized in Table 3.
4 DISCUSSION
We investigated the molecular etiology and genotype-phenotype correlation in unrelated subjects from a cohort of Xq22-PLP1 deletions consisting of eight cases, five females, and three males (Figure 1). Deletions of PLP1 are rare and this study perhaps represents the largest investigated cohort of Xq22 deletions extending over PLP1 to date. Female cases presented with clinical findings that were manifest as an entity defined here as EONDT consistent with clinical findings associated with large heterozygous Xq22-PLP1 deletions in the literature (Yamamoto et al., 2014; Table 2). We found evidence for the potential contribution to disease severity and EONDT in female cases of a genomic interval upstream of PLP1—a region previously implicated (Figure 3a). Our genomic analysis of deletion junction sequences further revealed that microhomology (≥1 bp) is a frequently observed feature among Xq22-PLP1 deletion rearrangements (Figure 2) and that large IR structures as well as non-B DNA motifs, specifically STRs, have a potential role in inciting genomic instability and stimulating Xq22 deletion formation near the PLP1 locus (Figure 4). In addition, the sequencing and alignment of deletion breakpoint junctions exposed experimental data implicating an apparent breakpoint hotspot that was not delineated from a computational approach alone (Figure 4).
4.1 Xq22 contiguous-gene deletion syndrome in females
Disease manifestation in the four EONDT female patients that were clinically investigated in this study echoes the characteristic clinical features described by Yamamoto et al. (2014) in their cohort of three large heterozygous Xq22-PLP1 deletion cases (Table 2 and Supporting Additional File S3), hence enabling us to examine for genotype/phenotype associations among both cohorts. Individuals from both cohorts were found to present with core clinical characteristics including neonatal hypotonia, severe DD/ID, behavioral abnormalities and strabismus. A photograph of the face is available in a published clinical report on one patient from our cohort (Brender et al., 2015) and it revealed dysmorphic features common with what was found in the other three published cases (Yamamoto et al., 2014) including a triangular face, large forehead, strabismus, and prominent jaw. This suggests that facial dysmorphism may also emerge as a possible core recognizable feature of EONDT, although one of our patients (BAB12522) did not have dysmorphism.
Brain MRI was obtained at different ages in the two cohorts thus it is challenging to directly compare findings; however, there are common features in both studies including nonspecific increased white matter signal intensities, delayed myelination, and thin corpus callosum. Increased white matter signal intensity on T2-weighted images of MRI and spasticity are features common to male subjects with SPG2 and subtle but similar features can be present in females with PLP1 deletion (Inoue, 2005; Inoue et al., 2002; Matsufuji et al., 2013; Torisu et al., 2012; Table 2). However, the other core features of EONDT are distinct in nature and are not parsimoniously explained by PLP1 on its own. Moreover, EONDT can be distinguished from PMD, an allelic disorder of SPG2, by the distinctive severity and early onset of disease in EONDT females as compared to mostly nonsymptomatic or late-onset-disease females from PMD families.
Variable expressivity in females carrying heterozygous Xq22-PLP1 deletions is proposed to be related to interindividual differences in XCI patterns as well as deletion size and gene content. Skewed XCI has been one molecular explanation for manifesting female carriers of Xq22-PLP1 deletion CNV and indeed skewed XCI patterns are seen in the peripheral blood of some EODNT patients (Table 3); however, this skewing is not necessarily reflected in the brain due to possible intraindividual XCI variation. Hence, although it may have a role, we are unable to provide any evidence for consistent XCI skewing potentially contributing to neurological disease severity.
Yamamoto et al. (2014) proposed two SROs flanking PLP1 to potentially contribute to the distinct features of the disease in EONDT females with large Xq22-PLP1 deletions (>3 Mb; Figure 3a, grey boxes). They suggested BEX3, located in an ~700 kb SRO upstream of PLP1 that contains 14 genes in total (chrX:102,233,526–102,957,289), as the most likely contributing candidate gene as it is preferentially expressed in the brain and it encodes the NGFRAP1 protein, also called p75NTR-associated cell death executor, NADE, disruption of which may lead to abnormal signal transduction and aberrant neuronal development (Calvo et al., 2015; Mukai et al., 2000). However, due to the limited number of EONDT female patients, other brain-expressed genes encompassed within the SRO containing BEX3 could not be ruled out from possible contribution to disease either alone or as part of a mutational burden along with BEX3. In addition, they proposed a second SRO (chrX:102,993,719–103,982,269), downstream of PLP1 and containing a single gene, IL1RAPL2, as a possible candidate region underlying EONDT in females with large Xq22 deletions. Here, we investigated the two suggested SROs in similarly affected patients and found further evidence that supports the potential contribution of the SRO upstream of PLP1 to distinct disease features of EONDT in females. We found a refined interval (chrX:102,615,641–102,957,288) of the proposed upstream SRO (Yamamoto et al., 2014) to be shared among deletions of all EONDT cases and absent from deletions of all SPG2 cases (Figure 3a, EONDT-SRO). This further emphasizes a possible role for the EONDT-SRO in contributing to disease potentially via haploinsufficiency of one, some or all of the encompassed genes or by disrupting an enhancer of a disease-causing gene in the region. Importantly, the refined interval contains one of the candidate genes proposed by Yamamoto et al. (2014) to be associated with the neurological components of this disease, BEX3. In addition to being encompassed by deletions of EONDT female cases, BEX3 has the highest expression levels in the brain as compared to the other five genes within the EONDT-SRO (all expressed in the brain; Figure 3c). Moreover, BEX3 (among other contiguous centromeric genes and one telomeric gene, RAB40A) has been associated with severe neurological impairment and reduced cerebral volume in a male patient harboring a hemizygous deletion that is not inclusive of PLP1 (Shirai, Higashi, Shimojima, & Yamamoto, 2017). Nevertheless, further studies of patients with BEX3 intragenic deletions or point mutations or studies of a knockout mouse model may be required to support its involvement in neurological disease.
The EONDT-SRO contains MORF4L2 (and its antisense transcript MORF4L2-AS1), a brain-expressed gene that is proposed to escape XCI making it prone to dosage sensitivity in 46, XX females (Zhang et al., 2013). Moreover, Labonne et al. (2016) reported a female patient with some clinical features that overlap with features of EONDT and a heterozygous Xq22 deletion that extends over GLRA4 and two of the EONDT-SRO genes, MORF4L2, and TCEAL1 (Figure 3a, deletion framed in yellow). They maintained that GLRA4 is likely the disease-causing gene leading to the phenotype of their patient. We found GLRA4 to be outside of the EONDT-SRO and to be encompassed by an SPG2 family (affected male and female) from the literature with no evidence of any of the EONDT features in affected individuals (Inoue et al., 2002; Figure 3a); this leads us to the interpretation that GLRA4 does not lead to EONDT features when deleted in a heterozygous nor in a hemizygous state. It also suggests that one or both of the EONDT-SRO genes encompassed by the patient of Labonne et al. (2016), MORF4L2 and TCEAL1 (Figure 3a, DGDP084; deletion framed in yellow), are potentially contributing as disease-causing candidates.
We cannot exclude the potential contribution of any of the genes encompassed by the EONDT-SRO to some or all of the phenotypic features of EONDT. It is likely that genes within the EONDT-SRO are contributing, in addition to PLP1, to the overall characteristics of an emerging contiguous gene deletion syndrome observed as EONDT in female cases (Campbell et al., 2012; Schmickel, 1986). An intragenic deletion or a rare likely damaging SNV/Indel in any one of the EONDT-SRO genes could potentially provide critical evidence to further dissect this contiguous gene syndrome. It is possible that in females with large deletions, PLP1 is leading to features common to SPG2 such as white matter MRI findings and spasticity an expected outcome from PLP1 LoF, while other genes in the EONDT-SRO, are contributing to the additional distinct features including the severe DD/ID, the behavioral abnormalities, and the distinct facial dysmorphology. The proposed contribution of PLP1 to SPG2 features of the disease in EONDT patients is further evidenced by the lack of these features in the patient described by Labonne et al. (2016) whose Xq22 deletion does not encompass PLP1.
4.2 Mechanisms mediating the formation of Xq22-PLP1 deletions
The rarity of Xq22-PLP1 deletions has been proposed to be due to (a) intolerance to the loss of contiguous genes and (b) possibly different mechanisms leading to their formation as opposed to the more common mutational events in this locus such as Xq22-PLP1 duplications and triplications (Inoue et al., 2002). Understanding genomic features of Xq22-PLP1 deletion breakpoints is, therefore, essential for the inference of potential triggers inciting DNA damage at this locus, including intrinsic genome architecture causing genomic instability, and for inference of the rearrangement mechanisms leading to deletion CNV formation and potentially contributing to the prevalence of the disease. Due to the limited number of cases previously identified and studied, a comprehensive analysis of genomic features of Xq22-PLP1 deletions has not been possible. Earlier studies of five Xq22-PLP1 deletions suggested microhomology as a prominent junction feature and nonhomologous end joining (NHEJ) as a potential mechanism of formation (Inoue et al., 2002; Matsufuji et al., 2013; Torisu et al., 2012).
In line with previous observations and contentions, here we also find microhomology in the majority of junctions (75%) and find features of the end joining mechanisms of DSB repair, NHEJ and microhomology-mediated end joining (MMEJ), to be common among new Xq22-PLP1 deletions. Nevertheless, features of break-induced replication (BIR) mechanisms such as Alu-Alu mediated rearrangement (AAMR) and fork stalling and template switching/microhomology-mediated BIR (FoSTeS/MMBIR) are also apparent (Beck et al., 2019; Carvalho & Lupski, 2016; Carvalho et al., 2011; Carvalho et al., 2013; Gu et al., 2015; Hsiao et al., 2015; Lee et al., 2007; Song et al., 2018; Zhang et al., 2009). In fact, the SNVs, microhomology, microhomeology, and the chimeric Alu hybrid at the breakpoint-junction of BAB2614 (Figure 2g) are all features indicative of AAMR (Song et al., 2018). The breakpoint-junction in BAB12522 as well reveals features of BIR including microhomology, microhomeology, a pair of LINEs, and an SNV. Moreover, the complexities observed in the junction sequence of BAB8120 (5-bp DR flanking a 22-bp insertion as shown in Figure 2f) indicate the probable involvement of serial replication slippage or FoSTeS/MMBIR (Figure S2) although further narrowing of the most parsimonious explanatory mechanism leading to the deletion formation is challenging to discern in this case given the presence of a pair of 5-bp DRs.
While taking microhomology into consideration only when ≥2 bp in size (as 1 bp apparent microhomology can potentially be due to NHEJ and end processing), we propose that NHEJ and MMEJ collectively mediate at least seven out of the 13 sequenced Xq22-PLP1 deletions to date, FoSTeS/MMBIR may play a role in at least three where insertional complexities and iterative template switching are apparent and AAMR is implicated in the two cases where Alu-Alu junctions are observed. The diversity of mechanisms potentially underlying the formation of Xq22-PLP1 deletions argues against a mechanistic limiting factor in their formation and supports the contention that other factors may be contributing to the rarity of these cases such as lethality in cases of large hemizygous deletions or milder disease that does not call for genetic testing in females with small non-EONDT deletions.
4.3 Hotspot of genomic instability
Although the observed Xq22-PLP1 deletions (16 deletions) are highly variable in size, more than 30% of all breakpoints taken together were found clustered in an interval that is less than 90 kb in size indicating a genomic instability hotspot within this interval. One-sided grouping or clustering of nonrecurrent breakpoints has been observed in other loci including distal breakpoints of duplications at the MECP2 locus which were clustered in an LCR-rich 215-kb genomic interval on Xq28 and the grouping of the triplication end site of Xq22-PLP1 DUP-TRP/INV-DUP events near a pair of PMD-LCRs, PMD-LCRA1a, and PMD-LCR-A1b (Beck et al., 2015; Carvalho et al., 2009). LCRs in the MECP2 locus and the Xq22-PLP1 DUP-TRP/INV-DUP events were proposed to stimulate replicative repair mechanisms leading to the formation of complex genomic rearrangement events. Here, we identified a GI-hotspot via the experimental approach of mapping deletion breakpoint to base-pair level and computationally found two architectural features to be overlapping the GI-hotspot of Xq22-PLP1 deletions, an inverted paralogous repeat, RepX-i1010, and significant enrichment of non-B DNA motifs, STR and Z-DNA motifs. RepX-i1010 was not detected by any of the publicly available repeat datasets; however, it was functionally delineated in this study due to its frequent coincidence with base-pair-resolution mapped breakpoints and was further confirmed by our computational approach for identifying HSIRs. This illustrates the role of experimental data and of mapping CNV breakpoints from the personal genomes of individual patients is not only supplementing computational approaches but also in allowing the elucidation of clustering of breakpoint-junctions and the discovery of intrinsic genomic architecture causing instability that can be missed otherwise due to methodology limitations.
IR structures are proposed to incite genomic instability (Dittwald et al., 2013) and to stimulate single-ended DSB repair (Carvalho & Lupski, 2016) while STR and Z-DNA motifs are predicted to form non-B DNA structures rendering genomic instability, DNA fragility and consequently leading to genomic disorders (Bacolla, Tainer, Vasquez, & Cooper, 2016; Chen et al., 2013; Javadekar, Yadav, & Raghavan, 2018; Li et al., 1992; Nambiar, Srivastava, Gopalakrishnan, Sankaran, & Raghavan, 2013; Wojcik et al., 2012). Moreover, Z-DNA motifs have been specifically associated with genomic instability rendering the formation of large deletions (Wang, Christensen, & Vasquez, 2006). The coincidence of both these features in a small interval where nonrecurrent breakpoints cluster parsimoniously explains the heterogeneity of mutational signatures seen in the rearrangement products. We propose that each of these architectural features likely incites a distinct environment of genomic instability (e.g., DSBs, fork stalling and/or DNA nicks) and potentially stimulates the employment and progression of different mechanisms of repair (e.g., NHEJ, MMEJ, DS-MMEJ, MMBIR/FoSTeS, and AAMR).
5 CONCLUSION
This study provides mechanistic insights into Xq22-PLP1 deletions both at the upstream level of rearrangement mechanisms leading to deletion formation and the downstream level of genic mechanisms and gene(s) contributions to trait manifestations. It provides further evidence for an Xq22 contiguous-gene deletion syndrome in females and elaborates on the recognizable clinical features of this syndrome. It also emphasizes that patient studies can enable experimental and functional characterization of mutational events and uncover intrinsic properties of the human genome rendering genomic instability.
ACKNOWLEDGMENTS
We thank all individuals, their families, and the referring physicians who submitted samples for testing. No additional compensation was received for these contributions. The authors would like to thank the Genome Aggregation Database (gnomAD) and the groups that provided exome and genome variant data to this resource. A full list of contributing groups can be found at https://gnomad.broadinstitute.org/about. This work was supported in part by the US National Human Genome Research Institute (NHGRI)/National Heart Lung and Blood Institute (NHLBI) UM1 HG006542 to the Baylor Hopkins Center for Mendelian Genomics (BHCMG), by the US National Institute of Neurological Disorders and Stroke (NINDS) grants R01 NS058529 and R35 NS105078 to J.R. Lupski and R01 NS058978 to G.M. Hobson, by the National Institute of General Medical Sciences (NIGMS) grants, R01 GM106373 to J.R. Lupski and by the PMD Foundation to G.M. Hobson. G.M. Hobson and the Nemours Biomolecular Core Lab are supported by the National Human Genome Research Institute (NHGRI), Centers of Biomedical Research Excellence (COBRE) grant, P30 GM114736. The Nemours Biomolecular Core Lab is also supported by the IDeA Networks of Biomedical Research Excellence (INBRE) grant, P20 GM103446. The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS.
CONFLICTS OF INTEREST
Baylor College of Medicine (BCM) and Miraca Holdings have formed a joint venture with shared ownership and governance of the Baylor Genetics (BG), which performs clinical microarray analysis and clinical exome sequencing. C.G.-.J. is currently a full-time employee of Regeneron Pharmaceuticals Inc. and receives stock options as part of compensation. J.R.L. serves on the Scientific Advisory Board of the BG. J.R.L. has stock ownership in 23andMe, is a paid consultant for Regeneron Pharmaceuticals, and is a coinventor on multiple United States and European patents related to molecular diagnostics for inherited neuropathies, eye diseases, and bacterial genomic fingerprinting. The Department of Molecular and Human Genetics at Baylor College of Medicine derives revenue from molecular genetic testing offered in the Baylor Genetics Laboratories. Nemours Biomedical Research derives revenue from molecular genetic testing offered in the Nemours Molecular Diagnostics Laboratory. The Greenwood Genetic Center receives revenue from diagnostic testing performed in the GGC Molecular Diagnostic Laboratory. The other authors declare no competing financial interests.
AUTHOR CONTRIBUTIONS
H.H., C.M.B.C., G.M.H., and J.R.L. participated in the design of the study. H.H. performed the majority of data analyses and experiments. H.H., C.M.B.C., D.P., and G.M.H. prepared the manuscript with J.R.L. F.S.C. participated in the design of the study and performed some experiments and analyses. C.G.J. developed the HSIR computational method while at Baylor College of Medicine. L.B. referred a patient to this study, provided their clinical information, and performed the XCI studies on the referred case. S.S.M., M.A.M., and A.H.K. referred patients to this study and provided their clinical information. J.A.L. participated in the design of the study, performed some experiments, and together with S.W.C. and P.F. facilitated anonymized sample studies and performed the XCI studies on three of the female samples. S.N. and S.S. referred a patient to this study and provided their clinical information. P.P. and P.W. performed some experiments and analyses. T.A. and K.S. performed molecular studies on one sample. J.R.J. and M.J.F. performed molecular studies and XCI on one sample and participated in data interpretation. J.T. performed in silico analyses. A.P. performed and interpreted MRI on one patient. X.S. performed the microhomeology and breakpoint-junction sequence-similarity analyses. A.D.W., and C.J. performed molecular experiments on two samples. K.I. reviewed the clinical information and helped with clinical interpretation. F.Z. participated in the design of the study and performed some experiments. G.M.H. provided samples and data for this study, participated in data interpretation, and critically revised the manuscript. D.P. reviewed and analyzed the clinical information, prepared the clinical table, and critically revised the manuscript. C.M.B.C. performed experiments, participated in data interpretation, and critically revised the manuscript. J.R.L. participated in data analysis and interpretation, oversaw the manuscript preparation, and critically revised the manuscript. All authors approved the final submitted version of the manuscript and are accountable for the accuracy and integrity of the work.
Open Research
DATA AVAILABILITY STATEMENT
Data generated during this study and the datasets supporting the conclusions of this article are included within the article and its supporting additional files. The GTEx data used for the analyses described in this manuscript were obtained from Gene expression, GTEx Batch Gene Query, the GTEx Portal (“GTEx Portal - The Broad Institute of MIT and Harvard,” n.d.) on 11/11/18.