Identification of copy number variants with genome sequencing: Clinical experiences from the NYCKidSeq program
Co-first authors: Katherine E. Bonini and Amanda Thomas-Wilson.
Abstract
Copy number variations (CNVs) play a significant role in human disease. While chromosomal microarray has traditionally been the first-tier test for CNV detection, use of genome sequencing (GS) is increasing. We report the frequency of CNVs detected with GS in a diverse pediatric cohort from the NYCKidSeq program and highlight specific examples of its clinical impact. A total of 1052 children (0–21 years) with neurodevelopmental, cardiac, and/or immunodeficiency phenotypes received GS. Phenotype-driven analysis was used, resulting in 183 (17.4%) participants with a diagnostic result. CNVs accounted for 20.2% of participants with a diagnostic result (37/183) and ranged from 0.5 kb to 16 Mb. Of participants with a diagnostic result (n = 183) and phenotypes in more than one category, 5/17 (29.4%) were solved by a CNV finding, suggesting a high prevalence of diagnostic CNVs in participants with complex phenotypes. Thirteen participants with a diagnostic CNV (35.1%) had previously uninformative genetic testing, of which nine included a chromosomal microarray. This study demonstrates the benefits of GS for reliable detection of CNVs in a pediatric cohort with variable phenotypes.
1 INTRODUCTION
Copy number variants (CNVs), unbalanced gains or losses of genomic DNA, account for a significant proportion of genetic diversity.1 While CNVs can contribute to adaptive traits or have no phenotypic impact, they are also a common mechanism of disease.2 It is estimated that 15%–20% of individuals with global developmental delay (GDD), intellectual disability (ID), and/or autism have a disease-causing CNV detectable by chromosomal microarray (CMA).3, 4 Among monogenic disease genes, the prevalence of intragenic CNVs is approximately 10%, and can account for 35% of causal variants for some phenotypes.5 Therefore, reliable detection of CNVs is critical.
A variety of cytogenetic and molecular methods are used to detect CNVs. Until recently, array-based comparative genomic hybridization (aCGH) or single nucleotide polymorphism (SNP) chromosomal microarrays (CMA) were considered gold standard and used as first-tier testing for individuals with neurodevelopmental disorders.6 Advancements in next generation sequencing (NGS), however, provide accurate and reliable CNV detection at greater resolution.7-10 Performance of genome sequencing (GS) is at least equivalent to CMA for detection of large CNVs11-13 and is capable of detecting CNVs below the limit of detection for array- or oligo-based methodologies. As costs for DNA sequencing, data analysis, and data storage decline, utility of GS as a first-tier test in the diagnostic pipeline is increasingly appreciated.14, 15 Now, GS or exome sequencing (ES) is recommended by the American College of Medical Genomics (ACMG) as first- or second-tier testing for individuals with ID, GDD, autism, and/or multiple congenital anomalies, replacing CMA.16
While the technical feasibility of CNV detection via GS has been explored, few studies have reported the frequency of causal CNVs detected via GS in large clinical cohorts. The NYCKidSeq program, a member of the Clinical Sequencing Evidence-Generating Research (CSER) Consortium, seeks to understand the utility of GS in diverse pediatric populations.17 The program consists of two studies: a randomized controlled trial (RCT; NCT03738098) evaluating the utility of GS, targeted gene panels (TGPs), and a digital genetic counseling tool, and TeleKidSeq, a pilot study assessing GS and results communication using telehealth.18, 19 We previously reported the diagnostic yield of GS compared to TGP using a fully paired study design. Among the 642 probands who received both GS and TGP, GS resulted in 106 (16.5%) molecular diagnoses compared to 52 (8.1%) by TGP (p < 0.001). Nineteen of the 106 diagnoses included CNVs (17.9%), and the majority of diagnostic CNVs were detected by GS alone (89.5%).20 To further explore the role of CNVs in childhood disorders, we leverage data from all NYCKidSeq program participants who received GS (N = 1052) to further characterize the diagnostic yield of CNVs. We also highlight clinically instructive examples of CNV diagnoses made by GS. This work contributes to evidence underlying the clinical utility of GS, particularly to identify disease-causing CNVs.
2 MATERIALS AND METHODS
2.1 Sample
Procedures of the RCT and TeleKidSeq studies have been described.18, 19 Individuals were eligible if they were 0–21 years of age, had a suspected genetic etiology for a neurologic, immunologic, and/or cardiac disorder and had at least one parent who was English- or Spanish-speaking. Individuals were ineligible if a known molecular diagnosis was present or if there was an apparent genetic diagnosis for their phenotype; however, individuals with previously uninformative genetic testing could participate. Enrollment from racial/ethnic minorities (non-White) and/or from medically underserved areas was prioritized. Genomic DNA was extracted from peripheral blood or saliva from the proband and, when available, biological parent(s). Clinical information was shared with the laboratories, including a phenotype checklist completed by the referring provider, recent chart note, pedigree, and if applicable, previous genetic testing reports.
Informed written consent was provided by all participants. Ethics approval was obtained by the Institutional Review Boards at the Icahn School of Medicine at Mount Sinai and the Albert Einstein College of Medicine.
2.2 Genome sequencing and CNV identification
GS was conducted at the New York Genome Center (NYGC) for 972 participants and at the Rady Children's Institute for Genomic Medicine (Rady) for 80 participants (Supplemental A). Methods at NYGC have been described.18, 20 Data analysis was performed with a proprietary analytical pipeline, utilizing Canvas and Manta algorithms (Illumina) to call CNVs. Variants were assessed in the proband and compared to parents (when applicable) and internal controls. CNVs were filtered based on an allele frequency <2% in internal databases and considered overlapping if located on the same chromosome with start and stop positions within 1 kilobase (kb) of a previously identified CNV. Only those overlapping genes in the professional version of the Human Gene Mutation Database were interpreted. Only findings in genes with established gene-disease relationships were reported. CNVs were prioritized based on the indication for testing, mode of inheritance, and variant frequency in publicly available databases such as gnomAD SVs and Database of Genomic Variation (DGV). Reported CNVs were confirmed using SNP microarray (Infinium Omni2.5, Illumina) or quantitative polymerase chain reaction (PCR).
At Rady, an Illumina PCR-free library prep kit was used to prepare the libraries, which was sequenced using Illumina HiSeq 2500 (rapid run mode), HiSeq 4000, or NovaSeq 6000 instruments with an average genomic coverage of 35× and/or >10× coverage of the coding bases. Illumina DRAGEN (Dynamic Read Analysis for GENomics) pipeline (GRCh37/hg19) was used for alignment and variant calling. Sanger sequencing was performed on reported single nucleotide variants (SNVs), and parental samples were tested for variants of interest. Orthogonal confirmation using multiplex ligation-dependent probe amplification (MLPA) was performed only if the variant met internal criteria (Supplemental B). DRAGEN Bio-IT Platform (v.2.5.1, Illumina), with integrated Manta structural variant (SV) caller methods, was used to call CNVs. CNVs were characterized via CNVnator and filtered by those with an allele frequency of <2% based on internal data, gnomAD-SV, DBVar and DGV. Only CNVs in the coding region of known disease-causing genes and those that overlapped or resided within 1 kb of coding exons were interpreted.
Variants were classified using ACMG and ClinGen guidelines at both laboratories.21, 22 A CNV was defined as an unbalanced genomic gain or loss greater than 50 base pairs (bp).1, 23
2.3 Targeted gene panels and CNV identification
Three custom NGS panels were developed at Sema4 using an exome backbone. Participant samples were subject to one or more of the panels based on phenotype, as previously described.20 Exome Hidden Markov Model algorithm was used to detect deletions/duplications of two or more exons per probed region, and the Oxford Gene Technology's customized exon array was used for confirmation. CNVs were confirmed by aCGH, quantitative PCR, and/or MLPA (Supplemental C).
2.4 Clinical interpretation and diagnostic classification
Clinical reports were sent to study genetic counselors, who assigned a clinical interpretation to each case. These processes are described in Abul-Husn et al.20 Diagnostic results are defined as variants consistent with a participant's primary indication that are (1) classified as Pathogenic or Likely Pathogenic (P/LP) and consistent with the condition's inheritance pattern or (2) determined to be diagnostic by the study's interpretation committee of four physicians. Variants of Uncertain Significance (VUS) consistent with the participant's phenotype were reviewed by the committee to determine clinical relevance.
2.5 Data analyses
We report on participants with a diagnostic result by GS from the RCT and TeleKidSeq study. Collectively, 1052 probands received GS (643 from the RCT and 409 from TeleKidSeq). A subset of this data (n = 642) was previously used to compare diagnostic yield of GS versus TGP.20 Here we include GS data from TeleKidSeq to yield a combined dataset (N = 1052) for analysis.
Demographics (sex, age, phenotype, self-reported race/ethnicity, and genetic testing history) were analyzed using descriptive statistics. Participants with more than one phenotype category (cardiac, immunologic, and neurologic) were designated to have “multiple phenotypes.” Frequency of diagnostic CNV detection by demographic was calculated, and Chi-squared test was used to determine significant differences in CNV yield among groups. Logistic regression was performed to evaluate the odds of reporting a CNV while controlling for proband characteristics.
3 RESULTS
3.1 Sample overview
Of those who received GS (N = 1052), 61.6% were male and 51.1% identified as Hispanic/Latino(a). 86.2% had a neurologic phenotype and 5.8% had multiple phenotype classifications. 29.0% had uninformative clinical genetic testing prior to enrollment, including microarray for 56% of these participants (Table 1A). 17.4% of participants (183/1052) received a diagnostic result via GS (Table 1B).
Participant characteristic N (%) | A. All participants who received GS (N = 1052) (%) | B. All participants with a diagnostic result by GS (n = 183)a (%) | C. Participants with diagnostic CNV by GS (n = 37) (%) | D. Diagnostic CNV (n = 37) / Total diagnosed cases (n = 183) (%, 95% CI)* | p value** |
---|---|---|---|---|---|
Sex | |||||
Male | 648 (61.6) | 107 (58.5) | 23 (62.2) | 21.5 (14.8–30.2) | 0.75 |
Female | 404 (38.4) | 76 (41.5) | 14 (37.8) | 18.4 (11.3–28.6) | |
Age | |||||
<3 years (infants/toddlers) | 146 (13.9) | 41 (22.4) | 14 (37.8) | 34.1 (21.6–49.5) | 0.03 |
3–12 years (preschool/school age children) | 534 (50.8) | 86 (47.0) | 12 (32.4) | 14.0 (8.2–22.8) | |
>12 years (adolescents/young adults) | 372 (35.4) | 56 (30.6) | 11 (29.7) | 19.6 (11.3–31.8) | |
Self-reported race/ethnicity | |||||
American Indian, Native American, or Alaska Native | 1 (0.1) | 0 (0.0) | 0 (0.0) | - | 0.31d |
Asian | 56 (5.3) | 9 (4.9) | 1 (2.7) | 11.1 (2.0–43.5) | |
Black or African American | 168 (16.0) | 25 (13.7) | 5 (13.5) | 20.0 (8.9–39.1) | |
Hispanic/Latino(a) | 538 (51.1) | 101 (55.2) | 18 (48.7) | 17.8 (11.6–26.4) | |
Middle Eastern or North African/Mediterranean | 8 (0.8) | 1 (0.6) | 0 (0.0) | - | |
White or European American | 212 (20.2) | 37 (20.2) | 11 (29.7) | 29.7 (17.5–45.8) | |
More than one population selected | 40 (3.8) | 6 (3.3) | 1 (2.7) | 16.7 (3.0–56.4) | |
Other | 8 (0.8) | 1 (0.6) | 0 (0.0) | - | |
Prefer not to answer | 13 (1.2) | 2 (1.1) | 1 (2.7) | 50.0 (9.5–90.6) | |
Unknown/“none of these fully describe my child” | 8 (0.8) | 1 (0.6) | 0 (0.0) | - | |
Phenotype | |||||
Multipleb | 61 (5.8) | 17 (9.3) | 5 (13.5) | 29.4 (13.3–53.1) | - |
Cardiac | 45 (4.3) | 8 (4.4) | 1 (2.7) | 12.5 (2.2–47.1) | |
Immunologic | 39 (3.7) | 4 (2.2) | 0 (0.0) | - | |
Neurologic | 907 (86.2) | 154 (84.2) | 31 (83.8) | 20.1 (14.6–27.2) | |
Epilepsy | 279 (30.8) | 28 (18.2) | 2 (6.5) | 7.1 (2.0–22.7) | - |
Intellectual developmental disability | 336 (37.0) | 67 (43.5) | 16 (51.6) | 23.9 (15.3–35.3) | |
Epilepsy and intellectual developmental disability | 292 (32.2) | 59 (38.3) | 13 (41.9) | 22.0 (13.4–34.1) | |
Previous genetic testing | |||||
No | 747 (71.0) | 108 (59.0) | 24 (64.9) | 22.2 (15.4–30.9) | 0.42 |
Yes | 305 (29.0) | 75 (41.0) | 13 (35.1) | 17.3 (10.4–27.4) | |
Including microarray | 141 (46.2) | 42 (56.0) | 5 (38.5) | 11.9 (5.2-25.0) | - |
Including exome (no microarray) | 11 (3.6) | 4 (5.3) | 0 (0.0) | - | |
Including microarray and exome | 30 (9.8) | 6 (8.0) | 4 (30.8) | 66.7 (30.0–90.3) | |
Otherc (no microarray or exome) | 115 (37.7) | 20 (26.7) | 4 (30.8) | 20.0 (8.1–41.6) | |
Unknown | 8 (2.6) | 3 (4.0) | 0 (0.0) | - | |
Sequenced | |||||
Singleton | 286 (27.2) | 76 (41.5) | 18 (48.7) | 23.7 (15.5–34.4) | 0.49 |
Duo | 236 (22.4) | 27 (14.8) | 6 (16.2) | 22.2 (10.6–40.8) | |
Trio | 530 (50.4) | 80 (43.7) | 13 (35.1) | 16.3 (9.8–25.8) |
- Note: A total of 1052 participants received GS through the NYCKidSeq program (Column A). Out of these, 183 obtained a diagnostic result (Column B). Thirty-seven participants received a diagnostic result by GS due to the presence of a CNV (Column C).
- a Includes the 37 diagnostic CNVs.
- b Refers to participants who had more than one of the three phenotype categories defined by the study (cardiac, immunologic, and neurologic).
- c Refers to one or more of the following genetic test modalities: Fragile X, karyotype, panel testing, other.
- d p value is for the three largest subpopulations (Black or African American, White or European American, and Hispanic/Latino(a)).
- * 95% confidence interval for the fraction of cases in the relevant group, calculated by Wilson score interval's method for binomial proportion.
- ** p values are determined by Chi-square test of values in column C compared to values in column B unless otherwise indicated. Test was not performed when a cell was less than 5.
3.2 Identification of CNVs
Of all participants, 69 had one or more CNVs reported by GS (73 CNVs total, Supplemental D); 37 participants had at least one diagnostic CNV. A diagnostic CNV was therefore identified by GS in 20.2% (37/183) of all solved cases. Characteristics of participants with a diagnostic CNV are summarized in Table 1C; the frequency of diagnostic CNVs among all diagnosed cases by demographics are in Table 1D. The frequency of diagnostic CNVs among diagnosed cases varied by phenotype: 20.1% (31/154) of all diagnosed participants with a neurologic phenotype alone had a diagnostic CNV, followed by 12.5% (1/8) of participants with a cardiac phenotype alone. Almost 30% (5/17) of those with multiple phenotypes who received a diagnostic result had a causative CNV. Despite these differences in frequency, differences in CNV detection between phenotypes were not statistically significant (Table 1). However, participants with multiple phenotypes were more likely to have a CNV (diagnostic or uncertain) than those with a neurologic phenotype alone (OR = 2.33, 95% CI 1.05–5.19, p = 0.038) (Supplemental E). There was a statistically significant difference (p = 0.03) in CNV identification in our cohort based on age (Table 1) and participants less than 3 years of age were more likely to have a CNV (diagnostic or uncertain) than children 3–12 years of age (OR = 2.38, 95% CI 1.32–4.31, p = 0.004) (Supplemental E).
Thirteen (35.1%) participants with a diagnostic CNV had received genetic testing prior to enrollment (Table 1C). Three had a CMA alone and six had a CMA in addition to other genetic testing. Four participants had ES, two of which included CNV calling. All previous genetic testing was considered uninformative or did not explain the participant's full clinical phenotype at that time. In three cases, GS confirmed previously identified CNVs as the sole molecular diagnostic finding, after which, the referring provider deemed the results likely causal. (Supplemental F).
Eighteen (48.6%) participants with a diagnostic CNV identified via GS also had a TGP (Supplemental G)20 For ten participants, at least one gene in the CNV detected by GS was included on the panel; however, the full CNV was detected by both modalities in only one case. There were four instances in which TGP identified part of the CNV detected via GS, and five cases where a CNV was not found by TGP. The remaining eight cases had a CNV reported that did not include a gene on the TGP.
Of the diagnostic CNVs, 16 (43.2%) were identified de novo, 11 (29.7%) were identified in proband-only or duo cases with unknown inheritance, and 2 (5.4%) were inherited from an affected parent. The remaining diagnostic CNVs were inherited from parents with uncertain clinical status, were recurrent CNVs known to be associated with reduced penetrance and variable expressivity, or were found in cases in which a SNV was also identified in a gene within the CNV and associated with an autosomal recessive inheritance pattern.
Six participants (16.2%) with a diagnostic CNV were found to have an additional SNV or indel via GS that contributed to the participant's phenotype. One participant had aneuploidy (47,XXX). Two CNVs were mosaic in the proband, and one was mosaic in an unaffected parent (Table 2).24
Case ID (CSER ID) | Test modality | Age at enrollment (years) | Sex | Disease category | Phenotype | Sequenced | CNV coordinates (hg38) (ClinVar ID) | Approximate Size (kb) | Locus | Inheritance | OMIM morbid genes | Disease association [inheritance pattern] | ACMG classification [criteria] | Additional diagnostic findings on GS | Previous genetic testing |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 (61704563CSER) | TGP + GS | 14.4 | F | Neurologic (Epilepsy, IDD) | Epilepsy, Intellectual Disability, Congenital Pulmonary Stenosis | Singleton | Chr1:4421211_20203749del (2446820) |
16000.0 | 1p36 | Unknown (paternal DNA unavailable) | 42 genesa | 1p36 deletion syndrome [AD] | Pathogenic [Riggs, 2019 1A (0pt), 2A (1pt), 3C (0.9pt)] |
No | Yes |
2 (71404778CSER) | GS | 3.0 | F | Neurologic (IDD) | IDD, Autism | Trio | Chr1:147050137_148438699dup (1679693) |
1380.0 | 1q21.1 | Paternal (unknown clinical status) | GJA5, GJA8 | 1q21.1 duplication syndrome [AD] | Pathogenic [Riggs, 2019 1A (0pt), 2A (1pt), 3C (0pt)] |
No | No |
3 (61302338CSER) | TGP + GS | 16.6 | F | Neurologic (Epilepsy, IDD) | Intellectual Disability, Epilepsy, Clinical suspicion of mitochondrial defect, Cataracts, Ataxia, Panic Attacks and mitochondrial myopathy | Duo | Chr1:(1475764_1482998)_(1517413_1518921)del (1213715) |
39 .0 | 1p36.33 | Unknown (paternal DNA unavailable) | ATAD3B (partial), ATAD3A (partial) | Harel-Yoon syndrome (HAYOS) [AD, AR] | Likely Pathogenic [Riggs, 2019 1A (0pt), 3A (0pt), 4B (.15 × 2 = 0.30), 4f (0.45), 5G (0.15) + in trans with SNV] |
ATAD3A c.1577C>G, p.(Ser526Trp) Uncertainb; Maternal inheritance | Yes |
4 (71947014CSER) | GS | 8.1 | M | Neurologic (IDD) | Moderate to severe IDD, Speech delay, Autism, ADHD, Microcephaly, Dysgenesis of Corpus Callosum, pes planovalgus, Bladder incontinence, Constipation | Trio | Chr3:(195832526_195833012)_(197340883_197342279)del (1679598) |
1500.0 | 3q29 | De novo | TFRC, PCYT1A, RNF168, NRROS, CEP19 | 3q29 microdeletion syndrome [AD] | Pathogenic [Riggs, 2019 1A (0pt), 2A (1pt), 5A (0.30)] |
No | No |
5 (61712402CSER) | TGP + GS | 17.2 | M | Neurologic (IDD) | IDD, Microcephaly, Cataracts, Central auditory processing disorder, Congenital hypothyroidism | Singleton | Chr4:3101589-7589518del (1679802) |
4500.0 | 4p16.3p16.1 | Unknown (paternal and maternal DNA unavailable) | HTT, DOK7, LRPAP1, ADRA2C, MSX1, EVC, EVC2, WFS1 | N/A | Likely Pathogenic [Riggs, 2019 1A (0pt), 2A (0pt), 3C (0.9pt)] |
GJA3 c.559C>G, p.(Pro187Ser) Likely Pathogenic variant; Inheritance unknown | No |
6 (61489324CSER) | TGP + GS | 18.9 | F | Neurologic (IDD), Immunologic | Primary Immunodeficiency, seizures, cerebral palsy | Duo | chr5:141572725_141615700del (homozygous) (1679561) |
42.9 | 5q31.3 | Unknown (heterozygous in asymptomatic mother) | DIAPH1 (exons 2–16) | Seizures, cortical blindness, microcephaly syndrome [AR, MIM 616632]; Deafness, autosomal dominant 1, with or without thrombocytopenia [AD, MIM 124900] | Likely pathogenic [Richards et al, 2015 PVS1, PM2_Supp] |
No | Yes |
7 (71012573CSER) | GS | 7.8 | F | Neurologic (Epilepsy, IDD) | Seizures, Intellectual Disability | Trio | Chr5:177181572_177225407del (1696474) |
43.8 | 5q35.3 | De novo | NSD1 (exon 3–5) | Sotos syndrome 1 [AD] | Likely Pathogenic [Richards, 2015 PVS1, PS2_Supp, PM2_Supp] |
No | Yes |
8 (61875918CSER) | TGP + GS | 5.8 | M | Neurologic (Epilepsy, IDD) | Epilepsy, Intellectual Disability, Autism, tremors | Singleton | Chr6:116413396_123326959del (978186) |
6900.0 | 6q22.1 | De novo | 19 genes, including NUS1 | Intellectual disability syndrome with seizures [AD] | Pathogenic [Riggs, 2019 1A (0pt), 2A (1pt), 3C (0.9pt), 4C (0.3pt), 5A (0.15pt)] |
No | No |
9 (71213436CSER) | GS | 15.1 | M | Neurologic (IDD), Cardiac | IDD, GDD, Hypertonia, Neurogenic bladder, Cryptorchidism, Hypospadias, Abnormality of the eye, Autistic behavior, Hypothyroidism, Hypertension, Ataxia, Cardiomyopathy, Achilles tendon contracture, Gastroesophageal reflux, Myelopathy, Sleep disturbance, Paraparesis, Gastrointestinal atresia, Bowel incontinence, Scoliosis, Lower limb muscle weakness, Oppositional defiant disorder, Stage 1 kidney disease, Overweight, Low levels of vitamin D | Singleton | chr7:73304197-74727473del (N/A) |
1423.617 | 7q11.23 | Unknown (paternal DNA unavailable) | ELN, DNAJC30 | Williams syndrome [AD] | Pathogenic [Riggs, 2019 1A (0pt), 2A (1pt), 3C (0.45pt), 4C (0.9pt)] |
No | Yes |
10 (61705948CSER) | TGP + GS | 19.6 | M | Neurologic (IDD) | Epilepsy, Learning Disability, Intellectual Disability, Autism, ADD | Trio | Chr7:(75058300_75420000)_79083658del (1342322) |
3700.0 | 7q11.23-q21.11 | Paternal (affected parent) | POR, MDH2, HSPB1, ZP3, MAGI2, YWHAG | 7q11.23 deletion syndrome [AD] | Pathogenic [Riggs, 2019 1A (0pt), 2A (1pt)] |
No | No |
11 (61155736CSER) | TGP + GS | 6.9 | M | Cardiac | Long QT Syndrome | Singleton | Chr7:150947163-150948277del (992792) |
1.1 | 7q36.1 | Maternal (affected parent) | KCNH2 (exon 12–13) | Long QT syndrome 2 [AD] | Likely Pathogenic [Richards, 2015 PVS1, PM2_Supp] |
No | No |
12 (71288407CSER) | GS | 10.5 | M | Neurologic (Epilepsy, IDD) | Seizure (generalized onset type—motor); Mild to severe IDD, syndromic with Autism Spectrum Disorder | Trio | Chr7:152052676-152295696del (1342507) |
243 | - | De novo | KMT2C (exon 7–59) | Kleefstra syndrome 2 [AD] | Pathogenic [Riggs, 2019 1A (0pt), 2D-4 (1pt), 4B (0.3pt)] |
No | No |
13 (61162273CSER) | TGP + GS | 1.1 | F | Neurologic (IDD) | Prematurity, Developmental Delay, Microcephaly, Short stature | Singleton | Chr9:137661384_137714409del (1341714) |
12.3 | 9q34.3 | Paternal (unknown clinical status) | EHMT1 (exon 2) | Kleefstra syndrome 1 [AD] | Likely Pathogenic [Richards, 2015, PVS1, PM2_Supp] |
ASXL3 c.4826G>A, p.(Trp1609Ter) Likely Pathogenic variant; De novo | No |
14 (71885569CSER) | GS | 12.8 | M | Neurologic (Epilepsy, IDD) | Seizures, IDD, Headaches | Duo | Chr10:45704708_(49974954_50015268)del (2498203) |
4300.0 | 10q11.22q11.23 | Unknown (paternal DNA unavailable) | RBP3, GDF2, CHAT, ERCC6, SLC18A3 | N/A | Likely Pathogenic [Riggs, 2019 1A (0pt), 2 (0pt), 3 (0.9pt), 4C (−0.15pt), 4K (0.15pt)] |
No | No |
15 (61154477CSER) | TGP + GS | 20.0 | M | Neurologic (IDD), Cardiac | Congenital musculoskeletal anomalies, Developmental Delay, Cerebellar ataxia, Cerebellar atrophy, Dysarthria, Dysmetria, Congenital absence of the right pulmonary artery with hypoplasia of right lung, Asthma | Singleton | Chr10:100242158_100249954del (996893) |
7.8 | 10q24.31 | Paternal (unaffected parent) | CWF19L1 (exon 8–10) | Spinocerebellar ataxia 17 [AR] | Pathogenic [Richards, 2015 PVS1, PM3, PM2_Supp] |
CWF19L1 c.942del, p.(Pro315Ter) Likely Pathogenic variant; Maternally inherited | Yes |
16 (61814200CSER) | TGP + GS | 9.0 | M | Neurologic (IDD) | Intellectual Disability, Autism, Optic nerve atrophy, Hypotonia, ADHD, Apraxia, Anxiety | Singleton | Chr12:23541822_23554123del (978115) |
12.3 | 12p12.1 | De novo | SOX5 (exon 12–13) | Lamb-Shaffer syndrome [AD] | Likely Pathogenic [Richards, 2015 PVS1, PS2, PM2_Supp] |
No | Yes |
17 (61543781CSER) | TGP + GS | 2.0 | M | Neurologic (IDD) | Developmental Delay, Protruding tongue, Hypotonia, Poor language | Singleton | Chr12:23845446_23848974del (978097) |
3.5 | 12p12.1 | De novo | SOX5 (exon 3) | Lamb-Shaffer syndrome [AD] | Likely Pathogenic [Richards, 2015 PVS1, PS2, PM2_Supp] |
No | No |
18 (71185024CSER) | GS | 7.3 | M | Neurologic (IDD) | Abnormal palate morphology, Hearing impairment, Ptosis, Behavioral abnormality, Abnormal skull morphology, Eczema, Muscular hypotonia, Global developmental delay, Abnormality of the voice, Nasal obstruction, Dysphagia, Respiratory distress, Neurological speech impairment, Sleep disturbance, Short stature, Abnormality of the palmar creases, Feeding difficulties, Toe walking | Singleton | Chr12:45857439_45861903del (N/A) |
4.46 | 12q12 | Unknown (paternal DNA unavailable) | ARID2 (exon 16) | Coffin-Siris syndrome 6 [AD] | Likely Pathogenic [Richards, 2015 PVS1, PS2, PM2_Supp] |
No | Yes |
19 (61025567CSER) | TGP + GS | 1.3 | M | Neurologic (IDD) | Developmental Delay, Intellectual Disability, Hypotonia, Failure to Thrive | Trio | Chr13:101235731_101238846del (1098701) |
3.1 | 13q32.3-q33.1 | Maternal (unaffected parent) | NALCN (exon 12) | Hypotonia, infantile, with psychomotor retardation and characteristic facies 1[AD, AR] | Likely Pathogenic [Richards, 2015 PVS1_Strong, PM3, PM2_Supp] |
NALCN c.3022C>T, p.(Arg1008Ter) Pathogenic variant; Paternally inherited | Yes |
20 (71131851CSER) | GS | 2.9 | M | Neurologic (Epilepsy) | Seizures | Trio | Chr15:22698177_(23120963_23380983)del (2498204) |
423 | 15q11.2 | Paternal (unknown clinical status) | NIPA1, NIPA2, CYFIP1, TUBGCP5 | 15q11.2 deletion syndrome [AD] | Pathogenic [Riggs, 2019 1A (0pt), 2A (1pt)] |
No | No |
21 (61221090CSER) | TGP + GS | 1.5 | F | Neurologic (IDD) | Moderate to severe Global developmental delay, Autism, Speech delay | Trio | Chr15:22804175_30375696dup (1098669) |
7600.0 | 15q11.2q13.2 | De novo, mosaicc | NIPA1, MKRN3, MAGEL2, NDN, SNRPN, UBE3A, GABRB3, GABRA5, OCA2, HERC2, NSMCE3 | 15q11-q13 duplication syndrome [AD] | Pathogenic [Riggs, 2019 1A (0pt), 2A (1pt), 3A (0pt)] |
No | No |
22 (71591378CSER) | GS | 1.9 | F | Neurologic (IDD) | Global Developmental Delay, decreased eye contact, Mild Diffuse Hypotonia, Micrognathia, Low set ears | Trio | Chr15:23370759_30529376del (1342511) |
7400.0 | 15q11.2-q13.2 | De novo | MKRN3, MAGEL2, NDN, SNRPN, UBE3A, GABRB3, GABRA5, OCA2, HERC2, NSMCE3 | Angelman syndrome [AD] | Pathogenic [Riggs, 2019 1A (0pt), 2A (1pt), 5A (0.30pt)] |
No | No |
23 (71937347CSER) | GS | 1.8 | F | Neurologic (Epilepsy, IDD) | Seizure, Developmental delay, Epilepsy, Increased blood pressure, Infantile spasms | Trio | Chr16:(14683149_14692101)_(16527136_16536956)del (2498201) |
1500.0 | 16p13.11 | Maternal (unknown clinical status) | MYH11, ABCC6, NDE1, ABCC1 | 16p13.11 microduplication syndrome [AD] | Likely Pathogenic [Riggs, 2019 1A (0pt), 2A (1pt), 5C (−0.1pt)] |
IRF2BPL c.1436C>T, p.(Pro479Leu) Likely Pathogenic variant; De novo. Triploidy X | No |
24 (71997110CSER) | GS | 0.6 | M | Neurologic (IDD) | Hand clenching, Hypotonia, Global developmental delay, Patent foramen ovale, Atrial flutter, Feeding difficulties | Singleton | Chr16:15395619_18151025del (N/A) |
2750.0 | 16p13.11 | De novo | NDE1, MYH11, ABCC1, ABCC6, XYLT1 | 16p13.11 microdeletion syndrome [AD] | Pathogenic [Riggs, 2019 1A (0pt), 2A (1pt)] |
No | No |
25 (71353825CSER) | GS | 9.3 | M | Neurologic (IDD) | IDD, autism, overweight | Trio | Chr16:(28606186_28814284)_(29032129_29113100)del (1679691) |
220 | 16p11.2 | De novo | TUFM, CD19, LAT | 16p11.2 deletion syndrome [AD] | Pathogenic [Riggs, 2019 1A (0pt), 2A (1pt)] |
No | No |
26 (71750014CSER) | GS | 1.6 | F | Neurologic (IDD) | Delayed speech and language development, Global developmental delay, Motor delay, Generalized hypotonia, Constipation | Singleton | Chr16:29640739_30188023del (NA) |
547 | 16p11.2 | De novo | KIF22, PRRT2, TLDC3B, ALDOA, TBX6, CORO1A | 16p11.2 deletion syndrome [AD] | Pathogenic [Riggs, 2019 1A (0pt), 2A (1pt)] |
No | No |
27 (61828944CSER) | TGP + GS | 0.9 | M | Neurologic (IDD) | Developmental Delay, Dysmorphic Features, Ectopic Kidney | Trio | Chr16:(38502679_46385317)_61223349dup (1184380) |
14800.0 | 16q11.2q21 | De novo | 94 genesd | 6q11.2q21 duplication [AD] | Pathogenic [Riggs, 2019 1A (0pt), 2B (0pt), 3C (0.90), 4C (0.3), 5A (0.15)] |
No | No |
28 (71712092CSER) | GS | 12.2 | F | Neurologic (Epilepsy, IDD) | Seizures, developmental delay, IDD, dysmorphic features | Duo | Chr16:53818483_57631312del (1696636) |
3800.0 | 16q12.2q21 | Unknown (paternal DNA unavailable) | IRX5, MMP2, SLC6A2, CES1, GNAO1, BBS2, NUP93, SLC12A3, CETP, RSPRY1, ARL2BP, COQ9, ADGRG1 | N/A | Pathogenic [Riggs, 2019 1A (0pt), 2A (1pt), 3C (0.90), 4C (0.3)] |
No | Yes |
29 (71366107CSER) | GS | 3.8 | M | Neurologic (Epilepsy, IDD) | Epilepsy, Speech delay, Café au lait spots | Duo | Chr16:9703465_9888417del (1696577) |
185 | 16p13.2 | Maternal (unknown clinical status) | GRIN2A | Epilepsy, focal, with speech disorder and with or without impaired intellectual development [AD] | Likely Pathogenic [Richards, 2015 PVS1_Strong, PM2_Supp] |
No | No |
30 (71734377CSER) | GS | 16.0 | M | Cardiac, Immunologic | Truncus arteriosus, interrupted aortic arch, Ventricular septal defect, Atrial septal defect, abnormal facial features, T cell lymphoma | Duo | Chr17:44846894_44962103del (1342375) |
115 | 17q21.31 | Unknown (paternal DNA unavailable) | EFTUD2, GFAP, CCDC103 | Koolen-De Vries syndrome [AD] | Pathogenic [Riggs, 2019 1A (0pt), 2A (1pt)] |
No | No |
31 (71883879CSER) | GS | 10.7 | M | Neurologic (IDD) | Microcephaly, Macrotia, Protruding ear, Hypotelorism, Growth delay, Short stature, Developmental delay, Weakness of facial musculature | Singleton | Chr17:44859379_44861949del (N/A) |
2.5 | 17q21.31 | De novo | EFTUD2 (exon 17–18) | Mandibulofacial dysostosis with microcephaly [AD] | Pathogenic [Richards, 2015 PVS1_Strong, PS2_Mod, PM2_Supp] |
No | Yes |
32 (61259726CSER) | TGP + GS | 1.0 | F | Neurologic (Epilepsy, IDD) | Epilepsy, Hypotonia, Developmental Delay | Singleton | Chr18:1_15400036del (2446821) |
15400.0 | 18p11.32p11.21 | Unknown (parental balanced translocation suspected) | TYMS, SMCHDQ, LPIN2, TGIF1, LAMA1, NDUFV2, APCDD1, PIEZO2, GNAL, TUBB6, AFG3L2, PSMG2, MC2R | 18p deletion syndrome [AD] | Pathogenic [Riggs, 2019 1A (0pt), 2A (1pt)] |
No | No |
Chr17:1_410133dup (2446822) |
450 | 7p13.3 | VPS53 (partial) | Pontocerebellar hypoplasia, type 2E [AR] | N/A | ||||||||||
33 (61563603CSER) | TGP + GS | 0.4 | F | Neurologic (Epilepsy) | Neonatal Seizures | Singleton | Chr20:63255263_63498365del (1341868) |
243 | 20q13.33 | Maternal (asymptomatic, mosaic in parent)c | ARFGAP1 (partial), CHRNA4, KCNQ2, EEF1A2 (partial) | 20q13.33 microdeletion syndrome [AD] | Likely Pathogenic [Riggs, 2019 1A (0pt), 2A (1pt), 3A (0pt), mother mosaic (−0.10)] |
No | No |
34 (61595454CSER) | TGP + GS | 1.9 | M | Neurologic (Epilepsy, IDD) | Motor Delay, Developmental Delay, Hypotonia, Seizures | Singleton | Chr21:33548991_33549505delinsGTTG (996849) |
0.5 | 21q22.11 | De novo | SON (spanning intron 2 and part of exon 3) | ZTTK syndrome [AD] | Likely Pathogenic [Richards et al, 2015 PVS1_Strong, PS2_Supp, PM2_Supp] |
No | No |
35 (71171804CSER) | GS | 19.5 | M | Neurologic (Epilepsy, IDD), Cardiac, Immunologic | Tetralogy of Fallot, Epilepsy, Intellectual Disability, Autism, Recurrent infections, Failure to thrive, Autoimmune disease, cleft palate, Velopharyngeal insufficiency, Redundant colon, Lethargy and weakness, Thrombocytopenia | Singleton | Chr22:(18206749_18274663)_(21110254_21234326)del (2498202) |
3000.0 | 22q11.2 | Unknown (parental samples not provided) | PRODH, DGCR2, SLC25A1, CDC45, GP1BB, TBX1, TXNRD2, COMT, TANGO2, RTN4R, SCARF2, PI4KA, SERPIND1, SNAP29, LZTR1 | 22q11 deletion syndrome [AD] | Pathogenic [Riggs, 2019 1A (0pt), 2A (1pt), 3C (0.9pt)] |
No | Yes |
36 (71000275CSER) | GS | 6.3 | M | Neurologic (Epilepsy, IDD) | Seizures, IDD, Autism | Trio | ChrX:123221813_124917630dup (1701783) |
1700.0 | Xq25 | De novo | SH2D1A, STAG2, XIAP, THOC2, GRIA3 | Xq25 duplication syndrome [AD] | Pathogenic [Riggs, 2019 1A (0pt), 2A (1pt)] |
No | No |
37 (61743903CSER) | TGP + GS | 2.3 | F | Neurologic (Epilepsy, IDD) | Intractable epilepsy, Developmental Delay | Singleton | ChrX:18592341_18596203del (1325527) |
3.8 | Xp22.13 | De novo, mosaicc | CDKL5 (exon 5–22) | Developmental and epileptic encephalopathy 2 [AD] | Likely Pathogenic [Richards et al, 2015 PVS1_Strong, PS2_Supp, PM2_Supp] |
No | Yes |
- Abbreviations: AD, autosomal dominant; ADD, attention deficit disorder; ADHD, attention-deficit/hyperactivity disorder; AR, autosomal recessive; F, female; GDD, global developmental delay; GS, genome sequencing; IDD, intellectual and developmental disabilities; kb, kilobase; Mb, megabase; TGP, targeted gene panel.
- Note: Variants were not submitted to ClinVar for five cases (#9, 18, 24, 26,31), as the clinical lab who performed GS for these participants, by standard practice, does not report structural variants to ClinVar.
- a PLA2G5, PLA2G2A, EMC1, ALDH4A1, PAX7, PAID6, PADI4, PADI3, SDHB, ATP13A2, EPHA2, CLCNKB, CLCNKA, SPEN, CELA2A, CTRC, VPS13D, MFN2, PLOD1, NPPA, CLCN6, MTHFR, MAD2L2, UBIAD1, MTOR, MASP2, TARDBP, PEX14, KIF1B, NMNAT1, PIK3CD, H6PD, ENO1, RERE, SLC45A1, PARK7, PER3, CAMTA1, PLEKHG5, ESPN, CHD5, NPHP4
- b Despite being classified as a variant of uncertain significance by the laboratory, the clinical team determined this variant to be contributory to the participant's phenotype in conjunction with the CNV.
- c Two CNVs were identified as mosaic in the proband (case 21 and 37) and one CNV was identified as mosaic in the proband's asymptomatic mother (case 33). Minimum read depth coverage was 30× ± 3×. Further case details can be found in Odgis et al., 2023 (PMID: 36563179). Case 21: Mosaic gain with 3–4 copies in proband's blood. Case 33: Estimated 20%–30% in mother's blood (based on array data). Case 37: 21.6% variant allele fraction (11/51) for 3.8 kb deletion at the 5′ breakpoint; 23.8% variant allele fraction (10/42) at the 3′ breakpoint in proband's blood.
- d Some OMIM morbid genes include SALL1, ZNF423, NOD2, CYLD, GNAO1, CETP, CNOT1, ORC6, PHKB, SLC12A3, ADGRG1, CNGB1.
3.3 GS for detection of CNVs: Examples from the NYCKidSeq program
Here we describe several cases that illustrate the unique ability of GS to detect clinically relevant CNVs, including diagnostic CNVs in individuals with previously normal CMA, identification of biallelic SNV and CNV associated with an autosomal recessive disorder, CNVs that may be missed by other methodologies, and identification of full CNVs that may be only partially detected by other methodologies.
3.3.1 GS can detect CNVs in individuals with previously uninformative CMA
In nine cases, GS detected a diagnostic CNV in a participant who previously received an uninformative CMA. This highlights the ability of GS to detect smaller CNVs, especially those under 50 kb, which often go undetected by standard CMA. An example is case #31, a 10-year-old male of El Salvadorian and Costa Rican descent with developmental delay, learning disability, dysmorphism (microcephaly, metopic prominence, cupped prominent ears, carp-shaped mouth, and a cranial nerve VII palsy), constitutional growth delay, and a seizure at age 1. Family history was unremarkable. Previous Olgio-SNP CMA was negative. Through the study, GS identified a de novo 2.5-kb deletion at 17q21.31 encompassing exons 17–18 of EFTUD2, consistent with a diagnosis of mandibulofacial dysostosis, Guion-Almeida type (MFDGA): an autosomal dominant condition characterized by head and face malformations and intellectual disability. Other abnormalities include esophageal atresia, tracheoesophageal fistula, cardiac anomalies, short stature, spine anomalies, and epilepsy.25-27 Detection of this CNV led to a molecular diagnosis for the participant and appropriate follow-up care.
3.3.2 GS can detect CNVs, SNVs, and indels, which together may contribute to a molecular diagnosis
There were six cases for which GS detected a diagnostic CNV and SNV or indel that together contributed to a molecular diagnosis. In three of those, a SNV and CNV intersecting the same gene provided a diagnosis of autosomal recessive disease. In traditional clinical testing workflows, CNVs and SNVs may not be detected by a single test. GS however can detect multiple variant types simultaneously. An example is case #19, a 15-month-old male of Ecuadorian descent with a history of GDD, hypotonia, failure to thrive, hand stereotypes, café au lait spots and dysmorphisms (right posterior plagiocephaly, strabismus, posteriorly rotated, large ears). Previous testing included an aCGH and MECP2 targeted testing, which were negative. Russell-Silver methylation studies were also negative, and a chromosomal SNP microarray reported a 153-kb interstitial deletion of long arm chromosome 2, classified as a VUS. Through the RCT, the neurodevelopmental TGP performed was negative. However, GS reported two variants in trans in NALCN on chromosome 13: a nonsense, paternally inherited pathogenic SNV (c.3022C>T, p.(Arg1008Ter)) and a maternally inherited, likely pathogenic 3.1-kb deletion (c.1267-924_1434 + 2024del), which includes exon 12. Homozygous/compound heterozygous loss-of-function NALCN variants are associated with autosomal recessive infantile hypotonia with psychomotor retardation and characteristic facies (IHPRF1), characterized by hypotonia, motor delay, speech delay/absence of speech, and dysmorphic facial features. Some children have seizures, autism, and/or microcephaly.28-32 GS not only provided a diagnosis for the participant, but also provided a recurrence risk for his parents, who otherwise had an unremarkable family history.
3.3.3 GS can identify the full extent of a CNV, while TGP may not
Of the 18 cases from the RCT with a diagnostic CNV identified by GS, there were four in which the CNV was only partially identified by TGP.20 An example is case #27, a 10-month-old male of Sephardic and Ashkenazi Jewish ancestry with GDD, hypotonia, ectopic kidney, and dysmorphic facies. Family history was unremarkable; no previous genetic testing was performed. Neurodevelopmental TGP identified a de novo, ~4.4-Mb duplication on chromosome 16 (chr16:53601960–58020315), classified as uncertain. The CNV contained full duplications of GNAO1 and ADGRG1, which were included in the panel. GNAO1 is associated with an autosomal dominant disorder with a loss of function mechanism.33 Gain-of-function variation in ADGRG1 is associated with autosomal dominant neurodevelopmental disorder with involuntary movements (NEDIM), characterized by delayed psychomotor development and hyperkinetic involuntary movements.34 In the absence of other testing, CMA was recommended on the TGP report to delineate the full size of the CNV. With GS however, the de novo CNV was determined to be a 14.8-Mb duplication (Chr16:((38502679_46385317)_61223349)) containing 94 OMIM genes. Individuals with similar duplications (dup(16q11.2q21)) have been described in the literature with GDD, ID, dysmorphic facial features, speech delay, and variable features including recurrent infections, behavioral abnormalities, hyper-/hypo-tonia, and MRI abnormalities.35, 36 The 14.8-Mb CNV was reported as pathogenic, leading to a diagnosis of 16q11.2 duplication syndrome. While TGP provided information that could have led to eventual diagnosis, GS provided a diagnosis in a single test. By delineating the breakpoints of the CNV, GS allowed for immediate clinical evaluation for additional features associated with the full CNV.
3.3.4 GS can detect intragenic CNVs that may be missed by TGP, despite the gene being included on the test
Due to limitations in testing methodologies, intragenic CNVs may not be detected through TGP. Of the 18 RCT cases with a diagnostic CNV identified via GS, five had a CNV containing a gene included on the TGP, yet the TGP did not identify the CNV.20 This occurred in case #17, a 2-year-old male of Ashkenazi Jewish descent with a history of GDD, autism, hypotonia, poor language skills, large tongue, and microphallus with no previous genetic testing. Family history was unremarkable. Neurodevelopmental TGP identified a maternally inherited, heterozygous VUS in SHANK2 (c.5200_5202del). GS, however, identified a heterozygous, de novo, LP CNV in SOX5 (c.271-2781_481 + 537del, exon 3), consistent with diagnosis of autosomal dominant Lamb-Shaffer syndrome, characterized by absent or delayed speech, developmental delay, and variable features including skeletal abnormalities, optic nerve atrophy, and epilepsy.37-39 While this gene was included on the TGP, the assay was unable to detect a CNV of this size. This finding provided a diagnosis for the participant and clinical evaluation for additional corresponding phenotypes.
4 DISCUSSION
Recurrent deletion and/or duplication syndromes as well as rare CNVs are known causes of a variety of pediatric phenotypes;40, 41 however, few studies have systematically assessed CNVs in large cohorts using GS.42, 43 We describe our experience reporting CNVs by GS in 1052 pediatric participants in the NYCKidSeq program who received GS and TGP. GS detected a diagnostic variant in 17.4% (183/1052) of participants, of which 20.2% (37/183) were diagnostic CNVs. An additional 32 participants had one or more CNVs classified as uncertain by GS, for a total of 69 cases with a CNV reported (69/1052; 6.6% of total cases).
In various neurodevelopmental cohorts, a causative CNV is detected in ~10%–20% of cases using microarray and/or karyotype.44-46 Previous studies also suggest that the rate of diagnostic CNVs may vary by phenotype4, 47-49 and that more complex phenotypes have a higher burden of clinically significant CNVs.4, 48 We observed that CNVs were more likely to be detected in individuals with multisystemic phenotypes. For individuals with multiple phenotypes and GS diagnosis, 29.4% of diagnoses were due to a CNV finding, compared to 20.1% of diagnosed cases with a neurologic phenotype alone and 12.5% with a cardiac phenotype alone.
Previous studies have suggested that CNV detection by GS is at least as sensitive as microarray.9 The limit of detection for CMA is based upon the methodology and the number, size, and probe distribution of each array (Supplemental H). Laboratories may also have varying internal guidelines for the size or number of probes required within a region for reporting. Given the technical variability in detecting CNVs and reporting guidelines between laboratories, it is difficult to say with certainty how many of the CNVs identified by GS may have been missed by CMA. Using a guideline of <200 kb for duplications and <50 kb for deletions as array detection limitations, GS identified five duplications and 16 deletions that were below these thresholds. Thirteen were reported as LP/P, and eight were reported as VUS. This suggests that 28.7% of reported CNVs (21/73) may have been missed by CMA.
In 13 cases, a diagnostic CNV was detected by GS that was not reported by a concurrent NGS-based TGP. TGP may utilize a variety of methodologies to call CNVs and are limited to detecting CNVs that include genes on the panel. TGP and ES may also detect CNVs encompassing multiple exons; however, the limit of detection is variable and dependent upon multiple factors intrinsic to the specific methodologies and bioinformatics pipelines utilized. For example, CNV callers for ES typically have a limit of detecting a minimum of three consecutive exon alterations. This was true for case #16, where the two-exon deletion in SOX5 was not detected by previous ES. This suggests that the eight cases with a deletion/duplication less than 3 exons in size may not have been identified via ES. Exome-based testing methodologies including ES and some TGPs typically rely on read depth-based strategies for CNV calling, which also can be problematic secondary to capture and PCR steps of those methodologies. GS, however, can utilize unbiased read-depth and split-read and/or read-pair methods for CNV detection.
Eight of 13 cases in which GS alone detected the CNV was due to the absence of a gene within the CNV on the TGP. In the remaining five, a gene within the CNV was included on the panel but was not reported, likely due to technical limitations of CNV detection with TGP. Further details regarding the differences in TGP and GS testing modalities in our cohort are described by Abul-Husn et al.20
Three individuals had a diagnostic CNV as well as a SNV/indel in a gene within the CNV. In two (CWF19L1, 7.8-kb deletion; NALCN, 3.1-kb deletion), the variants were confirmed to be in trans, and in the third (ATAD3A, 39-kb deletion), they were presumed in trans based on inferred inheritance. In each case, the CNV size was below standard limits of CMA detection. Additionally, the deletion containing ATAD3A is within a complex region with low copy repeats, which makes probe design and detection difficult with CMA. In the absence of comprehensive testing like GS, these cases may have remained unsolved depending on availability and feasibility of additional single-gene testing.
While we demonstrate that GS is capable of identifying small and large CNVs, there are limitations to short-read GS technology, mostly owing to lower coverage than TGP or ES and short read lengths making mapping difficult in highly repetitive or complex regions.50 Caveats to GS include limited knowledge on the effect of noncoding variants (vs. RNAseq) and lower fold coverage (vs. TGP), which reduces the power to detect low (<10%–15%) allele fraction mosaic variants. Although some mosaic variants were detected in our dataset, GS sensitivity to detect mosaic CNVs remains unknown. GS pipelines can effectively identify balanced translocations and other balanced SVs, but the accuracy of functional annotations and filtration criteria are not as robust as those for other variant types, and calling balanced SVs from short-read sequencing is not standard on clinical testing. As this study relied on the clinically validated pipelines at each laboratory, we therefore did not analyze all types of SVs. While not broadly available, long-read sequencing is an emerging technology with potential to better capture balanced SVs and variants in difficult-to-map regions.
There are limitations to this study. While we highlight the utility of GS to detect CNVs compared to other testing modalities, this was not the primary goal of the project. In addition, the sample size of participants with a diagnostic CNV was small, limiting our ability to make meaningful statistical comparisons. Systematic evaluation of GS versus CMA or ES for CNV detection should be explored. In addition, most of our participants had a neurologic phenotype, so we were not sufficiently powered to generalize the frequency of diagnostic CNVs among different pediatric phenotypes. Finally, as two laboratories were used to generate GS results, there may be unknown variability in the detection and classification of variants.
GS is a comprehensive test with the ability to detect multiple variant types in a single test. Our study demonstrated the ability of GS to detect CNVs of various sizes, including those below the standard limit of detection for conventional technologies. Of the individuals found to have a diagnostic CNV by GS, 35.1% had some previous testing. Outside of GS, there are current limitations in the ability to detect CNVs smaller than the technical ability of CMA and outside the capabilities of ES or single-gene testing. This study has highlighted specific examples of the utility of GS for CNV identification in diverse pediatric patients with suspected genetic etiologies, showing the potential benefits of GS as a first-tier clinical genetic test.
AUTHOR CONTRIBUTIONS
Conceptualization: Katherine E. Bonini, Amanda Thomas-Wilson, Bruce D. Gelb, Carol R. Horowitz, Vaidehi Jobanputra, Eimear E. Kenny, Melissa Wasserstein. Data curation: Katherine E. Bonini, Amanda Thomas-Wilson, Nicole R. Kelly, Michelle A. Ramos, Beverly J. Insel, Laura Scarimbolo, Priya N. Marathe. Formal analysis: Katherine E. Bonini, Amanda Thomas-Wilson, Nicole R. Kelly, Beverly J. Insel, Laura Scarimbolo, Noura S. Abul-Husn, Priya N. Marathe, Monisha Sebastin, Jacqueline A. Odgis, Avinash Abhyankar, Miranda Di Biase, Katie M. Gallagher, Saurav Guha, Volkan Okur, Atteeq U. Rehman, Shruti Phadke, Caroline Nava, Michelle A. Ramos, Lama Elkhoury, Lisa Edelmann, George A. Diaz, John M. Greally, Vaidehi Jobanputra, Sabrina A. Suckiel, Carol R. Horowitz, Melissa Wasserstein, Eimear E. Kenny, Bruce D. Gelb. Funding acquisition: Bruce D. Gelb, Carol R. Horowitz, Eimear E. Kenny, Melissa Wasserstein. Investigation: Katherine E. Bonini, Amanda Thomas-Wilson, Beverly J. Insel, Laura Scarimbolo. Methodology: Katherine E. Bonini, Amanda Thomas-Wilson, Vaidehi Jobanputra, Beverly J. Insel, Eimear E. Kenny. Project administration: Kelly, Ramos. Resources: Bruce D. Gelb, Carol R. Horowitz, Eimear E. Kenny, Melissa Wasserstein, Vaidehi Jobanputra. Supervision: Bruce D. Gelb, Carol R. Horowitz, Vaidehi Jobanputra, Eimear E. Kenny, Melissa Wasserstein. Visualization: Katherine E. Bonini, Amanda Thomas-Wilson, Beverly J. Insel, Priya N. Marathe. Writing-original draft: Katherine E. Bonini, Amanda Thomas-Wilson. Writing-review & editing: Katherine E. Bonini, Amanda Thomas-Wilson, Priya N. Marathe, Monisha Sebastin, Jacqueline A. Odgis, Miranda Di Biase, Kelly, Michelle A. Ramos, Beverly J. Insel, Atteeq U. Rehman, Saurav Guha, Volkan Okur, Avinash Abhyankar, Shruti Phadke, Caroline Nava, Katie M. Gallagher, Lama Elkhoury, Lisa Edelmann, Randi E. Zinberg, Noura S. Abul-Husn, George A. Diaz, John M. Greally, Sabrina A. Suckiel, Carol R. Horowitz, Eimear E. Kenny, Melissa Wasserstein, Bruce D. Gelb, Vaidehi Jobanputra.
ACKNOWLEDGMENTS
This reported research was supported by the National Human Genome Research Institute and the National Institute on Minority Health and Health Disparities of the National Institutes of Health (NIH) under Award Number U01HG009610. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The authors thank all children, parents, and families who participated in this study; the New York Genome Center, Rady Children's Institute for Genomic Medicine, and Sema4 laboratory; the members of the Mount Sinai Hospital Genomics Stakeholder Board; referring physicians in the Mount Sinai and Montefiore Health Systems.
CONFLICT OF INTEREST STATEMENT
Dr. Abul-Husn is currently employed by 23andMe, was previously employed by Regeneron Pharmaceuticals, received personal fees from Genentech, Allelica, and 23andMe, received research funding from Akcea, and serves as a scientific advisory board member for Allelica. Dr. Kenny received personal fees from Illumina, 23andMe, Allelica, and Regeneron Pharmaceuticals, received research funding from Allelica, and serves as a scientific advisory board member for Encompass Bio, Overtone, and Galateo Bio. All other authors declare they have no conflicts of interest to report.
ETHICS STATEMENT
Informed written consent was provided by participants. Ethics approval was obtained by the Icahn School of Medicine at Mount Sinai and the Albert Einstein College of Medicine Institutional Review Boards.
Open Research
DATA AVAILABILITY STATEMENT
The de-identified data from this study are available upon reasonable request. Genomic sequencing data from the NYCKidSeq program will be available through the NHGRI's Analysis Visualization and Informatics Lab-space (AnVIL). The study is registered through the database of Genotypes and Phenotypes (dbGaP, accession number phs002337.v1.p1). Variant Interpretations will be submitted to the ClinVar database.