Translating Muscle RNAseq Into the Clinic for the Diagnosis of Muscle Diseases
Funding: This study was supported by Instituto de Salud Carlos III and FEDER, ‘Una manera de hacer Europa’ with grants FIS PI18/01585 (L.G.-Q. and P.G.) and PI22/01859 (L.G.-Q.), and SGR-Cat 2021 (AGAUR-Generalitat de Catalunya) through the project “Genomic medicine and rare diseases group” (2021-SGR-00835, 2023–2025, A.S.-C., M.J.R., B.R.-S., J.S., P.G., L.G.-Q.). A.S.-C. was supported by the Ministerio de Universidades (Spain) with grant FPU20/06692. B.E.-A. was supported by the predoctoral program “Joan Oró” of the Secretary of Universities and Research of the Department of Research and Universities of the Government of Catalonia with code 2024 FI-1 00075, co-financed by the European Union.
ABSTRACT
Objective
Approximately half of patients with hereditary myopathies remain without a definitive genetic diagnosis after DNA next-generation sequencing (NGS). Here, we implemented transcriptome analysis of muscle biopsies as a complementary diagnostic tool for patients with muscle disease but no definitive genetic diagnosis after exome sequencing.
Methods
In total, 70 undiagnosed cases with suspected genetic muscular dystrophies or congenital myopathies were included in the study. Muscle RNAseq comprised the analysis of aberrant splicing, aberrant expression, and monoallelic expression. In addition, existing NGS data or variant calling from RNAseq were reanalyzed, and genome sequencing was performed in selected cases. Four aberrant splicing open-source tools were compared and assessed.
Results
RNAseq established a diagnosis in 10/70 patients (14.3%) by identifying aberrant transcripts produced by single nucleotide variants (7/10) or copy number variants (3/10). Reanalysis of NGS data allowed the diagnosis in 9/70 individuals (12.9%). Based on this cohort, FRASER was the tool that reported more splicing outlier events per sample while showing the highest accuracy (81.26%).
Conclusions
We demonstrate the utility of RNAseq in identifying causative variants in muscle diseases. Evaluation of four aberrant splicing tools allowed efficient identification of most pathogenic splicing events, obtaining a manageable number of candidate events for manual inspection, demonstrating feasibility for translation into a clinical setting. We also show how the integration of omic technologies reduces the turnaround time to identify causative variants.
Abbreviations
-
- ACMG
-
- American College of Medical Genetics and Genomics
-
- CK
-
- creatine kinase
-
- CNV
-
- copy number variant
-
- ES
-
- exome sequencing
-
- GS
-
- genome sequencing
-
- LP
-
- likely pathogenic
-
- MRI
-
- magnetic resonance imaging
-
- NGS
-
- next-generation sequencing
-
- NMD
-
- neuromuscular disorder
-
- PSI
-
- percentage spliced in
-
- RIN
-
- RNA integrity number
-
- RNAseq
-
- RNA sequencing
-
- SNV
-
- single nucleotide variant
-
- SV
-
- structural variant
-
- TPM
-
- transcripts per million
-
- VC
-
- variant calling
-
- VUS
-
- variant of uncertain significance
1 Introduction
Muscular dystrophies and congenital myopathies are a wide group of rare neuromuscular disorders (NMD) associated with primary alterations in the muscle fiber. While muscle weakness is present in most patients, other manifestations of these diseases include cardiomyopathy, cognitive impairment, respiratory insufficiency, arthrogryposis, or contractures. Onset may occur from the neonatal period to adulthood, with genetic and clinical overlap, where the same gene may sometimes cause both neonatal-onset severe congenital myopathy and adult-onset myopathy. For these patients, a molecular diagnosis allows appropriate disease management, access to clinical trials, approved mutation-specific therapies, and genetic and reproductive counseling.
Next-generation sequencing (NGS) has led to the discovery of an increasing number of genes, and until now more than 680 genes have been linked to NMDs [1]. However, after exome sequencing (ES), around half of patients with muscle diseases remain without a definitive genetic diagnosis [2-5], probably due to difficulties in interpreting and reclassifying variants of uncertain significance (VUS), failure to detect pathogenic variants not covered by ES (variants in regulatory regions, structural variants (SVs) or deep intronic variants), or variants in as yet undiscovered disease-associated genes. An ES limitation overcome by genome sequencing (GS) is that GS explores non-coding regions. However, the diagnostic uplift of GS compared to ES remains controversial [5-8]. While the number of variants detected by GS is increasing, the ability to properly interpret these variants at the mRNA or protein level is limited, and, if no additional functional studies are available, most rare variants are classified as VUS according to American College of Medical Genetics and Genomics (ACMG) guidelines [9, 10]. Therefore, efficient functional tests need to be implemented to validate these variants routinely.
RNA analysis is a promising approach to increasing diagnostic yield after genomic studies, particularly when strong clinical and anatomopathological suspicion points to a single gene, e.g., Duchenne and Becker muscular dystrophy and the DMD gene [11-13]. In 2017, Cummings et al. [14] implemented a pipeline for RNA sequencing (RNAseq) analysis to screen for expression outliers and aberrant splicing events in a subset of 50 undiagnosed NMD cases, reporting a diagnostic rate of 35%. Since then, similar studies in cohorts with suspected Mendelian disorders have reported variable increased diagnostic yields (7.5%–36%) [15-22]. Given these promising results, several bioinformatic open-source workflows and tools have been developed in recent years to identify transcriptome-wide aberrant splicing and gene expression outliers more efficiently [23-29].
We evaluated muscle transcriptome analysis as a complementary diagnostic tool for a cohort of 70 participants with clinical features of muscle disease and no genetic diagnosis after ES. Sequential analysis was implemented to detect clinically relevant aberrant splicing transcripts, aberrant expression, monoallelic expression (MAE), and rare single nucleotide variants (SNVs) in known NMD genes. Then, if no candidate variant was found, analysis also focused on muscle-specific expressed genes to identify potentially novel disease-related genes. Here, we show an efficient workflow to prioritize the most pathogenic aberrant splicing in a clinical setting, and emphasize the significance of integrating complementary genetic and non-genetic tests (i.e., GS, muscle biopsy, MRI) to interpret RNAseq findings.
2 Materials and Methods
2.1 Patients
Participants with a suspected NMD, recruited from seven Spanish neuromuscular referral centres between 2018 and 2023, were included if they met the following criteria: (a) suspected genetic muscular dystrophy or myopathy, (b) previous negative or inconclusive gene panel or ES study, and (c) availability of muscle biopsy, previously obtained as part of the diagnostic process. Of the 72 participants initially enrolled, two were excluded due to excessive fibrotic or adipose tissue replacement in the muscle biopsy.
Relevant clinical data were collected from all individuals, including clinical presentation, age of onset, histopathological findings, muscle magnetic resonance imaging (MRI), and serum creatine kinase (CK) levels.
Obtained as controls were 28 additional samples: 17 muscle samples from healthy individuals, three positive samples with an identified DMD splicing defect, and eight samples from patients already diagnosed with LGMDR1 or LGMDD4. Healthy controls were either patients who underwent orthopedic surgery or patients with an initial suspicion of non-specific myopathy who finally presented strictly normal muscle biopsies.
Written informed consent was obtained from all participants. The study was approved by the Ethics Committee of the Hospital de la Santa Creu i Sant Pau (Barcelona, Spain) (Ref: IIBSP-MIO-2022-120), and all the procedures were performed in accordance with the Declaration of Helsinki.
2.2 RNA Extraction, Library Preparation, and Sequencing
Total RNA was extracted from 30 mg of frozen muscle biopsy using the Animal Tissue RNA Purification Kit (Norgen Biotek, Canada). Samples with an RNA integrity number (RIN) < 6 (measured with Agilent 2100 Bioanalyzer) were discarded. Libraries were prepared using the Illumina Stranded Total RNA Prep with Ribo-Zero Plus, and sequencing was performed on a NovaSeq 6000 sequencer (NIMGenetics, Spain) generating 150-bp paired-end reads. Samples were sequenced in three batches (batch 1, batch 2, and batch 3) as detailed in Table S1.
2.3 RNAseq Data Analysis
Raw RNA reads were trimmed with Fastp and aligned to the human genome reference GRCh37/hg19 (Ensembl version 87) with HISAT2 [30]. Figure 1A summarizes the bioinformatic pipeline, filtering, and prioritization strategies used.

2.4 Aberrant Splicing
Four open-source tools were used to assess aberrant splicing: FRASER (v1.8.1) [23], FRASER2 (v1.99.4) [24], rMATS-turbo (v4.1.2) [27, 28], and LeafCutterMD (v0.2.7) [25, 31]. Bioinformatic details are specified in File S1:Methods.
2.5 Gene Expression
Transcript assembly was performed using StringTie (-B option) to obtain the count tables, fragments per kilobase of transcript per million (FPKM), and transcripts per million (TPM) for transcripts and genes. The gene count table was the input to the OUTRIDER package (v1.20.1) [26].
2.6 Variant Calling (VC)
For VC, STAR (v2.7.10b) alignment was performed in –twopassMode basic [32]. VC was performed from STAR RNAseq BAM files following GATK Best Practices for RNA-seq short variant discovery (GATK v4.2.5.0) [33]. Variant annotation and filtering details are specified in File S1:Methods.
2.7 Monoallelic Expression (MAE)
MAE was assessed with the MAE module from DROP (v1.4.0) [34]. This analysis is based on Kremer, et al. approximation [20, 35]. Default parameters were used to consider a MAE gene: p-adjusted value 0.05 and allelic ratio cutoff 0.8. DNAseq VCF file was only available for 20 cases, which were the ones included in the MAE analysis.
2.8 Prioritization, Interpretation, and Validation
An initial splicing analysis was focused on cases with clear candidate variants or candidate genes (Table S1). If no RNA alteration was found or no candidate gene was indicated by the clinician, a stepwise analysis strategy was followed as detailed in Figure 1A.
SNVs located near aberrant splicing events (± 50 bp) were annotated from DNAseq data when available. Splicing outliers were prioritized by the presence of nearby genomic variants or altered gene expression. Based on benchmarking results, splicing outliers reported by FRASER2 were assessed first, followed by the FRASER |Δ percentage spliced in (PSI)| > 0.3 and |Δ PSI| > 0.1 cutoffs.
For genes with MAE, candidate aberrant splicing events, SNVs, and indels were extracted (Figure 1A). For the gene expression analysis, only a few NMD genes were found to be statistically significant outliers. Genes were therefore sorted by ascending z-score, and candidate aberrant splicing events and variants were extracted for the five lowest NMD genes, OMIM genes, highly muscle-expressed genes (according to the Human Protein Atlas [36], Table S2), and all genes.
SNVs, aberrant splicing events, gene expression outliers, and genes with MAE were prioritized considering in silico pathogenicity predictions, protein function, and participant phenotypes.
In selected cases (detailed in Table S1), GS was performed as described in File S1:Methods. Whenever possible, Sanger sequencing was used to validate splicing events and genomic variants (File S1:Methods). In Case 10, skin biopsy fibroblasts were cultured and stained with Collagen-VI antibody as previously described [37].
3 Results
3.1 Muscle Transcriptome Analysis in a Cohort of 70 Genetically Unsolved Cases
Ninety-eight muscle samples were analyzed, including 28 control samples and 70 undiagnosed individuals (39 males and 31 females) with a suspected genetic muscle disorder (Table S1). Of these, 12 had a positive family history of relatives with a similar condition, and 58 participants had a sporadic presentation. Sufficient expression of NMD genes was confirmed in all samples (File S1:Results; Table S3; Figure S1).
RNAseq identified an RNA alteration and allowed a definitive diagnosis for 10/70 individuals (14.28%) (Figure 1B, Table 1). Seven cases had an aberrant transcript due to an SNV, and three showed either MAE or an aberrant transcript due to the presence of a copy number variant (CNV); two of them (cases 10 and 45) required GS in addition to RNAseq to identify the causative CNV.
Case no. | Gene | Transcript | Causative variant(s) | RNA alteration |
---|---|---|---|---|
Solved cases—RNAseq | ||||
10 | COL6A1 (AD) | NM_001848.3 | c.(804+1_805-1)_(858+1_859-1)del, p.(Gly269_Pro286del) | In-frame exon skipping |
22 | RYR1 (AR) | NM_000540.3 | c.1250T>C, p.(Leu417Pro) | |
c.7833C>T, p.(Cys2611=) | Exon truncation (4-bp deletion) | |||
28 | TTN (AR) | NM_001267550.2 | c.38829del, p.(Val12944Cysfs*3) | |
c.57262G>C, p.(Val19088Leu) | Intron retention | |||
29 | GNE (AR) | NM_005476.7 | c.2086G>A, p.(Val696Met) | |
promoter GNE deletion | Monoallelic expression | |||
37 | CAPN3 (AR) | NM_000070.3 | c.140_142del, p.(Ile47del) | |
c.1746-20C>G, p.? | Multiple aberrant transcripts | |||
45 | Multiple | NC_000007.13(GRCh37/hg19):g.5669468_6368115del | Down-expression of 8 genes | |
49 | DMD (XLR) | NM_004006.2 | c.93+5590T>A | In-frame pseudoexon |
63 | RYR1 (AR) | NM_000540.3 | c.165+78_538-116del, p.(Asn56_Leu179del) | In-frame exon skipping |
c.7027G>A, p.(Gly2343Ser) | Multiple aberrant transcripts | |||
69 | CAPN3 (AR) | NM_000070.3 | c.2362_2363delinsTCATCT, p.(Arg788Serfs*14) | |
c.1746-20C>G, p.? | Multiple aberrant transcripts | |||
70 | CAPN3 (AR) | NM_000070.3 | c.2362_2363delinsTCATCT, p.(Arg788Serfs*14) | |
c.1782+1072G>C, p.? | Out-of-frame pseudoexon | |||
Solved cases—reanalysis NGS or, when unavailable, variant calling from RNAseq | ||||
1 | SMCHD1+DUX4 (DI) | NM_015295.3 | c.5602C>T, p.(Arg1868*) | |
DUX4 hypomethylation | ||||
3 | RYR1 (AD) | NM_000540.3 | c.6856C>G, p.(Leu2286Val) | |
6 | TNNT1 (AR) | NM_003283.6 | c.551_552delinsCA, p.(Arg184Pro) | |
8 | SQSTM1 / TIA1 (DI) | NM_003900.5 | c.1175C>T, p.(Pro392Leu) | |
NM_022173.4 | c.1070A>G, p.(Asn357Ser) | |||
11 | SMCHD1+DUX4 (DI) | NM_015295.3 | c.182_183dup, p.(Gln62Valfs*48) | |
DUX4 hypomethylation | ||||
26 | GNE (AR) | NM_005476.7 | c.388delA, p.(Ile130Serfs*21) | |
c.1519A>C, p.(Thr507Pro) | ||||
30 | ACTA1 (AD) | NM_001100.4 | c.82_83delinsTG, p.(Ala28Cys) | |
32 | MAMDC2 (AD) | NM_153267.5 | c.2047G>T, p.(Glu683*) | |
34 | RYR1 (AD) | NM_000540.3 | c.6856C>G, p.(Leu2286Val) |
- Note: Case 29 promoter GNE deletion not validated. Some patients required GS to complete the diagnosis.
- Abbreviations: AD, autosomal dominant; AR, autosomal recessive; DI, digenic inheritance; GS, genome sequencing; NGS, Next-Generation Sequencing; XLR, X-linked recessive.
Reanalysis of existing NGS data (or, when unavailable, VC from RNAseq) enabled diagnosis for 9/70 cases (12.9%) (Figure 1B, Table 1). No genetic diagnosis was achieved for these patients in the first NGS analysis mainly due to a lack of evidence of pathogenicity and VUS reclassification (5/9); other reasons were that the causative gene was not included in the analysis (2/9), the exclusion of a variant due to high allele frequency (> 0.5%) (1/9) [38, 39], and low ES coverage (1/9) (Figure 1B).
Seven participants are still under study (10%) as we have identified candidate variants in neuromuscular genes (3/7), OMIM genes with a possible novel phenotype–genotype correlation (1/7), or non-OMIM genes highly expressed in skeletal muscle (3/7) (Figure 1B). Additional functional studies are being performed to confirm the pathogenicity of the findings.
3.2 Aberrant Splicing Detection
We first compared the sensitivity of four open-source tools (FRASER, FRASER2, LeafCutterMD, and rMATS-turbo) to detect splicing alterations in NMD genes from samples sequenced in batch 1 (n = 34; 29 cases and 5 controls, detailed in Table S1). The evaluated tools reported a large and variable number of aberrant splicing events (Figure 2A). FRASER was the algorithm that reported more splicing outliers per sample, even when the effect size cutoff was set to |Δ PSI| > 0.3, as recommended by the authors. FRASER2 reported a significantly reduced number of outliers per sample: an average of 2.7 outlier events.

On intersecting outlier junctions between different algorithms, a high degree of consistency was found between FRASER and FRASER2 (92% and 81% of FRASER2 outliers were also identified by FRASER |Δ PSI| > 0.1 and FRASER |Δ PSI| > 0.3, respectively) (Figure 2B). However, FRASER agreement was lower with rMATS-turbo and LeafCutterMD (64% and 57% for |Δ PSI> 0.1 and 61% and 51% for PSI > 0.3, respectively) (Figure 2B).
A total of sixteen pathogenic splicing alterations were identified in all 98 samples, including six pseudoexons, four exon skipping events, one intron retention, one exon truncation, and four cases with multiple aberrant transcripts (Figure 2C,D, Table 2). FRASER PSI| > 0.1 showed the highest accuracy, identifying 81.26% (13/16) of relevant events (Figure 2C); FRASER2, reporting fewer outliers per sample (Figure 2A), identified 68.7% (11/16) of events (Figure 2C); and LeafCutterMD only recovered 25% of pathogenic events. For FRASER2, only one splicing alteration (SPG7 pseudoexon in Case 2) was missed (p-adjusted value > 0.05) with reduced sample size (n = 34, batch 1), but was correctly identified with the entire cohort (n = 98).
Case no. | Variant (cDNA) | Splicing alteration | FRASER |ΔPSI| > 0.1 | FRASER |ΔPSI| > 0.3 | FRASER2 | LeafcutterMD | rMATS-turbo | Aberrant expression (OUTRIDER z-score) |
---|---|---|---|---|---|---|---|---|
DMD1 | DMD(NM_004006.3):c.10554-2996T>G | Out-of-frame pseudoexon | Yes | Yes | Yes | No | Yes | −2.68 |
DMD2 | DMD(NM_004006.3):c.9225-647A>G | Out-of-frame pseudoexon | Yes | Yes | Yes | Yes | Yes | −4.98 |
DMD3 | DMD(NM_004006.3):c.94-5078_94-5071delins[NC_000008.10:g.16369356_16369419inv;TGG] | Out-of-frame cryptic exon | Yes | Yes | No | No | No | −2.95 |
2* | SPG7(NM_003119.4):c.286+853A>G | Out-of-frame pseudoexon | Yes | No | *Only identified with all cohort samples | No | No | −3.89 |
7* | COL6A1(NM_001848.3):c.1398+2T>C | In-frame exon skipping | Yes | Yes | Yes | Yes | Yes | 0.6 |
10 | COL6A1(NM_001848.3):c.(804+1_805-1)_(858+1_859-1)del | In-frame exon skipping | Yes | Yes | Yes | No | Yes | −0.2 |
22 | RYR1(NM_000540.3):c.7833C>T | Exon truncation | Yes | No | No | No | Yes | −2.8 |
28 | TTN(NM_001267550.2):c.57262G>C | Intron retention | Yes | No | Yes | No | No | −1.02 |
37 | CAPN3(NM_000070.3):c.1746-20C>G | Multiple aberrant transcripts | No | No | No | No | No | 0 |
43* | DMD(NM_004006.3):c.(649+1_650-1)_(2268+1_2269-1)del | Out-of-frame exon skipping | Yes | Yes | Yes | No | No | −2.89 |
49 | DMD(NM_004006.3):c.93+5590T>A | In-frame pseudoexon | Yes | Yes | Yes | Yes | No | 0.2 |
61* | COL6A1(NM_001848.3):c.859-10T>A | Multiple aberrant transcripts | No | No | Yes | No | No | −1.26 |
63 | RYR1(NM_000540.3):c.165+78_538-116del | In-frame exon skipping | Yes | Yes | Yes | Yes | Yes | −0.1 |
63 | RYR1(NM_000540.3):c.7027G>A | Multiple aberrant transcripts | Yes | No | Yes | No | Yes | −0.1 |
69 | CAPN3(NM_000070.3):c.1746-20C>G | Multiple aberrant transcripts | No | No | No | No | No | −0.3 |
70 | CAPN3(NM_000070.3):c.1782+1072G>C | Out-of-frame pseudoexon | Yes | Yes | No | No | No | −2.14 |
- Note: Asterisks (*) indicate cases not diagnosed due to incompatible segregation in the family (case 7), heterozygous alteration in a recessive gene (cases 2, 43) and multiple transcripts identified that require orthogonal validation (case 61). The category “multiple aberrant transcripts” is applied when more than one aberrant transcript was identified. Bold formatting indicates the correct identification of the event as aberrant outlier.
- Abbreviation: PSI, percentage spliced in.
3.3 VUS Reclassification
3.3.1 Aberrant Splicing Caused by an Apparent Synonymous RYR1 Variant
Case 22 is a 38 year-old female who presented with hypotonia at birth. At the time of consultation, she walked without assistance but occasionally tripped. Neurological examination showed myopathic facies, high-arched palate, bilateral ptosis, severe ophthalmoparesis, and cervical and proximal weakness predominantly in the lower limbs. Muscle biopsy showed increased variability in fiber size, increased endomysial connective tissue and prominent nuclear internalization (Figure 3A). Gomori Trichrome staining showed irregular reddish-purple areas that were devoid of ATPase activity and with decreased or increased enzymatic activity at oxidative stains, corresponding to well-defined cores (single arrows in Figure 3E) and areas with sarcomeric disorganization (double arrow in Figure 3E) (Figure 3B–F). Moreover, there was uniformity of type 1 fibers (Figure 3D) and a few fibers contained collections of small nemaline bodies (arrows in Figure 3C), corresponding to a core-rod myopathy.
A neuromuscular gene panel showed the presence of a missense likely pathogenic (LP) variant in the RYR1 gene (NM_000540.3:c.1250T>C, p.(Leu417Pro)), previously reported in individuals with central core disease and congenital myopathy. This variant, found in a heterozygous state in the healthy sister of the proband, was presumed to be paternally inherited (no parental DNA was available). Two VUS that could potentially alter canonical splicing were maternally inherited and were not present in the healthy sister (NM_000540.3:c.6548+11G>A and NM_000540.3:c.7833C>T, p.(Cys2611=)). RNAseq revealed an alternative transcript between exons 48 and 49 in 15% of the reads (Figure 3G). The aberrant transcript caused by the synonymous p.(Cys2611=) variant leads to a 4-bp exon deletion, thus incorporating a premature termination codon in the transcript (p.(Ala2612Thrfs*132)), confirming the pathogenicity of the synonymous variant p.(Cys2611=). This aberrant event was only reported by FRASER (|Δ PSI| > 0.1) and rMATS-turbo (Table 2), probably due to the low PSI and the activation of nonsense-mediated decay. The RYR1 gene showed MAE and the RYR1 expression was 38% lower than the mean expression of the cohort, although not to a statistically significant degree (Figure 3H). On manual inspection of the variant NM_000540.3: c.6548+11G>A, no alteration of canonical splicing was observed in the region.

3.3.2 Intron Retention due to a Missense TTN Variant
Case 28 is a 26-year-old male who presented with hypotonia at birth. He was clumsy at sports and manifested proximal upper limb muscle weakness. Neurological examination evidenced difficulty in standing on his heels and a positive Gowers' sign. Muscle biopsy at the age of 26 years showed increased variability of fiber size, endomysial connective tissue, central nuclei (often multiple) and necrotic fibers. Few lobulated and ring fibers were observed with oxidative stainings. A previous muscle biopsy performed in infancy showed core-like lesions and fiber type I predominance.
In a neuromuscular gene panel, a truncating variant in the TTN gene (NM_001267550.2): c.38829del, p.(Val12944Cysfs*3) was identified in compound heterozygosity with a missense variant classified as a VUS (NM_001267550.2:c.57262G>C, p.(Val19088Leu)). The c.57262G>C variant, located in the last nucleotide of exon 293, was predicted to alter splicing. In RNAseq, the TTN gene showed MAE and it was observed a complete intron 293 retention in 10% of the transcripts due to the apparent missense variant p.(Val19088Leu) (Figure 2D, middle panel). This retention led to a truncated transcript incorporating a premature termination codon in the mRNA (p.(Val19088Argfs*24)). This variant was confirmed in trans with the truncating p.(Val12944Cysfs*3) LP variant, and both were present in the proband's affected brother.
3.4 RNAseq Accelerates Prioritization of GS Results
3.4.1 Exon Skipping due to an Exon Deletion Missed by CNV Callers
Case 10 is the father of a child with neonatal hypotonia, congenital dislocation of hips, knees, and feet, distal hyperlaxity, and contractures in shoulders and elbows. On clinical examination, the father showed left hemicorporal hypotrophy and finger retractions, and whole-body MRI showed a typical sandwich sign with peripheral involvement and central sparing of the left vastus lateralis. Collagen-VI immunofluorescence from cultured skin fibroblasts showed decreased collagen-VI intensity and a diffuse microfibrillar pattern in the proband, and in the father, a normal collagen-VI expression and microfibrillar pattern similar to the control (Figure 4A, upper panel). In permeabilized cells, there was marked collagen-VI intracellular retention in the child's fibroblasts, suggesting a defect in collagen-VI secretion (Figure 4A, lower panel).
A muscle biopsy was obtained from the father's left side of the body, and in RNAseq, FRASER and FRASER2 identified partial exon 9 skipping in COL6A1 in 15% of the transcripts (Figure 4B, Table 2). Exon 9 is an in-frame exon in the critical triple-helix region, and exon skipping of exon 9 is thus expected to be pathogenic. This event was validated in both the father's and proband's fibroblasts, where the aberrant transcript was present in 12.3% and 47% of the transcripts in the father's and proband's fibroblasts, respectively, suggesting the presence of mosaicism in the father (Figure 4C). GS from the proband's and father's peripheral blood did not detect any intronic COL6A1 SNV. However, an exon 9 deletion in 10% of the father's and 50% of the proband's reads was observed (Figure 4D). This single exon deletion was not identified in either the gene panel or in the GS CNV analysis, probably because split-reads encompassing the deletion are mapped to a low-quality region with a class II transposable element (TE) (Figure 4D). Retrospective manual analysis of the gene panel data and custom multiplex ligation-dependent probe amplification (MLPA) confirmed the exon deletion, in mosaicism in the father and in heterozygosity in the proband.

3.4.2 Identification ofEight Down-Expressed Genes Underlying a De Novo Deletion
Case 45, a 14- year-old female, presented with very mild proximal muscle weakness (MRC grade 4/5 in deltoids and psoas muscles), non-specific findings in the muscle biopsy, including some rounded fibers and occasional internal nuclei, and a borderline intellectual quotient (IQ: 81). Several genes located in chromosome 7 were identified either as a “statistically significant outlier” or a “candidate outlier, not significant” by OUTRIDER (Figure S2). Trio GS previously performed within the Solve-RD consortium [8] was reanalyzed and RNAseq findings enabled the prioritization of a de novo deletion involving 11 genes (NC_000007.13(GRCh37/hg19):g.5669468_6368115del), validated by array CGH. Similar deletions have been reported in patients with intellectual disability.
3.4.3 Detection of Monoallelic Expression in an Autosomal Recessive Gene Probably due to a 5'UTR Deletion
In Case 29, a patient of Indian origin, gene panel sequencing identified a heterozygous SNV in GNE (NM_005476.7:c.2086G>A, p.(Val696Met)) compatible with the patient's clinical presentation and results of muscle MRI and muscle biopsy. RNAseq showed clear MAE of the mutated allele based on manual inspection of the BAM file (Figure S3), but no splicing alteration was identified. These findings suggest the presence of an upstream deletion in trans with the missense variant avoiding mRNA transcription of the wild-type allele. However, no DNA was available to confirm the hypothesis. GNE promoter deletions in trans with heterozygous pathogenic variants have been described before [40].
4 Discussion
In this study, we performed RNAseq from muscle biopsies of 70 patients that remained unresolved after DNA NGS testing. We applied a stepwise analysis involving gene expression, aberrant splicing, MAE, reanalysis of genomic data (or, when unavailable, VC from RNAseq), and GS in specific cases. A molecular diagnosis was established in 27.2% of the cases, either through RNAseq (10/70, 14.3%) or reanalysis/RNAseq VC (9/70, 12.9%) (Figure 1B, Table 1). Here, we demonstrate the versatility of transcriptome studies in detecting a wide variety of pathogenic variants often missed in the clinical setting due to difficulties in variant detection or prioritization. Although this study primarily focuses on muscular dystrophies and myopathies, its findings are applicable to other NMDs caused by genes with muscle expression. In the case of NMD caused by genes with low muscle expression (detailed in Table S3), RNAseq of fibroblasts or fibroblasts transdifferentiated into neurons could be employed [41]. To our knowledge, this is the largest cohort systematically using muscle RNAseq to improve genetic diagnosis of muscle diseases.
Phenotype-driven analysis was crucial for most of the cases diagnosed via RNAseq, as they had clear phenotypic presentations linked to a small group of genes (1/10, Case 10) or were carriers of a single heterozygous variant in a recessive gene (7/10) (Table 1). Based on our experience and according to similar works, RNA studies offer greater diagnostic success in cases with strong candidate disease genes [14, 16, 17, 42]. Interdisciplinary collaboration and integrating RNAseq findings with complementary techniques, including muscle MRI, muscle biopsy evaluation, and protein studies, play a crucial role in the diagnosis of individuals with rare muscle diseases. This integrative approach significantly improved the diagnostic yield in a substantial proportion of cases, enabling better clinical management and personalized care. When no clinical suspicion guides the analysis, annotating nearby rare genomic variants reduces the amount of splicing outliers to analyze. However, without a concordant genotype–phenotype correlation, interpretation is sometimes difficult and uncertain when rare aberrant transcripts are observed.
In patients with inconclusive results in the first-tier NGS analysis (usually ES), performing a combined analysis of both GS and RNAseq (if possible) maximizes the success rate [5, 43, 44]. Case 45 illustrates how the down-expression of nearby located genes allowed the prioritization of a genomic deletion (Figure S2). Similarly, in the integrated GS and transcriptomic analysis of an individual still under investigation, we identified a SNV in compound heterozygosity with a deep intronic variant that generated a pseudoexon in a gene functionally related to the patient's phenotype. However, it is important to note that short-read ES/GS pose technical limitations, specifically in repetitive regions of the genome, such as tandem repeats or pseudogenes, where coverage is usually low due to low-quality mapped reads; hence, detection of variants can be challenging [45] as exemplified in Case 10 (Figure 4). Therefore, even when no alteration has been identified in GS, RNAseq can detect changes in gene expression, MAE, or splicing that pinpoint an underlying SNV, SV, or CNV.
One-third of disease-causing variants affect RNA processing or expression [46]. However, variants outside the canonical splice sites are often overlooked. Recent in silico tools show improved sensitivity and specificity [46, 47], and PP3 ACMG criteria can be applied for SpliceAI Δ score ≥ 0.2. Nevertheless, functional validation is still required to meet the PVS1 criterion according to recent ClinGen SVI Splicing Subgroup recommendations [10]. In addition, in silico splicing tools may not accurately predict if more than one aberrant transcript is produced by the genomic variant, as observed in Cases 37, 61, 63, and 69 (Table 2).
When we compared four computational tools to detect alternative splicing from RNAseq data, FRASER2 reported fewer outliers per sample (Figure 2A) and identified 68.7% (11/16) of the pathogenic splicing events detected in our cohort. In five cases (5/16), one of the aberrant transcripts had an intron retained (Cases 28, 37, 61, 63, and 69; Table 2); nonetheless, LeafCutterMD is not designed to identify intron retention events [25, 31]. FRASER reported more outlier candidate events per sample (Figure 2A) and had the highest accuracy (81.26%, 13/16).
It is important to note that all splicing tools identified a significant number of aberrant splicing events in genes with complex splicing patterns, and triplicated (TTN, NEB) or highly homologous regions (MYH6, MYH7). The increased occurrence of false positives in these genes highlights the challenges associated with interpreting aberrant splicing in complex genes, thus requiring additional functional studies. Moreover, it also indicates an important limitation of transcriptome studies: pathogenic alterations in genes with complex splicing patterns could be missed during RNAseq analysis. In addition, the fact that we identified novel unannotated exons in muscle transcriptome data (not shown) emphasizes the need to consider tissue-specific data when interpreting non-coding variants. An important limitation of available in silico splicing prediction tools is that they do not integrate tissue-specific splicing and regulatory elements, although recent studies and novel approaches are emerging to further advance knowledge in this evolving field [47-49].
Alterations observed in RNAseq are often the consequence of genomic variants missed in ES. Detecting those alterations at the RNA level facilitates the delineation of the specific genomic causative variant in the patient (e.g., Cases 29 and 45). However, other techniques are usually required to complete genetic diagnosis after RNAseq, such as GS or Sanger sequencing to confirm the causative variant.
An important limitation in detecting cases with expression or splicing outliers is the limited availability of a large number of samples from the same tissue sequenced under the same conditions (e.g., RNA extraction, library preparation, read length) [20, 26]. The accuracy of bioinformatic pipelines such as OUTRIDER and FRASER significantly decreases for reduced sample sizes, thus, pathogenic events can be missed [23, 24, 26]. However, here we showed that only one pathogenic event was missed in FRASER2 with a reduced sample size (n = 34, batch 1) (Table S1, Table 2). Conversely, large cohort sizes require a significant computational demand for whole-transcriptome splice junction screening, which may currently exceed the capabilities of most clinical laboratories. To overcome this limitation, we first applied a targeted analysis to known NMD genes to reduce computational time and improve the ability to run on commodity hardware (98 samples were processed within 30 h with FRASER and 6 h with OUTRIDER). This demonstrates the feasibility of screening gene expression and alternative splicing in clinically relevant genes. Any deeper analysis would require research-based transcriptome-wide splicing analysis and greater computational resources. Finally, our findings underscore the importance of periodically reanalyzing existing genomic data to allow VUS reclassification. Reanalysis resulted in diagnoses for five cases due to the reclassification of a VUS identified in the first NGS analysis conducted three to five years earlier (Figure 1B, Table 1). This finding, in line with similar studies [50-54], emphasizes the utility of periodic genetic reanalysis and reassessment of clinical data, ideally biannually, to avoid unnecessary testing.
Over the last few years, many novel genomic tools have emerged. As documented here, bulk RNAseq is a powerful complementary technique for the diagnosis of rare diseases, allowing the identification of the molecular defects underlying the pathology. In addition, numerous studies using cutting-edge technologies such as single-cell, single-nuclei RNAseq or spatial transcriptomic approaches have enabled the identification of disease-progression signatures, opening a scenario for the development of novel therapeutic strategies [55-57]. As we deepen our knowledge of the different omics, we hope that we will be able to integrate the different approaches, offering significant benefits for the patients. For instance, it will enable the identification of the molecular basis of the disease, monitor disease progression, provide an accurate prognosis, and evaluate the effect of the different therapeutic options.
Overall, our study demonstrates the utility of muscle RNAseq on a clinical basis to establish a definitive molecular diagnosis in patients with muscle dystrophy and congenital myopathy. Benchmarking of aberrant splicing pipelines showed that FRASER2 reports an average of 2.7 NMD outliers per sample and detects 66.6% of pathogenic events (Figure 2A,C)—manageable for manual inspection on a clinical basis. The workflow implemented here (Figure 1A) efficiently identifies aberrant splicing, aberrant expression, MAE, and rare genomic variants in NMD genes and enables the identification of potentially novel disease genes. GS is progressively becoming available in clinical settings, and we show that, by integrating genomics and transcriptomics, success rates are increased and turnaround times for identifying causative variants are accelerated, thus ending the patient's diagnostic odyssey.
Author Contributions
Conceptualization: A.S.-C., P.G., L.G.-Q.; data curation: A.S.-C., C.D.-G., D.N.B., C.O., A.N., B.E.-A., L.L., I.M.-C., S.K., L.G.-M., V.N., R.F.-T., A.L.M., R.J.-M., M.O., P.G., L.G.-Q.; methodology: A.S.-C., A.H.-L., C.J., A.L.M., M.J.R., C.J.-M., B.E.-A., B.R.-S., E.G., P.G., L.G.-Q.; formal analysis: A.S.-C., A.H.-L., B.E.-A., C.J, A.L.M., C.J.-M., B.R.-S., L.G.-Q.; resources: C.D.-G., D.N.B., C.O., A.N., L.L., I.M.-C., S.K., L.G.-M., V.N., R.F.-T., A.L.M., R.J.-M., E.G., M.O. Funding acquisition: J.S., P.G., L.G.-Q.; writing – original draft: A.S.-C, L.G.-Q; writing – review and editing: A.S.-C., C.D.-G., D.N.B., C.O., B.E.-A., A.N., L.L., A.H.L., C.J., I.M.-C., A.L.M., S.K., M.J.-R., L.G.-M., V.N., R.F.-T., A.L.M., R.J.-M., C.J.-M., J.S., B.R.-S., E.G., M.O., P.G., L.G.-Q. All authors have read and approved the final version of the manuscript.
Acknowledgments
We would like to thank patients and families for their participation in the study. The authors C.D.-G., D.N.B., S.K., A.H.-L., B.E.-A., L.L., C.O., C.J., I.M.-C., A.L.M., L.G.-M., V.N., R.F.-T., C.J.-M., R.J.-M., A.L.M., A.N., E.G., and M.O are members of the European Reference Network for Rare Neuromuscular Diseases (ERN-NMD). We are indebted to the “Biobanc de l'Hospital Infantil Sant Joan de Déu per a la Investigació” integrated in the Spanish Biobank Network of ISCIII for sample and data procurement. We also thank Ailish Maher for proofreading the text.
Conflicts of Interest
The authors declare no conflicts of interest.
Open Research
Data Availability Statement
Raw data are not publicly available as ethics approval does not allow us to share patient data. Datasets generated during the study are available from the corresponding author on reasonable request. rMATS-turbo (https://github.com/Xinglab/rmats-turbo), LeafCutterMD (https://github.com/davidaknowles/leafcutter), FRASER (https://github.com/gagneurlab/FRASER), FRASER2 (https://github.com/gagneurlab/FRASER/releases/tag/1.99.4), OUTRIDER (https://github.com/gagneurlab/OUTRIDER), DROP (https://github.com/gagneurlab/drop), and GATK (https://github.com/broadinstitute/gatk/releases/tag/4.2.5.0) are open-source tools available in GitHub.