The functional consequences of intron retention: Alternative splicing coupled to NMD as a regulator of gene expression
Abstract
The explosion in sequencing technologies has provided us with an instrument to describe mammalian transcriptomes at unprecedented depths. This has revealed that alternative splicing is used extensively not only to generate protein diversity, but also as a means to regulate gene expression post-transcriptionally. Intron retention (IR) is overwhelmingly perceived as an aberrant splicing event with little or no functional consequence. However, recent work has now shown that IR is used to regulate a specific differentiation event within the haematopoietic system by coupling it to nonsense-mediated mRNA decay (NMD). Here, we highlight how IR and, more broadly, alternative splicing coupled to NMD (AS-NMD) can be used to regulate gene expression and how this is deregulated in disease. We suggest that the importance of AS-NMD is not restricted to the haematopoietic system but that it plays a prominent role in other normal and aberrant biological settings.
Abbreviations
-
- AS
-
- alternative splicing
-
- AS-NMD
-
- alternative splicing coupled to NMD
-
- EJC
-
- exon junction complex
-
- hnRNPs
-
- heterogeneous nuclear ribonucleoproteins
-
- IR
-
- intron retention
-
- NMD
-
- nonsense-mediated decay
-
- Pre-mRNA
-
- precursor messenger RNA
-
- PTC
-
- premature termination codon
-
- RNA Pol II
-
- RNA Polymerase II
-
- snRNP
-
- small nuclear ribonucleoprotein
-
- SR
-
- serine-arginine proteins
Introduction
Most eukaryotic genes are interrupted by noncoding interspersed sequences termed introns. These sequences are removed from the precursor messenger RNA (Pre-mRNA; i.e. the sequence generated by RNA polymerase mediated transcription) in order to facilitate the assembly of protein coding exons into mature mRNA in a process termed splicing 1. Splicing occurs co-transcriptionally 2 and is catalysed by the spliceosome, a dynamic RNA-protein complex comprising five small nuclear ribonucleoproteins (snRNPs) and an assembly of auxiliary proteins 3, 4. The U1 and U2 snRNPs are responsible for the recognition of the 5′ splice site and branch sites, respectively, which results in the formation of the pre-spliceosomal A complex, which is essential for exon/intron definition 5, 6. Following exon definition and binding of the U4/U5-U6 tri-snRNP, the complex rearranges and the splicing process proceeds, ultimately resulting in the ligation of two exons and disposal of the intervening intronic sequence (Fig. 1A).

Importantly, the splicing process can be modulated by numerous so-called splicing factors, of which serine-arginine (SR) proteins and heterogeneous nuclear ribonucleoproteins (hnRNPs) are major players 7, 8. These proteins are recruited dynamically to the spliceosome either by direct interaction or through binding to RNA sequence elements termed intronic enhancers/silencers and exonic enhancers/silencers. In this manner, distinct steps of the splicing process along the pre-mRNA molecule can either be enhanced or inhibited through the binding of splicing regulators, thereby favouring the selective use of potential 5′ and 3′ splice sites. Overall, the modular set-up of the pre-mRNA splicing facilitates the potential generation of multiple mature mRNAs from an individual pre-mRNA molecule in a process termed alternative splicing (AS).
Until very recently, our insight into AS was mostly limited to the analysis of individual genes; however, with the explosion in sequencing technologies and associated downstream computational processing, the way in which we can characterise the transcriptome has undergone a revolution. From the initial reports it became clear that AS affects essentially all multi-exon human genes 9, 10, in line with previous suggestions that AS is one of the main drivers of protein diversity in mammals 11, 12. Several modes of AS have been reported, ranging from common events such as exon skipping and alternative use of 5′ and 3′ splice sites, to more rare events such as intron retention (IR) and the inclusion of mutually exclusive exons. IR constitutes a class of AS that is often neglected, because these events may simply originate from contamination of pre-mRNA molecules.
Eukaryotic cells have developed a number of mRNA surveillance pathways that deal with transcripts that could encode potentially deleterious proteins 13. One of these pathways is the nonsense-mediated mRNA decay (NMD) system, which recognises transcripts with so-called premature termination codons (PTCs) and subjects them to decay if they adhere to distinct molecular rules 14-16 (Fig. 2). This efficiently prevents the expression of C-terminal truncated proteins that could potentially have dominant negative properties. The importance of the NMD pathway for mammalian biology has been analysed using mouse genetics 17-20. Deletion of either of the two core NMD components, Upf1 or Upf2, leads to very early embryonic lethality at the blastocyst stage 19, 20. Similarly, conditional deletion of Upf2 in the haematopoietic system leads to a loss of the entire stem and progenitor compartment, whereas more mature compartments, such as monocytes and granulocytes, are less affected 20.

The NMD pathway was originally perceived mainly as an mRNA scavenger system 21, and indeed transcriptome analysis from NMD deficient tissues revealed a marked increase in PTC-containing low-abundant transcripts 22, 23. These transcripts presumably originate from erroneous splicing events, which leads to the selective stabilisation of PTC-containing mRNAs. However, the NMD pathway is also used in post-transcriptional control in order to regulate the levels of specific mRNA isoforms. Specifically, the NMD pathway is harnessed to degrade PTC-containing AS transcripts in a process termed alternative splicing coupled to nonsense-mediated mRNA decay (AS-NMD) 22, 24.
Here, we describe the recent work of Wong et al. published in Cell 25, which uncovers how IR combined with the NMD pathway is utilised to drive a specific cellular differentiation event. Focussing exclusively on work done in mammalian systems, we then move on to discuss the underlying mechanisms regulating IR, how similar mechanisms could be used in other systems and how they may be perturbed in disease.
Intron retention regulates granulocytic differentiation
Myeloid differentiation from haematopoietic stem cell to mature neutrophilic granulocyte is one of the best-studied differentiation pathways in mammals, and a number of distinct progenitor type cells have been identified 26, 27. Wong et al. used flow cytometry to prospectively isolate promyelocytes, myelocytes (two myeloid progenitors), as well as mature neutrophilic granulocytes, from mouse bone marrow; next they subjected them to sequencing-based transcriptome analysis. In order to identify introns that are differentially retained during myeloid differentiation, Wong et al. developed the computational pipeline IRFinder, and used this to identify 121 retained introns in 86 genes. Importantly, the IR transcripts were poly-adenylated, present mainly in cytoplasmic fractions, and most of them could be validated by other experimental approaches, hence suggesting that they are not merely partially processed pre-mRNAs. Further support for the functional relevance of the differentially retained introns came from the demonstration that a high fraction of their corresponding human host genes also expressed transcripts harbouring retained introns in human granulocytes.
Strikingly, the degree of IR increased almost uniformly in the differentiated granulocytes, and most of them harboured PTCs, hence suggesting that they may be targeted by NMD. Indeed inhibition of the NMD pathway, using either broad-action agents such as caffeine, or the more specific knockdown of the core NMD component UPF1, led to stabilisation of IR transcripts. Furthermore, as the IR transcripts were not associated with protein synthesis, they provided a means for the concerted downregulation of genes during myeloid differentiation.
The genes harbouring retained introns were enriched in pathways involved in leukocyte function. In particular, genes involved in nuclear periphery and nuclear lamina were strikingly overrepresented among the IR transcripts, suggesting that the pronounced changes in nuclear morphology during granulopoiesis could be driven by NMD-mediated targeting of these transcripts. Indeed, the level of LMNB1 protein (encoded by an IR transcript) was markedly downregulated during myeloid differentiation, and its overexpression in vivo was associated with abnormal granulocytic morphology. Collectively, the findings reported by Wong et al. 25 clearly demonstrate that NMD-mediated downregulation of IR transcripts is of functional importance during granulocytic differentiation.
Intron retention regulates key cellular decisions
Although the study by Wong et al. represents one of the first insights into the biological function of IR during a specific developmental process, a number of other studies have previously demonstrated that IR in selected mRNAs is not merely the result of incomplete splicing.
Whereas the Wong et al. study highlighted the use of IR coupled to NMD in order to facilitate rapid mRNA decay, IR may also lead to the formation of translationally competent mRNAs. One example is the selected inclusion of intron 1 of the transcription factor ID3 in smooth muscle cells following vascular injury, which leads to the expression of an Id3 variant with a distinct C-terminal sequence 28. Although the extent to which the IR Id3 transcript was sensitive to NMD was not tested, these findings demonstrate that functionally distinct protein variants may be derived from IR transcripts. Along the same lines, but slightly more complicated, the retention of a distinct intron in the KCNMA1 pre-mRNA correlates with the inclusion of a neighbouring exon encoding the STREX domain 29. Interestingly, because the retained intron is removed by what appears to be a cytoplasmic splicing event, the purpose of its retention is presumably to facilitate the inclusion of the neighbouring STREX encoding exon. The resultant KCNMA1 isoform is incorporated into calcium-activated big potassium channels, where it alters key properties of these channels, including calcium influx 30.
However, one of the most stunning examples of a functional role for an IR transcript is a recent paper published by Colak et al. in Cell 31. During foetal life, commissural neurons are attracted to the ventral midline of the spinal cord by signals deriving from the floor plate 32-34. Axon guidance of these neurons is regulated by the expression of two alternative isoforms of ROBO3, a receptor for the slit family of guidance cues. Specifically, Robo3.2 contains a retained intron near the 3′-end of the pre-mRNA, which is predicted to elicit NMD 35. Interestingly, whereas Robo3.1 is both expressed and translated prior to midline crossing, and sharply downregulated at this stage (presumably through a transcriptional mechanism) the Robo3.2 transcript behaves completely differently. Instead, Robo3.2 is expressed, but translationally repressed, which in turn protects it from NMD because this pathway requires translation 14-16. Upon crossing of the midline, floor plate signals relieve the translational repression in the post-crossing neurons, and the Robo3.2 transcript is subjected to NMD. This ensures that only limited amounts of ROBO3.2 protein are expressed (presumably one copy per mRNA). Since ROBO3.2 facilitates midline repulsion induced by floor plate signals, its level is critical, and indeed deletion of Upf2 in commissural neurons leads to aberrant post-crossing trajectories associated with increased migratory distances. A final twist is that NMD components are concentrated in the growth cones of the post-crossing axons, hence demonstrating a role for both localised NMD and translation in the migratory behaviour of commissural neurons. Collectively, these studies highlight how IR either alone, or coupled to NMD, can affect the translational properties of mRNAs in order to regulate cellular properties and key developmental events.
Different mechanisms govern intron retention
Which features, then, define whether an intron is to be spliced or retained? A major driving force is most likely the extent to which the spliceosome machinery uses an exon-definition or intron-definition mode to select the unit to undergo splicing. Intron definition is considered the ancestral mode of splicing 5, 6, and introns following this model are generally short and are characterised by a high GC content that does not differ from that found in neighbouring exonic regions 36. Since these introns are the defining unit for the splicing machinery, they are believed to be under selective pressure to remain short. In species having a prevalence of longer introns, the splicing machinery adapted and changed to an exonic mode of splice unit selection. This led to a prominent increase in the GC content of such exons, probably in order to facilitate their recognition. Retained introns are generally short, and this feature – along with their high GC content – suggests that they are recognised through the intron definition mode 5, 6, 36, 37. Indeed the IR transcripts identified by Wong et al. were characterised by a high GC content 25.
Sequencing analysis demonstrated that retained human introns generally have weaker splice sites, thus rendering them more likely to escape recognition by the spliceosome 37. Additionally, retained introns are characterised by weaker polypyrimidine tracts; furthermore, motifs associated with intronic and exonic enhancers also score lower in the comparative sequencing analyses. Collectively, the general reduction in splice signal strength suggests that retained introns should be more sensitive to changes in the protein levels of spliceosome components and splicing regulators (Fig. 1B). Indeed, Wong et al. 25 found the increase in IR during myeloid differentiation to correlate well with a reduction in the levels of splicing components. In particular, factors associated with the early steps of splicing, i.e. exon and intron definition, were found to be selectively downregulated. An example is SF3B1, which was markedly downregulated, and has previously been shown to be essential for intron definition 38. Collectively the accumulating evidence suggest that retained introns are normally spliced using the intron definition mode, they display weaker splice features, and are highly sensitive to fluctuations in the level of splicing factors (Fig. 1B).
In addition to events that act directly on the mRNA, there is also increasing evidence that chromatin structure and transcription play important roles in the regulation of splicing and thus IR. Splicing occurs co-transcriptionally 2, and is influenced both by the elongation rate of the RNA polymerase II (RNA POL II) and chromosome accessibility, two properties that are likely to be connected 39. Thus exon inclusion has been shown to be promoted by a slow transcribing RNA POL II mutant 40 as well as drugs that affect transcription rates 41, 42. This has led to the formulation of the kinetic model of AS, which requires a reduction in RNA POL II elongation rate, or even pausing, in order for the splicing machinery to assemble 39, 42. Along the same lines, nucleosomes – which are known to affect RNA POL II elongation rates 43 – are generally increased across exons, and even more so on those that are being alternatively spliced 44-47. Interestingly, introns with high GC content, which are preferentially retained, also display a high nucleosome density 36, suggesting that their processing also depends on nucleosome density. Finally, distinct histone modification patterns are observed across exons 44, 45, 47, and modulation of specific epigenetic marks has been shown to affect AS 48. Collectively these findings suggest that the chromatin state is a major regulator of AS (Fig. 1B). Differentiation processes, such as that described by Wong et al., are characterised by major alterations in chromatin structure. Therefore, it is likely that the prominent differences in splicing patterns seen between tissues may be a result of these epigenetic changes.
AS-NMD regulates splicing patterns
The concept of coupling AS to NMD is not restricted to the downregulation of IR transcripts. Indeed the term AS-NMD has been coined to describe splicing events that lead to the generation of NMD-sensitive transcripts 22, 24 (see Fig. 3A,B). In particular, splicing regulators are enriched among transcripts undergoing AS-NMD, and these factors have been shown to use AS-NMD as a means to maintain homeostatic levels 24, 49 (Fig. 3C). An early example is the SR protein SFRS2, which – by promoting the inclusion of a PTC-containing exon – reduces its own mRNA level 50. Global RNA-seq transcriptome studies have recently addressed the extent to which the NMD pathway influences splicing patterns by analysing mouse tissues 23 or cells 51 deficient in key NMD components. Strikingly, approximately 30% of expressed genes were found to upregulate at least one PTC-containing transcript following loss of NMD; most of these originated from AS events, hence demonstrating that AS-NMD is a major regulator of the eukaryotic transcriptome 23. Interestingly, also transcripts devoid of any discernable NMD-eliciting signals were found to be upregulated upon loss of NMD, which is compatible with a model in which the breakdown of splicing regulator homeostasis has profound consequence for splicing patterns in general 23. Hence, the NMD pathway controls mammalian splicing patterns through both direct and indirect means.

Defects in splicing and NMD are common in human disease
The study by Wong et al. 25, and more broadly of AS-NMD, is also relevant in the context of human disease. Thus pre-mRNA splicing has been shown to play a major role in the development of human diseases, and current estimates suggest that around 1/3 of all disease-causing mutations affect regulatory elements such as 5′ and 3′ splice sites 52, 53. Many of these mutations lead to generation of NMD-sensitive transcripts through aberrant splicing and to a subsequent reduction in the levels of the corresponding proteins. Similarly, about 1/3 of all hereditary diseases are caused by either nonsense mutations or frame-shift mutations that result in the introduction of a downstream PTC, thereby rendering the affected transcript sensitive to NMD 54-56. Thus, a large fraction of single-gene disease-causing mutations are associated with either splicing and/or NMD; this observation has raised hopes that drugs that selectively promote read-through of PTCs can potentially be developed to treat the relevant diseases. One such promising drug has been developed and is currently in clinical trials in patients suffering from Cystic fibrosis and Duchenne muscular dystrophy 57, 58.
In addition to affecting single genes, global de-regulation of splicing patterns and NMD targets have also been identified in a number of diseases. Thus splicesomal components have been found to be mutated in both hereditary diseases, such as retinitis pigmentosa 59 and Taybi-Linder syndrome 60, 61; more recently, they have also been identified through cancer genome sequencing efforts, in an increasing number of cancers, mainly of haematological origin 62-67. An example of the latter is mutations in SF3B1, which have been observed in high frequencies in several haematological malignancies as well as breast and prostate cancer 68, 69. The jury is still out as to whether these splicing factor mutations are activating or inactivating 70, 71, but it is tempting to speculate that a reduction in their levels may affect exon/intron definition on selected genes, as described by Wong et al. 25. Finally, mutations in the NMD component UPF3B have been associated with various forms of intellectual disability, a finding that was recently corroborated by the identification of copy number variations in other NMD components 72, 73.
Collectively, changes in both splicing patterns and/or NMD sensitivity are frequently associated with human disease. For those diseases in which splicing/NMD patterns are affected in a global manner, the challenge for the future will be to identify the specific transcripts that act as molecular drivers, a strategy that might guide future drug development efforts.
Conclusions and outlook
The study by Wong et al. 25 demonstrates how AS-NMD, and specifically IR coupled to NMD, can be used to drive a specific differentiation process. Given its ubiquity, it is fair to assume that AS-NMD could play a role in similar processes in other tissues, and that its de-regulation could be involved in disease. Thus a major task for the future will be to chart the importance of AS-NMD using a combination of transcriptome analysis, mouse genetics (knock-outs of NMD components and/or splicing regulators) and functional analyses.
It is equally important to understand how these processes could potentially be regulated. In the Wong et al. study 25 the stabilisation of IR transcripts correlated with the downregulation of splicing factors, but what are their relative contributions to IR, or more generally, AS-NMD? Similarly, what drives the downregulation of splicing factors, i.e. what are the upstream regulators?
The Wong et al. study 25 also raises the question as to why the IR-transcripts are not completely degraded by NMD. Could this be related to differences in NMD efficiency in different cell types, as has been previously demonstrated 74? If so, this might be regulated by differential expression of NMD components, again raising the question of upstream regulation. Of interest here may be our own unpublished findings from Upf2 deficient mice, which show that loss of NMD leads to upregulation of a number of NMD components, perhaps indicating that the NMD pathway is auto-regulated.
In conclusion, the work by Wong et al. 25 has highlighted the importance of AS-NMD in cellular differentiation, and will hopefully inspire new research to further uncover its extent and importance in normal biology and disease.
Acknowledgments
This work was supported by the Danish Agency for Science and Innovation (Mobility PhD fellowship to Ying Ge) and through a centre grant from the NovoNordisk Foundation (The Novo Nordisk Foundation Section for Stem Cell Biology in Human Disease).
The authors have declared no conflicts of interest.