uORF-introducing variants in the 5′UTR of the NIPBL gene as a cause of Cornelia de Lange syndrome
Abstract
Cornelia de Lange syndrome (CdLS) is a clinically-recognizable rare developmental disorder. About 70% of patients carry a missense or loss-of-function pathogenic variant in the NIPBL gene. We hypothesized that some variants in the 5′-untranslated region (UTR) of NIPBL may create an upstream open reading frame (uORF), putatively leading to a loss of function. We searched for NIPBL 5′-UTR variants potentially introducing uORF by (i) reannotating NGS data of 102 unsolved CdLS patients and (ii) literature and variant databases search. We set up a green fluorescent protein (GFP) reporter assay and studied NIPBL expression in a lymphoblastoid cell line (LCL). We identified two variants introducing a novel ATG codon sequence in the 5′-UTR of NIPBL, both predicted to introduce uORF: a novel c.-457_-456delinsAT de novo mutation in a 15-year-old male with classic CdLS, and a c.-94C>T variant in a published family. Our reporter assay showed a significant decrease of GFP levels in both mutant contexts, with similar levels of messenger RNA (mRNA) as compared to wt constructs. Assessment of LCL of one patient showed consistent results with decreased NIPBL protein and unchanged mRNA levels. 5′-UTR uORF-introducing NIPBL variants may represent a rare source of pathogenic variants in unsolved CdLS patients.
1 INTRODUCTION
Cornelia de Lange syndrome (CdLS; MIM# 122470) is a rare multiorgan disease characterized by growth restriction, craniofacial abnormalities, upper limb abnormalities, intellectual disability/developmental delay and hirsutism. Five genes have been implicated in CdLS (NIPBL, SMC1A, SMC3, RAD21, and HDCA8) (Deardorff et al. 2012, 2016; Kline et al.,2018; Krantz et al., 2004; Selicorni et al., 2021; Tonkin et al., 2004). These genes encode proteins of the cohesin complex, which controls chromatin conformation and modulates the three-dimensional architecture of the genome, thus regulating gene expression (Boyle et al., 2015; Kline et al., 2018; Piché et al., 2019). Variants located in the coding sequence of the NIPBL gene are the most frequent cause of CdLS, accounting for 70% of all diagnoses (Kline et al., 2018; Selicorni et al., 2021). Pathogenic NIPBL variants cause CdLS through a loss-of-function mechanism, either through haploinsufficiency (frameshift short insertions or deletions, nonsense, splice site variants, partial deletions, structural variants) or through missense variants. Patient-specific variants and strong intolerance to haploinsufficiency in control databases such as gnomAD show that NIPBL is a dosage-sensitive gene (Karczewski et al., 2020). While the coding sequence of the genome remains the main focus in medical genetics, the study of noncoding regions is of growing interest. It is now generally accepted that a large proportion of the noncoding genome is functionally relevant and that pathogenic variants can occur outside the coding sequence and may disrupt a protein-coding gene through the alteration of transcriptional regulatory sequences, gene promoters, the 5′ and 3′-untranslated regions (UTRs), and introns.
The 5′-UTR of genes has various regulatory functions including fine regulation of gene expression. One of the mechanisms is constituted by upstream open reading frames (uORFs), i.e., ORFs encoded within the 5′-UTRs of protein coding genes. Such uORFs can be found in around half of all known genes (Whiffin et al., 2020). Natural uORFS are considered as putative cis-acting regulators of translation initiation, most of them acting by repressing translation of the natural protein (Lin et al., 2019). Variants creating novel upstream start codons, and variants disrupting stop codons of existing uORFs are under strong negative selection and represent a putatively important disease mechanism (Labrouche-Colomer et al., 2020; Romanelli Tavares et al., 2019; Wright et al., 2021; Zhou et al., 2018). Several elements, such as the Kozak sequence, the distance to the 5′-cap and the length of the uORF, make the ribosomal complex recognize the AUG translation initiation codon as such, and start the translation. In cases where the uORF is scanned by the ribosome, several consequences are possible (Silva et al., 2019). First, if the uORF is interrupted by a stop codon, itself located upstream of the natural start, three main consequences are described: (i) translation of the uORF and then dissociation from the ribosomal complex following the scanning of the stop codon, thus reducing the amount of ribosomes available for initiating the translation at the main site, (ii) translation of the uORF and stalling during the elongation or termination phase of translation, creating stalling for additional ribosomes or inducing messenger RNA (mRNA) degradation; or (iii) translation of the uORF while remaining associated with the mRNA, continued scanning, and reinitiation of translation further downstream at the main ORF. Second, some uORFs are not in frame with a stop codon before the natural start, uORF thus overlaps with the main ORF and may add amino-acids at the N-terminal end of the protein or disrupt the protein sequence if the overlapping uORF is not in frame with the natural ORF. Overall, most uORF-introducing variants are predicted to result in a loss of function of the natural protein and thereby represent sources of noncoding pathogenic variants in diseases where the main mechanism is a loss of function or haploinsufficiency. Of note, variants in the 5′-UTR of protein coding genes may also have diverse alternative consequences, such as the destabilization of RNA. For example, one c.-321_-320delCCinsA NIPBL variant has been reported in a girl and her father both affected by CdLS and was associated with decreased mRNA levels in patients' lymphocytes compared to control samples and in a luciferase reporter assay (Borck et al., 2006), independent of any new start codon.
Due to its specific phenotype and because of the frequency of mosaic variants, the genetic investigation strategy in case of a suspicion of CdLS generally consists in a targeted screening of a gene panel ideally including CdLS genes along with genes associated with related disorders. In our experience and in the literature, the overall yield of this analysis is about 50%, with 70% of genetically-confirmed patients having a pathogenic variant in the NIPBL gene. Thus, up to 50% of individuals remain without any molecular diagnosis after sequencing, including 25% of patients with a classical form in our series. We aimed at assessing the role of uORF-introducing 5′-UTR variants of the main CdLS gene, NIPBL, as a putative cause of CdLS. For that purpose, we (i) performed a specific annotation of 5′-UTRs among NGS data of 102 CdLS patients without a coding pathogenic variant, and (ii) we interrogated patient-specific databases and performed a literature search. We identified two variants which were associated with strong translation initiation predictions: a novel variant, detected in our own sequencing data, and one previously published variant (Selicorni et al., 2007), the functional consequences of which were not previously reported, to our knowledge. We further assessed the consequences of both variants in a reporter assay as well as in our patient's lymphoblastoid cell line and demonstrated that both variants resulted in repressed protein levels, thus representing the likely cause of CdLS in both patients.
2 METHODS
2.1 NGS data of CdLS patients
We included all patients with a phenotype clinically classified as CdLS with a negative genetic screen following targeted gene panel sequencing of 22 genes in the genetics laboratory of the Rouen University Hospital, one of the two laboratories in France with a national recruitment of CdLS patients for diagnostic purposes. Patients or legal representatives provided informed written consent for genetic analysis. This retrospective study was approved by the Rouen University Hospital Institutional Review Board (CERDE, Comité d'Ethique pour la Recherche sur Données Existantes et Hors Loi Jardé, Rouen, France) (notification no. 69-2020).
Sequencing was performed with an average depth of coverage of approximately 700x on an Illumina Miseq sequencer, following the capture by a custom Agilent QXT kit. Our gene panel covers the coding sequence of all five CdLS-causing genes (NIPBL, SMC1A, SMC3, RAD21, and HDAC8) as well as 17 genes associated with differential diagnoses (Table S1). Sequenced reads were aligned on human genome hg19 using the BWA-mem program (v0.7.17). The GATK tools (v4.0.6.0) were then used for the postprocessing of the bam files (BQSR and deduplication). Single nucleotide variants and short insertions and deletions (indels) were called using the GATK HaplotypeCaller, VarScan2 (v2.4.3) and VarDict (v1.5.1) tools and all vcf files were annotated using SNPEff, SNPSift and AlamutBatch. Copy number variants were called using a CANOES-based workflow (Quenez et al., 2021) and the GRIDSS tool (2.10.0) (Cameron et al., 2021). Variants of interest were confirmed by Sanger sequencing, which was also used for segregation analysis when DNA of parents was available.
The 5′-UTR of the NIPBL gene is captured and sequenced with an average depth of coverage of 284x. Variations in the 5′-UTR regions were annotated with the 5utr ['suter'] tool (https://github.com/leklab/5utr), allowing the search for uORF creations, in addition to systematic manual reading of rare 5'UTR variants. Predictions of translation initiation were then directly performed by the NetStart (Pedersen & Nielsen, 1997), ATGpr (Salamov et al., 1998) and DNAFSMiner bioinformatics tools (Liu et al., 2005) (lastly assessed, June 2021).
2.2 Literature and variant databases search
Patient-specific ClinVar, Human Gene Mutation Database, Leiden Open Variation Database v.3.0, and Decipher databases, as well as the public Medline database (through Pubmed) were interrogated to identify variants of interest in the 5′-UTR region of the NIPBL gene in September, 2020. Consequences of variants were manually assessed to identify AUG-introducing variants by using NetStart (Pedersen & Nielsen, 1997), ATGpr (Salamov et al., 1998) and DNAFSMiner (Liu et al., 2005) prediction tools through direct access to the associated websites.
2.3 Reporter constructs
To assess the consequences of the mutations on gene expression, wild-type and mutated 5′-UTR sequence (according to RefSeq data) of NIPBL were synthetized and subcloned upstream of the green fluorescent protein (GFP) into the BamHI/EcoRI sites of the pcDNA3.1(+)-C-eGFP vector (Genscript). This vector contains the GFP gene under the control of an SV40 promoter. All constructs were sequence verified.
2.4 Cell culture and transfection
HEK293 cells were grown in culture medium containing a mix of Dulbecco's modified Eagle medium and F12 (Gibco, Thermo Fischer Scientific), supplemented with 10% FCS (Eurobio). For transfection, cells were grown in 12-well plates, and transfected with lipofectamine 3000 (Invitrogen, Thermo Fischer Scientific) according to the manufacturer's instructions. Extraction was performed 24 h later.
To establish the lymphocyte cell line (LCL), B lymphocytes were isolated from total blood by Ficoll gradient centrifugation, and submitted to EBV transformation promoted by cyclosporine. Then, LCL from the patient and controls were grown in RPMI (Gibco, Thermo Fischer Scientific), supplemented with 10% FCS (Eurobio). Extraction was performed on 1 M cells.
2.5 RNA isolation and reverse-transcription digital droplet PCR analysis
Total RNA were extracted using the Nucleospin® RNA isolation kit (Macherey-Nagel), according to the manufacturer's instructions. RNA was quantified by spectrophotometry (Nanodrop; Thermo scientific). Reverse transcription was performed on 100 ng RNA, using the Verso cDNA kit with oligoDT primers (Thermo Scientific). Relative GFP and neomycin gene expression in HEK293 cells and NIPBL expression in LCL were then assessed by digital droplet PCR (ddPCR) on a QX200 plateform (Bio-Rad Laboratories). The RT-ddPCR were performed by relative quantification with RPL27 and TOP1, used as reference genes. The ddPCR protocol was performed as previously described (Cassinari et al., 2020).
For HEK293 cells, GFP was PCR-amplified using the following primers: Forward (Fw): 5′-GAAGCGCGATCACATGGT-3′, Reverse (Rv): 5′-CCATGCCGAGAGTGATCC-3′, associated with the FAM-labeled hydrolysis probe: 5′-TGCTGGAG-3′ containing locked nucleic acids (LNA, Universal ProbeLibrary; Roche). The reference amplicon, located in the RPL27 gene, was PCR-amplified using the following primers: Fw: 5′-ACCTCAGATCGCCCCTACA-3′, Rv: 5′-ATGGCAGCTGTCACTTTGC-3′, associated with the HEX-labeled hydrolysis probe (IDT DNA): 5′-TGCTCTGGTGGCTGGAATTGAC-3′. For each condition, analyses were performed twice on two samples with three technical replicates.
Neomycin was PCR-amplified, as a control of transfection efficiency, using the following primers: Fw: 5′-ATGCCTGCTTGCCGAATA-3′, Rv: 5′-CCACAGTCGATGAATCCAGA-3′, associated with the FAM-labeled hydrolysis probe, containing LNA (Roche): 5′-TGGTGGAA-3′.
For lymphoblastoid cell lines, NIPBL was PCR-amplified using the following primers: Fw: 5′-GCCCCATGTCCCCATTAC-3′, Rv: 5′-GCAGGTAAAGGAGATGGAAGAG-3′, associated with the FAM-labeled hydrolysis probe: 5′-TGTGAGACTAGCAATCCCCGCAAG-3′. The reference amplicon, located in the TOP1 gene, was PCR-amplified using the following primers: Fw: 5′-AGTCCAAAGAGATGAAAGTCCG-3′, Rv: 5′-CTCCTTTTCATTGCCTGCTC-3′, associated with the HEX-labeled hydrolysis probe: 5′-CTGTAGCCCTGTACTTCATCGACAAGC-3′ (IDT DNA). For each cell line, analyses were performed in two technical replicates.
2.6 Protein extraction and immunoblot analysis
Cells were homogenized in RIPA buffer (Tris-HCl pH8 50 mM, NaCl 150 mM, NP-40 1%, sodium deoxycholate 0.5%, Glycerol 10%, DTT 2 mM), supplemented with a cocktail of protease inhibitors (Sigma-Aldrich) and phosphatase inhibitors (Halt phosphatase; Thermo Fisher Scientific). After 10 min on ice, lysates were centrifuged (12,000g, 10 min, 4°C) and the supernatant containing soluble proteins was collected. Protein concentrations were measured using the DC Protein Assay Kit (Bio-Rad Laboratories). The NIPBL and GFP proteins were resolved by Tris-acetate NOVEX NuPAGE 3%–8% (Invitrogen™) or 12% TGX Stain-Free gels (Bio-Rad Laboratories), respectively. Then the proteins were transferred onto nitrocellulose membranes using the Trans-Blot Turbo system (Bio-Rad Laboratories). Membranes were then blocked in 5% nonfat milk and immunoblotted with appropriate primary antibodies: monoclonal mouse anti-GFP antibody (1:1000) (ref#11814460001; Roche), polyclonal rabbit anti-NIPBL antibody (1:3000) (A301-779A; Bethyl Laboratories). Membranes were then incubated with secondary peroxidase-labeled anti-mouse or anti-rabbit antibodies (1:10,000) from Jackson Immunoresearch Laboratories, and signals were detected with chemiluminescence reagents (ECL Clarity; Bio-Rad Laboratories). Signals were acquired with a GBOX (Syngene) monitored by the Gene Snap (Syngene) software. The signal intensity in each lane was quantified using the Genetools software (Syngene) and normalized with the Stain-Free signal quantified in the corresponding lane (ImageLab™ software; Bio-Rad Laboratories).
3 RESULTS
3.1 Identification of a CdLS patient with an AUG-introducing variant in the 5′-UTR of NIPBL
We included all patients with a clinical diagnosis of CdLS who had benefited from gene panel sequencing in our genetics laboratory and showed no molecular diagnosis following the interpretation coding sequence variants, representing 102 patients in September 2020. We hypothesized that some of these patients might still have pathogenic variant in the noncoding regions of these main genes. We focused on the 5′-UTR of NIPBL which was sequenced with a good quality in the targeted panel analysis but not routinely interpreted. Among 102 patients, one (Patient 1) exhibited a novel delins in the 5′-UTR of the NIPBL gene, NM_133433.3:c.-457_-456delinsAT, (chr5:g.36876903_36876904delinsAT) (ClinVar submission SUB10550850). The variant resulted in the replacement of a 5′-G-A-3′ sequence by a 5′-A-T-3′ one, right upstream of a G, thus resulting in a 5′-A-T-G-3′ sequence. BAM files analysis confirmed that the variant was indeed a delins with two single-nucleotide substitutions in cis. We sequenced this region in both unaffected parents and demonstrated that the variant occurred de novo. Parenthood was verified by assessing informative microsatellite markers. This variant is absent from gnomAD (v 2.1.1). No effect on splicing is predicted by the Maxentscan, NNsplice, and GeneSplicer tools. It is located in a sequence with strong predictions in favor of translation initiation following the interrogation of DNAFSMiner, NetStart and ATGpr (Figure 1), with higher scores than those of the natural initiation site. If this uORF is recognized by the ribosomal complex, a 90-codon peptide would be produced, with a stop codon upstream of the natural initiation site, thus not overlapping with the natural reading frame (Figures 2 and S1).


Patient 1 is a 15-year-old boy of healthy nonconsanguineous parents. He has two brothers who present learning disabilities. He was born on term with normal growth parameters (weight: 3550 g/45th centile, height: 49 cm/12th centile, head circumference: 33 cm/5th centile). His psychomotor development was delayed, characterized by walking at 18 months and delayed language: 10 words at 5 years and simple sentences at 8 years of age. Dysmorphic facial features had already been noticed at birth, including arched eyebrows, synophrys, anteverted nostrils, bulging philtrum, thin upper lip (Figure 3). Upon last examination at 8 years, he showed moderate to severe ID and growth retardation: weight 19.3 kg (−2.5 SD), height 118 cm (−2 SD) and head circumference 47.5 cm (−4 SD). He had clinodactyly of the 5th fingers, brachymetacarpia of the 1st ray and hypertrichosis. He also presented bilateral undescended testes managed surgically. Neurological examination was unremarkable. Brain magnetic resonance imaging indicated a mild thin corpus callosum and cardiac ultrasound was normal. Previous array comparative genomic hybridization did not reveal any pathogenic CNV.

3.2 Literature and patient-specific database search for AUG-introducing 5′-UTR NIPBL variants
In September 2020, database and literature search revealed only one variant in the 5′-UTR of NIPBL that was predicted to create a novel AUG codon (NM_133433.3:c.-94C>T). It was reported in 2007 by Selicorni et al. (2007), but no functional analyses were provided in this report. Analysis of familial segregation of the variant could not be performed due to the absence of parental samples. Thus, the pathogenicity of this variant could not be proven at that time. Only the NetStart prediction tool (Pedersen & Nielsen, 1997) predicted that this variant could create an uORF with higher scores than the natural start site of NIPBL (Figure 1). The two other prediction tool DNAFSMiner and ATGpr also predicted that this variant could create an uORF albeit with lower scores than the natural start site of NIPBL (Figure 1 and Figure S2).
The T allele of this chr5:g.36877266C>T, NM_133433.3:c.-94C>T variant results into a novel upstream ATG predicted to generate a novel uORF of 17 codons, not overlapping with the natural ORF (Figure 2 and Figure S1), similar to patient 1 although at a different genomic region.
Briefly, the patient reported by Selicorni et al. (2007) (referred here to as Patient 2) was a 1-year-old female with facial dysmorphism suggesting a diagnosis of CdLS and recurrent otitis. She did not present intrauterine or postnatal growth retardation, limb reduction or major malformations. She had moderate mental retardation. The phenotype appeared to be consistent with a rather mild phenotype compared to classic CdLS patients.
3.3 Assessment of two AUG-introducing variants in the 5′-UTR of NIPBL
We hypothesized that the detected variants could be responsible for the observed phenotype in these two patients, by impacting the translation efficiency of the natural protein. To validate this hypothesis, we developed a GFP reporter assay which consisted in cloning the 5′-UTR of NIPBL (wt or mutant) upstream of the coding sequence of the GFP. Following transient transfection of HEK293 cells, we first assessed GFP mRNA levels by RT-ddPCR. No differences in GFP expression levels could be observed in wt and mutant conditions. Furthermore, neomycin expression analysis indicated similar transfection efficiency in all conditions (Figure 4a and Figure S3). We then measured the amounts of GFP protein by western blotting (WB). Both 5′-UTR variants were associated with a significant decrease in GFP protein levels compared to the wt 5′-UTR (Figure 4b). The decrease appeared to be more pronounced in the construct carrying variant c.-457_-456delinsAT from Patient 1 (81% decrease as compared to the wt construct, p = 0.0001, t test) than the c.-94C>T from Patient 2 (41% decrease as compared to the wt construct, p = 0.0001; comparison between c.-457_-456delinsAT and c.-94C>T: p < 0.0001).

In Patient 1, we had the opportunity to confirm the results of the reporter assay in patient-derived LCL. We first confirmed by Sanger sequencing the presence of the variant in the patient's LCL and its absence in 4 independent controls LCL. Relative NIPBL mRNA levels were assessed as compared to the TOP1 housekeeping gene. Consistent with the absence of modification of GFP mRNA levels in our reporter assay, relative NIPBL mRNA levels in LCL from Patient 1 were comparable to that of the four controls LCL (Figure 4c). We then analyzed NIPBL protein expression by WB (Figure 4d). NIPBL expression in LCL from Patient 1 was 50% lower compared to control LCL, thus validating the results obtained in our reporter assay and confirming the strong impact of the 5′-UTR variant on NIPBL expression.
4 DISCUSSION
CdLS is a rare neurodevelopmental disorder, overwhelmingly linked to pathogenic variants in the NIPBL gene. Most of the variants described to date are located in the coding sequence of NIPBL. As reduced expression of NIPBL could also be caused by mutations in noncoding regions, we set out to screen the 5′-UTR region of NIPBL in CdLS cases that remained negative after gene panel sequencing. Here, we assessed two variants, one of which is novel, in the 5′-UTR of NIPBL creating uORFs and resulting in NIPBL translation repression, thus representing a novel mechanism causing CdLS. The number of diseases known to be caused by mutations that introduce or disrupt uORFs is increasing, albeit remaining scarcely reported (Barbosa, Peixeiro, et Romão 2013; Labrouche-Colomer et al., 2020; Romanelli Tavares et al., 2019; Wright et al., 2021; Zhou et al., 2018). However, variants located in the 5′-UTR region of NIPBL are rarely identified and require further investigation to assess putative pathogenicity (Borck et al., 2006; Selicorni et al., 2007). The increasing availability of whole genome sequencing may be associated with an increased discovery of noncoding variants of interest, including 5′-UTR variants. Despite the development of annotation tools dedicated to noncoding regions, validation by functional assays remain mandatory in such cases, before allowing the use of such results for genetic counseling.
In our NGS data, we identified only one patient with a de novo variant in the 5′-UTR of NIPBL, generating a novel uORF, and one in the literature. The probability of possible translation initiation was assessed by three online tools, DNA functional site miner, NetStart and ATGpr. The prediction tools suggested that both uAUGs could potentially compete with the endogenous NIPBL start AUG as translation initiation sequences, based on the scores delivered by these tools. Of note, the c.-94C>T showed lower scores than the one of Patient 1's variant. Our reporter assay suggested a stronger decrease of GFP levels with Patient 1's variant than with Patient 2's. The sequence contexts at these positions both provided good matches with the Kozak consensus for translation initiation (Kozak, 1986). Interestingly, Patient 2's variant was associated with lower Kozak consensus scores by all prediction tools. These results appear to be consistent with previously published data suggesting that the amount by which the translation in reduced seems to be dependent on the uAUG match to the Kozak consensus sequence (Labrouche-Colomer et al., 2020; Wright et al., 2021). Interestingly, the phenotype of Patient 2 (and her father) described by Selicorni et al. was milder than the phenotype of Patient 1, which seems consistent with the hypothesis of a milder effect of the c.-94C>T.
uORFs are typically described as repressors of translation initiation at the downstream main ORF by different mechanisms (Silva et al., 2019) and we show here two variants actually acting through a similar loss-of-function mechanisms of expression regulation by reducing normal translation of the NIPBL coding sequence. We highlighted through functional assays looking at variants effects on mRNA and protein levels using first a GFP reporter assay, and then LCL generated from Patient 1's blood, that these variants likely reduce translation from the main open reading frame without disturbing mRNA levels and thus the transcription machinery. The reduction in expression of the natural ORF was not complete, more specifically for Patient 2's variant, suggesting that either skipping of the mutant AUG could occur, as suggested by lower Kozak scores, or that following translation of the uORF, reinitiation of translation at the natural ORF may still be possible. It seems that the likelihood of translation reinitiation following uORF scanning is higher in case of shorter uORF, as is the case for the c.-94C>T variant (Barbosa et al., 2013).
In theory, premature termination codon (PTC) of uORF should be recognized by the Nonsense-Mediated Decay (NMD) machinery and thus lead to decreased mRNA levels. In the case of patient 1, for whom we had the opportunity to assess mRNA levels in LCLs, stable mRNA levels suggested that NMD is not induced. It is unclear how such a PTC may escape NMD. It seems that the shorter uORFs are, the lower the chances are that the PTC are recognized by the NMD machinery. No clear threshold has been established in mammalian cells, to our knowledge, although it seems that 35 codons are required to trigger NMD in plants (Barbosa et al., 2013; Nyikó et al., 2009). However, the predicted uORF would be 90-codons long in the case of Patient 1 variant. There are some other examples of uORFs longer than 35 codons that still resist to NMD, including some circumstances in stress conditions leading to translation initiation factor eIF2α phosphorylation (Barbosa et al., 2013), but it remains unclear, in our specific case, why the uORF apparently escapes NMD. We can hypothesize that there is either (1) ribosome dissociation with consequent ribosome recycling following the scanning of the stop codon of the novel uORFs upstream of the natural ORF, or (2) ribosome stalling, in which the elongating/terminating ribosomes would be blocked because of the presence of secondary structures in the uORF, an uORF-specific nucleotide context, or the interaction with trans-acting factors. Another possibility would be that uORF-encoded peptides that depend on their amino acids sequence and their interaction with the translational machinery, can themselves induce ribosome stalling and dissociation. Further functional studies would be necessary to understand the specific mechanisms resulting in decreased protein levels in both mutant conditions and absence of NMD in the case of Patient 1.
Despite being non conclusive on the mechanistic aspects, we show here, as previously reported for other 5′-UTR variants (Borck et al., 2006; Hornig et al., 2016; Labrouche-Colomer et al., 2020; Romanelli Tavares et al., 2019; Zhou et al., 2018) that a simple reporter assay, possibly associated with Patient's cells assessment when available, may be sufficient to provide evidence of a loss-of-function effect and thus reclassify variants of unknown significance as likely pathogenic.
Recently, noncoding variants in the 5′-UTR of haploinsufficient genes MEF2C and STXBP1 have been identified in patients with intellectual disabilities (Wright et al., 2021), again expanding the Mendelian diseases where such noncoding variants can be identified. In contrast to the case of NIPBL in the two variants from our report, these variants were acting via three distinct loss-of-function mechanisms at different stages of expression regulation: (i) deletions removing the promoter and part of the 5′-UTR predicted to abolish normal transcription of MEF2C, (ii) out-of-frame ORFs overlapping the coding sequence, reducing normal translation of MEF2C and STXBP1 if translation initiates at the uAUG, (iii) in-frame uAUGs elongating the MEF2C coding sequence and reducing the function of the encoded elongated protein by disrupting its binding ability to DNA.
In addition, while this article was under evaluation, we identified another AUG-introducing variant in the 5′-UTR of NIPBL upon genetic screen of additional CdLS patients by gene panel sequencing in our genetics laboratory. This NM_133433.3:c.-467C>T, p.? (Chr5:36876893C>T) variant (ClinVar submission SUB11303438) also creates an ATG with strong in silico predictions of a use as an alternative translation initiation site with a putative uORF, is located 11-bp upstream from the c.-457_-456delinsAT variant that we assessed here, occurred de novo in a patient with classic CdLS, and was very recently reported in another CdLS patient in the Clinvar database, classified as pathogenic by the Cell and Gene Engineering Laboratory, Zhejiang University (evidence not provided in Clinvar, accession number: SCV001775539).
In summary, we have demonstrated that two variants in the 5′-UTR of NIPBL are likely to cause CdLS by reducing NIPBL expression at the translational level. Our results suggest, together with this very last finding, that mutations in the 5′-UTR of NIPBL are rare events that could account for a proportion of unsolved CdLS cases.
ACKNOWLEDGMENTS
We thank the patient and his parents as well as referring physicians for their participation to this study. This work is dedicated to the memory of Professor Lionel Van Maldergem who passed away during the writing of the paper. This study was co-supported by the European Union and Région Normandie in the context of Recherche Innovation Normandie (RIN2018). Europe gets involved in Normandie with the European Regional Development Fund (ERDF). This work was generated within the European Reference Network for Developmental Anomalies and Intellectual Disability.
CONFLICTS OF INTEREST
The authors declare no conflicts of interest.
WEB RESOURCES
DNA functional site miner (http://dnafsminer.bic.nus.edu.sg/), NetStart (https://services.healthtech.dtu.dk/service.php?NetStart-1.0), and ATGpr (https://atgpr.dbcls.jp/).