Next-generation phenotyping in Nigerian children with Cornelia de Lange syndrome
Abstract
Next-generation phenotyping (NGP) can be used to compute the similarity of dysmorphic patients to known syndromic diseases. So far, the technology has been evaluated in variant prioritization and classification, providing evidence for pathogenicity if the phenotype matched with other patients with a confirmed molecular diagnosis. In a Nigerian cohort of individuals with facial dysmorphism, we used the NGP tool GestaltMatcher to screen portraits prior to genetic testing and subjected individuals with high similarity scores to exome sequencing (ES). Here, we report on two individuals with global developmental delay, pulmonary artery stenosis, and genital and limb malformations for whom GestaltMatcher yielded Cornelia de Lange syndrome (CdLS) as the top hit. ES revealed a known pathogenic nonsense variant, NM_133433.4: c.598C>T; p.(Gln200*), as well as a novel frameshift variant c.7948dup; p.(Ile2650Asnfs*11) in NIPBL. Our results suggest that NGP can be used as a screening tool and thresholds could be defined for achieving high diagnostic yields in ES. Training the artificial intelligence (AI) with additional cases of the same ethnicity might further increase the positive predictive value of GestaltMatcher.
1 INTRODUCTION
GestaltMatcher is an artificial intelligence (AI) that has been developed to analyze phenotypic features in medical imaging data of patients with rare diseases (Hsieh et al., 2022). The technology has been evaluated in a prospective setting in a large study of the national healthcare system in Germany and shown to increase the diagnostic yield of exome sequencing (Schmidt et al., 2023). In that study, a multidisciplinary team (MDT) of clinicians had to assess the clinical features of patients who were suspected of having rare disorders. If the MDT concluded that a genetic cause was likely, the patients were subjected to exome sequencing (ES), and the results of the GestaltMatcher analysis were used in the data analysis (Hsieh et al., 2019). We hypothesized that gestalt scores could be used not only in variant prioritization and classification but also in the decision-making process prior to diagnostic testing. Therefore, we computed the positive likelihood ratio (LR+) for different disorders based on the GestaltMatcher scores and defined thresholds suggesting a high positive predictive value (PPV) (Lesmann, Klinkhammer, & Krawitz, 2023; Lesmann, Lyon, et al., 2023). We identified two Nigerian patients submitted to GestaltMatcher who scored high for Cornelia de Lange syndrome (CdLS) and for whom no genetic testing had yet been performed.
CdLS (OMIM #122470, 300590, 610759, 300882, and 614701) is an autosomal dominant multisystem malformation syndrome with an estimated incidence of 1:10,000 to 1:30,000 live births (Mannini et al., 2013). CdLS can be caused by pathogenic variants in the regulatory subunits of the cohesin complex encoded by NIPBL and HDAC8 or in the structural subunit of SMC1A, SMC3, and RAD21.
NIPBL is the major CdLS disease gene, and heterozygous variants are identified in more than 65% of affected individuals (Musio et al., 2006; Rohatgi et al., 2010). CdLS is recognized by characteristic facial dysmorphism, including low anterior hairline, arched eyebrows, synophrys, anteverted nares, maxillary prognathism, long philtrum, thin lips, and ‘carp’ mouth, in association with prenatal and postnatal growth retardation, developmental delay and upper limb anomalies.
Few publications have focused on syndromes in diverse populations (Dowsett et al., 2019). As a result, many clinicians are trained with clinical genetic resources where only individuals of European descent serve as the standard of reference (Muenke et al., 2016). Here, we compare facial dysmorphism and findings from physical exams in individuals from underrepresented minority groups with CdLS and demonstrate how facial analysis technology can be a useful clinical tool in the diagnosis of individuals from diverse ancestral backgrounds and developing countries. Next-generation phenotyping approaches (NGP), such as GestaltMatcher (Hsieh et al., 2022), have been proven capable of facilitating the diagnosis of rare disorders (Forwood et al., 2023; Schmidt et al., 2023). However, they have not yet been evaluated systematically on ethnically diverse datasets, as many populations are still disproportionately underrepresented, notably Africans. In this study, we show that GestaltMatcher can predict NIPBL, the disease-causing gene, correctly on the first rank for both individuals. The clinical diagnosis was molecularly confirmed by trio exome sequencing analysis. We demonstrated the effectiveness of the NGP approach on two Nigerian patients. Moreover, utilizing this technology is crucial to facilitate the diagnostic process and to reduce the cost to the healthcare system in developing countries with limited or no access to specialist medical geneticists.
2 PATIENTS, MATERIALS, AND METHODS
2.1 Clinical details
2.1.1 Patient 1
A 5 months old boy (Figure 1a) with global developmental delay, microcephaly, growth delay, cryptorchidism, micropenis and pulmonary artery stenosis was seen for syndromal diagnostics. Craniofacial dysmorphic features included synophrys, long eyelashes, thick hair, depressed nasal bridge, and low anterior hairline. He also presents bilateral postaxial hand polydactyly. The boy has had balloon valvuloplasty for the pulmonary stenosis and ochiopexy for cryptorchidism. He is yet to speak at 3 years and 1 month. No further psychological and learning tests were documented and brain MRI was also not executed for the child.

He is the second child born to non-consanguinous healthy parents. Pregnancy history was unremarkable. The older brother has no obvious abnormality.
2.1.2 Patient 2
The 16 months old boy presented with global developmental delay, intellectual disability, growth delay and cryptorchidism. He had dysmorphic facial features typical for CdLS, including synophrys, highly arched eyebrow, long eyelashes, facial hirsutism, hypertelorism, epicanthus, depressed nasal bridge, long and smooth philtrum, low anterior hairline, low-set ears, anteverted nares, thin upper and lower lip vermilion, prominence of the premaxilla and a high palate. He also had split hand deformity (Figure 1b). At 4 years of age, he was still unable to walk and no visual contact was possible. Psychological and learning tests were not documented and brain MRI was not carried out.
The patient was born to non-consanguinous healthy parents after 38 weeks gestation. Pregnancy was uneventful but the baby was small for gestational age with birth weight of 1.8 kg. Family history was unremarkable. The boy is the first of three children. His two healthy sisters are 2 and 3 years old.
3 MATERIALS AND METHODS
DNA was extracted from saliva and/or blood samples, enriched with TWIST Exome CoreRefSeq, and sequenced with a 100 bp paired-end protocol on a NovaSeq 6000. Sequence reads were mapped to the reference genome GRCh37 with BWA-MEM, and variants were called with GATK according to best practice guidelines. 95% of the target region was covered by at least 20x. We utilized GestaltMatcher to analyze the patient's facial image (Hsieh et al., 2023; Hustinx et al., 2022). The facial image is first encoded by the GestaltMatcher model into the facial phenotype descriptor. We then calculated the cosine distance between the test image to 9764 facial images of patients with 528 different rare disorders in the GestaltMatcher Database (GMDB; Lesmann, Klinkhammer, & Krawitz, 2023; Lesmann, Lyon, et al., 2023). The suggested disease-causing gene list was further ranked by sorting the cosine distances in the accessing order. Variant analysis was done using the prioritization of exome data by image analysis (PEDIA) approach (Hsieh et al., 2019) in the VarFish platform (Holtgrewe et al., 2020). First, phenotypic similarities of clinical features to rare disorders were assessed by CADA with the Human Phenotype Ontology (HPO) terminology (Peng et al., 2021). We then quantified the facial gestalt scores by analyzing the facial image with GestaltMatcher. On the molecular level, the deleteriousness of pre-filtered variants was assessed by CADD scores (Kircher et al., 2014), and the highest score per gene was annotated. All scores were used to compute a PEDIA score per gene, and genes were ranked by the PEDIA score in descending order. The final output of the PEDIA score for each variant can be utilized for prioritization. Further evaluation was carried out using VarFish11. The identified sequence variants were assessed on the basis of public databases (Deciper [https://decipher.sanger.ac.uk/], ClinVar [https://www.ncbi.nlm.nih.gov/clinvar/], 1kGP12, Iranome13, GME Variome14 and gnomAD [http://gnomad.broadinstitute.org/]). Recommendations for the classification of variants according to ACMG guidelines were also taken into account.
4 RESULTS
4.1 Facial analysis technology and molecular genetic findings
4.1.1 Patient 1 (A-II-1)
We first analyzed the portrait image of individual 1 with GestaltMatcher, and CdLS was ranked in the top-1 position, indicating the most likely diagnosis based on syndromic similarity. In fact, the 10 most similar patients in GMDB had a pathogenic variant in NIPBL, amongst them five of African ethnicity. Moreover, Gradient-weighted Class Activation Mapping (Grad-CAM) (Selvaraju et al., 2017) also indicated a depressed nasal bridge and typical synophrys, which are relevant facial dysmorphic features to CdLS in Figure 1a.
By trio exome sequencing, we identified a novel pathogenic de novo frameshift variant in NIPBL, NM_133433.3:c.7948dup (p.Ile2650Asnfs*11), establishing the diagnosis of CdLS1 (Figure 1c). The heterozygous duplication was covered with 230 reads, shows an alternate allele balance of 0.36 (ref: 229 reads C; alt: ins A 83 reads), and was only observed in the index while the parents are homozygous for the reference allele. The single base pair duplication c.7948dup leads to an exchange of the highly conserved amino acid isoleucine to asparagine at position 2650 following a frameshift and a premature stop codon after 11 amino acids in NIPBL. Bioinformatic prediction by MutationTaster and UMD Predictor classified this variant as pathogenic. Using PEDIA, the disease-causing variant in NIPBL was prioritized at rank 1.
On the basis of the clinical findings, facial analysis, de novo mutation status, and impact on RNA and protein level (loss of function mutation), we classify the novel variant c.7948dup p.(Ile2650Asnfs*11) in NIPBL as disease-causing in the affected individual.
4.1.2 Patient 2 (B-II-1)
For individual 2 the consent for portrait image publication was not given, but GestaltMatcher also ranked NIPBL in the top-1 position, which was based on the first nine most similar patients in GMDB with this molecular diagnosis. Six of these patients were also of African ethnicity. Trio exome sequencing confirmed CdLS1 for case 2 and revealed a heterozygous de novo nonsense variant NM_133433.4:c.598C>T (p.Gln200*) in NIPBL (Figure 1c). The heterozygous variant was covered with 89 reads, shows an alternate allele balance of 47% (ref: 47 reads C; alt: T 42 reads), was only observed in the index, while the parents are homozygous for the reference allele. The single base exchange c.598C>T leads to an exchange of a highly conserved amino acid glutamine to a stop codon at position 200 in NIPBL. Bioinformatic prediction by MutationTaster and UMD Predictor classified this variant as pathogenic with a CADD score of 37. The variant is listed as pathogenic in ClinVar: Accession ID: VCV000159167.2; Variation ID: 159167. Using PEDIA, NIPBL was prioritized on position 1.
On the basis of the clinical findings, facial analysis, de novo mutation status, and impact on RNA and protein level (loss of function mutation), we also classify the variant c.598C>T p.(Gln200*) in NIPBL as disease-causing in the affected individual.
5 DISCUSSION
CdLS is a syndromic disorder with a highly distinctive facial gestalt that can be learned and recognized by an AI, such as GestaltMatcher (Hsieh et al., 2019). Depending on the measured similarity the gestalt score can provide supporting or even moderate evidence for the pathogenicity of variants that are identified in the genes associated with this phenotype (Lesmann, Klinkhammer, & Krawitz, 2023; Lesmann, Lyon, et al., 2023; Tavtigian et al., 2018). Similarly to the Bayesian reasoning in variant classification, a threshold for the similarity to CdLS could be used prior to genetic testing to adjust the expected diagnostic yield, especially in situations where resources are limited or costs need to be strictly managed. We tested this approach in a cohort of cases with facial dysmorphism that was collected in Nigeria. We identified two individuals with gestalt scores of 0.58 and 0.49 for CdLS and submitted samples for genetic testing. Trio exome analyses established the molecular diagnoses by revealing pathogenic de novo variants in NIPBL. It is noteworthy that the strong phenotypic match could only be achieved due to molecularly solved cases that were already available in the GestaltMatcher training set. Currently, only 4% of all individuals in the GestaltMatcher database are of African descent. If only CdLS patients of European ethnicity had been used for training the AI, the measured phenotypic similarity and therefore the achieved evidence of pathogenicity would have been lower (Figure 2). This underlines the importance of collecting and sharing medical imaging data of patients from ethnicities currently underrepresented in data collections for AI.

6 CONCLUSION
Our findings demonstrate that GestaltMatcher can predict CdLS in two Nigerian children and we suggest that NGP can be used as an effective screening tool in the diagnostic process for rare diseases in developing countries.
AUTHOR CONTRIBUTIONS
Annabelle Arlt: Writing—original draft, Writing—review & editing. Alexej Knaus: Writing—original draft, Writing—review & editing. Tzung-Chien Hsieh: Software, Writing—original draft, Writing—review & editing. Hannah Klinkhammer: Formal analysis, Writing—original draft, Writing—review & editing. Meghna Ahuja Bhasin: Formal analysis. Alexander Hustinx: Software, Formal analysis. Shahida Moosa: Writing—original draft, Writing—review & editing. Peter Krawitz: Writing—original draft, Writing—review & editing. Ekanem Ekure: Writing—original draft, Writing—review & editing.
ACKNOWLEDGMENTS
We acknowledge the NGS Core Facility for their expertise in ES data generation and the CUBA for their proficient data processing, both of which were integral to this study's success. Their contributions have been indispensable in advancing our scientific understanding. And the authors thank the family members who participated in this study.
FUNDING INFORMATION
Annabelle Arlt is supported by the UPS-NDDiag project from European Joint Programme Rare Disease (EJP RD).
CONFLICT OF INTEREST STATEMENT
The authors declare not to have any competing conflict of interest.
ETHICS STATEMENT AND CONSENT FOR PUBLICATION
The study was approved by the ethics commitee of University of Lagos, all participants were informed about the research objectives and approved the data analysis. For depicted photographs written consent was provided by their legal guardians.
Open Research
DATA AVAILABILITY STATEMENT
The variants have been submitted to ClinVar (Accession: VCV001328515.1 and VCV000159167.6).