Application of facial analysis Technology in Clinical Genetics: Considerations for diverse populations
Abstract
Facial analysis technology in rare diseases has the potential to shorten the diagnostic odyssey by providing physicians with a valuable diagnostic tool. Given that most clinical genetic resources focus on populations of European descent, we compare craniofacial features in genetic syndromes across different populations and review how machine learning algorithms perform on diagnosing genetic syndromes in geographically and ethnically diverse populations. We also discuss the value of populations from ancestrally diverse backgrounds in the training set of machine learning algorithms. Finally, this review demonstrates that across diverse population groups, machine learning models have outstanding accuracy as supported by the area under the curve values greater than 0.9. Artificial intelligence is only in its infancy in the diagnosis of rare disease in diverse populations and will become more accurate as larger and more diverse training sets, including a wider spectrum of ages, particularly infants, are studied.
There are roughly 7000 rare diseases, mostly genetic in origin (Haendel et al., 2020). The ability for a physician to diagnose all these rare diseases based on history, exam, and pattern recognition is near impossible. A failure to diagnose any rare disease starts a patient and their family on the diagnostic odyssey which can last years or may never be resolved, denying patients precision therapies, increasing expenses and stress for families and the society at large. Even for common syndromes such as aneuploidy syndromes, diagnoses are often missed (Kruszka, Porras, Sobering, et al., 2017). Early deployment of exome and genome sequencing has greatly assisted decreasing the diagnostic odyssey (Claussnitzer et al., 2020); however, this technology is still not universally available.
The medical literature in genetic syndromes has focused on individuals of European descent (Muenke et al., 2016). Paralleling the historical lack of global genetic diversity in reference genomes (Sirugo et al., 2019), there is a paucity of data about phenotype variation in ancestrally diverse populations. In a recent series of genetic syndromes in diverse populations in the American Journal of Medical Genetics Part A (Dowsett et al., 2019; Kruszka, Addissie, McGinn, et al., 2017; Kruszka et al., 2020; Kruszka, Porras, Addissie, et al., 2017; Kruszka, Porras, Sobering, et al., 2017; Tekendo-Ngongang et al., 2020), in many conditions, it was difficult to find individuals from diverse populations, even with a large number of co-authors and contributors from around the globe. As an example, in the study of Cornelia de Lange syndrome, only 6% of the study participants were from an African background (Dowsett et al., 2019). In the Down syndrome study, the investigators were only able to enroll 65 individuals, from 13 countries, with diverse back grounds (Kruszka, Porras, Addissie, et al., 2017; Kruszka, Porras, Sobering, et al., 2017).
One of the most common applications of artificial intelligence (AI) in clinical genetics is the use of facial analysis technology. A recent study showed that a neural network classifier outperformed the average clinical geneticists in recognizing two common genetic conditions (Duong et al., 2022). Like any AI platform, accuracy greatly depends on the data set used to train the algorithm. In the case of global populations, the accuracy of facial analysis technology is contingent upon using training sets which include individuals from ancestrally diverse groups. Multiple studies have not only established that the distinction between facial features considered to be normal variations and minor anomalies may be subtle based on individuals’ background (Christianson, 1996; Lumaka et al., 2017); but also differences in facial features between populations exist in common genetic conditions such as 22q11.2 deletion syndrome (Kruszka, Addissie, McGinn, et al., 2017; Lumaka et al., 2017; McDonald-McGinn et al., 2005; Veerapandiyan et al., 2011). As an example, the phenotype classification of ‘minor anomaly’ applied to white populations may be common in some African populations such as broad nasal bridge or thick vermillion borders of the lips (Lumaka et al., 2017). Lumaka and colleagues demonstrate the detrimental effect of the training set of a machine learning algorithm not matching the tested population. In their study, a 38.6% diagnostic rate was obtained using Face2Gene, an AI classifier based on convolutional neural networks (CNNs), in Congolese individuals with Down syndrome; however, when the AI classifier is retrained with individuals from Africa, the diagnostic yield increased to 94.7% (Lumaka et al., 2017).
In the American Journal of Medical Genetics Part A series on diverse populations noted above, these studies documented both subjective physical exam findings and objective assessments using facial analysis technologies. Initially, the machine learning classifier used in the series was the support vector machine (SVM) (Kruszka, Porras, Sobering, et al., 2017); this eventually moved to deep convolutional neural networks (DCNN) (Kruszka et al., 2020; Tekendo-Ngongang et al., 2020). Both SVM and DCNN were accurate in discriminating genetic syndromes in diverse populations from age and ethnic matched controls. Using area under the curve (AUC) as a measure of overall accuracy, with greater than 0.9 as outstanding and 0.8–0.9 as excellent (Mandrekar, 2010), Table 1 shows the accuracy of facial analysis technology in seven different genetic syndromes. All studies achieve an AUC greater than 0.9, demonstrating the effectiveness of this tool. A limitation of the above studies is that the comparisons in this series were binary, and the studied conditions were usually compared to age, gender, and ethnic matched controls. However, in the Turner syndrome study, individuals with Turner syndrome were also compared to individuals with Noonan syndrome (Kruszka et al., 2020). A binary classification scheme is not applicable to most clinical scenarios where there are multiple conditions in the differential diagnosis, or the physician only suspects a genetic syndrome without the ability to develop a differential diagnosis. One advantage of a large database such as Face2Gene (Gurovich et al., 2019) is the ability to diagnose a patient from thousands of syndromes. In the Turner syndrome study, the Face2Gene application was used to demonstrate the diagnostic effectiveness of facial analysis technology, with Turner syndrome appearing in the top five diagnoses for 72% of the participants (Kruszka et al., 2020).
Algorithm | Comparison group | AUC | ||
---|---|---|---|---|
Down syndrome | Kruszka, Porras, Sobering, et al. (2017) | SVM | Age and ethnicity matched, health controls | 0.978 |
22q11.2 deletion syndrome | Kruszka, Addissie, McGinn, et al. (2017) | SVM | Age and ethnicity matched, health controls | 0.987 |
Noonan syndrome | Kruszka, Porras, Addissie, et al. (2017) | SVM | Age and ethnicity matched, health controls | 0.94 |
Williams-Beuren syndrome | Kruszka et al. (2018) | SVM | Age and ethnicity matched, health controls | 0.95 |
Cornelia de Lange syndrome | Dowsett et al. (2019) | SVM | Age and ethnicity matched, health controls | 0.98 |
Turner syndrome | Kruszka et al. (2020) | DCNN | Age and ethnicity matched, health controls and Noonan syndrome | 0.91 (unaffected); 0.93 (Noonan syndrome) |
Rubinstein-Taybi syndrome | Tekendo-Ngongang et al. (2020) | DCNN | Age and ethnicity matched, health controls | 0.99 |
- Abbreviations: AUC, area under the curve; DCNN, deep convolutional neural networks; SVM, support vector machine.
In addition to facial analysis technology, the diverse syndrome series also documented physical exam findings of expert clinicians and showed that some syndromes differed by population group. In Down syndrome across multiple population groups, only upslanting palpebral fissures and flat facial profile were found in greater than 50% of individuals (Kruszka, Porras, Sobering, et al., 2017). Additionally, the cardinal findings of brachycephaly, ear anomalies, clinodactyly, sandal gaps, and abundant neck skin were significantly less observed in Africans compared to other groups (Kruszka, Porras, Sobering, et al., 2017). Contrary to Down syndrome, we observed some genetic syndromes where exam findings were consistent across population groups. In CdLS, we found three facial features (synophrys, short nose/anteverted nares, and long philtrum) to be consistent across all population groups (Dowsett et al., 2019). Noonan syndrome also had similar exam findings across all populations with widely spaced eyes, low set ears found across 80% of the cohort (Kruszka, Porras, Addissie, et al., 2017). Interestingly, these findings conflicted with a small study from South Africa where only 25% of individuals with Noonan syndrome from African descent had wide spaced eyes (Tekendo-Ngongang et al., 2019).
Regardless of the variation in physical exam findings across population groups and across different studies, facial analysis technology has proven to be accurate (see Table 1). The above studies examining physical exam and machine learning-based facial analysis findings are only scratching the surface of what needs to be done in clinical genetics as pertains to diverse populations. One important piece missing from this dataset and facial analysis studies is infants with genetic syndromes, especially as early diagnosis may allow for a change in management. The concept of a precise diagnosis is fueling an interest in testing newborns for conditions beyond the current limited newborn screening programs (Gold et al., 2023). Recently, Duong et al. using a neural network classifier showed that their algorithm was able to accurately distinguish between Williams syndrome and 22q11.2 deletion syndrome across all ages, including infants (Duong et al., 2022). Research in the accuracy of facial analysis technology in newborns, especially in conditions with actionable findings, would be of value in resource limited settings where other screening modalities are not always available.
Another limitation of the studies of genetic syndromes in diverse populations was the relatively small size of cohorts. Future studies will need larger training sets to improve accuracy, and the clinical genetics community would benefit from more ethnic diversity in syndrome references. An illustration of a diverse reference source is the “Atlas of Human Malformation Syndromes in Diverse Populations”, a website created by the National Human Genome Research Institute that features individuals with genetic syndromes (Muenke et al., 2016). This online open access resource features individuals from a spectrum of geographically diverse locations and includes facial images and other relevant body areas. Projects such as an atlas in diverse populations that are at the juncture of imagery, medicine, race, and ethnicity bring forward ethical questions. Koretzky et al. discuss in detail these ethical issues including recommendations for equitable use, structure, and maintenance of a diverse morphological atlas (Koretzky et al., 2016). In addition to building larger diverse cohorts, there are technologies that will assist diagnostic accuracy such as using generative adversarial networks (GANs). In a recent study, fake images created by a GAN combined with real images resulted in a modest improvement in classifier accuracy (Duong et al., 2022).
Although we are at the beginning of using AI in clinical genetics, the evidence is mounting that facial analysis technology is accurate and may efficiently complement clinical experts. As larger training sets are assembled across diverse populations and for all ages, we predict a time when facial analysis technology will be as common a tool as the stethoscope for rare disease.
ACKNOWLEGEMENTS
Paul Kruszka is the chief medical officer of GeneDx LLC and Cedrik Tekendo-Ngongang is a senior director at the n-Lorem Foundation.
Open Research
DATA AVAILABILITY STATEMENT
Data sharing is not applicable to this article as no new data were created or analyzed in this study.