Prevalence rates for ectodermal dysplasia syndromes
Abstract
Background
Ectodermal dysplasias (EDs) are a heterogeneous group of genetic conditions affecting the development and/or homeostasis of two or more ectodermal derivatives, including hair, teeth, nails, and certain glands. There are currently 49 recognized EDs with molecularly confirmed etiology. The EDs are very rare disorders, individually and in aggregate. Very little is published regarding the prevalence of these rare disorders. As a result of the genomics revolution, rare diseases have emerged as a global health priority. The various disabilities arising from rare disorders, as well as diagnostic and treatment uncertainty, have been demonstrated to have detrimental effects on the health, psychosocial, and economic aspects of families affected by rare disorders. Contemporary research methodologies and databases can address what have been historic challenges encountered when conducting research on rare diseases.
Objective
In this study, we aim to ascertain period prevalence rates for several of the more common ectodermal dysplasia syndromes, by querying a large multicenter database of electronic health records, Oracle Real-World Data.
Methods
For each of the included ectodermal dysplasia syndromes a clinical definition was developed by a committee of international experts with interests in EDs. The clinical definitions were based upon a combination of clinical features and designated by ICD-9 and ICD-10 codes. The January 2023 version of the Oracle Real-World Data database was queried for medical records that coincided with the clinical definitions. For our study, there were 64,523,460 individual medical records queried.
Results
Period prevalence rates were calculated for the following ED disorders: hypohidrotic ectodermal dysplasia, found to be 2.99 per 100,000; ectodermal dysplasia and immunodeficiency 1, 0.23 per 100,000; Clouston syndrome, 0.15 per 100,000; ectrodactyly ectodermal dysplasia and cleft lip/palate syndrome, 0.61 per 100,000; ankyloblepharon-ectodermal defects-cleft lip/palate syndrome, 0.36 per 100,000; focal dermal hypoplasia, 0.10 per 100,000; and incontinentia pigmenti, 0.88 per 100,000.
Conclusion
This study established estimated period prevalence rates for several of the ectodermal dysplasia syndromes, and it demonstrated the feasibility of utilizing large multicenter databases of electronic health records, such as Oracle Real World Data.
1 INTRODUCTION
Ectodermal dysplasias (EDs) are a heterogenous group of genetic conditions affecting the development and/or homeostasis of two or more ectodermal derivatives, including hair, teeth, nails, and certain glands (Wright et al., 2019). There have been rapid advancements in molecular genetics which have allowed for further understanding and clarification of this group of conditions and there are currently 49 known individual ED syndromes with molecularly confirmed etiologies (Peschel et al., 2022). The ectodermal dysplasia syndromes are very rare disorders, with the most common of the phenotypes, hypohidrotic ectodermal dysplasia (HED), being previously estimated to have a prevalence in Europe of 6.7 per 100,000 (Orphanet, 2023). Very few of the ED syndromes have prevalence rates determined, and many are simply noted in review articles as the small number of cases reported in the medical literature. One example of this is focal dermal hypoplasia, also known as Goltz Syndrome, which has been reported to have approximately 300 cases reported worldwide; however, this number has been stated in the medical literature since 1992, with no further updates (Goltz, 1992; Orphanet, 2023; Tadini et al., 2015).
As a result of the genomics revolution, rare diseases (RDs) have emerged as a global public health priority. The various disabilities arising from RDs, as well as diagnostic and treatment uncertainty, have been demonstrated to have detrimental influence on the health, psychosocial, and economic aspects of RD families (Chung et al., 2022). There are 6000–8000 unique RDs identified, and about 80% of these are genetic in origin. RDs are defined as conditions that affect fewer than 200,000 individuals in the United States, and fewer than 50 per 100,000 in Europe. It is increasingly important to establish reliable prevalence rates for these disorders, as emerging treatments demonstrate challenges in availability, accessibility, and affordability. Prevalence rates for these disorders will provide crucial information for affected families, researchers, research funding agencies, private and government health insurance agencies, and biopharmaceutical and other treatment development organizations. Important support for a public health approach to RDs comes through services rendered by patient advocacy organizations (Valdez et al., 2016). There is an increasing awareness of research methodologic approaches that can address the challenges to conducting robust research on RDs (Whicher et al., 2018).
In October 2021, the National Foundation for Ectodermal Dysplasias (NFED) sponsored an international conference focusing on translating discovery into therapy with a goal of advancing the diagnosis and treatment of conditions affecting ectodermal tissues with an emphasis on skin, hair, tooth, and eye phenotypes (Wright et al., 2023). The conference participants included researchers and clinicians. One of the five principal barriers to developing novel therapies for these conditions was identified as the lack of reliable prevalence data. Subsequently, the NFED established a working group to establish accurate prevalence rates for several of the more common ectodermal dysplasia syndromes.
Some countries, such as Denmark, have a comprehensive collection of medical registries (Lynge et al., 2011) which can be helpful for studying rare medical conditions. However, this is not commonplace throughout the world, including in the United States. Previously the primary large national population-based prevalence study on an ED syndrome was performed by Nguyen-Nielsen et al. in the Danish population on the most common ED, HED, and found the prevalence to range between 2.6 and 21.9 per 100,000 (Nguyen-Nielsen et al., 2013). More recently Herlin et al. were able to perform the first validated nationwide population-based cohort study in Denmark on EDs and found a prevalence of 14.5 per 100,000 live births for all EDs combined and 2.8 per 100,000 for X-linked HED (Herlin et al., 2024).
Multicenter databases of electronic health records are, however, becoming more common. Databases, such as the Oracle EHR Real-World Data (previously Cerner Real-World Data™ (CRWD)), are designed to help answer deep and complex research questions and provide larger sample sizes for RD studies (Ehwerhemuepha et al., 2022). In this current study, we sought to determine an estimated period prevalence for the more well-known EDs including—HED; ectodermal dysplasia with immune deficiency (EDAID1); ectrodactyly ectodermal dysplasia—cleft lip/palate (EEC); ankyloblepharon-ectodermal defects-cleft lip/palate (AEC); ectodermal dysplasia 2 hidrotic (Clouston syndrome); focal dermal hypoplasia (Goltz syndrome); and incontinentia pigmenti (IP). This was accomplished by developing a clinical definition for each of the EDs of interest based upon a combination of clinical features as identified by ICD-9 and/or ICD-10 codes within the CRWD™ now Oracle EHR Real-World Data (EHR RWD) version January 2023 and comparing the number of unique individuals within each definition to the total number of qualifying individuals within EHR RWD to create an estimated period prevalence.
2 EXPERIMENTAL DESIGN, MATERIAL, AND METHODS
2.1 Editorial policies and ethical consideration
The Institutional Review Board at the University of Missouri Columbia reviewed the study design and determined that it did not constitute human subjects research according to the Department of Health and Human Services regulatory definition since the study used de-identified data and no attempt was made to re-identify or contact subjects.
2.2 CRWD™ (now Oracle EHR real-world data)
EHR RWD is a de-identified big data source of multicenter electronic health records that is compiled from more than 100 health systems throughout the United States. The January 2023 version of EHR RWD includes data from 136 healthcare systems and consists of 100 million patients and 1.5 billion encounters across all care settings. Information regarding demographics, conditions (including ICD-9 and ICD-10 codes), encounters, immunization records, medications, procedures, and results are compiled from all participating health systems on each patient. This data is then merged and deidentified to create the EHR RWD. For our study, we included all individuals within the database who had a clinical encounter for which a diagnostic code was recorded in the January 2023 version of EHR RWD. An individual, for example, would be excluded if they solely entered the health care system because of attending an influenza vaccine clinic. 64,523,460 individuals met these inclusion criteria and served as our study population. The version used for the study included patients enrolled in EHR RWD with encounters from 1985 through January 2023.
2.3 Clinical definition
For each of the studied EDs, a clinical definition was created based upon the predominant major and/or minor clinical features. The clinical definitions were reviewed and revised to include the most salient features before agreement by a panel of experts. The panel was comprised of the NFED prevalence working group established following the 2021 international research conference. The panel included professionals from North America and Europe with a special interest in basic science, genetics, and/or clinical care of ectodermal dysplasias. Once the clinical definitions were agreed upon, the clinical features were translated into ICD-9 and/or ICD-10 codes for identification in EHR RWD. In addition, we also considered which combination of clinical features would both optimize identifying the ED of interest while decreasing chance of overlapping with another condition.
If an ED had a dedicated ICD-10 code, this was also used. This, however, was only the case for HED and IP. A summary of the clinical diagnosis/definition used to define each of the EDs is included in Tables 1–7. Clinical definitions composed of 2 or more salient features were not required to be exclusive and we would expect unique individuals meeting our clinical definitions to have other associated clinical features of their particular ED. For each of the clinical features used in the tables, corresponding ICD-9 and ICD-10 codes were identified. A list of the corresponding ICD-9 and ICD-10 codes can be found in the supplement.
Clinical definitions | Total individuals (out of 64,523,460) | Estimated period prevalence (per 100,000) |
---|---|---|
ICD-10 Q82.4 or ICD-9757.31 | 1191 | 1.85 |
Hypohidrosis + hypodontia | 55 | 0.09 |
Hypohidrosis + hypotrichosis | 404 | 0.63 |
Hypodontia + hypotrichosis | 350 | 0.54 |
Total unique individuals | 1929 | 2.99 |
Clinical definitions | Total individuals (out of 64,523,460) | Estimated period prevalence (per 100,000) |
---|---|---|
ICD-10 Q82.4 or ICD-9757.31 + immunodeficiency | 39 | 0.06 |
Hypohidrosis + immunodeficiency | 109 | 0.17 |
Total unique individuals | 147 | 0.23 |
Clinical definitions | Total individuals (out of 64,523,460) | Estimated period prevalence (per 100,000) |
---|---|---|
Hypotrichosis + nail dystrophy + palmoplantar hyperkeratosis | 98 | 0.15 |
Clinical definitions | Total individuals (out of 64,523,460) | Estimated period prevalence (per 100,000) |
---|---|---|
Ectrodactyly + hypohidrosis | 6 | 0.01 |
Ectrodactyly + hypodontia | 50 | 0.08 |
Ectrodactyly + hypotrichosis | 76 | 0.12 |
Ectrodactyly + cleft | 278 | 0.43 |
Total unique individuals | 391 | 0.61 |
Clinical definitions | Total individuals (out of 64,523,460) | Estimated period prevalence (per 100,000) |
---|---|---|
Skin erosion + cleft + hypohidrosis | 3 | 0.00 |
Skin erosion + cleft + hypodontia | 27 | 0.04 |
Skin erosion + cleft + hypotrichosis | 21 | 0.03 |
Ankyloblepharon + cleft | 184 | 0.29 |
Total unique individuals | 231 | 0.36 |
Clinical definitions | Total individuals (out of 64,523,460) | Estimated period prevalence (per 100,000) |
---|---|---|
Congenital skin aplasia + hypo-/hyperpigmentation + limb malformation | 19 | 0.03 |
Congenital skin aplasia + nail dysplasia + limb Malformation | 10 | 0.02 |
Congenital skin aplasia + telangiectasias + limb Malformation | 11 | 0.02 |
Congenital skin aplasia + coloboma + limb malformation | 29 | 0.04 |
Total unique individuals | 63 | 0.10 |
Clinical definitions | Total individuals (out of 64,523,460) | Estimated period prevalence (per 100,000) |
---|---|---|
IP ICD-10 Q82.3 | 569 | 0.88 |
2.4 Estimation of period prevalence
An estimated period prevalence was determined by cross-referencing and identifying unique individuals whose condition information contained ICD-10 and/or ICD-9 codes combinations that met our clinical definition and then comparing this to the total number of qualifying individuals within the database (64,523,460). If a particular individual met multiple clinical definitions analysis was done in a manner such that they were only counted once.
3 RESULTS
3.1 Hypohidrotic ectodermal dysplasia
HED is a phenotype characterized by hypohidrosis, hypotrichosis, and hypodontia. The disorder is due to pathologic variants in EDA, EDAR, EDARADD, or WNT10A. Clinical definitions for HED included the ICD-9 code for congenital ectodermal dysplasia and the ICD-10 code for ectodermal dysplasia (anhidrotic) along with the various combinations of at least 2 of the main phenotypical characteristics. The estimated period prevalence for each of the definitions as well as for the total unique individuals meeting at least one of the clinical definitions is found in Table 1. The overall period prevalence for HED was estimated to be 2.99 per 100,000.
3.2 Ectodermal dysplasia and immunodeficiency 1
EDAID1 is a disorder with phenotypic features similar to HED, but also associated with severe immunodeficiency. It is due to hemizygous pathogenic variants in IKK-gamma (IKBKG, previously known as NEMO) on chromosome Xq28. Clinical definition for EDAID1 included the ICD-9 code for congenital ectodermal dysplasia and the ICD-10 code for ectodermal dysplasia (anhidrotic) combined with immunodeficiency or the combination of hypohidrosis with immunodeficiency. The overall period prevalence for EDAID1 was estimated to be 0.23 per 100,000 (Table 2).
3.3 Clouston syndrome (ectodermal dysplasia 2 Hidrotic)
Clouston syndrome is characterized by dystrophic nails, hypotrichosis that can result in total alopecia and severe palmoplantar hyperkeratosis. It is due to heterozygous pathogenic variants in GJB6, which encodes connexin-30, on chromosome 13q12. Clinical definition for Clouston syndrome combined hypotrichosis, nail dystrophy, and palmoplantar hyperkeratosis with a total of 98 individuals identified and an estimated period prevalence of 0.15 per 100,000 (Table 3).
3.4 Ectrodactyly ectodermal dysplasia and cleft lip/palate syndrome (EEC)
EEC is characterized primarily by limb anomalies, hypotrichosis, dental anomalies, cleft lip with or without cleft palate. It is caused by pathogenic variants in TP63. For EEC the clinical definitions used combined the ectrodactyly with at least one other primary characteristic of hypohidrosis, hypodontia, hypotrichosis, or cleft lip and/or palate. The estimated period prevalence for each of the definitions as well as for the total unique patients meeting at least one of the clinical definitions is found in Table 4. The overall period prevalence for EEC was estimated to be 0.61 per 100,000.
3.5 Ankyloblepharon-ectodermal defects-cleft lip/palate syndrome (AEC syndrome)
AEC syndrome is characterized by ankyloblepharon filiforme adnatum, skin erosions that can be mild or life-threatening, cleft lip and/or cleft palate, dental anomalies, hypohidrosis, and hypotrichosis. AEC is caused by a pathogenic variant in TP63. For AEC the clinical definition required combination of skin erosion, cleft, and either hypohidrosis, hypodontia, or hypotrichosis. In addition, the combination of ankyloblephron along with cleft was also used. The overall period prevalence for AEC was estimated to be 0.36 per 100,000 (Table 5).
3.6 Focal dermal hypoplasia (Goltz syndrome)
Focal dermal hypoplasia, also known as Goltz syndrome or PORCN-related developmental disorders, is characterized by congenital patchy skin aplasia, congenital skin hypo- or hyperpigmentation, congenital nodular fat herniation, congenital nail dysplasia, and limb malformations. FDH is due to pathogenic variants in PORCN. A clinical definition combining congenital skin aplasia along with limb malformation along with either hypo-/hyperpigmentation, nail dysplasia, telangiectasias, or coloboma was used. Limb malformations used included ectrodactyly, syndactyly, oligodactyly, or long bone reduction defects. The estimated period prevalence for each of the definitions as well as for the total unique patients meeting at least one of the clinical definitions is found in Table 6. The overall period prevalence for Goltz syndrome was estimated to be 0.10 per 100,000.
3.7 Incontinentia pigmenti
IP is characterized by skin changes that progress through blisters, a wart-like rash, swirling macular hyperpigmentation, and linear hypopigmentation; alopecia; hypodontia; dental anomalies; dystrophic nails; and neurologic findings including seizures, and intellectual disability. IP is due to pathogenic variants in IKBKG. Given that IP has a unique and known clinical presentation and it has its own unique ICD-10 code it was decided that the unique ICD-10 would be used for the clinical definition. Overall period prevalence, however, in our study is underestimated as there is not a specific ICD-9 code for IP and there was no way in the database to separate patients that only had ICD-10 but not ICD-9 data. IP had an estimated period prevalence of 0.88 per 100,000 (Table 7).
4 DISCUSSION
To our knowledge this is the largest study, with over 64 million individuals, that seeks to provide an estimated period prevalence for the more common EDs including AEC and Goltz syndrome for which previous prevalence has been largely unknown. Having prevalence rates for specific rare conditions is crucial for affected families, researchers, research funding agencies, private and government health insurance agencies, and biopharmaceutical and other treatment development organizations. Table 8 compares our estimated period prevalence to the prevalence rates previously published by Orphanet (2023) for each of the EDs studied. Apart from Clouston syndrome, our estimates were similar. This study demonstrates the feasibility of utilizing large m electronic health records, such as Oracle EHR Real World Data to establish prevalence rates for rare disorders.
Estimated period prevalence (per 100,000) | Orphanet listed prevalence (per 100,000) | |
---|---|---|
HED | 2.99 | 6.70 |
EDAID1 | 0.23 | 0.20 |
Clouston syndrome | 0.15 | 1.00 |
EEC | 0.61 | 1.11 |
AEC | 0.36 | Unknown |
Focal dermal hypoplasia | 0.10 | Unknown |
IP | 0.88 | 1.20 |
4.1 Strengths and limitations
In this study, we sought to provide insight into the prevalence of several of the more common though rare EDs. The use of a large multicenter database of electronic health records enabling the examination of over 64 million individual health records is one of the primary strengths of this study. Our study is the first large study to examine prevalence rates for AEC and focal dermal hypoplasia. We believe our clinical definitions for separate EDs can provide the framework to identify EDs in other large databases where ICD-9 or ICD-10 disease-specific codes do not exist.
The large multicenter database used was chosen since it allowed for the study of a large number of individuals, however, there are limitations to what can be queried and how well patient information is captured within electronic health records. In addition, patients frequently are cared for across multiple healthcare systems utilizing different EHRs and therefore not all features of their condition may be fully documented within one EHR system. ICD codes were chosen to create our clinical definitions since they are a standard used across healthcare systems, readily found within such databases, and are easily queried. Our clinical definitions were created with the consideration to optimize the identification of the ED of interest using salient clinical features which we felt would be likely captured as ICD 9 and/or ICD 10 codes while also taking into the account these mentioned limitations.
Though care was taken in the creation of each clinical definition as discussed, one main limitation was that the EHR RWD database was deidentified. Therefore, validation through individual chart reviews and/or review of genetic results of our clinical definitions was not possible. However, our estimate for HED closely aligned with the prevalence reported in validated patients found by Herlin et al.
Another limitation of the deidentification process was that individuals' date of birth were randomly offset. This along with the database being health system-based rather than individual-based made accurate determination of age at diagnosis not possible. Age at the time of death was also not able to be captured since the database would only capture this information if an individual was hospitalized at the time of their death.
For IP, there is only a specific ICD-10 code that was used in the study given its unique and well-described clinical presentation which was difficult to capture through use of clinical features. A specific ICD-9 code does not exist and within our database, we were unable to separate patients with only ICD-10 codes. Therefore, our period prevalence is likely an underestimate of the true prevalence of IP.
4.2 Future direction
Though validation of our clinical definitions was not possible, this study provides an important framework for future studies. Several authors of this study continue to work with similar advanced databases which should in the future allow for more detailed exploration. Such exploration will likely include the ability to look at specific unique individuals' clinical data including genetic testing and results, other labs, radiology reports, surgeries, medications, treatments, etc. This would allow for both further validation as well as optimization of clinical definitions such as those used to identify affected individuals in our study.
Defining prevalence rates for rare conditions is important. Accuracy in determining these prevalence rates would be greatly enhanced by universal genetic testing for these rare conditions along with the creation of specific individual ICD codes for each genetic condition. Until that time, approaches like that used in this study may help to bridge the gap for RDs.
AUTHOR CONTRIBUTIONS
Clayton Butcher: Conceptualization, methodology, formal analysis, writing-original draft, writing–review and editing. Becky Abbott: Conceptualization, methodology, funding acquisition, formal analysis, writing–review and editing. Dorothy Grange: Conceptualization, methodology, formal analysis, writing–review and editing. Mary Fete: Conceptualization, methodology, funding acquisition, formal analysis, writing–review, and editing. Beau Meyer: Conceptualization, methodology, formal analysis, writing–review and editing. Christine Spinka: Data curation, formal analysis. Timothy Fete: Conceptualization, methodology, formal analysis, writing-original draft, writing–review and editing.
ACKNOWLEDGMENTS
Greg Petroski, James McClay, and Tamara McMahon: Department of Biomed Informatics, University of Missouri School of Medicine, Columbia, Missouri. Mike Seda: Oracle/Cerner Tiger Institute for Health Innovation, University of Missouri School of Medicine, Columbia, Missouri. Cheryl Akridge: Oracle Life Sciences, Land O’ Lakes, Florida.
FUNDING INFORMATION
Funding for this project provided by the National Foundation for Ectodermal Dysplasias.
CONFLICT OF INTEREST STATEMENT
Clayton Butcher: Scientific Advisory Council member for the National Foundation for Ectodermal Dysplasias. Becky Abbott: Director, Treatment and Research Advocacy for the National Foundation for Ectodermal Dysplasias. Dorothy Grange: Scientific Advisory Council member for the National Foundation for Ectodermal Dysplasias. Mary Fete: Executive Director for the National Foundation for Ectodermal Dysplasias. Beau Meyer: Scientific Advisory Council member for the National Foundation for Ectodermal Dysplasias. Timothy Fete: Scientific Advisory Council member for the National Foundation for Ectodermal Dysplasias.
Open Research
DATA AVAILABILITY STATEMENT
Data is not currently within a public repository, but can be made available to one.