Utilizing ClinGen gene-disease validity and dosage sensitivity curations to inform variant classification
ClinGen Gene Curation Working Group (https://clinicalgenome.org/working-groups/gene-curation/).
ClinGen Dosage Sensitivity Working Group (https://clinicalgenome.org/working-groups/dosage-sensitivity-curation/).
Abstract
Understanding whether there is enough evidence to implicate a gene's role in a given disease, as well as the mechanisms by which variants in this gene might cause this disease, is essential to determine clinical relevance. The National Institutes of Health-funded Clinical Genome Resource (ClinGen) has developed evaluation frameworks to assess both the strength of evidence supporting a relationship between a gene and disease (gene-disease validity), and whether loss (haploinsufficiency) or gain (triplosensitivity) of individual genes or genomic regions is a mechanism for disease (dosage sensitivity). ClinGen actively applies these frameworks across multiple disease domains, and makes this information publicly available via its website (https://www.clinicalgenome.org/) for use in multiple applications, including clinical variant classification. Here, we describe how the results of these curation processes can be utilized to inform the appropriate application of pathogenicity criteria for both sequence and copy number variants, as well as to guide test development and inform genomic filtering pipelines.
1 INTRODUCTION
The ability to classify a variant (i.e., the ability to determine whether or not a particular variant is causative of disease) is inherently linked to what is known about the gene in which it is observed. Is the relationship between variation in this gene and disease valid? Which disease(s), and through which mode(s) of inheritance? What is the normal function of this gene, which domains are critical for this function, and how does disruption of function or gain of function lead to disease? Determining if an individual variant could alter protein function in a manner that causes disease becomes difficult, if not impossible, without answers to these questions.
Even with significant technological advances over the last few decades, the genomics community still does not understand the biological role of most genes nor the causal underpinnings of most genetic diseases. Determining which genes are implicated in disease (and how their altered function contributes to disease) has been a critical, ongoing effort, the importance of which has been escalated with the growing use of clinical genetic testing. Historically, the methods by which these questions were explored required large pedigrees, linkage analysis, and laborious sequencing using nonscalable methods, making progress in gene discovery slow. In the last decade, more scalable approaches have been used, but the rigor with which identified variants have been deemed causal has been highly variable with sometimes nothing more than a few inherited missense variants in another member of a gene family used to implicate a novel gene in disease (Claussnitzer et al., 2020). Any level of information could prompt the development of a sequencing assay for one of these genes, or, later, its inclusion on a clinical genetic testing assay. Each new assay would build on the next; many test offerings would combine gene lists from various laboratories or publications in an effort to have the most comprehensive test available, with little regard to how much evidence supported each individual gene. As new genome-wide assays came into wide clinical use, the genomics community also began to observe novel variants in novel genes, presenting additional challenges for clinical classification, and resulting in the newly defined “gene of uncertain significance” term (Richards et al., 2015). Over time, as new information became available (such as large general population datasets), the original evidence supporting some of these purported “disease genes” was debunked, making the classifications of variants observed in these genes potentially inaccurate or unclear (Andreasen et al., 2013; Hosseini et al., 2018; Jabbari et al., 2013; Piton et al., 2013).
The National Institutes of Health-funded Clinical Genome Resource (ClinGen) (Rehm et al., 2015) was established in 2013 in part to develop standards and frameworks to aid the community in identifying clinically relevant genes and variants. In the early 2010s when cytogenomic microarrays were first being widely used, laboratories were identifying novel copy number variants, or CNVs, on a regular basis and were unsure how to evaluate them to arrive at a clinical classification. Guidance at the time (Kearney et al., 2011) recommended that laboratories assess individual genes within the CNV and correlate with established clinical literature to determine whether loss (haploinsufficiency [HI]) or gain (triplosensitivity [TS]) of these genes was associated with human disease. However, no specific recommendations for such an evaluation were provided. The dosage sensitivity curation process ultimately adopted by ClinGen was initially developed to facilitate this process for the genomics community and promote classification consistency (Riggs et al., 2012). Later, as next-generation sequencing panels, exomes, and genomes entered the clinical sphere, ClinGen developed a semi-quantitative framework (Strande et al., 2017) by which to assess the level of evidence supporting or refuting gene-disease relationships, or gene-disease validity. Classifications generated by this framework can be used to determine which genes should be included and/or evaluated as part of these assays and in which contexts.
Without a clear understanding of a gene's role in disease, one cannot accurately reflect the pathogenicity of genomic variants. The ClinGen dosage sensitivity and gene-disease validity curation processes provide critical information about known gene-disease relationships, modes of inheritance, and disease mechanisms that can be used to inform both variant classification and clinical test pipeline development for variant filtration and annotation. Here, we will outline the methodologies implemented by ClinGen's Gene Curation Working Group (GCWG) and Dosage Sensitivity Working Group (DSWG) over the past 7 years, the current progress of curation, and how gene and dosage evaluations influence determination of variant pathogenicity.
2 METHODS
2.1 Gene-disease validity
The ClinGen GCWG is composed of biocurators, clinical molecular geneticists, clinicians, and nosologists who meet monthly to review key aspects of the ClinGen gene-disease validity curation process. The group reviews feedback from Gene Curation Expert Panels (GCEPs) and the growing gene-disease validity curations to provide needed updates to processes, scoring, and recommendations culminating in annual updates to the standard operating procedures (SOP) document.
ClinGen gene-disease validity curations are completed under the auspices of multiple independent GCEPs, each focused on specific disease areas and/or clinical domains (https://clinicalgenome.org/affiliation/gcep/). GCEPs generally prioritize for evaluation those genes that appear on clinical genetic testing panels for their disease area(s) of focus, or those genes that are reported in the literature as being involved in specific disorders being covered by the GCEP. Once genes of interest are identified, GCEPs determine the appropriate disease entity for each gene curation using the ClinGen precuration process as defined by the ClinGen Lumping and Splitting Working Group (https://www.clinicalgenome.org/site/assets/files/2099/lumping_and_splitting_guidelines_gene_curation_final-1.pdf). Through this process, GCEP members review mode of inheritance, molecular mechanism, and phenotypic variability (particularly intrafamilial variability) across the various disorders that have been proposed to be associated with a given gene in an effort to determine if multiple disorders should be “lumped” together into a single entity, or “split” and assessed separately. In general, if the disease entities are not distinguishable on a molecular level, such as having distinct modes of inheritance or molecular mechanisms (e.g., loss of function [LOF] vs. gain of function), then the disease entities will be “lumped” into a single, overarching disease entity for curation purposes.
Once a gene and disease pair have been identified to curate, evidence supporting or contradicting the purported relationship is evaluated using a semiquantitative framework that takes into consideration both genetic and experimental evidence resulting in one of seven classifications: Definitive, Strong, Moderate, Limited, No Reported Evidence, Disputed, Refuted (Strande et al., 2017). Since 2017, the GCWG has made several updates to the original scoring guidance; these are all publicly available through the SOP document on the ClinGen website (https://clinicalgenome.org/curation-activities/gene-disease-validity/training-materials/). Significant updates from the originally published scoring guidance include: reducing the number of points (and overall weight) attributable to familial segregation of a variant (SOP version 5); requiring all GCEPs to utilize the disease-identification precuration process as described above (SOP version 6); requiring descriptive evidence summaries for each curation (SOP version 8); and providing more clarity on the assessment of variants included as genetic evidence (SOP version 8). The latter represented a significant change in the genetic evidence scoring metric, and removed the scoring “cap” previously implemented for certain evidence types (e.g., missense variants). These scoring caps inadvertently made it more difficult for those diseases caused predominantly by missense variants (e.g., those caused by a gain of function mechanism) to reach Definitive without exceptional experimental evidence, even if numerous case reports with genetic evidence had accumulated in the literature over time. Removing these caps allows diseases caused by any mechanism to reach Definitive with a sufficient amount of case-level evidence. The Gene-Disease Validity SOP is updated on an annual basis and versioned; previous versions are archived but remain available on the clinicalgenome.org website for review.
All ClinGen gene-disease validity curations are made publicly available through the clinicalgenome.org website shortly after approval by the GCEPs. The final approval date and SOP version for each curation are documented. As evidence supporting or contradicting a gene-disease relationship may evolve over time, ClinGen gene-disease validity curations are periodically re-evaluated based on the level of their most current classification as outlined in the ClinGen recuration policy (https://www.clinicalgenome.org/site/assets/files/2164/clingen_standard_gene-disease_validity_recuration_procedures_v1.pdf). As a founding member, ClinGen also submits all gene-disease validity curations to the Gene Curation Coalition Database (GenCC; https://search.thegencc.org/), a global effort to harmonize gene-level resources.
2.2 Dosage sensitivity
The ClinGen DSWG (https://clinicalgenome.org/working-groups/dosage-sensitivity-curation/) is composed of clinical cytogeneticists, clinical molecular geneticists, researchers, genetic counselors, and biocurators. The group is divided into three subgroups focused on different domains for curation: genes involved in neurodevelopmental disorders, genes involved in hereditary cancer, and genomic regions, such as recurrent CNVs flanked by segmental duplications. Though the subgroups each meet and perform curations independently, all groups come together monthly to discuss topics of mutual concern such as process improvements and curation questions.
In general, genes and genomic regions are prioritized for dosage sensitivity evaluation at the discretion of the subgroups. Historically, these genes and regions were those with targeted coverage on genome-wide microarrays used as part of clinical testing. Over time, additional genes and regions have been prioritized for evaluation, including those that are of interest to the ClinGen GCEPs or variant curation expert panels, genes identified as relevant within overlapping areas of interest by other research efforts (e.g., genes identified by the Deciphering Developmental Disorders study (Wright et al., 2015) for the neurodevelopmental subgroup; cancer-predisposing genes assembled from experts for the hereditary cancer subgroup, etc.), and sets of genes that represent broad clinical interest (e.g., ACMG secondary findings genes (Kalia et al., 2017)).
Single genes and selected genomic regions are evaluated for both HI and TS and are scored on the basis of the quality of evidence supporting or refuting these mechanisms as valid causes of disease in the heterozygous or hemizygous state. As originally described in 2012 (Riggs et al., 2012), the highest score is “3,” which signifies that Sufficient Evidence is available to support HI or TS. Genes/regions with less supporting evidence receive lower scores, respectively—a score of 2 signifies “Emerging Evidence,” a score of 1 is equivalent to “Little/limited Evidence,” and a score of 0 means that No Evidence supporting dosage sensitivity has been reported. Additional possible dosage sensitivity scores are intentionally not in numerical order with the rest to signify different concepts. A score of 40 is equivalent to “Dosage Sensitivity Unlikely,” meaning that there is evidence refuting dosage sensitivity, such as a polymorphic copy number loss or gain involving a particular gene or genomic region observed in the general population. A score of 30 indicates that a gene/region has been associated with an autosomal recessive (AR) condition. Finally, a score of −1 is programmatically applied within the system to annotate genes that will not be evaluated, such as pseudogenes.
As with the gene-disease validity process described above, significant updates to the dosage sensitivity evaluation process have been implemented since it was first described in 2012. These changes are discussed in detail in the dosage sensitivity SOP document available on the ClinGen website (https://www.clinicalgenome.org/site/assets/files/6428/dosage_sop-scoring-1.pdf). The most significant update to the dosage sensitivity evaluation process for single genes occurred in February 2019, when the group began utilizing a more stringent scoring process to align more closely with the newly updated ACMG/ClinGen constitutional CNV evaluation standards (Riggs et al., 2020), specifically Section 4, which describes evaluation of literature evidence. In addition to evaluating new genes for dosage sensitivity, ClinGen has been systematically re-evaluating all single genes assessed before February 2019 utilizing this updated scoring system. A separate evaluation procedure for recurrent CNV regions is currently under development; many of these regions are notoriously difficult to assess clinically, as they are often associated with milder phenotypes with reduced penetrance and variable expressivity. The recurrent region evaluation procedure will consider clinical reports from the literature, gene-level dosage sensitivity, gene constraint metrics, protein-coding gene count, phenotypic specificity and variability, segregation and inheritance patterns, general population frequency and subclinical phenotype data, and case-control data.
ClinGen dosage sensitivity evaluations have been publicly available through https://dosage.clinicalgenome.org/ for many years. Recently, this information has also become available through the ClinGen website, https://clinicalgenome.org/; here, users can also access relevant gene-disease validity and clinical actionability curation for a given gene. Both sites are updated with new curations as they are finalized. Users can search by gene symbol or genomic coordinates to identify genes/regions of interest that have been scored by the dosage-sensitivity working group. An FTP site (https://search.clinicalgenome.org/kb/gene-dosage/ftp), updated daily, provides users with files that can be uploaded into their browser of choice for analysis.
3 RESULTS
As of April 2021, 1261 gene-disease pairs have been assessed for clinical validity across 36 GCEPs (range: 1–177 gene-disease evaluations per GCEP). These curations represent 1039 unique genes, as some genes have more than one gene-disease validity evaluation. On average, approximately 40 new or updated gene-disease validity classifications are published to the ClinGen website per month across all GCEPs. As illustrated in Figure 1, while over half (58.3%) of the currently available curations are Definitive (expected given the focus on genes included in clinical genetic testing), a growing proportion (28.6%) have been classified as Limited, No Known Disease Relationship, Disputed, or Refuted—genes that are likely inappropriate for inclusion on disease-targeted testing panels.

In terms of dosage sensitivity, 1461 single genes and 73 genomic regions have been evaluated across the three dosage subgroups. On average, approximately 20 new or updated dosage classifications are published to the ClinGen website per month. Again, given the original focus on genes/regions targeted on clinical microarray or otherwise considered relevant for clinical genetic testing, many of these genes (24.5%) have sufficient evidence for HI (HI score 3) (Figure 2a). Conversely, few genes/regions (1.5%) have sufficient evidence for TS (TS score 3) (Figure 2b); this is not surprising, given that to implicate TS, a gene duplication of only the gene of interest, without other nearby genes, must be observed, which is a rare event.

Although gene-disease validity assessments and single gene dosage sensitivity assessments are both evaluating gene-level characteristics, there are important distinctions between the two. Gene-disease validity is evaluating the evidence that pathogenic variants in a particular gene causes disease; any mechanism by which the gene causes disease (LOF, gain of function, dominant negative, etc.) may be evaluated through this curation process. Dosage sensitivity is evaluating evidence that specific mechanisms (HI and/or TS only) cause disease. As shown in Figure 3, a total of 1261 genes have been curated by gene-disease validity and a total of 1463 genes were curated by dosage sensitivity; 512 genes have been evaluated by both groups. In many cases, the results between gene-disease validity assessments and dosage sensitivity assessments are concordant; a gene causing disease by a LOF/HI mechanism may have a gene-disease validity classification of Definitive/Strong, and a dosage sensitivity classification of Sufficient (3). As of April 2021, 178 out of 188 (95%) genes with HI scores of 3 were also found to have a gene-disease validity evaluation of Definitive or Strong. Of the ten genes where the HI score was 3 but the gene-disease validity was less than Definitive or Strong, six were curated for different disease terms, and four were evaluated on incongruent evaluation timelines (new evidence may have been captured during the gene-disease validity evaluation that was not available at the time of the dosage evaluation, or vice versa). When the latter occurs, gene-disease validity and/or dosage sensitivity curations may be revisited and updated with the newer information.

However, if a gene is known to cause a disease in an AR manner, or is caused by another mechanism, such as gain of function or dominant negative, the results of these two curation activities may not appear to align; the gene-disease validity classification may be Definitive, but the dosage sensitivity score is something other than Sufficient (3). For example, as of April 2021, 170 genes classified as having definitive gene-disease validity have been classified as AR (HI score 30) by dosage sensitivity. The dosage sensitivity score does not negate the gene-disease validity assessment, and indeed the mechanism of disease may be biallelic LOF. This difference reflects the original focus of the dosage sensitivity group on providing information to aid cytogenomic microarray result interpretation, namely those genes/regions that cause disease through hetero-/hemizygous loss or gain. Additionally, there are a total of 70 genes with a Definitive gene-disease validity classification and a dosage score implying less than Sufficient evidence for HI (HI score 3): 38 genes with a score of “No Evidence” (HI score 0), 14 genes with a score of “Little Evidence” (HI score 1), 16 genes with a score of “Emerging Evidence” (HI score 2), and 2 genes with a score of “Dosage Sensitivity Unlikely” (DS score = 40). Genes with Definitive gene-disease validity and No Evidence for dosage sensitivity often cause disease by mechanisms other than HI or TS. For example, ALK was classified as having a Definitive relationship to autosomal dominant neuroblastoma by the ClinGen Hereditary Cancer GCEP in 2019; the group noted that the disease mechanism was gain of function. The same gene was evaluated by the Hereditary Cancer Dosage Sensitivity subgroup later in 2019 and assessed as having No Evidence for either HI or TS; the group referenced the known gain of function disease mechanism, but concluded that there was no evidence to suggest that HI, monoallelic LOF, or TS resulted in neuroblastoma or any other disorder at the time of evaluation. Possible explanations for Definitive gene-disease validity and some (but not sufficient) evidence for dosage sensitivity include unclear disease mechanism (disease is caused by both missense and putative LOF variants, but there are not enough LOF variants to reach dosage sensitivity scoring thresholds) and incongruent evaluation timelines, as described above.
4 POINTS TO CONSIDER WHEN UTILIZING GENE-LEVEL INFORMATION FOR VARIANT CLASSIFICATION
Gene-level characteristics, such as gene-disease validity and dosage sensitivity, directly impact variant classification and reporting. Understanding which diseases are associated with a given gene and the mechanisms through which pathogenic variants are essential for properly aggregating evidence, correctly applying variant classification guidance, and writing accurate clinical reports. The following are general recommendations for incorporating gene-level information into clinical variant assessment workflows.
4.1 Defining the disease entity
A critical component of the variant classification process is defining the disease entity for which the claim of pathogenicity is being made. Asserting that a particular disease can be caused by disruptive variation within a particular gene feeds into the process of aggregating evidence and assigning pathogenicity to variants within that gene. Historically, variants were classified as pathogenic or not, and little attention was given to the associated disorder. However, it is now widely recognized that genes are often associated with multiple conditions (Chong et al., 2015) and defining the exact nature of the condition for which pathogenicity is assigned is important for diagnosis, as well as prognosis and management. Statements about the pathogenicity of the variant should always be made in the context of a disease entity (e.g., Variant X is not just “Pathogenic,” it is Pathogenic for Y disorder) and a mode of inheritance; such entities should be chosen from among valid disease relationships for that gene.
When encountering genes that have relationships to multiple conditions, it can be challenging to determine which disease entity to utilize, particularly when it is unclear whether the various disease entities are truly distinct, or represent different ends of the same clinical spectrum. As described above, the ClinGen Lumping and Splitting working group has put forth guidelines to determine when disease entities should be considered in aggregate, or “lumped,” and when they should be considered separately, or “split” (https://clinicalgenome.org/site/assets/files/2099/lumping_and_splitting_guidelines_gene_curation_final-1.pdf). Considering the phenotypic severity, mode of inheritance, and disease mechanism can help laboratories determine how to aggregate or separate evidence in support of particular variants across multiple conditions, as well as to determine the best way to report such variants. In Table 1, we outline a strategy for classifying variants in genes associated with multiple disorders.
Genes associated with… | Evidence aggregation | Classification | Example |
---|---|---|---|
…a single condition but the severity of disease varies based on inheritance and gene dosage | These conditions are considered semi-dominant. Evidence may be aggregated across biallelic and monoallelic observations | Classify variants for the single condition in a semi-dominant manner | LDLR and familial hypercholesterolemia, in which biallelic pathogenic variants result in more severe disease than single heterozygous pathogenic variants |
…two distinct conditions with different patterns of inheritance, mutational mechanism consistent | Since the mechanism of pathogenicity is consistent across the conditions, then evidence can be aggregated across the conditions (taking care not to duplicate any evidence or applied codes) | The variant should receive the same classification for each disorder. To ensure that the role of the variant in each condition is properly recognized when submitting such information to databases such as ClinVar, it is preferred that the variant be submitted for each condition, using the same evidence summary to support each claim. If this is not possible, submit the variant classification for the more well-established condition, but note the relationship to the other condition in the evidence summary. One evidence summary can be generated to describe how pathogenic variants cause both conditions |
ATM, associated with both AD breast cancer and AR ataxia telangiectasia |
…a single mutational mechanism of pathogenicity, but for which the phenotype varies along a spectrum | In general, evidence can be aggregated across observations, but rules for counting cases should be based on the frequency of phenotypic observation (e.g., more weight given to syndromic presentations vs. isolated feature presentations) | Classify the variant for a disease entity that encompasses the spectrum of disease. Specify the potential for variable expressivity in the evidence summary | FBN1 and AD Marfan syndrome; pathogenic variants can be found in cases with the full syndrome and individual phenotypic features, such as aortic dissection |
…more than one condition, mutational mechanism distinct, conditions mutually exclusive | Evidence supporting each condition should be considered separately | If a variant is pathogenic for one condition, it cannot, by definition, also be pathogenic for the other. Classify variants ONLY for the condition of pathogenicity. Variants should only be classified as LB or B if the classification is relevant for all conditions. If no evidence exists for or against all conditions, classification of VUS can be made for both conditions OR for a more general disease term. If some evidence exists for one of the conditions, classify as VUS for that condition and explain in the evidence summary | RET and AD Hirschsprung disease (LOF), AD multiple endocrine neoplasia type 2A and type 2B (GOF) |
…more than one condition, conditions not mutually exclusive | Evidence cannot be aggregated across conditions | A claim of pathogenicity for one condition does not rule out a role for the variant in the other condition(s). Classifications should be made separately for each condition; each should have its own evidence summary | RYR1 and AD malignant hyperthermia, AD/AR forms of myopathy |
…more than one condition, but the mechanisms of pathogenicity (and whether the conditions are distinct) are unclear | Use clinical judgement. Aggregating evidence should only be done if the phenotypes are close and the mechanism of disease appears similar (e.g., predicted LOF variants) | Use clinical judgement | ACTG1 and AD Baraitser-Winter syndrome 2 and AD nonsyndromic hearing loss |
4.2 Applying variant classification criteria
Knowing whether or not LOF is a known mechanism for a particular gene-disease relationship is critical for evaluating the pathogenicity of variants within or including that gene. When assessing CNVs, deletions completely including definitive HI genes can be classified as pathogenic without the need for additional evidence; intragenic deletions involving such genes may also be classified as pathogenic if they are predicted to result in complete loss of the protein (Riggs et al., 2020). When assessing sequence variants, the PVS1 criterion (Richards et al., 2015) constitutes a “very strong” piece of evidence within the overall pathogenicity evaluation if a null variant is identified in a gene that causes disease by an LOF mechanism. In line with CNV evaluation guidelines, recommendations by the ClinGen Sequence Variant Interpretation (SVI) working group also suggest that a classification of pathogenic is warranted for whole gene deletions of known HI genes (Abou Tayoun et al., 2018). The ClinGen SVI recommends that the PVS1 criterion can be evaluated (at differing strength levels) if gene-disease relationships are considered Moderate, Strong, or Definitive per clinical validity standards, and at least two LOF variants have been observed across more than one exon, and/or null mouse models recapitulate the phenotype.
The ClinGen HI score can also be used to determine whether it is appropriate to apply the PVS1 criterion when evaluating a sequence variant. As described above, the HI score is based on genotype and phenotype evidence from affected probands, and is not an algorithm-based predictor of HI, such as the DECIPHER HI Index (Huang et al., 2010), or a measure of intolerance based on observations in general population individuals, such as the gnomAD pLI score (Lek et al., 2016). In general, per the more stringent scoring guidelines that have been in place since 2019, at least three LOF variants in affected individuals must be documented for a gene to be classified as having sufficient evidence for HI for those genes with highly specific and relatively unique phenotypes. In cases with nonspecific phenotypes (most cases), more individual evidence is required (6 or more probands with LOF variants). This surpasses the minimum threshold of evidence set forth by the ClinGen SVI working group (Abou Tayoun et al., 2018). Null variants in genes with a ClinGen HI score of 3 may be evaluated using the PVS1 criterion; the final strength at which PVS1 should be applied should be determined by the particular variant type (e.g., nonsense, splice site, etc.) and additional characteristics of the variant (e.g., location within the gene, predicted impact, etc.).
4.3 Assigning final classifications
Understanding how much evidence supports a gene-disease relationship can directly impact classification of variants within that gene. Since many of the criteria used to evaluate variant pathogenicity rely on a basic understanding of the gene (whether or not LOF is a mechanism, hot spots, well-characterized functional domains, valid functional assays, etc.), it follows that variants in genes without well-understood disease-gene relationships (Limited, Disputed, Refuted) should not be classified higher than variants of uncertain significance. While gene-disease pairs with classifications of Moderate do have at least a reasonable level of positive supporting evidence, the evidence is typically still emerging, and more information is needed to better understand the gene-disease relationship. Because of this, ACMG has recommended that variants in Moderate genes should not typically be classified higher than Likely Pathogenic (Bean et al., 2020), and that variants in Limited genes not be classified higher than variants of uncertain significance (VUS).
5 USING CURATION RESULTS FOR PANEL BUILDING
The ability of a clinical genetic test—whether it is a gene panel, an exome, or a genome—to identify a genetic etiology for the phenotype of the individual being tested relies on its ability to assess the appropriate genes. Determining which genes are appropriate to interrogate related to a given set of phenotypic features is a key consideration for both test design and result interpretation. Although reviewing the literature for evidence of novel gene-disease relationships is a necessary practice when designing assays, clinical testing laboratories have historically taken differing approaches to assessing the quality and quantity of evidence for gene-disease relationships when determining inclusion on a gene panel. Figure 4 shows just one example of this—multigene panel offerings (as registered in the Genetic Testing Registry (Rubinstein et al., 2013) in April 2021) for a single indication (hypertrophic cardiomyopathy) range from 5 genes to 104 genes.

The ClinGen gene-disease validity framework represented one of the first efforts to quantify the level of evidence available for supporting or refuting gene-disease relationships. Since its publication in 2017, ClinGen has classified over 300 gene-disease pairs as Limited, Disputed, or Refuted—in other words, as potentially not having enough evidence to warrant inclusion on a diagnostic testing panel, even if one or more publications has reported the gene-disease relationship. It is important to continuously evaluate gene-disease validity as new evidence can emerge at any time. As of April 2021, at least 116 of the 328 gene-disease pairs (287 unique genes) classified as either Limited, Disputed, or Refuted have disease associations cataloged in OMIM, and 254 of these genes still appear on at least one testing panel for related indications.
In the clinical context, it is very difficult to interpret variants in genes without well-established roles in disease. The 2015 ACMG/AMP sequence variant interpretation guidelines (Richards et al., 2015) state that they were developed to classify variants in the context of “a gene with a definitive role in a Mendelian disorder.” Many of the pieces of evidence considered in that framework depend upon having an understanding about the properties of the gene itself and the mechanism of disease—if these properties are unknown, these rules cannot be accurately applied. When variants are detected in these “genes of uncertain significance” (GUS) they are often (correctly) interpreted as VUS, which can be frustrating to clinicians and patients. In the worst-case scenario, they are incorrectly attributed to be disease-causing when in fact they are not, and medical management decisions get made based on inaccurate or incomplete information (Mahon, 2015). ACMG discourages the broad inclusion of GUS on diagnostic testing panels, while recognizing that there are some scenarios where they might be useful, such as in the case of newly described gene-disease relationships with Limited but emerging evidence (Bean et al., 2020). Preliminary evidence from ClinGen suggests that genes that score on the higher end (5–6 points) of the Limited classification range are more likely to transition to a Moderate or Strong/Definitive classification over time, whereas Limited gene-disease relationships receiving 2 or fewer points often moved to a Disputed classification (McGlaughon et al., 2018). This distinction within the Limited category may prove useful when trying to determine the potential utility of adding a newly described GUS to a clinical testing pipeline.
Setting different thresholds of evidence supporting gene-disease relationships may also be helpful when designing assays for different contexts. Laboratories should consider only the most well-supported, well-understood genes (Strong/Definitive) for use in predictive testing in asymptomatic individuals. The level of acceptable evidence for affected individuals might be broader in genome-wide assays, such as exomes or genomes (Moderate/Strong/Definitive, as recommended by ACMG) (Bean et al., 2020). In the research setting, understanding the standardized evidence thresholds utilized by the clinical community may help identify which types of evidence are necessary to move those genes that appear interesting in the research “discovery” phase into those that are truly appropriate for clinical care.
6 SUMMARY
The ClinGen gene-disease validity and dosage sensitivity curation processes are transparent, evidence-based methods for evaluating which genes are implicated in which diseases and the mechanism(s) by which variant pathogenicity occurs. The analytic procedures are constantly evolving to incorporate new evidence types and community feedback, and new results are made available to the public on a continuous basis. The results of these curation processes can be utilized to inform the appropriate application of pathogenicity criteria for both sequence and copy number variants, as well as to guide test development and inform genomic filtering pipelines. Overall, these evidence-based curation efforts and data sharing will help to standardize interpretation and reporting carried out as part of clinical genetic testing, ultimately leading to higher quality care for patients.
ACKNOWLEDGEMENTS
ClinGen is supported by the National Human Genome Research Institute (NHGRI) through the following three grants: U41HG006834, U41HG009649, U41HG009650.
CONFLICT OF INTERESTS
The authors declare that there are no conflict of interests.
Open Research
DATA AVAILABILITY STATEMENT
All ClinGen curation results, including the gene-disease validity and dosage sensitivity results discussed in this manuscript, are publicly available at www.clinicalgenome.org.