Human Mutation special issue on “Variant Effect Prediction"
Abstract
The journal Human Mutation has as its principal focus variants in the human genome, covering the entire spectrum from methods used to detect variants, to ways of answering the ultimate question: “What are the consequences of carrying a variant for the health of the individual?” This comprehensive collection of articles provides an excellent perspective of the advancements in variant effect prediction in recent years, as well as some caveats and cautions in this developing field. We believe that this resource will help to drive further evolution of the variant effect prediction process toward more robust understanding of genotype-phenotype relationships through reliable variant classification.
The journal Human Mutation focuses on variants in the human genome, and covers the entire spectrum from methods to detect variants to ways to answer the ultimate question: “What are the consequences of carrying a variant for the health of the individual?” Sequencing was always the golden standard to detect and confirm variants, yet it was a rather expensive technology. Therefore, many methods were designed to zoom in on THE most probably affected gene and to pre-screen DNA fragments for the fragment containing variants, thereby reducing sequencing cost. Nowadays, since sequencing cost have dropped dramatically, the method of choice to detect variants has been entirely shifted to sequencing. We do not zoom in on the gene most likely affected, we simply sequence the entire genome (whole genome sequencing or WGS) or all protein coding sequences (whole exome sequencing or WES). As a consequence, we do not detect one or a few variants which need to be evaluated regarding their consequences for the health of the individual, but we now detect tens of thousands (in WES) or millions (in WGS) of variants. Work in the laboratory therefore shifted from detecting variants to evaluating (classifying) variants, i.e., “Variant Effect Prediction”.
This special issue focusses entirely on the very process of Variant Effect Prediction. The topics selected are based on the Variant Effect Prediction Training Course (VEPTC) organized by Global Variome and the Human Genome Organisation (HUGO), see http://VEPTC.variome.org/. Besides giving some background on the now prevailing technology used, next-generating sequencing (NGS), with its current limitations and pitfalls, the articles in this issue cover all aspects of the process to evaluate the possible consequences of the variant and draw the ultimate conclusion, i.e. clinically classify the variant: what are the consequences for the health of the individual, disease-associated (“pathogenic”) or not (“benign”). The format of the papers is a bit different from the standard in Human Mutation. The direction is more educational, sometimes in the format of a tutorial, and entirely focused on the process of variant effect prediction. For example, a paper does not describe how a genome browser or database was build and what information it contains, but it describes how it can be used to gather the relevant information to answer the most important question in clinical diagnostics: is the variant disease-associated or not.
The first article is from Victor Guryev (“Variant calling: considerations, practices, and developments” https://doi.org/10.1002/humu.24311) and focusses on the computational steps getting from the individual sequence reads, to the mapping to a reference genome and the ultimate step of variant calling. By addressing the critical steps and pointing out the problems, the paper gives the reader a concise overview about the complexity of NGS data creation and processing and the delicate process of variant calling, where still several different computational approaches are required to reliably call the different variant types (from single nucleotide variants, to structural variants (incl. copy number variants to CNVs).
The next two papers, from Sarah Hunt (“Annotating and prioritizing genomic variants using the Ensembl Variant Effect Predictor -- A tutorial” https://doi.org/10.1002/humu.24298) and Anna Benet-Pages (“Variant Interpretation: UCSC Genome Browser Recommended Track Sets” https://doi.org/10.1002/humu.24335) explain how the Ensembl and UCSC genome browsers can be used to interpret variants for diagnostic purposes. They describe what settings are best in order to display “all” relevant information on your screen, both from previous observations of the variant and from computational predictions of its consequences, to assist you to draw the right conclusion -- in other words: “to see the Ensembl and UCSC genome browsers through an ACMG-AMP guidelines focused looking glass”.
Anne O'Donnell-Luria (“Variant interpretation using population databases: lessons from gnomAD” https://doi.org/10.1002/humu.24309) describes how the gnomAD browser has developed from a pure “variant database” displaying variant frequencies in humans into a versatile tool which answers all kinds of variant interpretation related questions, for example frequency in numerous sub-populations; “filtering allele frequency” used for variant prioritization; expression levels of different gene isoforms in different tissues; scores for the probability of missense and loss-of-function constraint; and integration of classified ClinVar variants.
In another article, Erin Riggs (“Utilizing ClinGen gene-disease validity and dosage sensitivity curations to inform variant classification” https://doi.org/10.1002/humu.24291) describes ClinGen resources which can be used to determine what evidence has been published so far to unequivocally link variants in a gene with a specific phenotype, including the known consequences of deletions or duplications of the gene. These resources represent a tremendously useful resource once we leave our “comfort zone” of targeted approaches and move to WGS or WES.
Christian Gilissen contributed a paper (“Clinical exome sequencing -- Mistakes and caveats” https://doi.org/10.1002/humu.24360) describing what can go wrong when you start to apply a new technology. While it is obvious that you will make mistakes when you explore new methods, it is a valuable lesson when scientists share the problems they encountered and the “obvious” mistakes they initially made. We are grateful the authors were willing to share their experiences. It is so instructive to learn from mistakes which, when solved, make you wonder, “How could we have not thought of this beforehand?” In their contribution, the authors present a wide overview of sequencing caveats that cover the complete range of a diagnostic NGS process, from alignment to clinical data interpretation. As such, the paper presents a valuable introductory read for any novice starting their work in interpretation of WES and WGS data.
While most people and prediction programs blindly go from DNA to protein, the team of Holger Prokisch (“Guidelines for clinical interpretation of variant pathogenicity using RNA phenotypes” https://doi.org/10.1002/humu.24416) discusses the molecule in between: RNA. In their article, the authors present a comprehensive overview of outcomes detectable with RNA sequencing studies and propose a systematic approach for assessment of variant consequences on the RNA level. Importantly, the authors propose a standardized framework for assessment of RNA sequencing outcomes in the context of the ACMG variant classification system for clinical sequencing.
WGS or WES clinical assays will regularly confront the interpreter with too many variants left after the standard filter sets have been applied (e.g. high-frequency and reported benign variants filtered), imposing a dire need of strategies to further prioritize variants. What can be done in such cases? Several tools have been developed to improve the variant prioritization in clinics, which are based on Human Phenotype Ontology (HPO) nomenclature and which rank the variants based the overlap of the phenotype profile of the patient's clinical presentation and the known gene-phenotype associations. In their contribution (“Phenotype-driven approaches to enhance variant prioritization and diagnosis of rare disease” https://doi.org/10.1002/humu.24380), Peter Robinson and his team systematically assessed the available tools and measured their performance against a set of thousands of diagnosed cases in the 100,000 Genomes Project. Through this, they offer guidance on selecting the most performant tool in finding “the needle in the haystack” variant, where a proper phenotypic description of the patient using the HPO together with other data (e.g. cross-species phenotypes) can be used to inform an algorithm which will rank variants according to their probability of causing the disease.
For rare diseases, new links between phenotypes (disease) and genes are still discovered regularly, though in most cases the evidence is limited in order to establish the robust (=definitive) gene-disease association needed in clinical genetics. Gijs Santen (“Gene-disease relationship evidence: A clinical perspective focusing on ultra-rare diseases” https://doi.org/10.1002/humu.24367) suggests a decision tree workflow and gives guidance to what is necessary before a new link between a disease and a gene can be claimed.
In 2015, Richards et al. published a framework to classify variants in one of 5 distinct classes, based on different evidence types which are qualitatively and quantitatively weighted. These so-called ACMG-AMP guidelines have become the standard in variant classification and are used by most diagnostic laboratories. Originally formulated to classify DNA variants in all genes with a definitive gene-disease association, the authors were well aware of the fact that such broad or general rules have to be modified by experts (so-called Variant Curation Expert Panels, VCEPs) in order to reflect the different nature of genes/diseases. Petros Kountouris (“Adapting the ACMG/AMP variant classification framework: A perspective from the ClinGen Hemoglobinopathy Variant Curation Expert Panel” https://doi.org/10.1002/humu.24280) and Dianalee McKnight (“Recommendations by the ClinGen Rett/Angelman-like expert panel for gene-specific variant interpretation methods” https://doi.org/10.1002/humu.24302) give examples of how the ACMG-AMP guidelines were adapted to better classify variants for hemoglobinopathies and Rett/Angelman-like syndrome, respectively. Finally, Steven Harrison (“Harmonizing variant classification for return of results in the All of Us Research Program” https://doi.org/10.1002/humu.24317) describes how, after sharing the data, differences in variant classification were detected and what was done to resolve the discrepancies.
This comprehensive collection of articles presented by Human Mutation provides an excellent perspective of the advancements in variant effect prediction in recent years, as well as some caveats and cautions in this developing field. We believe that this resource will help to drive further evolution of the variant effect prediction process toward more robust understanding of genotype-phenotype relationships through reliable variant classification. The guest editors are indebted to the authors, peer-reviewers, and editorial and production staff who made it possible.