The Critical Assessment of Genome Interpretation (CAGI): Human Mutation: Vol 38, No 9

Actions

Tools

Follow journal

COVER IMAGE

Free Access

Cover Image, Volume 38, Issue 9

Qifang Xu, Qingling Tang, Panagiotis Katsonis, Olivier Lichtarge, David Jones, Samuele Bovo, Giulia Babbi, Pier L. Martelli, Rita Casadio, Gyu Rie Lee, Chaok Seok, Aron W. Fenton, Roland L. Dunbrack Jr,

Page: i
First Published: 17 August 2017

On the cover: This cover image, by Qifang Xu et al., is based on the Special Article Benchmarking predictions of allostery in liver pyruvate kinase in CAGI4, Pages 1123–1131. DOI: 10.1002/humu.23222.

ISSUE INFORMATION

Free Access

Issue Information

Pages: 1033-1035
First Published: 17 August 2017

IN THIS ISSUE

Free Access

Novel ID gene CSNK2B: The crossover from molecular diagnosis to research continues

Leanne M. Dibbens,

Page: 1037
First Published: 17 August 2017

OVERVIEW

Free Access

Reports from CAGI: The Critical Assessment of Genome Interpretation

Roger A Hoskins, Susanna Repo, Daniel Barsky, Gaia Andreoletti, John Moult, Steven E. Brenner,

Pages: 1039-1041
First Published: 12 July 2017

SPECIAL ARTICLE

Free Access

Performance of in silico tools for the evaluation of p16INK4a (CDKN2A) variants in CAGI

Pages: 1042-1050
First Published: 25 April 2017

The Critical Assessment of Genome Interpretation (CAGI) experiment is aimed to define the state of art of genotype-phenotype interpretation. Here, we present the assessment of the CAGI p16INK4a challenge. Participants were asked to predict the effect on cellular proliferation of ten variants for the p16INK4a tumor suppressor, a kinase inhibitor coded by the CDKN2A gene. Twenty-two pathogenicity predictors were validated in terms of accuracy and reliability. Different assessment measures were combined in an overall ranking to provide robust results.

SPECIAL ARTICLE

Free Access

Assessing predictions of fitness effects of missense mutations in SUMO-conjugating enzyme UBE2I

Jing Zhang, Lisa N. Kinch, Qian Cong, Jochen Weile, Song Sun, Atina G. Cote, Frederick P. Roth, Nick V. Grishin,

Pages: 1051-1063
First Published: 17 August 2017

Assessing predictions of fitness effects of missense mutations in SUMO-conjugating enzyme UBE2I

Competitive Growth Scores of UBE2I Mutations. A) A histogram of scores. Bars are colored in gradient from red (deleterious mutations) through white (wild type) to blue (advantageous mutations). The same color gradient is applied to corresponding residues in surface representations of the UBE2I structure for B) deleterious and C) advantageous mutations. The consensus substrate tetrapeptide PsiKxE (magenta) and the C-terminal SUMO peptide (green) are shown and select residues near the active site are labeled.

SPECIAL ARTICLE

Free Access

Blind prediction of deleterious amino acid variations with SNPs&GO

Emidio Capriotti, Pier Luigi Martelli, Piero Fariselli, Rita Casadio,

Pages: 1064-1071
First Published: 19 January 2017

Blind prediction of deleterious amino acid variations with SNPs&GO

(A) Representation of the SNPs&GO Support Vector Machine-based algorithm. (B) Main input features in SNPs&GO. (C) Performance of SNPs&GO on 38,460 single amino acid variants (Capriotti et al., BMC Genomics, 2013). Evaluation measures (Q2, TPR, PPV, TNR, NPV, MCC and AUC) are defined in Supplementary Materials.

SPECIAL ARTICLE

Free Access

Objective assessment of the evolutionary action equation for the fitness effect of missense mutations across CAGI-blinded contests

Panagiotis Katsonis, Olivier Lichtarge,

Pages: 1072-1084
First Published: 23 May 2017

Objective assessment of the evolutionary action equation for the fitness effect of missense mutations across CAGI-blinded contests

The Evolutionary Action equation is a first order perturbation form of the genotype-phenotype relationship, which predicts the action of genetic variants on fitness. This analytical approach is general and its terms can be numerically estimated from sequence data. Thus, in contrast to the current state-of-the-art methods, it requires no training over large datasets and numerous features. The objective assessments of the Critical Assessment of Genome Interpretation experiments consistently ranked Evolutionary Action one of the most accurate predictors.

SPECIAL ARTICLE

Free Access

PON-P and PON-P2 predictor performance in CAGI challenges: Lessons learned

Abhishek Niroula, Mauno Vihinen,

Pages: 1085-1091
First Published: 22 February 2017

PON-P and PON-P2 predictor performance in CAGI challenges: Lessons learned

We have participated in various challenges in the four CAGI experiments. We used predictions from PON-P and PON-P2 and additional details to develop models for addressing the challenges. Here, we summarized our approaches and their performances in challenge assessments. We discussed the types of lessons we earned from CAGI, impacts of CAGI on our research and development of predictor with improved performance. We also make some suggestions for future experiments.

SPECIAL ARTICLE

Free Access

Missense variant pathogenicity predictors generalize well across a range of function-specific prediction challenges

Vikas Pejaver, Sean D. Mooney, Predrag Radivojac,

Pages: 1092-1108
First Published: 16 May 2017

Missense variant pathogenicity predictors generalize well across a range of function-specific prediction challenges

By participating in the Critical Assessment of Genome Interpretation, we demonstrate the direct applicability of missense variant pathogenicity predictors in the task of the prediction of real-valued impact on biochemical, molecular and cellular function, as measured in in vitro experiments. Our work suggests that when a large number of structural and functional features are integrated into a learning algorithm that outputs smooth score distributions, pathogenicity predictors can model the biology shared by both of these prediction tasks.

SPECIAL ARTICLE

Free Access

Ensemble variant interpretation methods to predict enzyme activity and assign pathogenicity in the CAGI4 NAGLU (Human N-acetyl-glucosaminidase) and UBE2I (Human SUMO-ligase) challenges

Yizhou Yin, Kunal Kundu, Lipika R. Pal, John Moult,

Pages: 1109-1122
First Published: 24 May 2017

Ensemble variant interpretation methods to predict enzyme activity and assign pathogenicity in the CAGI4 NAGLU (Human N-acetyl-glucosaminidase) and UBE2I (Human SUMO-ligase) challenges

We report results obtained using newly-developed ensemble methods to address two CAGI4 challenges requiring prediction of continuous protein activity: population rare variants in NAGLU that are representative of clinical encounters, and mutations from high throughput mutational scans on UBE2I. The methods are effective, ranking 2^nd for SUMO-ligase and 3^rd for NAGLU. The ensemble approaches also show considerable potential for estimating the reliability of pathogenic assignments, and are already reliable enough for use with a subset of mutations.

SPECIAL ARTICLE

Free Access

Benchmarking predictions of allostery in liver pyruvate kinase in CAGI4

Pages: 1123-1131
First Published: 29 March 2017

Homotetramer of human liver pyruvate kinase (PYK) with bound substrate (phosphoenol pyruvate), inhibiting allosteric effector (alanine), and activating allosteric effector (fructose-1,6-bisphosphate), compiled from multiple crystal structures of PYK.

SPECIAL ARTICLE

Free Access

Whole-protein alanine-scanning mutagenesis of allostery: A large percentage of a protein can contribute to mechanism

Qingling Tang, Aron W. Fenton,

Pages: 1132-1143
First Published: 13 April 2017

Whole-protein alanine-scanning mutagenesis of allostery: A large percentage of a protein can contribute to mechanism

Residue positions at which side chain removal alters allosteric coupling for Fru-1,6-BP activation (Q_ax-Fru-1,6-BP).

SPECIAL ARTICLE

Free Access

Exploring the limits of the usefulness of mutagenesis in studies of allosteric mechanisms

Qingling Tang, Aileen Y. Alontaga, Todd Holyoak, Aron W. Fenton,

Pages: 1144-1154
First Published: 29 April 2017

Exploring the limits of the usefulness of mutagenesis in studies of allosteric mechanisms

A simplistic allosteric mechanism model based on allosterically-relevant plasticity. This model was developed with acknowledged insufficient structural insights, but provided a framework for mutational probings of the allosteric binding site. Although we provide data that is not consistent with this model, the responses of many mutant proteins could be viewed as consistent, raising questions about how mutational probings should be used in studies of allosteric mechanisms.

SPECIAL ARTICLE

Free Access

Lessons from the CAGI-4 Hopkins clinical panel challenge

Pages: 1155-1168
First Published: 11 April 2017

The CAGI-4 Hopkins clinical panel challenge was an attempt to assess state-of-the-art methods for clinical phenotype prediction from DNA sequence. Participants were provided with exonic sequences of 83 genes for 106 patients from the Johns Hopkins DNA Diagnostic Laboratory. Five groups participated in the challenge, predicting both the probability that each patient had each of the 14 possible classes of disease, as well as causal variants. We discuss the accuracy of the predictions and their implications for further methods development.

SPECIAL ARTICLE

Free Access

CAGI4 SickKids clinical genomes challenge: A pipeline for identifying pathogenic variants

Lipika R. Pal, Kunal Kundu, Yizhou Yin, John Moult,

Pages: 1169-1181
First Published: 16 May 2017

CAGI4 SickKids clinical genomes challenge: A pipeline for identifying pathogenic variants

The CAGI4 SickKids clinical genome challenge provided an opportunity to assess the landscape of genetic variants found in a difficult set of 25 unsolved rare disease cases. We used a three-stage pipeline consisting of careful analysis of data quality, classification of disease relevant gene-specific variants into seven categories, and prioritization of variants by compatibility with the complex phenotype of a patient.We were able to determine the genetic cause of a case of epileptic encephalopathy, a missense mutation in KCNB1.

SPECIAL ARTICLE

Free Access

Working toward precision medicine: Predicting phenotypes from exomes in the Critical Assessment of Genome Interpretation (CAGI) challenges

Roxana Daneshjou, Yanran Wang, Yana Bromberg, Samuele Bovo, Pier L Martelli, Giulia Babbi, Pietro Di Lena, Rita Casadio, Matthew Edwards, David Gifford, David T Jones, Laksshman Sundaram, Rajendra Rana Bhat, Xiaolin Li, Lipika R. Pal, Kunal Kundu, Yizhou Yin, John Moult, Yuxiang Jiang, Vikas Pejaver, Kymberleigh A. Pagel, Biao Li, Sean D. Mooney, Predrag Radivojac, Sohela Shah, Marco Carraro, Alessandra Gasparini, Emanuela Leonardi, Manuel Giollo, Carlo Ferrari, Silvio C E Tosatto, Eran Bachar, Johnathan R. Azaria, Yanay Ofran, Ron Unger, Abhishek Niroula, Mauno Vihinen, Billy Chang, Maggie H Wang, Andre Franke, Britt-Sabina Petersen, Mehdi Pirooznia, Peter Zandi, Richard McCombie, James B. Potash, Russ B. Altman, Teri E. Klein, Roger A. Hoskins, Susanna Repo, Steven E. Brenner, Alexander A. Morgan,

Pages: 1182-1192
First Published: 21 June 2017

Working toward precision medicine: Predicting phenotypes from exomes in the Critical Assessment of Genome Interpretation (CAGI) challenges

The Critical Assessment of Genome Interpretation (CAGI) is a community experiment consisting of genotype-phenotype prediction challenges; participants build models, undergo assessment, and share key findings. For CAGI 4, three challenges involved using exome sequencing data: Crohn's disease, bipolar disorder, and warfarin dosing. We discuss the range of techniques used for phenotype prediction, the methods used for assessing predictive models, and the lessons gleaned from the CAGI exomes challenges.

SPECIAL ARTICLE

Free Access

Crohn disease risk prediction—Best practices and pitfalls with exome data

Manuel Giollo, David T. Jones, Marco Carraro, Emanuela Leonardi, Carlo Ferrari, Silvio C.E. Tosatto,

Pages: 1193-1200
First Published: 13 January 2017

SPECIAL ARTICLE

Free Access

Determination of disease phenotypes and pathogenic variants from exome sequence data in the CAGI 4 gene panel challenge

Kunal Kundu, Lipika R. Pal, Yizhou Yin, John Moult,

Pages: 1201-1216
First Published: 12 May 2017

Determination of disease phenotypes and pathogenic variants from exome sequence data in the CAGI 4 gene panel challenge

We describe the design and implementation of a gene panel sequencing data analysis pipeline, VarP. The performance of the pipeline was assessed in the CAGI 4 community experiment. VarP identified the correct disease class and potentially causative variant(s) in 36/106 patients, including 10 patients where the clinical pipeline did not find any causative variants. Post analysis showed that use of three-dimensional structure could have assisted interpretation in a number of cases.

SPECIAL ARTICLE

Free Access

DeepBipolar: Identifying genomic mutations for bipolar disorder via deep learning

Sundaram Laksshman, Rajendra Rana Bhat, Vivek Viswanath, Xiaolin Li,

Pages: 1217-1224
First Published: 10 June 2017

Correction(s) for this article

DeepBipolar: Identifying genomic mutations for bipolar disorder via deep learning

We designed an end-to-end deep learning architecture (called DeepBipolar) to predict bipolar disorder based on limited genomic data. Leveraging Deep Convolutional Neural Network (DCNN), it automatically extracts features from genotype information to predict the bipolar phenotype. DeepBipolar was considered the most successful by the independent assessor in the Critical Assessment of Genome Interpretation (CAGI) bipolar disorder challenge.

SPECIAL ARTICLE

Free Access

CAGI4 Crohn's exome challenge: Marker SNP versus exome variant models for assigning risk of Crohn disease

Lipika R. Pal, Kunal Kundu, Yizhou Yin, John Moult,

Pages: 1225-1234
First Published: 16 May 2017

CAGI4 Crohn's exome challenge: Marker SNP versus exome variant models for assigning risk of Crohn disease

Understanding the basis of complex trait disease is a fundamental problem in human genetics. The three rounds of CAGI Crohn's Exome challenges provide insight into the adequacy of current disease models by requiring participants to identify which of a set of individuals has been diagnosed with the disease, given exome data. For the CAGI4 round, we developed a method that uses machine learning methods to assign disease risk, with only the imputed genotypes of GWAS marker SNPs as input.

SPECIAL ARTICLE

Free Access

Stratified polygenic risk prediction model with application to CAGI bipolar disorder sequencing data

Maggie Haitian Wang, Billy Chang, Rui Sun, Inchi Hu, Xiaoxuan Xia, William Ka Kei Wu, Ka Chun Chong, Benny Chung-Ying Zee,

Pages: 1235-1239
First Published: 17 April 2017

Stratified polygenic risk prediction model with application to CAGI bipolar disorder sequencing data

This study proposed to perform complex trait prediction using a stratified design. The genetic data are divided into strata according to genetic architectures, and feature selection is conducted within each strata through a data adaptive W-test for main effect and pairwise interactions. An ensemble classification algorithm can be applied to integrate the selected features to perform prediction. Application on CAGI data set showed that including interaction effect of common variants improved prediction accuracy.

SPECIAL ARTICLE

Free Access

Predicting gene expression in massively parallel reporter assays: A comparative study

Pages: 1240-1250
First Published: 21 February 2017

Deciphering the functionality of the non-coding genome has been the focus of many recent studies, aiming to annotate regulatory regions and understand their specific role in disease and other phenotypes. The goal of the CAGI eQTL challenge was to predict the activity of candidate genomic regions (experimentally evaluated by massively parallel reporter assays). Our meta-analysis of competing submissions highlights features and models that lead to accurate prediction and points to areas for improvement.

SPECIAL ARTICLE

Free Access

Predicting enhancer activity and variant impact using gkm-SVM

Michael A. Beer,

Pages: 1251-1258
First Published: 25 January 2017

Predicting enhancer activity and variant impact using gkm-SVM

DNA variation in regulatory regions (e.g. enhancers) has been implicated as contributing to many human diseases. We have been developing quantitative computational models of regulatory regions to predict the impact of this variation and to identify the mechanisms of enhancer modulation. In this blind test, our model predictions are in quantitative agreement with direct measurements of enhancer activity.

SPECIAL ARTICLE

Free Access

Accurate eQTL prioritization with an ensemble-based framework

Haoyang Zeng, Matthew D. Edwards, Yuchun Guo, David K. Gifford,

Pages: 1259-1265
First Published: 21 February 2017

Accurate eQTL prioritization with an ensemble-based framework

We present a novel ensemble-based computational framework, EnsembleExpr, that achieved the best performance in the Fourth Critical Assessment of Genome Interpretation expression quantitative trait locus “(eQTL)-causal SNPs” challenge for identifying eQTLs and prioritizing their gene expression effects. We envision EnsembleExpr to be a resource to help interpret noncoding regulatory variants and prioritize disease-associated mutations for downstream validation.

SPECIAL ARTICLE

Free Access

Matching phenotypes to whole genomes: Lessons learned from four iterations of the personal genome project community challenges

Pages: 1266-1276
First Published: 23 May 2017

PGP provides unrestricted access to genomes of individuals and their associated phenotypes. The CAGI PGP challenge is to predict whether an individual had a particular trait or phenotype profile based on their whole genome. Assessment results show prediction of individual traits is difficult, relying on a strong knowledge of trait frequency within general population, while matching genomes to trait profiles relies heavily upon a small number of common traits.

Sign up for email alerts

Enter your email to receive alerts when new articles and issues are published.

Email address

Please select your location and accept the terms of use.

Country or location

I consent to Wiley Online Library’s Terms of Use and processing of my personal data as disclosed in Wiley’s Privacy Policy.I consent to my personal information being transferred outside of the People’s Republic of China as laid out in Wiley’s Privacy Policy.*(Optional) I consent to special offers, newsletter & promotions from Wiley. See Privacy Policy for more details.

Tools

Submit an article

Journal Metrics

Human Mutation

Journal list menu

Tools

Follow journal

Special Issue:The Critical Assessment of Genome Interpretation (CAGI)

Cover Image, Volume 38, Issue 9

Issue Information

Novel ID gene CSNK2B: The crossover from molecular diagnosis to research continues

Reports from CAGI: The Critical Assessment of Genome Interpretation

Performance of in silico tools for the evaluation of p16INK4a (CDKN2A) variants in CAGI

Assessing predictions of fitness effects of missense mutations in SUMO-conjugating enzyme UBE2I

Blind prediction of deleterious amino acid variations with SNPs&GO

Objective assessment of the evolutionary action equation for the fitness effect of missense mutations across CAGI-blinded contests

PON-P and PON-P2 predictor performance in CAGI challenges: Lessons learned

Missense variant pathogenicity predictors generalize well across a range of function-specific prediction challenges

Ensemble variant interpretation methods to predict enzyme activity and assign pathogenicity in the CAGI4 NAGLU (Human N-acetyl-glucosaminidase) and UBE2I (Human SUMO-ligase) challenges

Benchmarking predictions of allostery in liver pyruvate kinase in CAGI4

Whole-protein alanine-scanning mutagenesis of allostery: A large percentage of a protein can contribute to mechanism

Exploring the limits of the usefulness of mutagenesis in studies of allosteric mechanisms

Lessons from the CAGI-4 Hopkins clinical panel challenge

CAGI4 SickKids clinical genomes challenge: A pipeline for identifying pathogenic variants

Working toward precision medicine: Predicting phenotypes from exomes in the Critical Assessment of Genome Interpretation (CAGI) challenges

Crohn disease risk prediction—Best practices and pitfalls with exome data

Determination of disease phenotypes and pathogenic variants from exome sequence data in the CAGI 4 gene panel challenge

DeepBipolar: Identifying genomic mutations for bipolar disorder via deep learning

Correction(s) for this article

Erratum

CAGI4 Crohn's exome challenge: Marker SNP versus exome variant models for assigning risk of Crohn disease

Stratified polygenic risk prediction model with application to CAGI bipolar disorder sequencing data

Predicting gene expression in massively parallel reporter assays: A comparative study

Predicting enhancer activity and variant impact using gkm-SVM

Accurate eQTL prioritization with an ensemble-based framework

Matching phenotypes to whole genomes: Lessons learned from four iterations of the personal genome project community challenges

Tools

More from this journal

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Human Mutation

Journal list menu

Tools

Follow journal

Special Issue:The Critical Assessment of Genome Interpretation (CAGI)

Download PDFs

Cover Image, Volume 38, Issue 9

Issue Information

Novel ID gene CSNK2B: The crossover from molecular diagnosis to research continues

Reports from CAGI: The Critical Assessment of Genome Interpretation

Performance of in silico tools for the evaluation of p16INK4a (CDKN2A) variants in CAGI

Assessing predictions of fitness effects of missense mutations in SUMO-conjugating enzyme UBE2I

Blind prediction of deleterious amino acid variations with SNPs&GO

Objective assessment of the evolutionary action equation for the fitness effect of missense mutations across CAGI-blinded contests

PON-P and PON-P2 predictor performance in CAGI challenges: Lessons learned

Missense variant pathogenicity predictors generalize well across a range of function-specific prediction challenges

Ensemble variant interpretation methods to predict enzyme activity and assign pathogenicity in the CAGI4 NAGLU (Human N-acetyl-glucosaminidase) and UBE2I (Human SUMO-ligase) challenges

Benchmarking predictions of allostery in liver pyruvate kinase in CAGI4

Whole-protein alanine-scanning mutagenesis of allostery: A large percentage of a protein can contribute to mechanism

Exploring the limits of the usefulness of mutagenesis in studies of allosteric mechanisms

Lessons from the CAGI-4 Hopkins clinical panel challenge

CAGI4 SickKids clinical genomes challenge: A pipeline for identifying pathogenic variants

Working toward precision medicine: Predicting phenotypes from exomes in the Critical Assessment of Genome Interpretation (CAGI) challenges

Crohn disease risk prediction—Best practices and pitfalls with exome data

Determination of disease phenotypes and pathogenic variants from exome sequence data in the CAGI 4 gene panel challenge

DeepBipolar: Identifying genomic mutations for bipolar disorder via deep learning

Correction(s) for this article

Erratum

CAGI4 Crohn's exome challenge: Marker SNP versus exome variant models for assigning risk of Crohn disease

Stratified polygenic risk prediction model with application to CAGI bipolar disorder sequencing data

Predicting gene expression in massively parallel reporter assays: A comparative study

Predicting enhancer activity and variant impact using gkm-SVM

Accurate eQTL prioritization with an ensemble-based framework

Matching phenotypes to whole genomes: Lessons learned from four iterations of the personal genome project community challenges

Sign up for email alerts

Tools

More from this journal