Volume 40, Issue 9 pp. 1486-1494

SPECIAL ARTICLE

Full Access

What went wrong with variant effect predictor performance for the PCM1 challenge

Maximilian Miller,

Corresponding Author

Maximilian Miller

[email protected]

orcid.org/0000-0002-1335-9499

Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey

Correspondence Maximilian Miller and Yana Bromberg, Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ 08873. Email: [email protected] (M.M.) and [email protected] (Y.B.)

Search for more papers by this author

Yanran Wang,

Yanran Wang

Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey

Search for more papers by this author

Yana Bromberg,

Corresponding Author

Yana Bromberg

[email protected]

orcid.org/0000-0002-8351-0844

Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey

Search for more papers by this author

Maximilian Miller,

Corresponding Author

Maximilian Miller

[email protected]

orcid.org/0000-0002-1335-9499

Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey

Search for more papers by this author

Yanran Wang,

Yanran Wang

Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey

Search for more papers by this author

Yana Bromberg,

Corresponding Author

Yana Bromberg

[email protected]

orcid.org/0000-0002-8351-0844

Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey

Search for more papers by this author

First published: 03 July 2019

https://doi.org/10.1002/humu.23832

Citations: 8

Maximilian Miller and Yanran Wang contributed equally in this study.

Share a link

Email
Wechat
Bluesky

Abstract

The recent years have seen a drastic increase in the amount of available genomic sequences. Alongside this explosion, hundreds of computational tools were developed to assess the impact of observed genetic variation. Critical Assessment of Genome Interpretation (CAGI) provides a platform to evaluate the performance of these tools in experimentally relevant contexts. In the CAGI-5 challenge assessing the 38 missense variants affecting the human Pericentriolar material 1 protein (PCM1), our SNAP-based submission was the top performer, although it did worse than expected from other evaluations. Here, we compare the CAGI-5 submissions, and 24 additional commonly used variant effect predictors, to analyze the reasons for this observation. We identified per residue conservation, structural, and functional PCM1 characteristics, which may be responsible. As expected, predictors had a hard time distinguishing effect variants in nonconserved positions. They were also better able to call effect variants in a structurally rich region than in a less-structured one; in the latter, they more often correctly identified benign than effect variants. Curiously, most of the protein was predicted to be functionally robust to mutation—a feature that likely makes it a harder problem for generalized variant effect predictors.

1 INTRODUCTION

The Pericentriolar material 1 (PCM1) gene encodes the PCM1 protein, which is a fundamental component of centriolar satellites (CS) (Tollenaere, Mailand, & Bekker-Jensen, 2015). The CS structures are, in association with PCM1, essential for microtubule- and dynactin-dependent attraction of proteins to the centrosome (Dammermann & Merdes, 2002). PCM1 is involved in the maintenance of microtubule arrays and their anchoring at the centrosome during interphase (Hori & Toda, 2017). A dysfunctional PCM1 protein has been associated with defects in the neural developmental process (Keryer et al., 2011; Zoubovsky et al., 2015). Furthermore, variations in the PCM1 gene have been linked to psychiatric and severe developmental disorders (Datta et al., 2010; Gurling et al., 2006; Hashimoto et al., 2011; Kamiya et al., 2008; Moens et al., 2010; Page et al., 2018). In addition, PCM1 based fusion genes have been linked to myeloid/lymphoid neoplasms (Baer, Muehlbacher, Kern, Haferlach, & Haferlach, 2018), acute myeloid leukemia (Lee et al., 2018), and chronic myeloproliferative neoplasm (Ghazzawi et al., 2017). Even though various studies have attempted to better characterize PCM1 (Dammermann & Merdes, 2002; Hashimoto et al., 2011; Hoang-Minh et al., 2016; Kamiya et al., 2008; Wang, Lee, Malonis, Sanchez, & Dynlacht, 2016), we still lack a comprehensive understanding of its functionality. PCM1 knockouts, as well as heterozygous PCM1 deficiency, result in clear phenotypes in mice, for example, change in overall brain volume, impairment in social interaction, and changed exploratory activity (Zoubovsky et al., 2015).

One challenge of the recent Critical Assessment of Genome Interpretation (CAGI) provided a benchmark data set of 38 amino acid substitutions in PCM1. Participants were tasked with estimating the impact of the provided variants on overall PCM1 functionality. Notably, the experimental evaluation reported the change in brain ventricular volume due to PCM1 mutation. Multiple computational approaches have been designed to predict variant effects (functional change, pathogenicity, stability, etc.; Thusberg, Olatubosun, & Vihinen, 2011), including screening for nonacceptable polymorphisms (SNAP; Bromberg & Rost, 2007), sorting intolerant from tolerant (SIFT; Ng & Henikoff, 2003), PolyPhen-2 (I. A. Adzhubei et al., 2010), and nearly 200 others (Miller, Vitale, Rost, & Bromberg, 2019); these tools vary according to their preferred methods for model building, training data set selection, and descriptive features used for prediction.

For the PCM1 challenge of CAGI-5 described above, six submissions were submitted by five different groups. Our winning submission used SNAP predictions, thresholded to produce the originally requested (by the challenge rules) tri-class categorization of variant effects. Here we also show that SNAP without thresholds outperformed 24 additional, widely used variant effect prediction tools on the same benchmark data set. However, even SNAP performance was lower than expected (Bromberg & Rost, 2007). Following up on the reasons for this performance we suggest that methods fail (a) particularly due to the specifics of PCM1 features, for example, its conservation and structure patterns, and/or (b) due to the discrepancy between the experimental measures (brain ventricle volume change) and the predictor goals (pathogenicity, function or stability changes, etc.). The former indicates that mutation effects in proteins similar to PCM1 are likely to also be mispredicted by existing methods. The latter reinforces the notion that no single predictor can answer all different variant impact questions.

Here, we also further discuss the results of our second submission, where we applied a recently developed functional Neutral Toggle Rheostat predictor (fuNTRp; Miller et al., 2019) method. funtrp categorizes protein positions into three functional types based on the expected range of effect upon variation (no/weak effects = Neutral, severe effects = Toggle, intermediate effects = Rheostat). These types could then be used as a baseline for variant effect identification. Details of the funtrp model training and evaluation are described in (Miller et al., 2019). The predicted distribution of funtrp position types observed in PCM1 is characteristic of proteins involved in cell cycle/mitosis and/or necessary for maintaining cellular structure/integrity. Further, these results suggested a very robust and variation-insensitive protein functionality. This is particularly important for PCM1 due to its involvement in critical and essential processes in the cell.

2 MATERIALS AND METHODS

2.1 Experimentally derived PCM1 benchmark data set

The experimental data were provided by Dr. Maria Kousi (MIT) and Dr. Nicholas Katsanis (Duke University; CAGI-5 PCM1 challenge, manuscript in preparation). The benchmark set consisted of 38 PCM1 variants evaluated in a zebrafish brain development model. Variants referenced in this study are reported by the challenge to affect RefSeq (O'Leary et al., 2016) transcript NM_001315507.1 and protein NP_006188.3. The relative effect due to PCM1 mutations was defined as the volume difference in the brain ventricle area between zebrafish injected with wildtype and mutant mRNA. For every variant, the experimentally measured volume change and the significance of its difference from wildtype was provided (t test, p value). Based on the p value thresholds established by the data providers (Table S1), variants were categorized into benign (same function as wildtype), hypomorphic (partial loss of function), or pathogenic (loss of function). Participants were asked to predict the p value and effect type for all 38 variants. Note, the assumption made here was that the difference in ventricle size directly correlates to functional change, that is same volume = benign and different volume = (partial) loss of function. An in-depth description of the PCM1 challenge is available on the CAGI-5 website.

2.2 PCM1 challenge submission

We submitted the results of two different approaches based on SNAP (Bromberg & Rost, 2007; Bromberg, Yachdav, & Rost, 2008) and funtrp (Miller et al., 2019) predictions. SNAP is a neural-network-based computational method for predicting variant functional effects (Bromberg & Rost, 2007; Bromberg et al., 2008). funtrp, on the other hand, is NOT a variant effect predictor. Instead, it distinguishes protein sequence positions as either Neutral, Rheostat or Toggle on the basis of variant impact that can be expected at each position. Variants at Neutral positions are expected to have mostly minor impact, while variants at Toggle positions are prone to have severe functional consequences. Finally, Rheostat positions show a range of effects and thus allow for tuning of protein functionality (Miller, Bromberg, & Swint-Kruse, 2017).

For our Submission 1, we compiled SNAP prediction reliability scores and assigned those (a) in the [2,9] range as pathogenic; (b) in the [−9,−2] range as benign; and (c) in the [−1,1] range as hypomorphic.

In Submission 2, we applied funtrp and assigned any variants (a) at Rheostat positions as hypomorphic, (b) at Neutral positions as benign, (c) and at Toggle positions as pathogenic.

2.3 Performance evaluation

Consistent with the CAGI evaluation strategy, predictions were transformed into binary classes: effect versus no-effect, where the effect class encompassed the hypomorphic and pathogenic predictions. We computed balanced accuracy (BACC) and Matthew's correlation coefficient (MCC) for evaluation of prediction performance (Equation 1).

$urn:x-wiley:10597794:media:humu23832:humu23832-math-0001$ (1)

2.4 Additional postchallenge prediction analysis

We extracted precomputed predictions for all 38 variants in the PCM1 benchmark data set from the VarCards (Li et al., 2018) database. VarCards contains effect annotations from 23 commonly used computational variant effect predictors (Supporting Information Text S2). For most of these methods, a threshold-based binary classification was available, categorizing variants as either tolerant/benign/neutral or damaging/effect/nonneutral. Note that here we will use effect/no-effect wording. For methods with multiclass classifications, we assigned any prediction that was not labeled tolerant, benign, or neutral to the effect class. The two variants with missing predictions (NA) from MutationAssessor (Reva, Antipin, & Sander, 2011) and M-CAP (Jagadeesh et al., 2016) were labeled incorrect. We also added SNAP predictions as 24th method to this analysis.

We also calculated an agreement-ratio (R_a) per variant (Equation 2). Here, R_a = −1 signifies that none of the methods predicted effect, while with R_a= 1 all methods predict the effect. An R_a= 0 means that predictors disagree, that is 50% of the methods predict the effect. Note that in reality, there was no variant for which all methods agreed (range: −0.92 for no-effect and 0.67 for effect). In this representation, on the far sides of the R_a scale, the majority of methods agree [−0.92,−0.68] and [0.33,0.67]. When methods agreed on the correct experimental annotation, the variant was labeled easy. When their prediction was incorrect, the variant was labeled difficult.

$urn:x-wiley:10597794:media:humu23832:humu23832-math-0002$ (2)

2.5 Extracting protein features

Domain annotations (e.g., for coiled-coil regions) were extracted from UniProt Knowledgebase (UniProtKB; UniProt Consortium, 2018). To better understand the reasons for the observed differences between computational predictions and experimental effect annotations in PCM1, we analyzed SNAP predictions in the context of protein position structural features. We used the PredictProtein pipeline (Yachdav et al., 2014) to predict, using sequence alignments, three protein per-residue features commonly used by variant effect predictors (solvent accessibility and secondary structure from PROFphd (Rost & Sander, 1994) and conservation from ConSurf (Ashkenazy et al., 2016).

The CAGI-5 PCM1 benchmark data set was based on the canonical sequence of PCM1 (isoform-1; NP_006188.3, UniProt Q15154-1) and contains 16 benign, 10 hypomorphs, and 12 loss of function variants. UniProt lists five PCM1 isoforms; isoforms −1, −2, −4, and −5 are approximately 2,000 residues long, while −3 is by far the shortest with 530 residues. Isoform-1 aligns (Smith & Waterman, 1981) without gaps to the positions 1–263 and 303–530 in isoform-3—an overlap of 491 residues (~93% of isoform-3). In this study, we separately evaluated the variants localizing to the overlap region (OR) of isoforms −1 and −3 in comparison to the isoform-1 exclusive (I1) regions. As isoform-3 is observed in other species (≥95.8% sequence identity over the entire sequence) for example, chimpanzee, gibbon, and orangutan, we assumed that it is fully functional as is, that is it does not require additional domains found in isoform-1. Fourteen of the 38 variants benchmarked in the PCM1 challenge were present in both isoforms; that is one of the 16 benign, seven of the 10 hypomorphs, and six of the 12 loss of function variants were OR-specific.

3 RESULTS

3.1 Submission 1 outperforms both the CAGI-5 submissions and other predictors

Our SNAP-based submission (Submission 1) performed best (BACC = 0.66, MCC = 0.35; Equation 1 and Table S3) in the CAGI-5 PCM1 challenge. Our Submission 2 (funtrp-based) performance, however, was roughly average (BACC = 0.49, MCC = −0.04; Table S4), predicting the majority (36 of 38) of positions Neutral and, thus, the variants affecting these positions to have no-effect. Note that as funtrp is not specifically a variant effect predictor we expected a lower performance compared with other predictors.

We further compared the BACC performance of all CAGI-5 submissions to that of other publicly available methods, whose predictions we extracted from the VarCards database (Methods; Figure 1 and Table S5). Our Submission 1 was still the best even in this increased pool of methods. Here, PROVEAN (Choi & Chan, 2015) and GERP++ (Davydov et al., 2010) were next (BACC = ~0.58, MCC = 0.22 for both). Note that these top three methods remained in the lead even when performance was ranked by MCC. Interestingly, conservation-driven methods, GERP++, phyloP (Pollard, Hubisz, Rosenbloom, & Siepel, 2010), and LRT (Chun & Fay, 2009), occupied position 3–5 in a ranking by BACC—ahead of many more advanced methods; phyloP and LRT were worse when evaluated by MCC (positions 7 and 8; Table S5). Note that the SNAP (Bromberg & Rost, 2007), thresholds instituted in Submission 1, had shifted the default SNAP method cutoff to predict fewer no-effect variants. This shift resulted in an improved performance of Submission 1, placing regular SNAP into the 8th place by BACC and 6th by MCC rankings (BACC = 0.54, MCC = 0.12).

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

Comparing predictor performance for benchmark set variants. The balanced accuracy (BACC) of the CAGI-5 PCM1 challenge submissions (triangles) was compared with that of 24 commonly used variant effect predictors (squares and rhombi). Predictors in the legend are arranged according to their BACC: lowest for Group 5 and highest for Submission 1 (Group 3). CAGI, Critical Assessment of Genome Interpretation; PCM1, Pericentriolar material 1

3.2 Variant effect predictions correlate with region structural features

To better understand why all predictors performed worse than expected, we further evaluated predictions separately for variants in the isoform-1 exclusive (I1) region compared with those found in the overlap of isoforms −1 and −3 (OR). These two regions differ drastically in their structural compositions: I1 lacks regular secondary structure (48% of all residues predicted to be loops), while the OR region is more ordered (98% predicted to be in a helix). Note, that I1 contained most of the no-effect (15/16) and nine of the 22 effect variants; thus, OR had only one no-effect variant and 13 effect variants.

As for all 38 PCM1 variants, the top three methods (Submission 1, GERP++, and PROVEAN) remained the same for I1 variants and their performance even slightly improved (Table S5). Of the other methods, some performed better for this subset of variants (e.g. SIFT and SiPhy), while others did worse (e.g. phyloP and LRT; Table S5). For the OR variants, the predictor ranking changed, likely due to the imbalance in the data set (only one no-effect variant). Here, a method predicting all variants as the effect would do well but getting the single no-effect variant correct (no false negatives) could drastically improve performance. The six top scoring (Table S5) predictors, had all predicted that one benign variant correctly; MutationAssessor (Reva et al., 2011) in addition annotated six effect variants correctly, while the following five methods (DANN (Quang, Chen, & Xie, 2015), SiPhy (Garber et al., 2009), Submission 1, Group 1, Group 2) identified five of the effect variants. Note that only one of the top three overall predictors, our Submission 1, remained at the top of the OR ranking; PROVEAN also got the benign variant right, but only identified two of the 13 effect variants, while GERP++ had identified 11 effect variants correctly but mispredicted the benign variant.

The above variance in performance suggested a qualitative difference in predictor assessment of variants between the two regions. There were clearly different trends in the agreement of the predictors (R_a ratio, Equation 2) between I1 and OR variants (Figure 2). Effect (Loss of function and hypomorph) OR variants were predicted as an effect by more predictors (R_a ≥ 0) more often (38%) than those in I1 (33%). Notably, OR hypomorphs were more often misclassified as no-effect than OR loss of function variants. This is expected, as most predictors are trained is separating extreme loss of function versus no-effect cases and mild or moderate changes are harder to grasp (Bromberg, Kahn, & Rost, 2013; Miller et al., 2017). Note that it was impossible to compare no-effect predictions across regions, as there was only one benign variant in OR (correctly identified by 13 of 24 predictors, R_a= −0.08). However, two I1 benign variants were correctly identified by more predictors as no-effect (R_a< 0) nearly half (47%) the time—a significant improvement over the effect predictions in this region. This observation reaffirms the finding that effect variants in a structure-rich region and benign variants in a less-structured region are easier for predictors to identify.

3.3 Conservation as a driving factor for predictions

Most (if not all) methods that predict functional effect use conservation as an input. As the best-performing representative, Submission 1 mispredictions were examined for conservation patterns. For comparison, we also evaluated their relationship to the affected residue predicted solvent accessibility. Solvent accessibility was similar for correctly predicted and mispredicted both benign and loss of function/hypomorph variants (Figure 3a). Conservation, on the other hand, was clearly different for loss of function/hypomorph variants, even if it was similar for benign variants (Figure 3b). In fact, the majority of mispredicted loss of function/hypomorph variants show low conservation (high ConSurf scores; median = 0.78) while correctly predicted ones are more conserved (low ConSurf scores; median = −0.44).

We further selected the easiest and most difficult to predict variants (extreme cases) based on the agreement-ratio (R_a; Methods):

(i)
Easy benign: p.Ser804Arg, p.Asn1125Ser, p.Cys1361Tyr
(ii)
Difficult benign: p.Arg883Thr, p.Lys1275Glu, p.Glu1535Lys
(iii)
Easy loss of function/hypomorph: p.Glu311Gln, p.Glu369Gly, p.Gly892Trp
(iv)
Difficult loss of function/hypomorph: p.Glu23Asp, p.Leu472Val, p.Gly1556Asp

By definition, most methods get all the easy cases right and mispredicting the difficult cases. Submission 1 was no exception, correctly classifying all easy cases (i and iii) and making the wrong prediction for four of the five variants difficult loss of function/hypomorph (iv). However, it correctly predicted two of the three difficult benign variants (ii), whereas many other predictors were wrong (R_a p.Lys1275Glu = 0.33 and p.Glu1535Lys = 0.42). As expected, conservation played a role in this result. The correctly predicted difficult benign variants exhibited low levels of conservation, as did the mispredicted difficult loss of function hypomorph variants.

3.4 funtrp predictions indicate that PCM1 is functionally robust

We further explored PCM1 in terms of the mutagenize-ability type of its positions (Miller et al., 2019). funtrp predicted 95% (36) of all 38 variant protein sequence positions as Neutral, that is where variants of only weak or no functional effect were expected. Only two positions (982 and 1,275) were predicted as Rheostat, allowing for a broader range of effects upon variation (Table S4). However, prediction confidence scores for these were rather weak, that is close to Neutral classification. Our Submission 2 made use of funtrp predictions but did not attain better performance. Here, we further analyzed PCM1 funtrp annotations to glimpse the possible reasons for this failure.

We predicted position classes for all 2,024 PCM1 sequence positions and found that the vast majority of these were predicted Neutral (83.9%), some (15.6%) were Rheostats, and only 11 (0.5%) were Toggles, where variants are expected knock out functionality altogether. This distribution is unusual—on average, proteins have 53% Neutrals, 33% Rheostats, and 14% Toggles (Miller et al., 2019).

A high fraction of funtrp Neutral residues suggests robust/variation-insensitive protein functionality and, thus, a possible contribution to poor variant effect predictor performance. This protein characteristic is certainly beneficial for critical processes/pathways. However, it may also indicate that PCM1 has few canonical (e.g., binding, catalytic, etc.) active/functional sites, identified by Toggle and Rheostat predictions. Outside of these sites, variation would then rarely result in a severe functional impact.

We found PCM1 is one among a set of 668 Swiss-Prot (Boutet et al., 2016) proteins, which have a similar fraction of Neutral positions (>80%; Miller et al., 2019). Most proteins of this Robust set (>99%) are nonenzymes and 74% contain at least one coiled-coil domain. Interestingly, most of these proteins appear to be structural components of the cell. Of the 414 (62% of the Robust set) proteins with a Gene Ontology biological process annotation, 240 (58%) are responsible for the cellular component organization. Of the 561 (84% of the Robust set) that have cellular component annotations, 35% map to the cytoskeleton and 27% are in the nucleus. In line with these findings, five proteins of the 131 proteins in the Robust set that could be mapped to a structure in the PDB were histone proteins.

We further queried the ConsensusPathDB-human (Kamburov, Stelzl, Lehrach, & Herwig, 2013) for interaction networks involving the Robust set proteins; 300 (44.4%) were present in at least one pathway, many of which were cell cycle/mitotic pathways, and PCM1 was found in the 16 highest ranked of these pathways (Table 1).

Table 1. Top 16 pathways involving PCM1 and other proteins with high Neutral position content

ID	q-val^a	Pathway details	Size	Overlap
1	5E-35	M phase	340	18.2% (62)
2	8E-31	Cell cycle, mitotic	481	13.9% (67)
3	1E-27	Cell cycle	564	12.1% (68)
4	3E-27	Mitotic prometaphase	186	22.6% (42)
5	2E-22	AURKA activation by TPX2	72	36.1% (26)
6	2E-22	Anchoring of the basal body to the plasma membrane	97	29.9% (29)
7	1E-21	Loss of Nlp from mitotic centrosomes	69	36.2% (25)
8	1E-21	Loss of proteins required for interphase microtubule organization from the centrosome	69	36.2% (25)
9	2E-21	Recruitment of NuMA to mitotic centrosomes	79	32.9% (26)
10	4E-20	Regulation of PLK1 Activity at G2/M Transition	87	29.9% (26)
11	5E-20	Recruitment of mitotic centrosome proteins and complexes	80	31.3% (25)
12	5E-20	Centrosome maturation	80	31.3% (25)
13	4E-19	Cilium assembly	187	18.2% (34)
14	2E-18	Organelle biogenesis and maintenance	240	15.4% (37)
15	5E-17	G2/M transition	137	20.4% (28)
16	6E-17	Mitotic G2-G2/M phases	139	20.1% (28)

Abbreviation: pCM1, Pericentriolar material 1 protein.
^a q-value are p values corrected for multiple testing; Source of all 16 pathways: Reactome.

4 DISCUSSION

4.1 We do not see better performance with more features or more exotic algorithms

Submission 1 was the best predictor in this challenge, but conservation-based predictors did as well or better than more complicated methods. SNAP is a neural-network-based variant effect predictor which uses a variety of biochemical properties and sequence/structure-based features as input for model training. In comparison, the conservation-based GERP++ obtained the 3rd best performance with the only marginal difference to PROVEAN which ranked 2nd. The latter uses a new alignment-based metric which improves the predictions of in-frame insertions and/or deletions. FATHMM-multiple-kernel learning (MKL), which was ranked 6th, applies a based algorithm and uses ten feature groups including amongst others GC content, histone modification, transcription factor binding site as well as conservation. Interestingly, the latter was outperformed by the less complex and conservation centered LRT approach, a likelihood ratio test to identify deleterious mutations that disrupt highly conserved residues. Thus, the increase in number and complexity of features and/or choice of algorithms do not necessarily improve overall prediction performance.

4.2 PCM1 is a hard protein to evaluate

Prediction performance is generally evaluated for a wide range of samples (i.e., genes or proteins from the whole genome). Datasets, such as the PCM1 set, represent the variants of one specific protein, which may or may not be well characterized in training sets and learned by a particular tool. Thus, the choice of a benchmarking data set plays a big role in a given tool's prediction accuracy. According to one study (Mahmood et al., 2017), for example, CADD had an area under the receiver operating characteristics (ROC) curve of AUC_ROC = 0.87 when making predictions for TP53, but only =0.56 when looking at BRCA1; some other methods did poorly for both proteins. Different tools attained different levels of accuracy in evaluating PCM1 (Figure 1), but performance was lower than expected for many of them (I. Adzhubei, Jordan, & Sunyaev, 2013; Carter, Douville, Stenson, Cooper, & Karchin, 2013; Lu et al., 2015; Quang et al., 2015; Rentzsch, Witten, Cooper, Shendure, & Kircher, 2019; Reva et al., 2011). Thus, it appears that PCM1 is a hard protein for variant effect prediction.

One of the reasons for poor performance may be that the PCM1 protein 3D structure is unknown, and secondary structure prediction suggests that, in large part, it may lack structure altogether. For methods that use structural information as a feature, this piece of information is missing for PCM1.

Another reason may be that PCM1 (and other cell structure-relevant proteins) seem to be poorly represented (Kawabata, Ota, & Nishikawa, 1999) in experimental evaluations of functional effects; it is unclear whether this is because (a) the impact on these proteins is harder to quantify than, for example, enzymatic activity, (b) genes encoding enzymes (or their complexes) are more numerous, or simply because (c) enzymes and receptor proteins are currently of more pharmacological interest.

Finally, studies also highlight the overall discordance between experimental functional assays and computational predictions (Gray, Kukurba, & Kumar, 2012; Miosge et al., 2015). Thus, the additional possible reasons for the difficulty of PCM1 may include the following:

(1)
The experiment may not measure directly the same thing that the predictor evaluates, that is protein function change or pathogenicity. In this challenge, the function of PCM1 is represented by the change of zebrafish brain ventricle volume. However, the change of zebrafish ventricular volume after injection of mutant mRNA might not reflect the protein functional deficiency, nor the variant-associated pathogenicity.
(2)
Experimental functional assays could fail to reflect variant pretranscription effects. Injecting mRNA removes the effects of mutation from the organism genomic context, possibly ignoring the effect of variation on the transcript stability and introducing artifacts of overexpression (Gibson, Seiler, & Veitia, 2013).
(3)
Prediction tools measure different types/aspects of variant impact and thus their results hold different meanings. For example, SNAP was trained on PMD variants, specifically using the change in molecular functional activity as the label. CADD, on the other hand, learned from observed and simulated variants, learning to differentiate the likelihood of variant “existence” – a proxy for deleteriousness. Although this deleteriousness may correlate with functional impact, it is not a direct prediction of function. All tools reflect their specific hypotheses and their predictions should be interpreted accordingly.

4.3 What can we do better in variant effect prediction?

Participation in CAGI challenges provides valuable insights, which can be applied to develop improved variant effect prediction approaches. Also, some conclusions of this study may be useful for future CAGI challenge planning and design. As we find that computational methods should be evaluated using more fitting datasets, it may be of interest to ask experimentalists to design particular types of assays–testing precisely what the specific categories of methods predict. As we realize that this is an unlikely scenario, it may be useful to encourage testing strong predictions of variants in the genes of their interest. Finally, of utmost importance is properly evaluating the so-called no-effect/neutral variants, that is the ones that require the most work for the least “profit”.

For the CAGI-5 PCM1 benchmark data set, we observed a large diversity in the agreement of the computational variant effect predictors with the experimental results. As discussed above, there are many reasons for why this may be the case. These include method failure, incompatibility of effect assays, complexity of the affected interaction pathways, specifics of the protein features, and so forth. Thus, interpretation of computational predictions, while valuable in a research setting, still has limited use in actionable, for example, clinical, settings.

ACKNOWLEDGMENTS

We would like to thank the CAGI organizers and data providers for giving us the possibility to test and benchmark our methods and thus gain valuable new insights. We are also grateful to Dr. Chengsheng Zhu, Dr. Kenneth McGuinness, Yannick Mahlich and Zishuo Zeng (all Rutgers) for all creative discussions and to Dr. Sonakshi Bhattacharjee (Columbia) for discussions and help with the manuscript. We would also like to express gratitude to all people who deposit their data into publicly available databases and to those who maintain them. Y.B., Y.W., and M.M. were supported by the NIH U01 GM115486 grant. M.M. was also supported by the NIH R01 MH115958 01 grant. The CAGI experiment coordination is supported by NIH U41 HG007346 and the CAGI conference by NIH R13 HG006650.

Supporting Information

REFERENCES

Adzhubei, I., Jordan, D. M., & Sunyaev, S. R. (2013). Predicting functional effect of human missense mutations using PolyPhen-2. Current Protocols in Human Genetics, 9999B, 20. https://doi.org/10.1002/0471142905.hg0720s76
Google Scholar
Adzhubei, I. A., Schmidt, S., Peshkin, L., Ramensky, V. E., Gerasimova, A., Bork, P., … Sunyaev, S. R. (2010). A method and server for predicting damaging missense mutations. Nature Methods, 7(4), 248–249. https://doi.org/10.1038/nmeth0410-248
10.1038/nmeth0410-248
CAS PubMed Web of Science® Google Scholar
Ashkenazy, H., Abadi, S., Martz, E., Chay, O., Mayrose, I., Pupko, T., & Ben-Tal, N. (2016). ConSurf 2016: An improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic Acids Research, 44(W1), W344–W350. https://doi.org/10.1093/nar/gkw408
10.1093/nar/gkw408
CAS PubMed Web of Science® Google Scholar
Baer, C., Muehlbacher, V., Kern, W., Haferlach, C., & Haferlach, T. (2018). Molecular genetic characterization of myeloid/lymphoid neoplasms associated with eosinophilia and rearrangement of PDGFRA, PDGFRB, FGFR1 or PCM1-JAK2. Haematologica, 103(8), e348–e350. https://doi.org/10.3324/haematol.2017.187302
10.3324/haematol.2017.187302
CAS PubMed Web of Science® Google Scholar
Boutet, E., Lieberherr, D., Tognolli, M., Schneider, M., Bansal, P., Bridge, A. J., … Xenarios, I. (2016). UniProtKB/Swiss-Prot, the manually annotated section of the UniProt KnowledgeBase: How to use the entry view. Methods in Molecular Biology, 1374, 23–54. https://doi.org/10.1007/978-1-4939-3167-5_2
10.1007/978-1-4939-3167-5_2
CAS PubMed Web of Science® Google Scholar
Bromberg, Y., Kahn, P. C., & Rost, B. (2013). Neutral and weakly nonneutral sequence variants may define individuality. Proceedings of the National Academy of Sciences of the USA, 110(35), 14255–14260. https://doi.org/10.1073/pnas.1216613110
10.1073/pnas.1216613110
CAS PubMed Web of Science® Google Scholar
Bromberg, Y., & Rost, B. (2007). SNAP: Predict effect of non-synonymous polymorphisms on function. Nucleic Acids Research, 35(11), 3823–3835. https://doi.org/10.1093/nar/gkm238
10.1093/nar/gkm238
CAS PubMed Web of Science® Google Scholar
Bromberg, Y., Yachdav, G., & Rost, B. (2008). SNAP predicts effect of mutations on protein function. Bioinformatics, 24(20), 2397–2398. https://doi.org/10.1093/bioinformatics/btn435
10.1093/bioinformatics/btn435
CAS PubMed Web of Science® Google Scholar
Carter, H., Douville, C., Stenson, P. D., Cooper, D. N., & Karchin, R. (2013). Identifying Mendelian disease genes with the variant effect scoring tool. BMC Genomics, 14(Suppl 3), S3. https://doi.org/10.1186/1471-2164-14-S3-S3
10.1186/1471-2164-14-S3-S3
PubMed Web of Science® Google Scholar
Choi, Y., & Chan, A. P. (2015). PROVEAN web server: A tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics, 31(16), 2745–2747. https://doi.org/10.1093/bioinformatics/btv195
10.1093/bioinformatics/btv195
CAS PubMed Web of Science® Google Scholar
Chun, S., & Fay, J. C. (2009). Identification of deleterious mutations within three human genomes. Genome Research, 19(9), 1553–1561. https://doi.org/10.1101/gr.092619.109
10.1101/gr.092619.109
CAS PubMed Web of Science® Google Scholar
Dammermann, A., & Merdes, A. (2002). Assembly of centrosomal proteins and microtubule organization depends on PCM-1. Journal of Cell Biology, 159(2), 255–266. https://doi.org/10.1083/jcb.200204023
10.1083/jcb.200204023
CAS PubMed Web of Science® Google Scholar
Datta, S. R., McQuillin, A., Rizig, M., Blaveri, E., Thirumalai, S., Kalsi, G., … Gurling, H. M. (2010). A threonine to isoleucine missense mutation in the Pericentriolar material 1 gene is strongly associated with schizophrenia. Molecular Psychiatry, 15(6), 615–628. https://doi.org/10.1038/mp.2008.128
10.1038/mp.2008.128
CAS PubMed Web of Science® Google Scholar
Davydov, E. V., Goode, D. L., Sirota, M., Cooper, G. M., Sidow, A., & Batzoglou, S. (2010). Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Computational Biology, 6(12), e1001025. https://doi.org/10.1371/journal.pcbi.1001025
10.1371/journal.pcbi.1001025
PubMed Web of Science® Google Scholar
Garber, M., Guttman, M., Clamp, M., Zody, M. C., Friedman, N., & Xie, X. (2009). Identifying novel constrained elements by exploiting biased substitution patterns. Bioinformatics, 25(12), i54–i62. https://doi.org/10.1093/bioinformatics/btp190
10.1093/bioinformatics/btp190
CAS PubMed Web of Science® Google Scholar
Ghazzawi, M., Mehra, V., Knut, M., Brown, L., Tapper, W., Chase, A., … Cross, N. C. P. (2017). A novel PCM1-PDGFRB fusion in a patient with a chronic myeloproliferative neoplasm and an ins(8;5). Acta Haematologica, 138(4), 198–200. https://doi.org/10.1159/000484077
10.1159/000484077
PubMed Web of Science® Google Scholar
Gibson, T. J., Seiler, M., & Veitia, R. A. (2013). The transience of transient overexpression. Nature Methods, 10(8), 715–721. https://doi.org/10.1038/nmeth.2534
10.1038/nmeth.2534
CAS PubMed Web of Science® Google Scholar
Gray, V. E., Kukurba, K. R., & Kumar, S. (2012). Performance of computational tools in evaluating the functional impact of laboratory-induced amino acid mutations. Bioinformatics, 28(16), 2093–2096. https://doi.org/10.1093/bioinformatics/bts336
10.1093/bioinformatics/bts336
CAS PubMed Web of Science® Google Scholar
Gurling, H. M., Critchley, H., Datta, S. R., McQuillin, A., Blaveri, E., Thirumalai, S., … Dolan, R. J. (2006). Genetic association and brain morphology studies and the chromosome 8p22 Pericentriolar material 1 (PCM1) gene in susceptibility to schizophrenia. Archives of General Psychiatry, 63(8), 844–854. https://doi.org/10.1001/archpsyc.63.8.844
10.1001/archpsyc.63.8.844
PubMed Web of Science® Google Scholar
Hashimoto, R., Ohi, K., Yasuda, Y., Fukumoto, M., Yamamori, H., Kamino, K., … Takeda, M. (2011). No association between the PCM1 gene and schizophrenia: A multi-center case-control study and a meta-analysis. Schizophrenia Research, 129(1), 80–84. https://doi.org/10.1016/j.schres.2011.03.024
10.1016/j.schres.2011.03.024
PubMed Web of Science® Google Scholar
Hoang-Minh, L. B., Deleyrolle, L. P., Nakamura, N. S., Parker, A. K., Martuscello, R. T., Reynolds, B. A., & Sarkisian, M. R. (2016). PCM1 depletion inhibits glioblastoma cell ciliogenesis and increases cell death and sensitivity to temozolomide. Translational Oncology, 9(5), 392–402. https://doi.org/10.1016/j.tranon.2016.08.006
10.1016/j.tranon.2016.08.006
PubMed Web of Science® Google Scholar
Hori, A., & Toda, T. (2017). Regulation of centriolar satellite integrity and its physiology. Cellular and Molecular Life Science, 74(2), 213–229. https://doi.org/10.1007/s00018-016-2315-x
10.1007/s00018-016-2315-x
CAS PubMed Web of Science® Google Scholar
Jagadeesh, K. A., Wenger, A. M., Berger, M. J., Guturu, H., Stenson, P. D., Cooper, D. N., … Bejerano, G. (2016). M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity. Nature Genetics, 48(12), 1581–1586. https://doi.org/10.1038/ng.3703
10.1038/ng.3703
CAS PubMed Web of Science® Google Scholar
Kamburov, A., Stelzl, U., Lehrach, H., & Herwig, R. (2013). The ConsensusPathDB interaction database: 2013 update. Nucleic Acids Research, 41, D793–D800. https://doi.org/10.1093/nar/gks1055
10.1093/nar/gks1055
CAS PubMed Web of Science® Google Scholar
Kamiya, A., Tan, P. L., Kubo, K., Engelhard, C., Ishizuka, K., Kubo, A., … Sawa, A. (2008). Recruitment of PCM1 to the centrosome by the cooperative action of DISC1 and BBS4: A candidate for psychiatric illnesses. Archives of General Psychiatry, 65(9), 996–1006. https://doi.org/10.1001/archpsyc.65.9.996
10.1001/archpsyc.65.9.996
CAS PubMed Web of Science® Google Scholar
Kawabata, T., Ota, M., & Nishikawa, K. (1999). The protein mutant database. Nucleic Acids Research, 27(1), 355–357.
10.1093/nar/27.1.355
CAS PubMed Web of Science® Google Scholar
Keryer, G., Pineda, J. R., Liot, G., Kim, J., Dietrich, P., Benstaali, C., … Saudou, F. (2011). Ciliogenesis is regulated by a huntingtin-HAP1-PCM1 pathway and is altered in Huntington disease. Journal of Clinical Investigation, 121(11), 4372–4382. https://doi.org/10.1172/JCI57552
10.1172/JCI57552
CAS PubMed Web of Science® Google Scholar
Lee, J. M., Lee, J., Han, E., Kim, M., Kim, Y., Han, K., & Kim, H. J. (2018). PCM1-JAK2 fusion in a patient with acute myeloid leukemia. Annals of Laboratory Medicine, 38(5), 492–494. https://doi.org/10.3343/alm.2018.38.5.492
10.3343/alm.2018.38.5.492
PubMed Web of Science® Google Scholar
Li, J., Shi, L., Zhang, K., Zhang, Y., Hu, S., Zhao, T., … Sun, Z. (2018). VarCards: An integrated genetic and clinical database for coding variants in the human genome. Nucleic Acids Research, 46(D1), D1039–D1048. https://doi.org/10.1093/nar/gkx1039
10.1093/nar/gkx1039
CAS PubMed Web of Science® Google Scholar
Lu, Q., Hu, Y., Sun, J., Cheng, Y., Cheung, K. H., & Zhao, H. (2015). A statistical framework to predict functional non-coding regions in the human genome through integrated analysis of annotation data. Scientific Reports, 5, 10576. https://doi.org/10.1038/srep10576
10.1038/srep10576
PubMed Web of Science® Google Scholar
Mahmood, K., Jung, C. H., Philip, G., Georgeson, P., Chung, J., Pope, B. J., & Park, D. J. (2017). Variant effect prediction tools assessed using independent, functional assay-based datasets: Implications for discovery and diagnostics. Human genomics, 11(1), 10. https://doi.org/10.1186/s40246-017-0104-8
10.1186/s40246-017-0104-8
PubMed Web of Science® Google Scholar
Miller, M., Bromberg, Y., & Swint-Kruse, L. (2017). Computational predictors fail to identify amino acid substitution effects at rheostat positions. Scientific Reports, 7, 41329. https://doi.org/10.1038/srep41329
10.1038/srep41329
CAS PubMed Web of Science® Google Scholar
Miller, M., Vitale, D., Rost, B., & Bromberg, Y. (2019). fuNTRp: Identifying protein positions for variation driven functional tuning. bioRxiv, 578757. https://doi.org/10.1101/578757
Google Scholar
Miosge, L. A., Field, M. A., Sontani, Y., Cho, V., Johnson, S., Palkova, A., … Andrews, T. D. (2015). Comparison of predicted and actual consequences of missense mutations. Proceedings of the National Academy of Sciences of the USA, 112(37), E5189–E5198. https://doi.org/10.1073/pnas.1511585112
10.1073/pnas.1511585112
CAS PubMed Web of Science® Google Scholar
Moens, L. N., Ceulemans, S., Alaerts, M., VanDen Bossche, M. J., Lenaerts, A. S., DeZutter, S., … Del-Favero, J. (2010). PCM1 and schizophrenia: A replication study in the Northern Swedish population. American Journal of Medical Genetics. Part B, Neuropsychiatric Genetics: The Official Publication of the International Society of Psychiatric Genetics, 153B(6), 1240–1243. https://doi.org/10.1002/ajmg.b.31088
10.1002/ajmg.b.31088
CAS PubMed Web of Science® Google Scholar
Ng, P. C., & Henikoff, S. (2003). SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Research, 31(13), 3812–3814.
10.1093/nar/gkg509
CAS PubMed Web of Science® Google Scholar
O'Leary, N. A., Wright, M. W., Brister, J. R., Ciufo, S., Haddad, D., McVeigh, R., … Pruitt, K. D. (2016). Reference sequence (RefSeq) database at NCBI: Current status, taxonomic expansion, and functional annotation. Nucleic Acids Research, 44(D1), D733–D745. https://doi.org/10.1093/nar/gkv1189
10.1093/nar/gkv1189
PubMed Web of Science® Google Scholar
Page, D. J., Miossec, M. J., Williams, S. G., Monaghan, R. M., Fotiou, E., Cordell, H., … Keavney, B. (2018). Whole exome sequencing reveals the major genetic contributors to non-syndromic tetralogy of Fallot. Circulation Research, 124, 553–563. https://doi.org/10.1161/CIRCRESAHA.118.313250
10.1161/CIRCRESAHA.118.313250
Web of Science® Google Scholar
Pollard, K. S., Hubisz, M. J., Rosenbloom, K. R., & Siepel, A. (2010). Detection of nonneutral substitution rates on mammalian phylogenies. Genome Research, 20(1), 110–121. https://doi.org/10.1101/gr.097857.109
10.1101/gr.097857.109
CAS PubMed Web of Science® Google Scholar
Quang, D., Chen, Y., & Xie, X. (2015). DANN: A deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics, 31(5), 761–763. https://doi.org/10.1093/bioinformatics/btu703
10.1093/bioinformatics/btu703
CAS PubMed Web of Science® Google Scholar
Rentzsch, P., Witten, D., Cooper, G. M., Shendure, J., & Kircher, M. (2019). CADD: Predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Research, 47(D1), D886–D894. https://doi.org/10.1093/nar/gky1016
10.1093/nar/gky1016
CAS PubMed Web of Science® Google Scholar
Reva, B., Antipin, Y., & Sander, C. (2011). Predicting the functional impact of protein mutations: Application to cancer genomics. Nucleic Acids Research, 39(17), e118–e118. https://doi.org/10.1093/nar/gkr407
10.1093/nar/gkr407
CAS PubMed Web of Science® Google Scholar
Rost, B., & Sander, C. (1994). Combining evolutionary information and neural networks to predict protein secondary structure. Proteins, 19(1), 55–72. https://doi.org/10.1002/prot.340190108
10.1002/prot.340190108
CAS PubMed Web of Science® Google Scholar
Smith, T. F., & Waterman, M. S. (1981). Identification of common molecular subsequences. Journal of Molecular Biology, 147(1), 195–197.
10.1016/0022-2836(81)90087-5
CAS PubMed Web of Science® Google Scholar
Thusberg, J., Olatubosun, A., & Vihinen, M. (2011). Performance of mutation pathogenicity prediction methods on missense variants. Human Mutation, 32(4), 358–368. https://doi.org/10.1002/humu.21445
10.1002/humu.21445
CAS PubMed Web of Science® Google Scholar
Tollenaere, M. A., Mailand, N., & Bekker-Jensen, S. (2015). Centriolar satellites: Key mediators of centrosome functions. Cellular and Molecular Life Science, 72(1), 11–23. https://doi.org/10.1007/s00018-014-1711-3
10.1007/s00018-014-1711-3
CAS PubMed Web of Science® Google Scholar
UniProt Consortium, T. (2018). UniProt: The universal protein knowledgebase. Nucleic Acids Research, 46(5), 2699–2699. https://doi.org/10.1093/nar/gky092
10.1093/nar/gky092
PubMed Web of Science® Google Scholar
Wang, L., Lee, K., Malonis, R., Sanchez, I., & Dynlacht, B. D. (2016). Tethering of an E3 ligase by PCM1 regulates the abundance of centrosomal KIAA0586/Talpid3 and promotes ciliogenesis. eLife, 5, 5. https://doi.org/10.7554/eLife.12950
10.7554/eLife.12950
CAS Web of Science® Google Scholar
Yachdav, G., Kloppmann, E., Kajan, L., Hecht, M., Goldberg, T., Hamp, T., … Rost, B. (2014). PredictProtein--an open resource for online prediction of protein structural and functional features. Nucleic Acids Research, 42(Web Server issue), W337–W343. https://doi.org/10.1093/nar/gku366
10.1093/nar/gku366
CAS PubMed Web of Science® Google Scholar
Zoubovsky, S., Oh, E. C., Cash-Padgett, T., Johnson, A. W., Hou, Z., Mori, S., … Jaaro-Peled, H. (2015). Neuroanatomical and behavioral deficits in mice haploinsufficient for Pericentriolar material 1 (Pcm1). Neuroscience Research, 98, 45–49. https://doi.org/10.1016/j.neures.2015.02.002
10.1016/j.neures.2015.02.002
PubMed Web of Science® Google Scholar

Citing Literature

All articles

What went wrong with variant effect predictor performance for the PCM1 challenge

Abstract

1 INTRODUCTION

2 MATERIALS AND METHODS