Real-world clinical applicability of pathogenicity predictors assessed on SERPINA1 mutations in alpha-1-antitrypsin deficiency
Communicated by David N. Cooper
Abstract
The growth of publicly available data informing upon genetic variations, mechanisms of disease, and disease subphenotypes offers great potential for personalized medicine. Computational approaches are likely required to assess a large number of novel genetic variants. However, the integration of genetic, structural, and pathophysiological data still represents a challenge for computational predictions and their clinical use. We addressed these issues for alpha-1-antitrypsin deficiency, a disease mediated by mutations in the SERPINA1 gene encoding alpha-1-antitrypsin. We compiled a comprehensive database of SERPINA1 coding mutations and assigned them apparent pathological relevance based upon available data. “Benign” and “pathogenic” variations were used to assess performance of 31 pathogenicity predictors. Well-performing algorithms clustered the subset of variants known to be severely pathogenic with high scores. Eight new mutations identified in the ExAC database and achieving high scores were selected for characterization in cell models and showed secretory deficiency and polymer formation, supporting the predictive power of our computational approach. The behavior of the pathogenic new variants and consistent outliers were rationalized by considering the protein structural context and residue conservation. These findings highlight the potential of computational methods to provide meaningful predictions of the pathogenic significance of novel mutations and identify areas for further investigation.
1 INTRODUCTION
Increasing access to genome-sequencing technology supports increased personalization of care in which disease risk can be stratified by genotype. The most straightforward applications of this will predictably occur for monogenic disorders, where many genetic variations are now found during wide screening of individuals with complex disease phenotypes, or in whom no disease is clinically apparent. Each gene is likely to host many such variants, though each one will tend to be rare. In vitro, cellular, and/or in vivo characterization to define pathogenicity for all identified mutations is unlikely to be practical. Deleteriousness prediction scores have been developed to address this challenge and so distinguish benign and pathogenic alleles computationally (Niroula & Vihinen, 2016).
We chose to test how well general prediction algorithms predicted pathogenicity for variants of the SERPINA1 gene (MIM# 107400) that encodes the major circulating antiprotease, alpha-1-antitrypsin (α1AT). This system represents an interesting test. Pathogenic mutations cause the monogenic disorder alpha-1-antitrypsin deficiency (MIM# 613490), a relatively common “rare disease” in individuals of North European descent, whose molecular mechanisms are among the best characterized for any disease (Lomas, Hurst, & Gooptu, 2016). It is characterized by deficiency of circulating α1AT. Common clinical manifestations are emphysema and/or hepatic cirrhosis. Almost all cases involve missense point mutations that cause the α1AT variants to misfold within the endoplasmic reticulum (ER) of hepatocytes, which normally secrete the protein into the circulation. The synthesized mutant polypeptides are therefore prone to ER-associated degradation (ERAD) and/or polymerization within the ER to varying degrees depending on the nature of the mutation. Emphysema and the associated pulmonary syndrome, chronic obstructive pulmonary disease (COPD), arise from loss of function due to reduction of antielastase activity that renders lung tissue vulnerable to dysregulated proteolysis. Hepatic disease ensues from gain-of-toxic-function effects of misfolding and polymerization within hepatocytes. The function and dysfunction of α1AT is intimately related to the three-dimensional (3D) structure of the protein that, due to its metastable native conformation, is prone to dramatic conformational changes (Lomas et al., 2016).
Coding mutations of the SERPINA1 gene are most clearly defined by the amino acid alterations involved. For α1AT, the latter are conventionally numbered according to the residue number in the mature secreted protein, lacking the 24 N-terminal residues of the signal peptide. This conventional system differs from that recommended by the Human Genome Variation Society (HGVS), which includes the initial 24 amino acids and is adopted hereafter as preferred annotation. In addition, classical definition of α1AT variants includes isoelectric focusing (IEF) profiling (Ferrarotti et al., 2007). Wild-type α1AT is designated PI*M (where PI refers to α1-proteinase inhibitor, an alternative name for α1AT), mutants that migrate more cathodally are assigned letters occurring later in the alphabet while more anodal species are assigned earlier letters. Variants may also be named after a geographical location related to their discovery, typically the birthplace of the index case. The PI*M phenotype represents five normal polymorphisms named M1A, M1V, M2, M3, and M4, which have very similar IEF mobility (Ferrarotti et al., 2007; Luisetti & Seersholm, 2004). The coincidence of similar IEF in these five variants with physiological equivalence as “normal variants” is serendipitous. M1A is believed to be the ancestral human allele, whereas M1V contains the p.A237V substitution and it is the most frequent SERPINA1 allele. M3 and M4 are characterized by p.E400D and p.R125H variations, respectively, in M1V background, whereas M2 (p.R125H; E400D) contains both substitutions.
The most common deficiency allele associated with clinically severe α1AT deficiency is the Z variant (p.E366K), whereas the common α1AT S variant (p.E288V) is associated with milder α1AT deficiency (de Serres & Blanco, 2012). Although plasma levels of α1AT in SS homozygotes are reduced to approximately 60% compared to healthy subjects, these appear sufficient to provide protection from damage to lung tissue (Hazari et al., 2017). Relatedly, the S variant is less polymerogenic compared to Z α1AT, with more efficient clearance of the misfolded protein by degradation pathways (Curiel, Chytil, Courtney, & Crystal, 1989). On the other hand, compound heterozygotes with both S and Z alleles of SERPINA1 (SZ) are at increased risk of α1AT deficiency-related lung and liver diseases (American Thoracic Society and European Respiratory Society, 2003; Ferrarotti et al., 2005; Laffranchi, Berardelli, Ronzoni, Lomas, & Fra, 2018; Turino et al., 1996). All the other observed α1AT alleles have frequencies < 0.005, and in many cases have been described in a single case or family. However, cumulatively, “rare” mutations are likely to affect a substantial number of people. The reported frequency of rare α1AT alleles, generally in compound heterozygosity with Z, is higher in Southern Europe (Ferrarotti et al., 2005; Piras et al., 2013). Among rare SERPINA1 variants, the so-called null alleles are characterized by complete absence of protein in the bloodstream and are conventionally designed by Q0, followed by a given name. In most cases, these are mutations generated by premature stop codons, splicing site alterations or large deletions. They are associated with high risk of lung disease, but not with liver disease (Ferrarotti et al., 2014; Luisetti & Seersholm, 2004). Unsurprisingly, the information about the clinical presentation of most rare pathological point mutants of α1AT is limited. Those studied in more detail are polymerogenic mutants, such as M(malton) (p.F75del) (Graham et al., 1989), S(iiyama) (p.S77F) (Lomas, Finch, Seyama, Nukiwa, & Carrell, 1993), and King's (p.H358D) (Miranda et al., 2010). Others such as the I (p.R63C) (Graham et al., 1989; Ronzoni et al., 2016), Queen's (p.K178N) (Nyon et al., 2012), Baghdad (p.A360P) (Haq et al., 2016), Trento (p.E99V) (Miranda et al., 2017), and P(brescia) (p.G249R) (Medicina et al., 2009) are associated with milder polymerogenicity and minor plasma deficiency. The circulating deficiency of α1AT may be exacerbated by supervening reduced activity due to decreased binding affinity and inhibitory capacity against elastase, as reported for Z, F (p.R247C), Queen's and Baghdad mutations (Cook, Burdon, Brenton, Knight, & Janus, 1996; Haq et al., 2016; Nyon et al., 2012; Okayama, Brantly, Holmes, & Crystal, 1991). A point mutation at the reactive site of α1AT in the Pittsburgh variant (p.M382R) switches specificity from inhibition of elastase to inhibition of the clotting factors thrombin and Factor XIa, resulting in fatal bleeding events (Owen, Brennan, Lewis, & Carrell, 1983). Besides the antiprotease activity, amino acid variations may also potentially affect other anti-inflammatory functions of AAT (Janciauskiene et al., 2018; Jonigk et al., 2013).
Most computational tools used to predict pathogenicity use machine learning algorithms trained on known pathogenic variants to compute a damage score for every possible missense substitution. They integrate various combinations of features including conservation estimates, physicochemical properties of the amino acids, secondary structure, domain information, and substitution matrices. Examples of these approaches include CADD (Kircher et al., 2014) and PolyPhen-2 (Adzhubei et al., 2010). In addition, "meta-predictors" such as REVEL (Ioannidis et al., 2016) integrate results from previous prediction tools to improve classification performance. Successful gene-specific pathogenicity predictors have been developed for genes with large panels of mutations identified such as CFTR (Masica, Sosnay, Cutting, & Karchin, 2012) and BRCA1 (Starita et al., 2015).
Here, we report an updated catalog of missense mutations described in the SERPINA1 gene, together with a detailed characterization of its genetic variability across the human populations represented within large genomic databases. We then compare the performance of 31 prediction tools for their ability to discriminate previously characterized α1AT variants and finally characterize in cellular models a panel of predicted pathogenic variants found in the Exome Aggregation Consortium (ExAC) database.
2 MATERIALS AND METHODS
2.1 Genetic variability of SERPINA1 gene in human population
To compile the catalog of α1AT variants in Supp. Table S1, we integrated data from public databases and literature reports, last accessed in July 2017. We consulted public databases of clinically relevant variants, namely ClinVar (Landrum et al., 2014), HGMD (Stenson et al., 2014), and LOVD (Fokkema et al., 2011), queried for “SERPINA1.” Literature was searched in PubMed using as keywords “SERPINA1,” “alpha α1-AT,” “α1-AT,” “α1 proteinase inhibitor,” combined with “variant,” “allele,” “mutant,” and “rare.”
Genetic variability of SERPINA1 gene in different human populations was assessed by combining data repositories from ExAC v.0.3.1 (∼63,000 subjects; https://exac.broadinstitute.org), UK10K (∼3,500 subjects; https://www.uk10k.org); and Greater Middle East (GME) Variome Project (∼2,500 subjects; https://igm.ucsd.edu/gme) (Supp. Table S2). The frequency of carriers of Z and S/T mutations, singularly and combined, and for all other pathogenic variants (P/ P* rare variants in Table 1 and loss-of-function variants), was calculated from allele counts in ExAC and GME populations. Estimations were not performed on UK10K data since this data set includes data from twins, which could inflate observed allele frequencies for pathogenic variants.
AA variationHGVS/conventional | Allele name | P/P* | Clinical phenotype |
---|---|---|---|
p.F59C/F35C | Brixia | P | Def |
p.R63C/R39C | I | P | Def, Lung |
p.L65P/L41P | M(procida) | P | Def, Lung |
p.F75del/F51del |
M(palermo) M(malton)/M(cagliari) |
P* | Def, Lung, Liver |
p.S77F/S53F | S(iiyama) | P* | Def, Lung, Liver |
p.G91E/G67E | M(mineral springs) | P | Def, Lung |
p.T92I/T68I | Q0(lisbon) | P | Def, Lung |
p.E99V/E75V | Trento | P | Def, Lung |
p.T109M/T85M | Z(bristol) | P | Lung, Liver |
p.I116D/I92D | Q0(ludwigshafen) | P | Def, Lung |
p.Q129P/Q105P | NA | P | Def |
p.K178N/K154N | Queen's | P | Def |
p.E186G/E162G | P(gaia) | P | Lung |
p.V234E/V210E | M1(pierre-bénite) | P | Def, Lung |
p.R247C/R223C | F | P | Def, Lung, Dysfunction |
p.G249R/G225R | P(brescia) | P | Def, Lung, Liver |
p.D280V/D256V |
|
P | Def, Lung |
p.K283I/K259I | M(pisa) | P | Def, Lung |
p.L287P/L263P | Q0(gaia) | P | Def, Lung |
p.E288V/E264V | S/T | P | Def, Lung |
p.T292I/T268I | N(hartford city) | P | Def |
p.G344R/G320R | P(salt lake)/P(lyon) | P | Lung |
p.V357M/V333M | NA | P | Def, Liver |
p.H358D/H334D | King's | P* | Def, Lung, Liver |
p.A360P/A336P | Baghdad | P | Def |
p.A360T/A336T | W(bethesda) | P | Def, Lung, Liver |
p.E366K/E342K |
|
P* | Def, Lung, Liver |
p.M382R/M358R | Pittsburgh | P | Bleeding |
p.K392E/K368E | E(taurisano) | P | Def, Lung |
p.P393L/P369L | M(heerlen) | P* | Def, Lung |
p.P393S/P369S | M(wurzburg)/ M(val d'hebron) | P* | Def, Lung, Liver |
p.M409T/M385T | NA | P | Def, Lung |
p.P415H/P391K | Y(orzinuovi) | P* | Def, Liver |
2.2 Evaluation of pathogenicity/conservation predictors
We evaluated 24 deleteriousness and 7 conservation predictors for their ability to correctly classify α1AT variations (Supp. Table S3). We included predictors reported in dbNSFP 3.2 (Liu, Jian, & Boerwinkle, 2011, 2016) and four recently developed predictors: PON-P2 (6), REVEL (Ioannidis et al., 2016), iFISH (Wang & Wei, 2016), and M-CAP1.0 (Jagadeesh et al., 2016). Values of each score for all possible missense variants in SERPINA1 gene were retrieved from dbNSFP 3.2 or from predictor websites (Supp. Table S4). The M-CAP1.0 score was developed only for variants with minor allele frequency (MAF) < 0.01, so values for S/T and Z variants were not reported and set to missing. We used SERPINA1 protein sequence (NP_000286.3) as query in pBLAST search against database of primates sequences and selected eight orthologs with ≥ 95% sequence identity. By this method, we identified 40 amino acid substitutions that were used as benign variants in subsequent analysis, as previously described (Riera, Padilla, & de la Cruz, 2016) (Supp. Table S5 and Supp. Figure S1). Our final set of benign variants included also the 3 M background alleles for a total of 43 B variants. Pathogenic variants were P and P* variants reported in Table 1. Performance of each score in classifying benign and pathogenic variants was evaluated using the overall performance metric (OPM), as defined in Niroula, Urolagin, and Vihinen (2015), and the area under the curve (AUC) of receiver operator characteristic (ROC) curves, calculated using ROCR package (Sing, Sander, Beerenwinkel, & Lengauer, 2005). For each score, we calculated an optimal threshold, defined as the score value resulting in maximum OPM value, and then used this value to assess sensibility, specificity, positive predictive value, negative predictive value, accuracy, and Matthews correlation coefficient.
2.3 Distribution of REVEL scores and clusters definition for SERPINA1 variants
Since REVEL emerged as the best predictor based on OPM value, we analyzed how this score is distributed across all possible missense variants in SERPINA1 gene. First, we evaluated the optimal number of variants subgroups that could be defined based on REVEL score. Analysis of within sum of squares across clusters suggested three groups as the best solution (Supp. Figure S2). These three groups are defined as Cluster 3, high scoring variants with REVEL value > 0.618; Cluster 2, with midrange values between 0.354 and 0.618; Cluster 1, with low scoring variants < 0.354. For P/P* and tested variants, we also calculated the classification concordance among the top 5 and top 10 performing predictors, using the optimal thresholds defined in Supp. Table S3.
2.4 Expression vectors and cell transfection
Vectors encoding for the α1AT variants were obtained by site-directed mutagenesis of M1V (Medicina et al., 2009) using the QuikChange II Mutagenesis Kit (Agilent) with primers listed in Supp. Table S6. HEK293T/17 (ATCC#CRL-11268) or Hepa1-6 cells (ATCC#CRL-1830) were maintained in DMEM-10% FBS (Sigma) and transfected by PEI "Max" (Polysciences Inc.) or Lipofectamine2000 (Thermo Fisher) as previously described (Miranda et al., 2017; Ronzoni et al., 2016). Twenty-four hours after transfection, we collected the cell media and lysed the cells in 1% NP40/20 mM Tris–HCl pH 7.4/150 mM NaCl/10 mM N-ethylmaleimide/protease inhibitors, then discarding nuclei by 30′ centrifugation at 800g.
2.5 SDS–PAGE, Native-PAGE, and immunoblots
Lysates and media of transfected cells were analyzed either by 7.5% SDS–PAGE or 8% Native-PAGE and immunoblots revealed by anti-α1AT (DAKO) followed by HRP-anti-rabbit antibodies (Thermo Fisher) and ECL (Euroclone) as described previously (Fra et al., 2012; Miranda et al., 2017).
2.6 Sandwich ELISA
Quantification of α1-AT in culture media was performed by sandwich ELISA as described (Miranda et al., 2017). Briefly, 96-well plates (Costar 3590) were coated with rabbit polyclonal anti-α1AT (DAKO; 2 μg/mL) and saturated for 1 hr at 37°C with a blocking buffer (PBS, 0.25% BSA). Serial dilutions (1:1.5) in PBS/0.1% BSA of purified α1AT (Merck) and cell media were added to the plates and incubated at 37°C for 1 hr. After washing in PBS/0.05% Tween-20, wells were incubated for 1 hr at 37°C with Sheep anti-α1AT-HRP (Abcam) in PBS/0.1% BSA, further washed and revealed with the TMB substrate (Sigma). The reaction was blocked by adding 3 M HCl and the absorbance at 450 nm measured by an ELISA plate reader (EnSight, PerkinElmer, Milan, Italy).
2.7 Antielastase assay
The culture media of cells expressing α1AT variants were diluted in PBS and incubated at 37°C for 30 min with equimolar porcine pancreatic elastase (PPE; Sigma). Samples were then separated by 7.5% SDS–PAGE and detected by immunoblot by a polyclonal antibody anti-α1AT (DAKO).
3 RESULTS
3.1 Updated catalog of missense variants causing α1AT deficiency
We integrated information from the publicly accessible databases ClinVar, HGMD, and LOVD with published literature to compile an updated list of known missense variants in SERPINA1 (Supp. Table S1). The M1V allele (RefSeq: NM_000295.4) is used as the reference sequence. Missense variations are annotated with nucleotide and amino acid substitutions reported with both HGVS and conventional α1AT nomenclature. Whenever available, we reported the background allele and the name given to the α1AT variant, as well as relevant references. Seven cases of double (two amino acids) substitution alleles were identified. In some cases, mutations occur on different haplotype backgrounds, resulting in different allele names, as in the case of S/T or Z/Z (augsburg). Four missense alleles associated with undetectable α1AT plasma levels are annotated as Q0. However, these variants differ from the majority of Q0-type mutations, since they carry missense variations and not premature stop codons, splicing site alterations or large deletions (Ferrarotti et al., 2014).
Clinical significance of known variants was assessed after critical revision of published studies. Variants were classified as pathogenic (P; Table 1) if clearly associated with disease based on clinical and experimental evidence according to the following criteria: (i) they were reported in at least two unrelated subjects with α1AT deficiency-associated diseases; (ii) they were reported in single cases with manifestations of disease and have been further characterized by biochemical experiments and/or in cellular models. When a robust classification was not possible, the variants were cataloged as uncertain (U). In most cases, our classification correlates with that of ClinVar, where the variants are annotated in six main categories (P, B, LP, LB, U, or O for pathological, benign, likely pathological, likely benign, uncertain, and other, respectively; Supp. Table S1). Among the pathogenic variants in Table 1, we further subclassified seven mutations as severe (P*), based on very low plasma levels in vivo and high tendency to form intracellular polymers when assessed in vitro. The distribution of pathogenic variants appears clustered in the tertiary α1AT structure (PDB ID: 3NE4; Figure 1B).

3.2 SERPINA1 variations in human population databases
Different public databases have been established to collect genome/exome sequencing data worldwide. Querying for nonsynonymous SERPINA1 variants we retrieved 184 variants from ExAC v.0.3.1 (including seven population groups), 38 from UK10K database (including UK individuals), and 20 from GME repository (including subjects from six regions in GME; Supp. Table S2). Most missense variants reported in ExAC populations are ultrarare (allelic frequencies [AF] < 0.0001), with 77 seen only once in the overall combined data set (Supp. Table S7). Population-specific variants are observed in UK10K (n = 15) and GME (n = 1) that are not reported in ExAC (Supp. Figure S3).
Several of the published variants are found in at least one of the population databases (Supp. Table S2). The polymorphisms defining the background M alleles (p.V237A, p.E400D, p.R125H) have global frequency of 0.215, 0.276, and 0.169, respectively, with differential distribution in different subpopulations (Figure 2A). The common pathogenic S (p.E288V) and Z (p.E366K) alleles show the larger AF values in the global population: 0.0201 and 0.0117, respectively (Figure 2B). Frequency of carriers of Z and S in the different populations is reported in Supp. Table S8. Notably, 11 Z homozygotes and 48 S homozygotes are reported in the ExAC database and at least one homozygous individual is reported for variants F, M(malton), and M(procida). A large proportion of carriers of AAT variants identified in Asian (GME, EAS, and SAS) and non-Finnish European (NFE) groups carry a rare pathogenic allele, whereas Z and S variants prevail in other populations (Figure 2C). The observed AF and distribution are comparable to previous estimation in general populations of 97 countries worldwide (de Serres & Blanco, 2012).

3.3 Assessment of mutations by pathogenicity predictors and conservation scores
Several bioinformatics tools have been developed to predict the deleteriousness of protein variations, each considering a different set of features to evaluate amino acid and/or nucleotide substitutions. We considered 31 deleteriousness/conservation predictors for their ability to predict pathogenicity of the SERPINA1 variants listed in Table 1 (P and P*). We considered as benign (B) the three common M polymorphisms, together with a panel of substitutions found by alignments with primate α1AT orthologs (Supp. Table S5 and Supp. Figure S1), as performed previously (Riera et al., 2016). Overall, we used 32 P/P* and 43 B substitutions to calibrate the predictive algorithms outputs and ranked them according to their OPM at optimal thresholds (Supp. Table S3). The REVEL algorithm had the highest OPM (0.93 Figure; 3A), resulting in well-distinguished distribution of scores comparing P/P* and B variants (Figure 3B). Overall, the 10 best performing predictors reach a consistent classification for all P mutations, except E(taurisano), N(hartford city), Pittsburgh, and p.Q129P (Supp. Table S9). Notably, P* mutations consistently achieved extreme pathogenicity scores. Missense variants reported in ExAC and all other possible missense substitutions (Other) predominantly scored below the optimal REVEL threshold, consistent with the expectation that most possible missense mutations will have little or no functional impact. Score distributions for the 10 best-performing tools are reported in Supp. Table S10.

3.4 Clustering analysis of α1AT missense variants using REVEL
We applied REVEL to all possible missense variants in SERPINA1 gene (Supp. Table S10) to evaluate their pathogenic/neutral significance. Based on k-means clustering, we defined three groups that optimize classification: Cluster 1 (0–0.354), Cluster 2 (0.355–0.618), and Cluster 3 (0.618–1). Known P mutations all belong to Cluster 2 or 3, above the 0.477 optimal threshold of REVEL, all P* mutations belong to Cluster 3 (Supp. Figure S5B), whereas U variants are distributed equally across the three clusters (Supp. Figure S5B). Looking at the overall distribution of all possible missense variants and that of ExAC and GME, the greater proportion falls in Clusters 1 and 2 (Figure 4A, B). We mapped where variants belonging to Cluster 3 occur in the structure of native α1AT (Figure 4C). This distribution of Cluster 3 variants spans in a network of residues whose structural dynamics were shown to be most affected by known mutations (Fra et al., 2012; Nyon et al., 2012).

3.5 Characterization of new α1AT variants in cellular models
We correlated high REVEL scores against biological readouts in cellular models for a range of variants (highlighted in Figure 4B). Six Cluster 3 variants assessed were previously uncharacterized while p.P279T, previously reported in the M2(pont-eveque) and in the double mutant M(frankfurt) as benign due to normal AAT serum levels and lack of hepatic or pulmonary symptoms (Faber et al., 1994; Joly et al., 2014), achieved the highest REVEL score. Two variants from Cluster 2 (p.A55P, p.G282R) and two from Cluster 1 (p.S71R, p.M409V) were evaluated for comparison (Supp. Table S11).
The variants were first expressed transiently in HEK293T cells. Wild-type M1V and Z α1AT were also expressed as reference. Formation of polymers both in cell extracts (Figure 5A) and culture media (Figure 5B) was analyzed by native PAGE. Intracellular and secreted Z α1AT was overwhelmingly polymeric, consistently with previous in vitro and in vivo studies (Fra et al., 2012, 2016; Miranda et al., 2017; Tan et al., 2014). The expression of p.P313S, p.P279T, and p.G282R variants showed polymer/monomer proportions similar to Z α1AT (Figure 5A). Conversely, p.M409V resembles M α1AT with monomer predominance and the other variants showed intermediate aggregation profiles. The α1AT variants secreted by HEK293T transfected cells were functional, as they were able to react with protease porcine PPE (Figure 5C). Z, p.P279T and p.P313S were less functional due to a lower intrinsic activity, as published for the Z mutant (Ogushi, Fells, Hubbard, Straus, & Crystal, 1987), and/or to the presence of inactive polymers in the media. Similar results were obtained by expression in Hepa 1–6 cells in which we also analyzed the p.A55P and p.S71R variants, respectively, classified in cluster 2 and cluster 1 (Figure 5D–E).

Secretion was evaluated by quantifying α1AT in the culture media of both HEK293 and Hepatransfected cells by sandwich ELISA (Figure 5F). Consistent with previous studies (Fra et al., 2012; Laffranchi et al., 2018; Miranda et al., 2017; Ronzoni et al., 2016), these cell lines reproduced the secretion deficiency phenotype observed in vivo for Z variant homozygotes (∼15% relative to M homozygotes). All Cluster 3 variants showed statistically significant reductions in α1AT secretion relative to M α1AT. Notably, the levels of p.P279T in HEK293T and Hepa cells (6.1 ± 1.8% and 15.0 ± 5.3%, respectively) and p.P313S (3.4 ± 2.0% and 1.5 ± 1.0%, respectively) were severely reduced in the media, similar to levels of Z α1AT. Cluster 3 variants p.L90R, p.G216C and p.G331R showed more severe secretion deficiency in the Hepa cell line compared to HEK293T cells. A substantial reduction was also observed for the Cluster 2 variants p.A55P and p.G282R, whereas Cluster 1 variants (p.M409V and p.S71R) were secreted at similar levels as M α1AT. The observed secretion levels of the variants tested in the two cell models were then plotted against their REVEL scores (Figure 5G). Linear regression analysis suggested positive correlations between the two parameters in both cell models, although the data sets are relatively small.
Taken together, these biological findings support the potential of such algorithms to prospectively predict clinical significance of α1AT variants with reasonable accuracy.
4 DISCUSSION
Several in silico tools have been developed to predict the deleterious effect of missense variants. These have become even more important in view of the increasing number of variants identified by high-throughput screenings of healthy individuals as well as of large cohorts of patients. Here we have compared the performance of multiple algorithms for a single gene, SERPINA1, where several point mutations are associated with the α1AT deficiency through loss- and gain-of-function mechanisms.
A comprehensive database of mutations and their various classifications is a vital interface to evaluate the actual performance of computational tools, and, at present, it seems that this has to be accomplished manually by experts in the biological field. We have compiled such a database for SERPINA1 and α1AT deficiency and have identified important issues affecting predictions, which are likely of general relevance to other genes and disorders. First, many mutations identified from literature review were not cross-referenced on publicly accessible databases. Mutations may be described in the literature in terms of nucleic acid changes or simply at the amino acid level. Conventional residue numbering in proteins generated from a precursor polypeptide (e.g., all secreted eukaryotic proteins) can diverge from the numbering of residues by HGSV. Existing assignments of the pathogenicity of variants may be based on varying degrees of evidence. Initiatives within a field can, as here, produce a high-quality database of variants to cross-check with general databases and may be optimized for biomedical relevance. This can be of great use in the short term for those active in the field. However, without dedicated ongoing resource to maintain them, these inevitably become obsolete or lost.
We used well-supported pathogenic (P) and known or inferred benign variants (B) to calibrate and comparatively evaluate the performance of a large number of different algorithms designed for prediction of pathogenicity. The performance varied considerably among different algorithms suggesting that different predictors may be better suited for specific genes and hence diseases. A subset of the pathogenic variants for which the existing evidence strongly supported severe pathogenicity (P*) was used as an internal validator of the predictors’ performance. The distribution of scores for P* mutants tended to cluster for the best performing predictors at the end of the scale, indicating greatest pathogenic potential. None of the assessed algorithms identified as pathogenic the Pittsburgh variant whose mutation at the active site switches the antiprotease specificity from elastase to thrombin and Factor XIa. This is presumably because the mutated amino acid affects reactivity with the target protease rather than the protein structure.
As REVEL performed best in discriminating variants of known pathogenicity from those classified as benign, this tool was used to assess the pathogenic potential of all possible SERPINA1 variants and those observed in the ExAC population database. The majority of these were rare variants of unknown pathogenic potential, since they lack an experimental or clinical characterization. Most of possible missense variants were predicted to be benign by the REVEL algorithm. However, several novel variants identified within the ExAC cohort achieved scores above the optimal REVEL threshold. Eight of these variants, along with two new ones predicted as benign and with the wild-type M and mutant Z as controls, were characterized in vitro using two distinct cellular models for α1AT deficiency. We determined the presence of extracellular and intracellular polymers of α1AT, the secreted levels, and antiprotease function. The approach performed well in positively predicting mutations associated with reduced secretion mainly due to intracellular polymerization, with varying degrees of similarity to the Z variant. Although the number of variants characterized in cellular models is limited, our observations show that the secretion defect correlates with distribution among REVEL clusters and supports the hypothesis that the secretion levels might correlate with REVEL scores in a continuous manner. The secretion levels observed in our two cell models, though broadly similar, differ for some tested variants, likely reflecting differences among cell types in the proteostatic mechanisms (Fra, Yoboue, & Sitia, 2017; Sala et al., 2017).
The striking effects of p.P313S, p.P279T, and p.G282R mutations can be rationalized in terms of the structure of native α1AT (Supp. Figure S6). Cluster 3 variations p.P313S and p.P279T both replace highly conserved, hydrophobic proline residues that terminate β-strands with small polar residues that are associated with less rigid and kinked backbone behavior. Proline 313 is one of the most highly conserved residues across the serpin superfamily (∼96% conservation; Irving, Pike, Lesk, & Whisstock, 2000). It terminates strand 2 of β-sheet C (s2C) and interacts most directly with other similarly highly conserved residues (Supp. Figure S6B). Structurally, “latch” interactions between these residues constitute a hydrophobic interaction network stabilizing the flank region of the β-barrel formed by β-sheets B and C (Fra et al., 2012), which may nucleate folding of the metastable serpin native state (Tsutsui, Cruz, & Wintrode, 2012). The lateral part of this acts as a steric regulator of serpin conformational change and so is known as the “gate” region. In addition to such latch mutation effects, the p.P313S variation likely has additional effects upon the s2C terminus that may also affect folding and/or conformational stability. The p.P279T mutation affects the C-terminal residue of strand 3 of β-sheet B (s3B). It also arises at a sharp boundary between conserved hydrophobic and polar regions and so may make this less biochemically distinct. The p.G282R variation replaces a small, flexible uncharged residue with a bulky, polar residue within the turn between strand s3B and helix G. This amino acid substitution changes a DEGK sequence to DERK where two acidic residues are immediately followed by two basic residues. Its effects may therefore be mediated by abnormal interactions between these large charged residues and close structural elements (s3B and helix G). The DEGK amino acid stretch has low conservation among α1AT orthologs, likely explaining why the p.G282R mutation seemed an outlier with greater observed severity relative to its low REVEL score when compared with the correlations observed for other variants.
The REVEL algorithm and the clustering analysis therefore seem promising for identifying novel variants of interest for study in vitro and have potential for clinical utility. Caveats remain. Reduced secretion observed in a cell model cannot be directly translated into clinical significance and disease risk. Similarly, defining thresholds of polymer accumulation in cell models to stratify clinically significant risk of liver disease may be challenging. Interestingly, studies on Z variant homozygotes have indicated that genetic modifiers affect disease penetrance (Joly et al., 2017). Though the nature of these modifiers remains to be fully elucidated, it is possible that they might modify the intracellular handling of different SERPINA1 variants to different degrees. Thus, it remains challenging to translate predictions from algorithms like those assessed in this study into a stratified prediction of the risk of lung and/or liver disease.
Overall, the work presented here provides a detailed picture of genetic variability in SERPINA1 and represents a real-world example of the potential and challenges of using genomic data and computational approaches to evaluate the functional and clinical impact of new variants identified in genetic disorders.
ACKNOWLEDGMENTS
The authors thank Marco Lancini, Giovanni Sgrò, and Leonardo Lanfranchi (University of Brescia) for technical assistance.
CONFLICT OF INTEREST
The authors have nothing to declare.