Classification of conformational stability of protein mutants from 3D pseudo-folding graph representation of protein sequences using support vector machines
Corresponding Author
Michael Fernández
Molecular Modeling Group, Center for Biotechnological Studies, Faculty of Agronomy, University of Matanzas, 44740 Matanzas, Cuba
Molecular Modeling Group, Center for Biotechnological Studies, Faculty of Agronomy, University of Matanzas, 44740 Matanzas, Cuba===Search for more papers by this authorJulio Caballero
Molecular Modeling Group, Center for Biotechnological Studies, Faculty of Agronomy, University of Matanzas, 44740 Matanzas, Cuba
Centro de Bioinformática y Simulación Molecular, Universidad de Talca, 2 Norte 685, Casilla 721, Talca, Chile
Search for more papers by this authorLeyden Fernández
Molecular Modeling Group, Center for Biotechnological Studies, Faculty of Agronomy, University of Matanzas, 44740 Matanzas, Cuba
Search for more papers by this authorJose Ignacio Abreu
Molecular Modeling Group, Center for Biotechnological Studies, Faculty of Agronomy, University of Matanzas, 44740 Matanzas, Cuba
Artificial Intelligence Lab, Faculty of Informatics, University of Matanzas, 44740 Matanzas, Cuba
Search for more papers by this authorGianco Acosta
National Bioinformatics Center, 10200, Havana, Cuba
Search for more papers by this authorCorresponding Author
Michael Fernández
Molecular Modeling Group, Center for Biotechnological Studies, Faculty of Agronomy, University of Matanzas, 44740 Matanzas, Cuba
Molecular Modeling Group, Center for Biotechnological Studies, Faculty of Agronomy, University of Matanzas, 44740 Matanzas, Cuba===Search for more papers by this authorJulio Caballero
Molecular Modeling Group, Center for Biotechnological Studies, Faculty of Agronomy, University of Matanzas, 44740 Matanzas, Cuba
Centro de Bioinformática y Simulación Molecular, Universidad de Talca, 2 Norte 685, Casilla 721, Talca, Chile
Search for more papers by this authorLeyden Fernández
Molecular Modeling Group, Center for Biotechnological Studies, Faculty of Agronomy, University of Matanzas, 44740 Matanzas, Cuba
Search for more papers by this authorJose Ignacio Abreu
Molecular Modeling Group, Center for Biotechnological Studies, Faculty of Agronomy, University of Matanzas, 44740 Matanzas, Cuba
Artificial Intelligence Lab, Faculty of Informatics, University of Matanzas, 44740 Matanzas, Cuba
Search for more papers by this authorGianco Acosta
National Bioinformatics Center, 10200, Havana, Cuba
Search for more papers by this authorAbstract
This work reports a novel 3D pseudo-folding graph representation of protein sequences for modeling purposes. Amino acids euclidean distances matrices (EDMs) encode primary structural information. Amino Acid Pseudo-Folding 3D Distances Count (AAp3DC) descriptors, calculated from the EDMs of a large data set of 1363 single protein mutants of 64 proteins, were tested for building a classifier for the signs of the change of thermal unfolding Gibbs free energy change (ΔΔG) upon single mutations. An optimum support vector machine (SVM) with a radial basis function (RBF) kernel well recognized stable and unstable mutants with accuracies over 70% in crossvalidation test. To the best of our knowledge, this result for stable mutant recognition is the highest ever reported for a sequence-based predictor with more than 1000 mutants. Furthermore, the model adequately classified mutations associated to diseases of human prion protein and human transthyretin. Proteins 2008. © 2007 Wiley-Liss, Inc.
Supporting Information
This article contains supplementary material available via the Internet at http://www.interscience.wiley.com/jpages/0887-3585/suppmat
Filename | Description |
---|---|
jws-prot.21524.pdf32.4 KB | Supporting Information file jws-prot.21524.pdf |
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
REFERENCES
- 1 Saven J. Combinatorial protein design. Curr Opin Struct Biol 2002; 12: 453–458.
- 2 Mendes J, Guerois R, Serrano L. Energy estimation in protein design. Curr Opin Struct Biol 2002; 12: 441–446.
- 3 Bolon DN, Marcus JS, Ross SA, Mayo SL. Prudent modeling of core polar residues in computational protein design. J Mol Biol 2003; 329: 611–622.
- 4 Looger LL, Dwyer MA, Smith JJ, Helling HW. Computational design of receptor and sensor proteins with novel functions. Nature 2003; 423: 185–190.
- 5 Dang LX, Merz KM, Kollman PA. Free-energy calculations on protein stability: thr-1573val-157 mutation of t4 lysozyme. J Am Chem Soc 1989; 111: 8505–8508.
- 6 Lazaridis T, Karplus M. Effective energy functions for protein structure prediction. Curr Opin Struct Biol 2000; 10: 139–145.
- 7 Lee C, Levitt M. Accurate prediction of the stability and activity effects of site-directed mutagenesis on a protein core. Nature 1991; 352: 448–451.
- 8 Lee C. Testing homology modeling on mutant proteins: predicting structural and thermodynamic effects in the Ala98-Val mutants of T4 lysozyme. Fold Des 1995; 1: 1–12.
- 9 Topham CM, Srinivasan N, Blundell TL. Prediction of the stability of protein mutants based on structural environment-dependent amino acid substitution and propensity tables. Protein Eng 1997; 10: 7–21.
- 10 Gilis D, Rooman M. Prediction of stability changes upon single site mutations using database-derived potentials. Theor Chem Acc 1999; 101: 46–50.
- 11 Lacroix E, Viguera AR, Serrano L. Elucidating the folding problem of alpha-helices: local motifs, long-range electrostatics, ionic-strength dependence and prediction of NMR parameters. J Mol Biol 1998; 284: 173–191.
- 12
Munoz V,
Serrano L.
Development of the multiple sequence approximation within the AGADIR model of alpha-helix formation: comparison with Zimm-Bragg and Lifson-Roig formalisms.
Biopolymers
1997;
41:
495–509.
10.1002/(SICI)1097-0282(19970415)41:5<495::AID-BIP2>3.0.CO;2-H CAS PubMed Web of Science® Google Scholar
- 13 Guerois R, Nielsen JE, Serrano L. Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol 2002; 320: 369–387.
- 14 Gromiha MM, Oobatake M, Kono H, Uedaira H, Sarai A. Relationship between amino acid properties and protein stability: buried mutations. J Protein Chem 1999; 18: 565–578.
- 15 Gromiha MM, Oobatake M, Kono H, Uedaira H, Sarai A. Role of structural and sequence information in the prediction of protein stability changes: comparison between buried and partially buried mutations. Protein Eng 1999; 12: 549–555.
- 16 Gromiha MM, Oobatake M, Kono H, Uedaira H, Sarai A. Importance of surrounding residues for protein stability of partially buried mutations. J Biomol Struct Dyn 2000; 18: 1–16.
- 17 Zhou H, Zhou Y. Stability scale and atomic solvation parameters extracted from 1023 mutation experiment. Proteins 2002; 49: 483–492.
- 18 Capriotti E, Fariselli P, Casadio R. A neural-network-based method for predicting protein stability changes upon single mutations. Bioinformatics 2004; 20: 63–68.
- 19 Capriotti E, Fariselli P, Calabrese R, Casadio R. Prediction of protein stability changes from sequences using support vector machines. Bioinformatics 2005; 21: 54–58.
- 20 Capriotti E, Fariselli P, Casadio R. I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure. Nucleic Acids Res 2005; 33: 306–310.
- 21 Ramos de Armas R, González-Díaz H, Molina R, Uriarte E. Markovian backbone negentropies: molecular descriptors for protein research. i. predicting protein stability in arc repressor mutants. Proteins 2004; 56: 715–723.
- 22 Marrero-Ponce Y, Medina-Marrero R, Castillo-Garit JA, Romero-Zaldivar V, Torrens F, Castro EA. Protein linear indices of the ‘macromolecular pseudograph α-carbon atom adjacency matrix’ in bioinformatics. Part 1: prediction of protein stability effects of a complete set of alanine substitutions in arc represor. Bioorg Med Chem 2005; 13: 3003–3015.
- 23 Caballero J, Fernández L, Abreu JI, Fernández M. Amino acid sequence autocorrelation vectors and ensembles of bayesian-regularized genetic neural networks for prediction of conformational stability of human lysozyme mutants. J Chem Inf Model 2006; 46: 1255–1268.
- 24 Fernández L, Caballero J, Abreu JI, Fernández, M. Amino acid sequence autocorrelation vectors and bayesian-regularized genetic neural networks for modeling protein conformational stability: gene V protein mutants. Proteins 2007; 67: 834–852.
- 25 González-Díaz H, Molina R, Uriarte E. Recognition of stable protein mutants with 3D stochastic average electrostatic potentials. FEBS Lett 2005; 579: 4297–4301.
- 26 Bava KA, Gromiha MM, Uedaira H, Kitajima K, Sarai A. ProTherm version 4.0: thermodynamic database for proteins and mutants. Nucleic Acids Res 2004; 32: 120–121.http://gibk26.bse.kyutech.ac.jp/jouhou/protherm/protherm.html.
- 27 Randić M, Butina D, Zupan J. Novel 2D graphical representation of proteins. Chem Phys Lett 2006; 419: 528–532.
- 28 Agüero-Chapin G, González-Díaz H, Molina R, Varona-Santos J, Uriarte E, González-Díaz Y. Novel 2D maps and coupling numbers for protein sequences. The first QSAR study of polygalacturonases; isolation and prediction of a novel sequence from Psidium guajava L. FEBS Lett 2006; 580: 723–730.
- 29 Randić M, Krilov G. Characterization of 3-D sequences of proteins. Chem Phys Lett 1997; 272: 115–119.
- 30 Bai F, Wang T. On graphical and numerical representation of protein sequences. J Biomol Struct Dyn 2006; 23: 537–545.
- 31 Caballero J, Fernández L, Garriga M, Abreu JI, Collina S, Fernández M. Proteometric study of ghrelin receptor function variations upon mutations using amino acid sequence autocorrelation vectors and genetic algorithm-based least square support vector machines. J Mol Graph Model 2006 doi:10.1016/j.jmgm.2006.11.002.
- 32 Jeffrey HI. Chaos game representation of gene structure. Nucleic Acid Res 1990; 18: 2163–2170.
- 33MATLAB 7.0. program, available from The Mathworks Inc., Natick, MA. http://www.mathworks.com.
- 34 Cortes C, Vapnik V. Support-vector networks. Mach Learn 1995; 20: 273–297.
- 35 Burges CJC. A tutorial on support vector machines for pattern recognition. Data Min Knowledge Discov 1998; 2: 1–47.
- 36 Vapnik V. Statistical learning theory. New York: Wiley; 1998.
- 37 Chih-Chung C, Chih-Jen L. LIBSVM: a library for support vector machines. 2001; Software available at http://www.csie.ntu.edu.tw/∼cjlin/libsvm.
- 38 Sandberg WS, Terwilliger TC. Energetics of repacking a protein interior. Proc Natl Acad Sci USA 1991; 88: 1706–1710.
- 39 Sandberg WS, Terwilliger TC. Engineering multiple properties of a protein by combinatorial mutagenesis. Proc Natl Acad Sci USA 1993; 90: 8367–8371.
- 40 Apetri AC, Surewicz K, Surewicz WK. The effect of disease-associated mutations on the folding pathway of human prion protein. J Biol Chem 2004; 279: 18008–18014.
- 41 Liemann S, Glockshuber R. Influence of amino acid substitutions related to inherited human prion diseases on the thermodynamic stability of the cellular prion protein. Biochemistry 1999; 38: 3258–3267.
- 42 Calzolai L, Lysek DA, Güntert P, Schroetter C, Riek R, Zahn R, Wüthrich K. NMR structures of three single-residue variants of thehuman prion protein. Proc Natl Acad Sci USA 2000; 97: 8340–8345.
- 43 Shnyrova VL, Villar E, Zhadana GG, Sanchez-Ruiz JM, Quintas A, Saraiva MJM, Brito RMM. Comparative calorimetric study of non-amyloidogenic and amyloidogenic variants of the homotetrameric protein transthyretin. Biophys Chem 2000; 88: 61–67.