Volume 70, Issue 1 pp. 167-175
Research Article

Classification of conformational stability of protein mutants from 3D pseudo-folding graph representation of protein sequences using support vector machines

Michael Fernández

Corresponding Author

Michael Fernández

Molecular Modeling Group, Center for Biotechnological Studies, Faculty of Agronomy, University of Matanzas, 44740 Matanzas, Cuba

Molecular Modeling Group, Center for Biotechnological Studies, Faculty of Agronomy, University of Matanzas, 44740 Matanzas, Cuba===Search for more papers by this author
Julio Caballero

Julio Caballero

Molecular Modeling Group, Center for Biotechnological Studies, Faculty of Agronomy, University of Matanzas, 44740 Matanzas, Cuba

Centro de Bioinformática y Simulación Molecular, Universidad de Talca, 2 Norte 685, Casilla 721, Talca, Chile

Search for more papers by this author
Leyden Fernández

Leyden Fernández

Molecular Modeling Group, Center for Biotechnological Studies, Faculty of Agronomy, University of Matanzas, 44740 Matanzas, Cuba

Search for more papers by this author
Jose Ignacio Abreu

Jose Ignacio Abreu

Molecular Modeling Group, Center for Biotechnological Studies, Faculty of Agronomy, University of Matanzas, 44740 Matanzas, Cuba

Artificial Intelligence Lab, Faculty of Informatics, University of Matanzas, 44740 Matanzas, Cuba

Search for more papers by this author
Gianco Acosta

Gianco Acosta

National Bioinformatics Center, 10200, Havana, Cuba

Search for more papers by this author
First published: 24 July 2007
Citations: 20

Abstract

This work reports a novel 3D pseudo-folding graph representation of protein sequences for modeling purposes. Amino acids euclidean distances matrices (EDMs) encode primary structural information. Amino Acid Pseudo-Folding 3D Distances Count (AAp3DC) descriptors, calculated from the EDMs of a large data set of 1363 single protein mutants of 64 proteins, were tested for building a classifier for the signs of the change of thermal unfolding Gibbs free energy change (ΔΔG) upon single mutations. An optimum support vector machine (SVM) with a radial basis function (RBF) kernel well recognized stable and unstable mutants with accuracies over 70% in crossvalidation test. To the best of our knowledge, this result for stable mutant recognition is the highest ever reported for a sequence-based predictor with more than 1000 mutants. Furthermore, the model adequately classified mutations associated to diseases of human prion protein and human transthyretin. Proteins 2008. © 2007 Wiley-Liss, Inc.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.