Volume 79, Issue 6 pp. 1952-1963
Research Article

Structure-based identification of catalytic residues

Ran Yahalom

Ran Yahalom

Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel

Search for more papers by this author
Dan Reshef

Dan Reshef

Department of Life Sciences, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel

Search for more papers by this author
Ayana Wiener

Ayana Wiener

Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel

Search for more papers by this author
Sagiv Frankel

Sagiv Frankel

Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel

Search for more papers by this author
Nir Kalisman

Nir Kalisman

Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel

Search for more papers by this author
Boaz Lerner

Boaz Lerner

Department of Industrial Engineering and Management, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel

Search for more papers by this author
Chen Keasar

Corresponding Author

Chen Keasar

Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel

Department of Life Sciences, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel

Departments of Life Sciences and Computer Science, Ben-Gurion University of the Negev, P.O. Box 653, Beer-Sheva 84105, Israel===Search for more papers by this author
First published: 01 March 2011
Citations: 10

Author contributions: R.Y. developed the specialized SVM, did all the performance tests, and drafted the manuscript. D.R. characterized the new structural features, demonstrated their utility, and helped to draft the manuscript. A.W. and S.F. built the web server. N.K. developed the energy functions that underlie the structural features used in this project. B.L. helped in the design of this study, supervised its machine learning aspects, and took active part in the manuscript writing. C.K. conceived this study, coordinated it, and took an active part in the manuscript writing. All authors read and approved the final manuscript.

Abstract

The identification of catalytic residues is an essential step in functional characterization of enzymes. We present a purely structural approach to this problem, which is motivated by the difficulty of evolution-based methods to annotate structural genomics targets that have few or no homologs in the databases. Our approach combines a state-of-the-art support vector machine (SVM) classifier with novel structural features that augment structural clues by spatial averaging and Z scoring. Special attention is paid to the class imbalance problem that stems from the overwhelming number of non-catalytic residues in enzymes compared to catalytic residues. This problem is tackled by: (1) optimizing the classifier to maximize a performance criterion that considers both Type I and Type II errors in the classification of catalytic and non-catalytic residues; (2) under-sampling non-catalytic residues before SVM training; and (3) during SVM training, penalizing errors in learning catalytic residues more than errors in learning non-catalytic residues. Tested on four enzyme datasets, one specifically designed by us to mimic the structural genomics scenario and three previously evaluated datasets, our structure-based classifier is never inferior to similar structure-based classifiers and comparable to classifiers that use both structural and evolutionary features. In addition to the evaluation of the performance of catalytic residue identification, we also present detailed case studies on three proteins. This analysis suggests that many false positive predictions may correspond to binding sites and other functional residues. A web server that implements the method, our own-designed database, and the source code of the programs are publicly available at http://www.cs.bgu.ac.il/∼meshi/functionPrediction. Proteins 2011; © 2011 Wiley-Liss, Inc.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.