Volume 61, Issue 3 pp. 481-491
Research Article

Prediction and evolutionary information analysis of protein solvent accessibility using multiple linear regression

Jung-Ying Wang

Jung-Ying Wang

Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan

Department of Multimedia and Game Science, Lunghwa University of Science and Technology, Taoyuan, Taiwan

Search for more papers by this author
Hahn-Ming Lee

Hahn-Ming Lee

Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan

Search for more papers by this author
Shandar Ahmad

Corresponding Author

Shandar Ahmad

Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka-ken, Japan

Department of Bioscience, Jamia Millia Islamia University, New Delhi, India

Kyushu Institute of Technology, Bioscience and Bioinformatics, Iizuka, Fukuoka-ken, 820-8502, Japan===Search for more papers by this author
First published: 16 September 2005
Citations: 28

Abstract

A multiple linear regression method was applied to predict real values of solvent accessibility from the sequence and evolutionary information. This method allowed us to obtain coefficients of regression and correlation between the occurrence of an amino-acid residue at a specific target and its sequence neighbor positions on the one hand, and the solvent accessibility of that residue on the other. Our linear regression model based on sequence information and evolutionary models was found to predict residue accessibility with 18.9% and 16.2% mean absolute error respectively, which is better than or comparable to the best available methods. A correlation matrix for several neighbor positions to examine the role of evolutionary information at these positions has been developed and analyzed. As expected, the effective frequency of hydrophobic residues at target positions shows a strong negative correlation with solvent accessibility, whereas the reverse is true for charged and polar residues. The correlation of solvent accessibility with effective frequencies at neighboring positions falls abruptly with distance from target residues. Longer protein chains have been found to be more accurately predicted than their smaller counterparts. Proteins 2005. © 2005 Wiley-Liss, Inc.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.