Fast Fourier Transform-based Support Vector Machine for Prediction of G-protein Coupled Receptor Subfamilies
Yan-Zhi GUO
College of Chemistry, Sichuan University, Chengdu 610064, China
Search for more papers by this authorCorresponding Author
Meng-Long LI
College of Chemistry, Sichuan University, Chengdu 610064, China
*Tel, 86-28-89005151; Fax, 86-28-85412356; E-mail, [email protected]Search for more papers by this authorKe-Long WANG
College of Chemistry, Sichuan University, Chengdu 610064, China
Search for more papers by this authorZhi-Ning WEN
College of Chemistry, Sichuan University, Chengdu 610064, China
Search for more papers by this authorMin-Chun LU
College of Chemistry, Sichuan University, Chengdu 610064, China
Search for more papers by this authorLi-Xia LIU
College of Chemistry, Sichuan University, Chengdu 610064, China
Search for more papers by this authorLin JIANG
College of Chemistry, Sichuan University, Chengdu 610064, China
Search for more papers by this authorYan-Zhi GUO
College of Chemistry, Sichuan University, Chengdu 610064, China
Search for more papers by this authorCorresponding Author
Meng-Long LI
College of Chemistry, Sichuan University, Chengdu 610064, China
*Tel, 86-28-89005151; Fax, 86-28-85412356; E-mail, [email protected]Search for more papers by this authorKe-Long WANG
College of Chemistry, Sichuan University, Chengdu 610064, China
Search for more papers by this authorZhi-Ning WEN
College of Chemistry, Sichuan University, Chengdu 610064, China
Search for more papers by this authorMin-Chun LU
College of Chemistry, Sichuan University, Chengdu 610064, China
Search for more papers by this authorLi-Xia LIU
College of Chemistry, Sichuan University, Chengdu 610064, China
Search for more papers by this authorLin JIANG
College of Chemistry, Sichuan University, Chengdu 610064, China
Search for more papers by this authorAbstract
Abstract Although the sequence information on G-protein coupled receptors (GPCRs) continues to grow, many GPCRs remain orphaned (i.e. ligand specificity unknown) or poorly characterized with little structural information available, so an automated and reliable method is badly needed to facilitate the identification of novel receptors. In this study, a method of fast Fourier transform-based support vector machine has been developed for predicting GPCR subfamilies according to protein's hydrophobicity. In classifying Class B, C, D and F subfamilies, the method achieved an overall Matthew's correlation coefficient and accuracy of 0.95 and 93.3%, respectively, when evaluated using the jackknife test. The method achieved an accuracy of 100% on the Class B independent dataset. The results show that this method can classify GPCR subfamilies as well as their functional classification with high accuracy. A web server implementing the prediction is available at http://chem.scu.edu.cn/blast/Pred-GPCR.
Edited by Lu-Hua LAI
References
- 1 Attwood TK, Croning MDR, Gaulton A. Deriving structural and functional insights from a ligand-based hierarchical classification of G proteincoupled receptors. Protein Eng 2002, 15: 7–12.
- 2 Hebert TE, Bouvier M. Structural and functional aspects of G proteincoupled receptor oligomerization. Biochem Cell Biol 1998, 76: 1–11.
- 3 Bhasin M, Raghava GPS. GPCRpred: An SVM-based method for prediction of families and subfamilies of G-protein coupled receptors. Nucleic Acids Res 2004, 32: W383–W389.
- 4 Yin YB, Luo JC, Jiang Y. Advances in G-protein coupled receptor research and related bioinformatics study. Chin Sci Bull 2003, 48: 511–516.
- 5 Huang XQ, Jiang HL, Luo XM, Chen KX, Zhu YC, Ji RY, Cao Y. Comparative molecular modeling on 3D-structure of opioid receptor-like 1 receptor. Acta Pharmacol Sin 2000, 21: 529–535.
- 6 Takeshi H, Wataru N, Takeshi K, Norihisa F. Construction of hypothetical three-dimensional structure of P2Y1 receptor based on Fourier transform analysis. J Protein Chem 2002, 21: 537–545.
- 7 Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol 1990, 215: 403–410.
- 8 Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res 1997, 25: 3389–3402.
- 9 Pearson WR. Flexible sequence similarity searching with the FASTA3 program package. Methods Mol Biol 2000, 132: 185–219.
- 10 Lapinsh M, Gutcaits A, Prusis P, Post C, Lundstedt T, Wikberg JES. Classification of G-protein coupled receptors by alignment-independent extraction of principal chemical properties of primary amino acid sequences. Protein Sci 2002, 11: 795–805.
- 11 Sadowski MI, Parish JH. Automated generation and refinement of protein signatures: case study with G-protein coupled receptors. Bioinformatics 2003, 19: 727–734.
- 12 Bateman A, Birney E, Cerruti L, Durbin R, Etwiller L, Eddy SR, Griffiths, Jones S et al. The Pfam protein families database. Nucleic Acids Res 2002, 30: 276–280.
- 13 Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A et al. The Pfam protein families database. Nucleic Acids Res 2004, 32: D138–D141.
- 14 Papasaikas PK, Bagos PG, Litou ZI, Promponas VJ, Hamodrakas SJ. PRED-GPCR: GPCR recognition and family classification server. Nucleic Acids Res 2004, 32: W380–W382.
- 15 Papasaikas PK, Bagos PG, Litou ZI, Hamodrakas SJ. A novel method for GPCR recognition and family classification from sequence alone using signatures derived from profile hidden Markov models. SAR QSAR Environ Res 2003, 14: 413–420.
- 16 Karchin R, Karplus K, Haussler D. Classifying G-protein coupled receptors with support vector machines. Bioinformatics 2002, 18: 147–159.
- 17 Huang Y, Cai J, Ji L, Li YD. Classifying G-protein coupled receptors with bagging classification tree. Comput Biol Chem 2004, 28: 275–280.
- 18 Inoue Y, Ikeda M, Shimizu T. Proteome-wide functional classification and identification of mammalian-type GPCRs by binary topology pattern. Comput Biol Chem 2004, 28: 39–49.
- 19 Horn F, Vriend G, Cohen FE. Collecting and harvesting biological data: The GPCRDB and NucleaRDB information systems. Nucleic Acids Res 2001, 29: 346–349.
- 20 Grantham R. Amino acid difference formula to help explain protein evolution. Science 1974, 185: 862–864.
- 21 Cosic I. Macromolecular bioactivity: Is it resonant interaction between macromolecules? Theory and applications. IEEE Trans Biomed Eng 1994, 41: 1101–1114.
- 22 Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a protein. J Mol Biol 1982, 157: 105–132.
- 23 Mandell AJ, Selz KA, Shlesinger MF. Wavelet transformation of protein hydrophobicity sequences suggests their memberships in structural families. Physica A 1997, 244: 254–262.
- 24 Fauchére J, Pliška V. Hydrophobic parameters of amino-acid side chains from the partitioning of n-acetyl-amino-acid amides. Eur J Med Chem Chim Ther 1983, 18: 369–375.
- 25 Trad CH, Fang Q, Cosic I. Protein sequence comparison based on the wavelet transform approach. Protein Eng 2002, 15: 193–203.
- 26 Katoh K, Misawa K, Kuma K, Miyata, T. MAFFT: A novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 2002, 30: 3059–3066.
- 27 Haykin S. Support vector machines. In: Haykin S ed. Neural Networks: A Comprehensive Foundation. 2nd ed. New York : Prentice Hall Inc. 1999.
- 28 Vapnik V. Support Vector Machines of Pattern Recognition. In: V Vapnik ed. Statistical Learning Theory. Peking : Publishing House of Electronics Industry 2004.
- 29 Hua S, Sun Z. Support vector machine approach for protein subcellular localization prediction. Bioinformatics 2001, 17: 721–728.
- 30 Chou KC, Elrod DW. Bioinformatical analysis of G-protein-coupled receptors. J Proteome Res 2002, 1: 429–433.
- 31 Elrod DW, Chou KC. A study on the correlation of G-protein-coupled receptor types with amino acid composition. Protein Eng 2002, 15: 713–715.
- 32 Chou KC, Zhang CT. Prediction of protein structural classes. Crit Rev Biochem Mol 1995, 30: 275–349.
- 33 KV Mardia, JT Kent, JM Bibby eds. Multivariate Analysis. London : Academic Press 1979.
- 34 Matthews BW. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta 1975, 405: 442–451.
- 35 Bhasin M, Raghava GPS. Classification of nuclear receptors based on amino acid composition and dipeptide composition. J Biol Chem 2004, 279: 23262–23266.
- 36 Reczko M, Bohr H. The DEF data base of sequence based protein fold class predictions. Nucleic Acids Res 1994, 22: 3616–3619.