Proteins: Structure, Function, and Bioinformatics

Research Article

Prediction of RNA-binding residues in proteins from primary sequence using an enriched random forest model with a novel hybrid feature

Xin Ma

State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, People's Republic of China

Department of Elementary Courses, Golden Audit College, Nanjing Audit University, Nanjing 210029, People's Republic of China

Search for more papers by this author

Jing Guo,

Jing Guo

State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, People's Republic of China

Search for more papers by this author

Jiansheng Wu,

Jiansheng Wu

Department of Bioinformatics, School of Geography and Biological Information, Nanjing University of Posts and Telecommunications, Nanjing 210046, People's Republic of China

Search for more papers by this author

Hongde Liu,

Hongde Liu

State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, People's Republic of China

Search for more papers by this author

Jiafeng Yu,

Jiafeng Yu

State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, People's Republic of China

Search for more papers by this author

Jianming Xie,

Jianming Xie

State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, People's Republic of China

Search for more papers by this author

Xiao Sun,

Corresponding Author

Xiao Sun

[email protected]

State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, People's Republic of China

State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, People's Republic of China===Search for more papers by this author

Xin Ma,

Xin Ma

State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, People's Republic of China

Department of Elementary Courses, Golden Audit College, Nanjing Audit University, Nanjing 210029, People's Republic of China

Search for more papers by this author

Jing Guo,

Jing Guo

State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, People's Republic of China

Search for more papers by this author

Jiansheng Wu,

Jiansheng Wu

Department of Bioinformatics, School of Geography and Biological Information, Nanjing University of Posts and Telecommunications, Nanjing 210046, People's Republic of China

Search for more papers by this author

Hongde Liu,

Hongde Liu

State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, People's Republic of China

Search for more papers by this author

Jiafeng Yu,

Jiafeng Yu

State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, People's Republic of China

Search for more papers by this author

Jianming Xie,

Jianming Xie

State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, People's Republic of China

Search for more papers by this author

Xiao Sun,

Corresponding Author

Xiao Sun

[email protected]

State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, People's Republic of China

State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, People's Republic of China===Search for more papers by this author

First published: 06 December 2010

https://doi.org/10.1002/prot.22958

Citations: 57

Share a link

Email
Wechat
Bluesky

Abstract

The identification of RNA-binding residues in proteins is important in several areas such as protein function, posttranscriptional regulation and drug design. We have developed PRBR (Prediction of RNA Binding Residues), a novel method for identifying RNA-binding residues from amino acid sequences. Our method combines a hybrid feature with the enriched random forest (ERF) algorithm. The hybrid feature is composed of predicted secondary structure information and three novel features: evolutionary information combined with conservation information of the physicochemical properties of amino acids and the information about dependency of amino acids with regards to polarity-charge and hydrophobicity in the protein sequences. Our results demonstrate that the PRBR model achieves 0.5637 Matthew's correlation coefficient (MCC) and 88.63% overall accuracy (ACC) with 53.70% sensitivity (SE) and 96.97% specificity (SP). By comparing the performance of each feature we found that all three novel features contribute to the improved predictions. Area under the curve (AUC) statistics from receiver operating characteristic curve analysis was compared between PRBR model and other models. The results show that PRBR achieves the highest AUC value (0.8675) which represents that PRBR attains excellent performance on predicting the RNA-binding residues in proteins. The PRBR web-server implementation is freely available at http://www.cbi.seu.edu.cn/PRBR/. Proteins 2011; © 2011 Wiley-Liss, Inc.

Supporting Information

REFERENCES

1 Beaudoin ME,Poirel VJ,Krushel LA. Regulating amyloid precursor protein synthesis through an internal ribosomal entry site. Nucleic Acids Res 2008; 36: 6835–6847.
10.1093/nar/gkn792
CAS PubMed Web of Science® Google Scholar
2 Jacobs Anderson JS,Parker R. Computational identification of cis-acting elements affecting post-transcriptional control of gene expression in Saccharomyces cerevisiae. Nucleic Acids Res 2000; 28: 1604–1617.
10.1093/nar/28.7.1604
CAS PubMed Web of Science® Google Scholar
3 Newcomb LL,Kuo RL,Ye Q,Jiang Y,Tao YJ,Krug RM. Interaction of the influenza a virus nucleocapsid protein with the viral RNA polymerase potentiates unprimed viral RNA replication. J Virol 2009; 83: 29–36.
10.1128/JVI.02293-07
CAS PubMed Web of Science® Google Scholar
4 Yu Z,Sanchez-Velar N,Catrina IE,Kittler EL,Udofia EB,Zapp ML. The cellular HIV-1 Rev cofactor hRIP is required for viral replication. Proc Natl Acad Sci USA 2005; 102: 4027–4032.
10.1073/pnas.0408889102
CAS PubMed Web of Science® Google Scholar
5 Abdelmohsen K,Kuwano Y,Kim HH,Gorospe M. Posttranscriptional gene regulation by RNA-binding proteins during oxidative stress: implications for cellular senescence. Biol Chem 2008; 389: 243–255.
10.1515/BC.2008.022
CAS PubMed Web of Science® Google Scholar
6 Saunus JM,French JD,Edwards SL,Beveridge DJ,Hatchell EC,Wagner SA,Stein SR,Davidson A,Simpson KJ,Francis GD,Leedman PJ,Brown MA. Posttranscriptional regulation of the breast cancer susceptibility gene BRCA1 by the RNA binding protein HuR. Cancer Res 2008; 68: 9469–9478.
10.1158/0008-5472.CAN-08-1159
CAS PubMed Web of Science® Google Scholar
7 Hermann T,Westhof E. RNA as a drug target: chemical, modelling, and evolutionary tools. Curr Opin Biotechnol 1998; 9: 66–73.
10.1016/S0958-1669(98)80086-4
CAS PubMed Web of Science® Google Scholar
8 Huang L,Massa L,Karle J. Drug target interaction energies by the kernel energy method in aminoglycoside drugs and ribosomal A site RNA targets. Proc Natl Acad Sci USA 2007; 104: 4261–4266.
10.1073/pnas.0610533104
CAS PubMed Web of Science® Google Scholar
9 Mangasarian A. Alternative RNA splicing and drug target identification. IDrugs 2005; 8: 725–729.
CAS PubMed Web of Science® Google Scholar
10 Li Q,Cao Z,Liu H. Improve the prediction of RNA-binding residues using structural neighbours. Protein Pept Lett 2010; 17(3): 287–296
10.2174/092986610790780279
PubMed Web of Science® Google Scholar
11 Wang Y,Xue Z,Shen G,Xu J. PRINTR: prediction of RNA binding sites in proteins using SVM and profiles. Amino Acids 2008; 35: 295–302.
10.1007/s00726-007-0634-9
CAS PubMed Web of Science® Google Scholar
12 Jeong E,Chung IF,Miyano S. A neural network method for identification of RNA-interacting residues in protein. Genome Inform 2004; 15: 105–116.
CAS PubMed Web of Science® Google Scholar
13 Maetschke SR,Yuan Z. Exploiting structural and topological information to improve prediction of RNA-protein binding sites. BMC Bioinformatics 2009; 10: 341.
10.1186/1471-2105-10-341
CAS PubMed Web of Science® Google Scholar
14 Terribilini M,Lee JH,Yan C,Jernigan RL,Honavar V,Dobbs D. Prediction of RNA binding sites in proteins from amino acid sequence. RNA 2006; 12: 1450–1462.
10.1261/rna.2197306
CAS PubMed Web of Science® Google Scholar
15 Cheng CW,Su EC,Hwang JK,Sung TY,Hsu WL. Predicting RNA-binding sites of proteins using support vector machines and evolutionary information. BMC Bioinformatics 2008; 9( Suppl 12): S6.
10.1186/1471-2105-9-S12-S6
CAS PubMed Web of Science® Google Scholar
16 Kumar M,Gromiha MM,Raghava GP. Prediction of RNA binding sites in a protein using SVM and PSSM profile. Proteins 2008; 71: 189–194.
10.1002/prot.21677
CAS PubMed Web of Science® Google Scholar
17 Wang L,Brown SJ. Prediction of RNA-binding residues in protein sequences using support vector machines. Conf Proc IEEE Eng Med Biol Soc 2006; 1: 5830–5833.
10.1109/IEMBS.2006.260025
PubMed Google Scholar
18 Tong J,Jiang P,Lu ZH. RISP: a web-based server for prediction of RNA-binding sites in proteins. Comput Methods Programs Biomed 2008; 90: 148–153.
10.1016/j.cmpb.2007.12.003
PubMed Web of Science® Google Scholar
19 Wang L,Brown SJ. BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences. Nucleic Acids Res 2006; 34(Web Server issue): W243–W248.
10.1093/nar/gkl298
CAS PubMed Web of Science® Google Scholar
20 Vapnik VN. Statisical learning theory. New York: Wiley; 1998.
Google Scholar
21 Breiman L. Random forests. Machine Learn 2001; 45: 5–32.
10.1023/A:1010933404324
Web of Science® Google Scholar
22 Amaratunga D,Cabrera J,Lee YS. Enriched random forests. Bioinformatics 2008; 24: 2010–2014.
10.1093/bioinformatics/btn356
CAS PubMed Web of Science® Google Scholar
23 Berman HM,Westbrook J,Feng Z,Gilliland G,Bhat TN,Weissig H,Shindyalov IN,Bourne PE. The protein data bank. Nucleic Acids Res 2000; 28: 235–242.
10.1093/nar/28.1.235
CAS PubMed Web of Science® Google Scholar
24 Chen YC,Lim C. Predicting RNA-binding sites from the protein structure based on electrostatics, evolution and geometry. Nucleic Acids Res 2008; 36: e29.
10.1093/nar/gkn008
CAS PubMed Web of Science® Google Scholar
25 Terribilini M,Sander JD,Lee JH,Zaback P,Jernigan RL,Honavar V,Dobbs D. RNABindR: a server for analyzing and predicting RNA-binding sites in proteins. Nucleic Acids Res 2007; 35(Web Server issue): W578–W584.
10.1093/nar/gkm294
PubMed Web of Science® Google Scholar
26 Sikic M,Tomic S,Vlahovicek K. Prediction of protein-protein interaction sites in sequences and 3D structures by random forests. PLoS Comput Biol 2009; 5: e1000278.
10.1371/journal.pcbi.1000278
CAS PubMed Web of Science® Google Scholar
27 Han P,Zhang X,Norton RS,Feng ZP. Large-scale prediction of long disordered regions in proteins using random forests. BMC Bioinformatics 2009; 10: 8.
10.1186/1471-2105-10-8
CAS PubMed Web of Science® Google Scholar
28 Chen XW,Liu M. Prediction of protein-protein interactions using random decision forest framework. Bioinformatics 2005; 21: 4394–4400.
10.1093/bioinformatics/bti721
CAS PubMed Web of Science® Google Scholar
29 Storey JD,Tibshirani R. Statistical significance for genomewide studies. Proc Natl Acad Sci USA 2003; 100: 9440–9445.
10.1073/pnas.1530509100
CAS PubMed Web of Science® Google Scholar
30 Altschul SF,Madden TL,Schaffer AA,Zhang J,Zhang Z,Miller W,Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997; 25: 3389–3402.
10.1093/nar/25.17.3389
CAS PubMed Web of Science® Google Scholar
31 Zhang Q,Yoon S,Welsh WJ. Improved method for predicting beta-turn using support vector machine. Bioinformatics 2005; 21: 2370–2374.
10.1093/bioinformatics/bti358
CAS PubMed Web of Science® Google Scholar
32 Wang J. Biochemistry Higher Education (in chinese) 2002.
Google Scholar
33 Bonchev D. The overall Wiener index—a new tool for characterization of molecular topology. J Chem Inf Comput Sci 2001; 41: 582–592.
10.1021/ci000104t
CAS PubMed Web of Science® Google Scholar
34 Rose M. Re: Balaban et al.–Low volume bowel preparation for colonoscopy: randomized endoscopist-blinded trial of liquid sodium phosphate versus tablet sodium phosphate. Am J Gastroenterol 2003; 98: 2328–2329; author reply 2329.
10.1111/j.1572-0241.2003.07738.x
PubMed Web of Science® Google Scholar
35 Buntrock RE. ChemOffice Ultra 7.0. J Chem Inf Comput Sci 2002; 42: 1505–1506.
10.1021/ci025575p
CAS PubMed Web of Science® Google Scholar
36 Xin Ma J-SW,Hong-De Liu,Xi-Nan Yang,Jian-Ming Xie,Xiao Sun. A SVM-based approach for predicting DNA-binding residues in proteins from amino acid sequences. Int Joint Conf Bioinform Syst Biol Intelligent Comp 2009: 225–229.
Google Scholar
37 Frishman D,Argos P. Seventy-five percent accuracy in protein secondary structure prediction. Proteins 1997; 27: 329–335.
10.1002/(SICI)1097-0134(199703)27:3<329::AID-PROT1>3.0.CO;2-8
CAS PubMed Web of Science® Google Scholar
38 Wu J,Liu H,Duan X,Ding Y,Wu H,Bai Y,Sun X. Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature. Bioinformatics 2009; 25: 30–35.
10.1093/bioinformatics/btn583
CAS PubMed Web of Science® Google Scholar
39 Cohen G,Hilario M,Sax H,Hugonnet S,Geissbuhler A. Learning from imbalanced data in surveillance of nosocomial infection. Artif Intell Med 2006; 37: 7–18.
10.1016/j.artmed.2005.03.002
PubMed Web of Science® Google Scholar
40 Matthews BW. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta 1975; 405: 442–451.
10.1016/0005-2795(75)90109-9
CAS PubMed Web of Science® Google Scholar
41 Egan JP. Signal detection theory and ROC-analysis. New York: Academic Press; 1975.
Google Scholar
42 Ding Y,Cai Y,Zhang G,Xu W. The influence of dipeptide composition on protein thermostability. FEBS Lett 2004; 569(1-3): 284–288.
10.1016/j.febslet.2004.06.009
CAS PubMed Web of Science® Google Scholar
43 Gromiha MM,Ahmad S,Suwa M. Application of residue distribution along the sequence for discriminating outer membrane proteins. Comput Biol Chem 2005; 29: 135–142.
10.1016/j.compbiolchem.2005.02.006
CAS PubMed Web of Science® Google Scholar
44 Park KJ,Gromiha MM,Horton P,Suwa M. Discrimination of outer membrane proteins using support vector machines. Bioinformatics 2005; 21: 4223–4229.
10.1093/bioinformatics/bti697
CAS PubMed Web of Science® Google Scholar
45 DeLano WL. The PyMOL molecular graphics system. San Carlos, CA: DeLano Scientific; 2002.
Google Scholar
46 Dimitriadou Eea. e1071: Misc functions of the department of statistics (e1071), TUWien. R package, Version 1.5-16. Available at: http://cran.r-project.org/. 2007.
Google Scholar
47 Ahmad S,Sarai A. PSSM-based prediction of DNA binding sites in proteins. BMC Bioinform 2005; 6: 33.
10.1186/1471-2105-6-33
CAS PubMed Web of Science® Google Scholar

Citing Literature

Volume79, Issue4

April 2011

Pages 1230-1239

Prediction of RNA-binding residues in proteins from primary sequence using an enriched random forest model with a novel hybrid feature

Abstract

Supporting Information

REFERENCES

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Prediction of RNA-binding residues in proteins from primary sequence using an enriched random forest model with a novel hybrid feature

Abstract

Supporting Information

REFERENCES

Citing Literature

References

Related

Information