Volume 89, Issue 12 pp. 1734-1751
RESEARCH ARTICLE

Protein structure prediction using deep learning distance and hydrogen-bonding restraints in CASP14

Wei Zheng

Wei Zheng

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA

Search for more papers by this author
Yang Li

Yang Li

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA

School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China

Search for more papers by this author
Chengxin Zhang

Chengxin Zhang

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA

Search for more papers by this author
Xiaogen Zhou

Xiaogen Zhou

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA

Search for more papers by this author
Robin Pearce

Robin Pearce

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA

Search for more papers by this author
Eric W. Bell

Eric W. Bell

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA

Search for more papers by this author
Xiaoqiang Huang

Xiaoqiang Huang

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA

Search for more papers by this author
Yang Zhang

Corresponding Author

Yang Zhang

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA

Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, USA

Correspondence

Yang Zhang, Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA.

Email: [email protected]

Search for more papers by this author
First published: 30 July 2021
Citations: 28

Wei Zheng, Yang Li, and Chengxin Zhang contributed equally to this work.

Funding information: National Institute of Allergy and Infectious Diseases, Grant/Award Number: AI134678; National Institute of General Medical Sciences, Grant/Award Numbers: GM136422, S10OD026825; National Science Foundation, Grant/Award Numbers: DBI2030790, IIS1901191, MTM2025426, ACI-1548562

Abstract

In this article, we report 3D structure prediction results by two of our best server groups (“Zhang-Server” and “QUARK”) in CASP14. These two servers were built based on the D-I-TASSER and D-QUARK algorithms, which integrated four newly developed components into the classical protein folding pipelines, I-TASSER and QUARK, respectively. The new components include: (a) a new multiple sequence alignment (MSA) collection tool, DeepMSA2, which is extended from the DeepMSA program; (b) a contact-based domain boundary prediction algorithm, FUpred, to detect protein domain boundaries; (c) a residual convolutional neural network-based method, DeepPotential, to predict multiple spatial restraints by co-evolutionary features derived from the MSA; and (d) optimized spatial restraint energy potentials to guide the structure assembly simulations. For 37 FM targets, the average TM-scores of the first models produced by D-I-TASSER and D-QUARK were 96% and 112% higher than those constructed by I-TASSER and QUARK, respectively. The data analysis indicates noticeable improvements produced by each of the four new components, especially for the newly added spatial restraints from DeepPotential and the well-tuned force field that combines spatial restraints, threading templates, and generic knowledge-based potentials. However, challenges still exist in the current pipelines. These include difficulties in modeling multi-domain proteins due to low accuracy in inter-domain distance prediction and modeling protein domains from oligomer complexes, as the co-evolutionary analysis cannot distinguish inter-chain and intra-chain distances. Specifically tuning the deep learning-based predictors for multi-domain targets and protein complexes may be helpful to address these issues.

DATA AVAILABILITY STATEMENT

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.