REVIEW ARTICLE

Open Access

Deep learning methods for protein structure prediction

Yiming Qin

Faculty of Medicine, Macau University of Science and Technology, Avenida Wai Long Taipa, Macau, China

Contribution: Conceptualization (equal), Data curation (equal), Investigation (equal), Visualization (equal), Writing - original draft (equal), Writing - review & editing (equal)

Search for more papers by this author

Zihan Chen,

Zihan Chen

Faculty of Medicine, Macau University of Science and Technology, Avenida Wai Long Taipa, Macau, China

Contribution: Writing - review & editing (equal)

Search for more papers by this author

Ye Peng,

Ye Peng

Faculty of Medicine, Macau University of Science and Technology, Avenida Wai Long Taipa, Macau, China

Contribution: Writing - review & editing (equal)

Search for more papers by this author

Ying Xiao,

Ying Xiao

Faculty of Medicine, Macau University of Science and Technology, Avenida Wai Long Taipa, Macau, China

Contribution: Supervision (equal), Writing - review & editing (equal)

Search for more papers by this author

Tian Zhong,

Corresponding Author

Tian Zhong

[email protected]

Faculty of Medicine, Macau University of Science and Technology, Avenida Wai Long Taipa, Macau, China

Correspondence Tian Zhong and Xi Yu, Nutrition and Food Science Program, Faculty of Medicine, Macau University of Science and Technology, Avenida Wai Long Taipa, Macau, China.

Email: [email protected] and [email protected]

Contribution: Resources (equal), Supervision (equal), Writing - review & editing (equal)

Search for more papers by this author

Xi Yu,

Corresponding Author

Xi Yu

[email protected]

orcid.org/0000-0001-5726-2392

Faculty of Medicine, Macau University of Science and Technology, Avenida Wai Long Taipa, Macau, China

Correspondence Tian Zhong and Xi Yu, Nutrition and Food Science Program, Faculty of Medicine, Macau University of Science and Technology, Avenida Wai Long Taipa, Macau, China.

Email: [email protected] and [email protected]

Contribution: Conceptualization (equal), Funding acquisition (equal), Project administration (equal), Supervision (equal)

Search for more papers by this author

Yiming Qin,

Yiming Qin

Faculty of Medicine, Macau University of Science and Technology, Avenida Wai Long Taipa, Macau, China

Contribution: Conceptualization (equal), Data curation (equal), Investigation (equal), Visualization (equal), Writing - original draft (equal), Writing - review & editing (equal)

Search for more papers by this author

Zihan Chen,

Zihan Chen

Faculty of Medicine, Macau University of Science and Technology, Avenida Wai Long Taipa, Macau, China

Contribution: Writing - review & editing (equal)

Search for more papers by this author

Ye Peng,

Ye Peng

Faculty of Medicine, Macau University of Science and Technology, Avenida Wai Long Taipa, Macau, China

Contribution: Writing - review & editing (equal)

Search for more papers by this author

Ying Xiao,

Ying Xiao

Faculty of Medicine, Macau University of Science and Technology, Avenida Wai Long Taipa, Macau, China

Contribution: Supervision (equal), Writing - review & editing (equal)

Search for more papers by this author

Tian Zhong,

Corresponding Author

Tian Zhong

[email protected]

Faculty of Medicine, Macau University of Science and Technology, Avenida Wai Long Taipa, Macau, China

Correspondence Tian Zhong and Xi Yu, Nutrition and Food Science Program, Faculty of Medicine, Macau University of Science and Technology, Avenida Wai Long Taipa, Macau, China.

Email: [email protected] and [email protected]

Contribution: Resources (equal), Supervision (equal), Writing - review & editing (equal)

Search for more papers by this author

Xi Yu,

Corresponding Author

Xi Yu

[email protected]

orcid.org/0000-0001-5726-2392

Faculty of Medicine, Macau University of Science and Technology, Avenida Wai Long Taipa, Macau, China

Correspondence Tian Zhong and Xi Yu, Nutrition and Food Science Program, Faculty of Medicine, Macau University of Science and Technology, Avenida Wai Long Taipa, Macau, China.

Email: [email protected] and [email protected]

Contribution: Conceptualization (equal), Funding acquisition (equal), Project administration (equal), Supervision (equal)

Search for more papers by this author

First published: 23 September 2024

https://doi.org/10.1002/mef2.96

Citations: 4

Share a link

Email
Wechat
Bluesky

Abstract

Protein structure prediction (PSP) has been a prominent topic in bioinformatics and computational biology, aiming to predict protein function and structure from sequence data. The three-dimensional conformation of proteins is pivotal for their intricate biological roles. With the advancement of computational capabilities and the adoption of deep learning (DL) technologies (especially Transformer network architectures), the PSP field has ushered in a brand-new era of “neuralization.” Here, we focus on reviewing the evolution of PSP from traditional to modern deep learning-based approaches and the characteristics of various structural prediction methods. This emphasizes the advantages of deep learning-based hybrid prediction methods over traditional approaches. This study also provides a summary analysis of widely used bioinformatics databases and the latest structure prediction models. It discusses deep learning networks and algorithmic optimization for model training, validation, and evaluation. In addition, a summary discussion of the major advances in deep learning-based protein structure prediction is presented. The update of AlphaFold 3 further extends the boundaries of prediction models, especially in protein-small molecule structure prediction. This marks a key shift toward a holistic approach in biomolecular structure elucidation, aiming at solving almost all sequence-to-structure puzzles in various biological phenomena.

1 INTRODUCTION

Proteins are the cornerstone of cellular function, coordinating the biological processes of all life forms. From enzyme catalysis to immune system responses, proteins play ubiquitous roles. The complexity of protein structures is reflected in their three-dimensional conformations, which determine their functional roles and mechanisms of action. Therefore, accurate prediction of protein structure helps us to gain insight into the function of proteins as well as the basic mechanisms of life and provides an essential foundation for drug design,¹ disease treatment,² antibody design,³ and synthetic biology.⁴

Anfinsen's research⁵ emphasizes that the natural structure of a protein is determined solely by its amino acid sequence. This revelation has since made understanding this sequence-based paradigm a key area of study. Advances in experimental structural biology techniques, such as X-ray crystallography,⁶ nuclear magnetic resonance (NMR),⁷ and cryo-electron microscopy (cryo-EM),⁸ have enabled the generation of high-resolution and high-quality protein structures improving precise structure determination. Despite these developments, a substantial “structural knowledge gap” remains due to their high cost and time-intensive nature. In addition, interpretation of these data requires extensive expertise⁹ as well as the progressively increasing differences in protein sequences and structures further increase the complexity of prediction. Therefore, knowledge-based computational methods have emerged to help bridge the gap between experimental methods and advance the field of protein structure prediction.

Artificial intelligence technologies, particularly machine learning, have rapidly developed in recent years. Against this backdrop, deep learning methods have been widely employed in PSP due to their high-precision predictions and ability to process nonhomologous proteins (Figure 1).¹⁰ Using an innovative Transformer network architecture, DeepMind's AlphaFold 2 has revolutionized the accuracy standard of PSP. Moreover, AlphaFold 2 has been used to predict 98.5% of human protein structures,¹¹ making precise predictions of protein functions and RNA structures possible.¹² Recently, AlphaFold 3 further pushed the boundaries of PSP.¹³ The transformative substitution of diffusion with the optimization of the performer module, which predicts the original atomic coordinates by generating the diffusion process, enables multimodal structure prediction of the structures of complex biomolecular complexes, including proteins, nucleic acids, small molecules, ions, and modified residues.¹⁴ Currently, deep learning, as a powerful computational tool, is changing the research landscape in this field of protein structure prediction. With the continuous progress of deep learning technology, protein structure prediction's accuracy and application scope will be continuously expanded, bringing more possibilities for biological research and drug development.¹⁵

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

Overview of traditional and modern deep learning methods for protein structure prediction.

Here, we present the cutting-edge methodologies in protein structure prediction facilitated by deep learning technologies and their synergistic integration with other key techniques and resources. We focus on the evolution and distinctive features of both traditional methods and contemporary deep learning models in this field. Our review encompasses the architecture of several pivotal deep neural networks (DNNs), including convolutional neural networks (CNNs), recurrent neural networks (RNNs), generative adversarial networks (GANs), and transformer models. We discuss the successful application of deep learning to protein structure prediction, especially highlighting the diversity of model applications and their features following the introduction of AlphaFold2. Additionally, we address the challenges these advanced technologies face in complex biological scenarios and provide a balanced perspective that outlines how deep learning methods have improved the accuracy and efficiency of protein structure prediction, thus aiming to contribute to accelerated progress in the field.

2 BASIS OF PROTEIN STRUCTURE

Structure prediction involves extrapolating complex structural patterns from basic structural units through mathematical and computational methods. Its core goal is to predict higher-level structures by analyzing and modeling the basic components (e.g., atoms, molecules, or amino acid sequences) to understand their physical properties, chemical properties, or biological functions.¹⁶ In biology, protein structure prediction is an important branch of structure prediction, especially occupying a key position in computational biology. Protein structure prediction methods aim to bridge the sequence-structure gap by inferring the three-dimensional conformation of a protein from the amino acid sequence.¹⁷ These provide accurate models of protein folding without relying on expensive and time-consuming experimental techniques.¹⁸ Due to the inherent complexity of the multilevel structure of proteins, an in-depth understanding and systematic analysis of their basic structural units is essential to enable end-to-end structure prediction.

2.1 Hierarchy of protein structures

Proteins are biomolecules composed of linear sequences of amino acids whose functional diversity and biological activity are determined by their specific three-dimensional structures. These three-dimensional structures are formed by amino acid residues in their primary sequences through a complex folding process. From the amino acid sequence of the primary structure to the local folding pattern of the secondary structure, to the overall spatial conformation of the tertiary structure, up to the multisubunit assembly of the quaternary structure, each level of structure has a profound impact on the function and properties of proteins.¹⁹ This hierarchy reflects the conservatism in the evolutionary process and provides a multilevel source of information for structure prediction.

The amino acid sequence determines the primary structure, which is the blueprint for protein structure. The genetic code determines the uniqueness of the amino acid sequence, and the structure of this sequence determines the basic chemical properties and potential folding ability of the protein. However, the relationship between sequence and structure is not a simple one-to-one correspondence.²⁰ By performing multiple sequence alignment (MSA) at the primary structure level, conserved regions and coevolutionary information in protein sequences can be revealed, which can help to understand the evolutionary relationships, functional properties, and structural stability of proteins.²¹ Secondary structure refers to the local folding of a protein's amino acid chain, primarily stabilized by hydrogen bonds. It mainly includes α-helices (H), β-strands (E), β-turns (T), and random coils (C). The secondary structure of proteins is closely related to protein evolution and function. The tertiary structure represents the overall three-dimensional conformation of the protein, which is stabilized by hydrophobic interactions, ionic bonding, hydrogen bonding, and van der Waals forces. This determines the functional regions of the protein, such as enzyme active sites and ligand binding sites, and affects the protein's function in the cell.²² Analysis by cryo-EM reveals that the core DNA-binding domain of the tumor suppressor protein p53 protein binds specifically to the p53 binding site (p53BS) DNA containing 20 base pairs in a tetrameric form, which causes DNA to be stripped from the surface of the histone protein and significantly alters the of the DNA through the nucleosome, thereby activating the p21 gene. Similarly, cGMP binding induces a conformational transition in PKG, activating its kinase center and phosphorylating the substrate.²³ Quaternary structures are protein complexes consisting of multiple polypeptide chains (subunits). Quaternary structure prediction is key to understanding complex biological systems.²⁴ has successfully predicted the structures of more than 500,000 protein complexes using AlphaFold-Multimer, with 70% of the predicted structures having an RMSD of less than four angstroms from the experimental structures. This breakthrough opens new avenues for systems biology and cellular network research. Further, this method was applied to membrane protein complex prediction, achieving an average TM-score of 0.82 prediction accuracy in GPCR family proteins and providing a new perspective on membrane protein drug design.²⁵

2.2 Accurate prediction of structure

Accurate protein structure prediction is critical for decoding the functional mechanisms of biomolecules, which embodies the core “structure-function” paradigm of molecular biology and enhances our understanding of life processes.¹⁹ High-precision structure prediction not only elucidates the molecular basis of protein function but also provides key insights in areas such as drug design, disease mechanism studies, and protein engineering.²⁶ shows that high-precision structure prediction can significantly accelerate the new drug development process. The full-length spike protein structure of the SARS-CoV-2 virus was predicted using the CI-TASSER model with an average template modeling score (TM-score) of 0.9 or more.²⁷ The high-precision predicted structures were used for virtual screening to rapidly identify potential inhibitors and accelerate the development of antiviral drugs. The accuracy of protein structure prediction is affected by several factors, including the richness of sequence information, the availability of homologous protein structures, prediction algorithm performance, and computational resources.²⁸ A noteworthy limiting factor is the sequence variability associated with the structure maintenance of homologous proteins rooted in evolutionary processes.²⁹ With the rapid development of genome sequencing technology, the availability of many homologous sequences has also significantly improved structure prediction accuracy.

2.3 Traditional methods of protein structure prediction

In protein structure prediction, traditional structure prediction methods use a combination of known structural templates, sequence homology analyzes, and biophysical principles to infer the 3D structure of a target protein using fine-grained computational models and algorithms (Figure 2). Traditionally, based on whether known protein structures are used as templates, PSP methods can be approached from two perspectives: template-based methods (TBM)³⁰ and template-free modeling (TFM).³¹ Both rely on computational modeling to predict the structure of proteins. TBM, including homology modeling,³² fold recognition,³³ and comparative modeling,³¹ are used to guide the structure prediction of target proteins using known homologous protein sequences.³⁴ successfully predicted the three-dimensional structures of fungal effector proteins using a template-based modeling approach.

Homology modeling, or homology-based modeling, can build a 3D structural model of the target protein by comparing the unknown sequence with the template protein amino acid sequence using tools such as SWISS-MODEL³⁵ and HHpred.³⁶ The underlying assumption is that structurally similar protein sequences are evolutionarily related. After finding template proteins with similar sequences to the target proteins through sequence comparison tools such as BLAST,³⁷ the target sequences are mapped onto the template structures through comparison adjustment and molecular modeling techniques, and the final 3D structural models are obtained through structural optimization and energy minimization. However, the effectiveness of this method is limited by the sequence identity to the known structural template, and it is usually required that the structure prediction of target proteins with more than 30% sequence similarity can reach the experimental resolution level.³⁸ In addition, its structure prediction accuracy is limited by factors such as the size of the template protein database³⁹ and long-distance sequence comparison.⁴⁰ Fold recognition, or template matching, is suitable for structure prediction with low sequence similarity between the target sequence and template.⁴¹ Unlike homology modeling, fold recognition compares at the 3D structure level, directly comparing the 3D morphology of the target sequence with known structures in the database, thus identifying possible folding modes with the target sequence. This approach uses physical and statistical energy function numbers to assess the fitness of the target sequence in all known template structures, considering factors such as the spatial position of the amino acid side chains and the conservatism of the core structure.⁴² Although it has significant advantages over homology modeling in dealing with low sequence similarity, it is still limited by the quality of the size of the template library and the applicability of the predicted structure. Comparative modeling, as a comprehensive prediction method, applies to a wider range of template selection criteria. It generates a 3D structural model of the target protein by comparing the target sequence with multiple templates and synthesizing the structural information from different templates.⁴³ The comparative modeling approach makes up for the shortcomings of single-template modeling by integrating information from various templates, thus improving structure prediction accuracy.

TBM rely on the sequences of known homologous proteins as templates, allowing for the relatively accurate prediction of target protein structures. However, their primary limitation is the inability to predict the structures of newly discovered proteins that lack homologous templates. On the other hand, TFM methods (ab initio modeling) can be used to predict protein structures without any potential template structures.⁴⁴ This makes it possible to predict all protein structures, but usually with lower accuracy. The main implementation techniques include fragment-based assembly³¹ and ab initio or de novo folding methods.⁴⁵ In CASP7, I-TASSER, based on a TFM approach, generated 7 (about 1/3) correct topologies out of 19 free modeling targets with a TM score >0.5.⁴⁶ However, these sequences were limited to 155 residues in length, and the average RMSD of the template in the aligned region was 13.5 Å, significantly greater than the average RMSD of 5.0 Å achieved by the template-based approach. TFM is completed by two main strategies: fragment-based assembly methods⁴⁷ and ab initio or ab initio folding methods.⁴⁸ The fragment-based assembly approach exploits the principle of conservation of the local structure of proteins to construct the 3D structure of proteins by segmenting the target sequence into multiple short fragments and then searching for similar pieces in a library of known structural fragments for assembly. This method relies on a rich library of structural pieces, and the structural model with the lowest energy is finally selected by performing multiple simulations and reorganizations of the short fragments. A representative algorithm, Rosetta, assembles and optimizes structures by conformational sampling and energy minimization via the Monte Carlo strategy.⁴⁹ However, this approach needs to be revised regarding the efficiency and accuracy of conformational sampling.⁵⁰ As protein size increases (beyond 100 amino acid residues), the conformational space grows exponentially. In addition, long-range interactions are difficult to accurately capture by short fragments for large proteins (typically over 100 amino acid residues).⁵¹ Ab initio folding/ab initio assembly represents an important branch in protein structure prediction. The core idea is to predict the three-dimensional conformation of proteins based entirely on physicochemical principles without relying on known structural templates.⁵² The three-dimensional structure of a protein is predicted based entirely on physicochemical principles without relying on any template structure. The method explores the energy landscape through molecular dynamics (MD) simulations or Monte Carlo simulations, that is, Newton's equations of motion to simulate the motion of atoms over time, as well as random sampling to search for the lowest energy state among all possible conformations of the target sequence to obtain the predicted 3D structure.⁵³ This process involves complex energy function calculations, including van der Waals forces, hydrogen bonding, electrostatic, and hydrophobic interactions.

Traditional protein structure prediction methods, developed over decades, have resulted in a mature technical system that integrates the application of sequence analysis, biophysical modeling, and computational algorithms. Despite their success in several aspects, they show limitations in predicting the structure of new nonhomologous proteins or have no obvious similar templates. With the significant increase in computational power and the continuous evolution of algorithms, especially the incorporation of deep learning techniques, the traditional methods are being gradually transformed into a more integrated data-driven approach to structure prediction at the levels of functionality, dynamics, and interoperability, which has brought about an unprecedented level of accuracy and range of applications for protein structure prediction.

3 MODERN DEEP-LEARNING TECHNIQUES

With the advancement of high-throughput sequencing technologies, the boundaries between traditional methods have narrowed, and a new paradigm for predicting protein structures is gradually emerging at the forefront of research—hybrid methods.⁵⁴ This emerging mode of PSP transcends the rigid categorization of template-based and template-free predictions. Instead, it adopts a flexible strategy that utilizes both advantages for structure prediction.⁵⁵ While traditional methods have limitations such as dependency on homology,⁴⁹ sequence identity threshold,³⁸ and lower accuracy of free-modeling methods,⁴⁶ the latest hybrid methods alleviate these limitations to some extent by integrating a variety of deep learning network architecture problems.⁵⁶ DeepMind's AlphaFold2 model, released in 2020, is a pioneering model for protein prediction and a successful case study of modern hybrid methods for PSP. Deep learning is an important branch of machine learning.⁵⁷ In computational biology, deep learning architectures are forging innovative pathways for PSP based on the complex hierarchical structure of artificial neural networks (Table 1). The addition of various deep learning systems has also changed the traditional means of protein structure prediction. Evolving from feedforward neural networks, these systems emphasize feature extraction using multiple connected layers to achieve complex mappings from input to output. The diverse linking patterns constitute customizable DNNs for different data classes.⁵⁷ Within the field of protein structure prediction, CNNs, RNNs, GNNs, long short-term memory networks (LSTMs), and Transformer, as well as GANs, are among the most prominent and widely used DNNs. Applying deep learning to protein structure prediction has gone from a challenging research area to an active field, constantly improving its accuracy and application scope through continuous optimization. This section will introduce the core principles of deep learning, multiple network architectures, and their characteristics and provide insights into strategies for model optimization.

Table 1. Protein structure prediction models based on deep learning architecture.

Model	Year of release	Open source (Y/N)	Applicability	Innovations/feature	References
MetaPSICOV	2015	N	Contact relationships of residues within protein molecules	Contact map prediction using coevolution	[58]
DeepCov	2018	Y	Protein‒protein complex contact-map and interactions	Improved prediction accuracy for nonhomologous protein complex contacts	[59]
DeepFragLib	2019	Y	Amino acid residues (fragments)	Efficiently build protein-specific fragment libraries	[60]
AlphaFold2	2021	Y	Protein monomer and complex structures	Highly accurate protein structure prediction, scoring the accuracy of predicted structures	[11]
TrRosetta	2021	Y	Protein Monomer Structure	Rapid and accurate de novo structure prediction	[61]
CopulaNet	2021	N	Protein complex structure	Estimate residue coevolution directly from multiple sequence alignment (MSA)	[62]
RGN2	2022	N	Single-sequence protein structure prediction	Protein language models, faster predictions	[63]
SPOT-Contact-LM	2022	Y	The protein contact-map	Narrowed the gap between homologous and nonhomologous complex prediction	[64]
ColabFold	2022	Y	Large protein complexes	Fast and memory-efficient	[65]
Umol	2023	Y	Protein‒ligand complexes	Evoformer accepts protein sequences and the ligand small molecule SMILES as inputs	[66]
ESMFold	2023	Y	Atomic-level protein structure	Protein language modeling replaces MSA as input, which leads to faster predictions	[67]
IgFold	2023	Y	Antibody variable region 3D structure	Deep learning-based end-to-end antibody structure prediction	[68]
RoseTTAFoldNA	2024	Y	Protein-nucleic acid complex	Based on the three-track architecture of RoseTTAFold, end-to-end protein–NA structure prediction network	[69]
AF-Cluster	2024	Y	Protein Multi-Conformation Prediction	Clustering of MSA by sequence similarity	[70]
DeepFusion	2024	N	Protein‒RNA complexes	Dissect the patterns of RNA‒protein interactions	[71]
AlphaFold 3	2024	N	Large protein complexes	Fast and memory-efficient	[72]

3.1 Deep learning and neural networks

Neural networks, fundamental to deep learning, comprise artificial neurons that simulate biological neurons by receiving inputs, performing weighted sums, and generating outputs through activation functions.⁷³ DNNs excel in learning complex nonlinear relationships between protein sequences and structures by layering these neurons, enhancing the model's ability to process intricate data structures.⁷⁴ used the ReLU activation function, which reduces the risk of overfitting by inducing sparsity in the network and reducing the interdependence between parameters. The loss function is pivotal as it quantifies discrepancies between model predictions and actual data, influencing both the learning trajectory and the network's generalization capabilities. The mean squared error (MSE) is commonly used to assess the accuracy of predicted protein structures against experimental data.⁷⁵ At the same time, cross-entropy loss evaluates the alignment of the model's output probability distribution with the target distribution.⁷⁶ enables mask prediction of four protein structure hierarchies by calculating residual types, distances, angles, and cross-entropy of dihedrals.

In addition, deep learning techniques often use supervised learning methods such as logistic regression, support vector machines, decision trees, and random forests for binary classification problems^{77, 78} proposed IDEGBM prediction mechanism based on the extreme random tree (ERT) model to extract innovative hybrid features from evolutionary information, secondary structures, chemical properties, and global descriptors to reflect the diversity of different amino acid arrangements. As for multiclass classification problems such as protein functional domains and protein phase–protein interaction interface types, softmax regression or one-versus-rest (OvR) strategy is often used to train the model using protein datasets with known structures, and then predict the structural features and functional properties of the new proteins. DeepMC-iNABP⁷⁹ employs the “one-versus-all” multiclass classification technique, which applies unique heat codes to reshape the categorical variables (labeled variables) of the data instances to achieve the identification of the structure of the nucleic acid binding protein (NABP). In contrast, unsupervised learning methods operate independently of labeled data, enabling the discovery of inherent patterns and structures directly from the data set.⁸⁰ Techniques such as k-means and hierarchical clustering are frequently employed to elucidate intrinsic patterns and natural groupings within protein sequences or structures.⁸¹ These methods facilitate a deeper understanding of the similarities and differences among protein families. Additionally, self-supervised learning approaches, which leverage contrastive learning, are increasingly utilized to explore the latent structures within proteins, further enhancing our comprehension of molecular biology.⁸²

3.2 Deep learning architectures

3.2.1 CNNs

Deep learning methods excel at handling large and complex datasets, revealing intricate relationships between protein sequences and their structures. These techniques are particularly adept at capturing and exploiting nonlinear and high-dimensional features challenging in traditional template-based or free modeling approaches. CNN use convolutional filters to extract local features from the image and then use a pooling layer to reduce the data dimensionality.⁸³ This makes it suitable for processing lattice-like topological data, especially for protein secondary structure prediction.⁸⁴ While standard 1D CNNs can handle linear sequence data, 2D CNNs can also be used in protein contact maps to generate 2D matrices by analyzing the distance relationships between amino acid residues in proteins, allowing 2D CNNs to efficiently identify patterns in these images to infer the tertiary structure of proteins. In addition, 3D CNNs can learn key information about the spatial conformation of proteins directly from the original 3D structure. CNNs also show great power when applied to predicted distance maps or co-evolutionary signals extracted from MSA by direct coupled analysis (DCA) models. DeepCov uses fully convolutional neural networks (FCNNs) to achieve highly accurate structure prediction even when few homologous sequences are available. DeepCov achieves highly accurate structure prediction even with fewer homologous sequences available through FCNNs.⁵⁹ With cascading convolutional and pooling layers, CNNs can capture the complex spatial hierarchies of proteins and thus accurately predict the multilevel structure of proteins.⁸⁵ Ju et al. proposed CopulaNet,⁶² which employs a CNN architecture. By utilizing convolution and pooling layers to extract local features of protein sequences and integrating coevolutionary information between residues, CopulaNet achieves high-precision predictions of protein structures. PSSP-MVIRT⁸⁶ constructs a hybrid network architecture of CNN and bi-directionally gated recurrent units to extract global and local features of peptides, and its Sov metrics are more than 15% higher than that of HMM and other methods. by more than 15%.

3.2.2 RNNs and LSTM

RNNs and LSTM networks have become central to sequence-to-sequence prediction models, especially in the fields of protein folding and dynamics.⁸⁷ RNNs excel in processing protein sequences characterized by sequence dependencies. Compared to CNNs, the distinct advantage of RNNs lies in their ability to handle long-range dependencies within sequence data. This capability enables them to predict functional or structural domains of proteins and process longer sequences.⁸⁸ used an RNN-based seq. 2seq autoencoder to learn the embedding vectors and subsequently used the attention mechanism to learn the binding site information between compounds and proteins while using CNN to train a compound–protein interaction CPI prediction model. To alleviate the problem of vanishing or exploding gradients that occur when RNN processes long sequences,⁸⁹ proposed LSTM. It builds on standard RNNs by combining memory cells and gating mechanisms that regulate information flow, allowing RNNs to capture interactions between distal amino acids in protein sequences. This ability to recognize a sequence's overall structural features significantly improves the prediction accuracy. By understanding the complex interactions between distal amino acids, RNNs reveal the overall structural features of protein sequences, which improves the accuracy of protein prediction models.⁸⁷ demonstrates that simple character-level language models based on LSTM neural networks can learn probabilistic models of time series generated from physical systems and shows the reliability of the models utilizing force spectral trajectories for different benchmark systems and multi-state riboswitches. In addition Wang et al.,⁹⁰ obtained the best prediction performance in balanced F1 scores using RNN with LSTM architecture to deal with the effect of mutations on protein–ligand binding affinity.

3.2.3 GANs

GANs consist of two competing networks, a generator and a discriminator, which are trained through generator-discriminator adversarial training to improve the quality of generated data further. GAN has shown excellent performance in image processing and generative tasks, and recently, these advantages have gradually been introduced into bioinformatics and structural biology.⁹¹ In the field of PSP, GAN is mainly applied to data enhancement and structure generation. GANcon⁹² predicts accurate protein contact maps using GAN networks. To address the lack of homologous protein sequences and the low accuracy limitation of long-distance contact prediction, CGAN-Cmap uses GAN to capture and interpret distance distributions from 1D sequential and 2D paired feature maps, improving the accuracy of long-distance contacts in the model by more than 3.5%.⁹³ In addition,⁹⁴ used conditional Wasserstein GAN (CWGAN) for protein lysine modification site prediction. The resultant Euclidean distance was below 0.03 angstroms, which was much smaller than that of conditional GAN (CGAN) with Euclidean distance below 0.03 angstroms, which is much smaller than that of CGAN with distance above 0.1. To reveal potential sequence-structure relationships and design protein sequences for possibly novel structural folds, gcWGAN⁹⁵ used Wasserstein distance in the loss function of the WGAN. The folding accuracy was comparable to that predicted by the cVAE, but sequence diversity and novelty are significantly higher than that of cVAE. In addition to GAN, generative models based on flow matching show tremendous potential.⁹⁶ These models include diffusion models and rectified flows, among others. Diffusion models generate high-quality structural predictions by gradually adding noise and learning the denoising process.⁹⁷ Rectified flows, on the other hand, generate accurate predictions by gradually transforming and correcting the data distribution.⁹⁸ These methods have significant advantages in capturing proteins' complex spatial configuration and dynamical behavior. Denoising autoencoder (DAE) also helps to learn dense, stable, and low-dimensional representations of proteins by learning in an unsupervised manner.⁹⁹ Graph neural networks (GNN), by processing non-Euclidean data structures consisting of vertices and edges, enable efficient capture of the complex topology of protein molecules and identification of residue interactions. GCN-based DeepFRI¹⁰⁰ encodes amino acid residue interactions as graph edges. In contrast, the residues are represented as vertices of the graph, enabling molecular-level annotation of protein functions. The graphical structure representation is implemented to enhance the flexibility and computational efficiency of modeling the complex topology of proteins.

3.2.4 Transformer

The Transformer architecture proposed by Vaswani et al.¹⁰¹ has gained widespread attention because of its excellent natural language processing (NLP) performance. As shown for AlphaFold2, the Transformer has been successfully applied for protein structure prediction.¹⁰ The transformer's core feature, the self-attention mechanism, enables it to capture global dependencies in sequence data efficiently.¹⁰² In Transformer, each token in an input sentence can “attend” to all other tokens by exchanging activation patterns corresponding to the intermediate outputs of neurons in the neural network. In contrast to traditional CNN and RNN architectures, the global dependency modeling capability of Transformer, combined with its parallel processing advantage, allows the model to comprehensively consider the information between any two points within a sequence, thus effectively identifying remote dependencies. It introduced a paradigm shift in processing sequences by relying exclusively on a self-concern mechanism to weigh the importance of different parts of the input data.¹⁰³ Unlike RNN and LSTM, which process data sequentially, Transformer processes data in parallel, greatly speeding up training and enhancing the ability to capture complex dependencies in the data. The self-attention mechanism enables the model to focus on all parts of the sequence simultaneously, allowing it to focus on modeling different biological sequences for multitype structure prediction of proteins.¹⁰⁴ It is particularly effective when contextual relationships span long sequences, such as complex protein or RNA molecule folding patterns. Many models have adopted the release of AlphaFold2, the transformer architecture. The RoseTTAFoldNA model⁶⁹ significantly broadens the range of predictable structures by integrating various biomolecular data such as protein sequences, nucleic acid sequences, metal ions, small molecule ligands, and covalent linkage modifications as inputs. The Umol model⁶⁶ focuses on the structure prediction of protein–ligand complexes, accepting protein sequences and small molecule SMILES sequences as inputs and demonstrating the model's ability to resolve the interactions between proteins and small molecules, which is crucial for drug design and functional molecule research. The development of a transformer for protein structure prediction has not only led to revolutionary innovations. It has also advanced the prediction of interactions between proteins and other biomolecules (e.g., nucleic acids, other proteins) and small‒molecule ligands, expanding the variety and complexity of structures the model can predict.¹⁰⁵

Transformer provides a powerful framework for dealing with remote dependencies across sequences; CNN applies its convolutional filters to capture local structural motifs. RNN and LSTM excel in temporal data continuity, which is crucial for temporal dynamics. On the other hand, GAN innovates by synthesizing real biological data, which is invaluable for training predictive models where experimental data is scarce or incomplete. The unique nature of these structures provides researchers with a diverse toolkit, with each approach bringing unique advantages to different challenges in protein structure prediction. For example, CNN may be better suited for static structure analysis, RNN for dynamic processes, LSTM for long-term evolutionary studies, gan for generating new biomolecular structures, and Transformer for comprehensive sequence analysis that requires understanding local and global context. However, these models also have certain limitations. CNN may ignore global dependencies when capturing local features.¹⁰⁶ RNN and LSTM are susceptible to the gradient vanishing problem and are particularly limited when dealing with long sequences.¹⁰⁷ GAN, although advantageous in the field of data generation, is unstable in the training process and is prone to pattern collapses.¹⁰⁸ Despite its outstanding performance in cross-sequence dependency, the transformer's high computational resource requirements and its generalization ability on small sample datasets must be improved. These limitations must still be further optimized and improved in protein structure prediction.

In summary, the various integrations of network architectures based on deep learning provide powerful tools for advancing our understanding of protein structure. The unique features of each architecture enable it to handle specific aspects of the complex data involved in predicting protein folding and interactions. This complementary set of tools enhances our ability to model and understand biological systems. It paves the way for innovative approaches to drug design, genetic engineering, and other applications in biotechnology and medicine.

3.3 Training, validation, and evaluation of predictive models

Since the release of AlphaFold, optimization of the performance of protein structure prediction models has become one of the main focuses of computational biologists.¹⁰⁹ The optimal design of algorithms in the training, validation, and evaluation phases is a key component in achieving this goal. Training is the initial phase in which the model learns to predict results based on the data set, aiming to maximize prediction accuracy and generalization. This stage mainly involves data preparation and preprocessing, forward propagation and feature learning, loss calculation, backpropagation, parameter updating, and iterative optimization. Proteins have a multilevel structure and a variety of protein-ligand complexes, which leads to feature extraction from information such as sequences being a key aspect of training.¹¹⁰ Feature extraction has traditionally relied on position-specific scoring matrices (PSSM) and simple coding schemes. PSSM are derived from multiple sequence comparisons and can statistically represent the conservation of each amino acid at a specific position in a sequence comparison. Conventional feature extraction often ignores local context or long-range interactions of amino acids. To address this shortcoming, evolutionary scale modeling (ESM) uses the Transformer architecture to learn evolutionary information and sequence patterns from a large amount of protein sequence data to capture local and long-range dependencies in sequence data.⁶⁷ ESMFold enables end-to-end atomic-level prediction of a single sequence by combining language models with structural modules.¹¹¹ The subsequent release of ESM-2⁶⁷ introduces a multiscale feature fusion strategy based on the attention mechanism, which optimizes position embedding and employs rotary position embedding (RoPE) to handle sequences of arbitrary lengths without the limitation of the length of the pretrained sequences. Principal component analysis (PCA) is commonly used as a dimensionality reduction technique to reduce the high dimensionality of protein structure data. By identifying principal components or significant axes of variation in the data, PCA allows the model to focus on the essential features, thus reducing the computational load and improving learning efficiency.¹¹² For example, Ojeda-May et al.,¹¹³ used PCA to describe the general folding characteristics of proteins, a visual representation of the first principal component, and a density map to capture and detail the typical rearrangement patterns of adenylate kinase (AdK) in the ATPlid and AMPlid regions. In biological sequence analysis, feature extraction methods such as NMBroto, Z curve-12bit, SNC, and DNC can effectively capture local and global features of sequences by computing positional correlation and geometric mapping.¹¹⁴ These methods perform well in predicting RNA and DNA sequences, but their application in protein structure prediction is challenging. The functions and structures of proteins depend on sequences and involve complex 3D conformations and multilevel interactions, so it is difficult to fully reveal the complex properties of proteins by these linear feature extraction methods alone. To address this problem, deep learning-driven feature extraction methods, such as Autoencoders and multiscale feature extraction techniques, have shown more significant potential in recent years, further improving the accuracy and adaptability of protein structure prediction by combining local and global features.¹¹⁵

The validation and evaluation of protein structure prediction requires objective metrics to measure the similarity between computational models and experimentally determined reference structures. Objective metrics such as TM-score and prediction alignment error (PAE) are essential for fine-tuning the parameters and preventing overfitting. The TM-score was proposed by Zhang¹¹⁶ to evaluate the metric of protein structural topological similarity. It improves traditional metrics such as RMSD by emphasizing smaller distance errors, thus increasing the sensitivity to global folding rather than local differences. In addition, the TM-score employs length normalization to ensure that its value is independent of protein size, allowing for consistent structural comparisons of proteins of different lengths. Protein pairs with TM-scores >0.5 are mostly located in the same fold, whereas protein pairs with TM-scores <0.5 are predominantly not in the same fold.¹¹⁷ AlphaFold-Multimer uses a weighted combination of pTM and ipTM to indicate the model confidence for the interaction scores between different chain residues to score their interactions.²⁴ On the other hand, the PAE provides the expected error in residue-residue prediction throughout the protein sequence, predicting the dynamic nature of protein residues (Figure 3).¹¹⁹ concluded that the PAE plot of AF2 correlates with the distance variation (DV) matrix of MD simulations. DV matrix in MD simulations. Unlike traditional similarity metrics that rely on global superposition and are susceptible to structural domain shifts, the local distance difference test, lDDT, provides a robust assessment of the local model quality and maintains relevance even in structural domain shifts. lDDT serves as a superposition-free score for evaluating the local distance differences of all atoms in the model, including validation of stereochemical transformations. lDDT is also used to assess the local distance differences of all atoms in the model, including stereochemical transformations. lDDT is used to validate stereochemical transformations. Applies this concept to the per-residue lDDT-Cα score,¹⁰ pLDDT (per-residue lDDT-Cα). The score ranges from 0 to 100 and indicates the confidence level of a single model residue, the higher the score, the higher the confidence level. Language model-based protein structure prediction methods such as ESMFold also use plDDT-based metrics, and very low confidence pLDDT scores correlate with a high propensity for intrinsic disorder (intrinsic disorder) in protein structure.¹¹ Furthermore, for protein-protein complex structure prediction evaluation, DockQ combines several criteria, such as interface root mean square deviation (iRMSD), interface clustering, and interface local contacts, to quantify the accuracy of the prediction model in protein interactions.¹²⁰ Optimization of structural prediction models focuses on algorithmic improvement and strategy tuning in the training, validation, and evaluation phases. Together, these phases form the cornerstone of effective model development, ensuring that the models are highly accurate and robust for different protein structure prediction tasks.

In model optimization for protein structure prediction, a systematic approach is essential to ensure the generality and robustness of the model. Chou's five-step rules provide a theoretical basis for model development, covering the complete process from benchmark data set construction, mathematical representation, and prediction algorithm development to cross-validation and software implementation.¹²¹ Based on Chou's five-step rule, multiscale feature extraction methods have achieved significant results in other biological sequence analysis tasks. For example, in DNA-protein binding (DPB) prediction, the researchers systematically analyzed the fused sequence features through a multiscale CNN to accurately predict the regulatory mechanism of gene expression.¹²² This approach combines local and global features, effectively solves the complex dependency problem, and provides a reference for applying deep learning in protein structure prediction. In deep learning model development, following Chou's five-step rule can ensure the transparency and reproducibility of model development, especially in the feature extraction and validation process, which shows significant advantages. Combining traditional rules with cutting-edge technologies enables protein structure prediction to achieve higher accuracy and adaptability in the training, validation, and evaluation processes.⁵⁶

In the prediction of protein structures, regardless of the method chosen, the accuracy and efficiency of the results heavily rely on the availability of bioinformatics databases. Therefore, using these open-access databases is essential for protein structure prediction. The protein data bank (PDB),¹²³ UniProt,¹²⁴ and AlphaFold Protein Structure Database (AlphaFold DB)¹²⁵ contain exhaustive information ranging from protein sequences to structures, functions, and protein–protein interactions (Table 2). These databases are pivotal for algorithm development, model training, and statistical analysis within deep learning-based technologies. They also provide a foundation for validating and assessing the accuracy of new algorithm predictions, which is indispensable for advancing research and applications in the field of biomedical sciences.

Table 2. Summary of protein bioinformatics databases.

Database	Data type	Features	Cover range	Official website
Protein data bank	3D structures	X-ray, NMR, cryo-EM derived	Approximately 200,000 protein structures	rcsb.org
UniProt	Protein family and structural domain database	Comprehensive biological information	More than 200 million sequences	uniprot.org
InterPro	Protein sequence database	Aggregates multiple database info	More than 30,000 sequences	interpro
Gene ontology (GO)	Gene database	Biological processes, molecular functions	Approximately 1.5 million genes	geneontology.org
STRING	Protein interaction networks	Predicted and experimental interactions	More than 59 million proteins	string-db.org
ESM Atlas	Predicted protein structure	Computational Prediction-Based Structural Ensemble	More than 700 million metagenomic proteins	esmatlas.com
AlphaFold DB	Predicted protein structure	Computational prediction-based structural ensemble	Over 200 million protein structure predictions	alphafold.ebi.ac.uk
ESM Metagenomic Atlas	Predicted protein structure	Computational prediction-based structural ensemble	772 million predicted metagenomic protein structures	esmatlas.com
BFD (Big Fantastic Database)	Protein sequence database	Large-scale sequence data for machine learning	Over 2.5 billion sequences	bfd.mmseqs.com
MGnify	Metagenomic sequences and annotations	Automated analysis of microbiome sequencing data	Over 700,000 metagenomic datasets from various environments	metagenomics

4 MAJOR ADVANCES IN DEEP LEARNING-BASED PROTEIN STRUCTURE PREDICTION

Protein structure prediction is a fundamental problem with significant challenges in bioinformatics. Traditional methods are limited by the coverage of template libraries and the accuracy of force fields, while modern prediction methods have shown advantages in recent years.¹²⁶ In the presence of available templates, deep learning methods improve the accuracy of template selection and optimize the alignment between templates and target sequences. For proteins for which no suitable template is available, deep learning methods can predict the structure of the protein from scratch using a strategy similar to free modeling. This approach typically utilizes deep learning models to process large amounts of data to understand and predict complex protein folding patterns, including various protein structure prediction types (Table 3). In the critical assessment of protein structure prediction (CASP) 14 competition, the AlphaFold 2 model performed well due to its innovative attention mechanism, advanced information encoding, and end-to-end framework.¹³² Following the release of AlphaFold 2, a series of models such as Umol, RGN2, and TrRosetta adopted deep learning architectures, including the transformer, and further refined algorithmic strategies to enhance the model's predictive power greatly. The recent release of AlphaFold 3 further extends the boundaries of predictive modeling, particularly in protein-small molecule structure prediction. This extension marks a key shift toward a holistic approach in biomolecular structure elucidation, aiming to address virtually all sequence-to-structure challenges in various biological phenomena.

Table 3. Advances in protein structure prediction.

Type of prediction	Main models	Solution	Features and application scenarios	Challenge	Reference
Single-chain protein	trRosettaX-Single	Transformer-based language modeling and multiscale networks	Predict natural protein and single-sequence structures	Single-sequence accuracy	[127]
Multichain protein/protein complexes	HelixFold-Multimer	Integration of domain expertise to optimize cross-chain interaction modeling	Predict antigen-antibody and peptide–protein interfaces	Capturing complex interactions; improving prediction accuracy	[128]
Protein-small molecule complex	RoseTTAFold All-Atom	Optimizing the three-track architecture	Atomic-level precision; Custom pockets per ligand	Computational efficiency; detailed feature learning	[129]
Protein-nucleic acid complex	RoseTTAFold NA	Extend all tracks of the network to support nucleic acids	Protein–NA interface modeling; designing sequence-specific nucleic acid binding proteins	Improving single-mode performance and prediction accuracy	[69]
Protein–protein Interaction	OpenFold	Retrain AlphaFold2 to improve generalization	Precise prediction; framework compatibility	New molecular modeling; different protein family	[130]
Covalent modification of structure	PPICT	Elastic Network Models (ENMs) and Network Embedding Method	Predict posttranslational modification cross-talk	Lack of gold standard data sets; large protein pair space	[131]
Multimodal structure	AlphaFold 3	Using the pairformer module and diffusion module	Unified deep learning for high-accuracy biomolecular modeling	Managing complex chemical entities; Dynamic structure; Optimizing computational demands	[13]

4.1 AlphaFold's development and innovation

The AlphaFold2 model primarily consists of two architectures: (i) the Evoformer module for learning protein MSA sequence information (48 blocks) and (ii) the 3D Equivariant structure module for interpreting the three-dimensional structure of protein sequences (eight blocks). The Evoformer module utilizes a self-attention mechanism to process and integrate MSA data, enhancing the ability of the model to capture interactions between amino acid residues, particularly those that coevolve during protein folding. This enables efficient exchange of information within MSAs, pairwise representations that allow direct inference of spatial and evolutionary relationships, and success in learning complex patterns in protein sequences that are not explicitly bound to any known structure. In the 3D Equivariant Structure module, the model utilizes invariant point attention (IPA) and a recycling iteration mechanism to extract evolutionary information from MSA, achieving end-to-end structure prediction at the atomic level. Recently, new PSP systems, such as protein–nucleic acid complexes,⁶⁹ protein–ligand complexes,⁶⁸ and protein multicon formation ^{70, 133} prediction, have also been developed using advanced protein structure prediction methods based on deep learning. Despite the remarkable results achieved by AlphaFold 2 for single protein structure prediction, structure prediction for noncanonical amino acids，protein–ligands, protein–nucleic acids, protein complexes (e.g., antibodies), and posttranslational modifications of proteins has yet to be successful. In terms of algorithm optimization, AlphaFold-Multimer specifically targets known stoichiometric multimerization inputs, significantly improving the accuracy of predicted multimerization interfaces while maintaining high intrachain accuracy.¹³ AlphaMissens uses an AlphaFold-derived system to integrate structural contexts, enabling accurate prediction of proteome-wide missense variant effects.¹³⁴ AF-Cluster enables the prediction of multiple conformations of proteins through sequence clustering.⁷⁰ In addition,¹³⁵ enriches AlphaFold models with ligands and cofactors using algorithms such as AlphaFill. However, the update of AlphaFold 3 is a major and important advancement in the problem of protein multitype structure prediction. The model reduces the reliance on MSA by integrating Evoformer into the simpler Pairformer module. More importantly, AlphaFold 3 introduces a diffusion-based model. This novel structure replaces the previous modules focusing on specific amino acid frames and torsion angles. It can predict the joint structure of biomolecular complexes such as proteins, nucleic acids, small molecules, ions, and modified residues. The diffusion model generates atomic coordinates and provides a less computationally resource-intensive approach to modeling docking and biomolecular interactions. Nevertheless, Alphafold 3 still needs to improve on the problem of proceeding judgment, the problem of illusions (overlapping of two chains, etc.), and insufficient information about protein dynamics. To address these issues, machine learning researchers are actively exploring diffusion generative models, especially in flow matching,¹³⁶ Schrodinger bridges,¹³⁷ and stochastic interpolation,¹³⁶ to improve the accuracy and dynamic prediction ability of the models.

4.2 Advances in deep learning methods

With the optimization and integration of Transformer technology, models such as TrRosetta and RGN2 have significantly improved the accuracy of structural prediction of single-chain and multichain protein complexes by capturing long-range dependencies in protein sequences.¹³⁸ Meanwhile, integration methods that incorporate data from multiple sources, as well as the application of self-supervised and semisupervised learning techniques, have provided innovative solutions to deal with the problem of scarcity of labeled data, and these methods have enhanced the generalization ability and prediction accuracy of the models by utilizing a large amount of unlabeled biological sequence data.¹³⁹ With the rapid growth of biological data size and the increase of model complexity, traditional computing resources are challenging to cope with the demand of large-scale data training.¹⁴⁰ Parallel computing techniques provide strong support for handling complex tasks and play an important role in large-scale protein structure prediction. With distributed deep learning and multi-GPU architectures, parallel computing frameworks can efficiently distribute computational loads, significantly reduce model training time, and exhibit higher computational efficiency when dealing with ultra-large-scale datasets.¹⁴¹ Khan et al.¹⁴² significantly improved the efficiency and stability of DNNs in the task of anti-inflammatory peptide prediction by using a parallel distributed computing approach. Specifically, the training time of the model was reduced by about 70% in a multi-GPU architecture. At the same time, the gradient convergence was accelerated when dealing with large-scale datasets, and the incidence of training instability was reduced, with a 20% reduction in error fluctuations. Modern deep learning-based protein structure prediction methods have been achieved in single-stranded proteins and have recently been used in new protein structure prediction systems such as protein–nucleic acid complexes, protein–ligand complexes, and protein multiconformational prediction. RoseTTAFoldNA extends the RoseTTAFold deep learning approach to accurately predict nucleic acids and protein–nucleic acid complexes to design sequence-specific RNA and DNA binding proteins.⁶⁹ Co-folded protein–ligand complexes have the potential to accelerate drug repositioning, and Umol combines a large language model and a multimodal language model independent of structural information to enable prediction of the complete all-atom structure of protein–ligand complexes.¹⁴³ In addition, Schweke et al.¹⁴⁴ described a scalable AlphaFold2-based strategy for predicting homo-oligomer assembly between different proteomes to reveal the quaternary structure of proteins. In addition to continuously optimized algorithmic strategies, methods combining large and multimodal language models are gradually showing advantages. Protein language modeling is based on NLP technology, which uses analogous processing of text data to process biological information such as amino acid sequences and protein sequences. Its core advantage lies in capturing the Great Wall dependencies and complex patterns in protein sequences, and even kinetic information, to achieve end-to-end modeling of sequences and to predict the three-dimensional structure of proteins directly from the original amino acid sequences, including precise positioning at the atomic level. OmegaFold uses a protein language model combined with transformer architecture to predict only high-resolution protein structures from a single primary sequence.¹³⁰ trRosettaX-Single embeds a supervised Transformer protein language model, which performs well in dealing with orphan proteins and protein design, with an average template modeling score (TM-score) of 0.79 and using fewer computational resources.¹²⁷ Recent studies also explore dynamic prediction methods to simulate the folding process and functional state changes of proteins in organisms, opening up new directions in protein structure prediction techniques.¹⁴⁵ These technological advances greatly advance protein structure prediction and provide powerful new tools and methods for understanding complex disease mechanisms, new drug designs, and personalized medicine.

5 CHALLENGES AND SOLUTIONS

Modern hybrid methods demonstrate the potential for solving complex protein structure prediction problems. However, these methods also pose new challenges that require further optimization. High-accuracy protein structure prediction places higher demands on the availability and quality of training data.¹⁴⁶ At the same time, modern hybrid methods often rely on data integration from different architectures, which not only increases the complexity of managing and coordinating different learning paradigms with data representations but also significantly increases the demand on computational resources, including memory, processing power, and the complexity of model training and tuning.^{147, 148} In addition, deep learning models are prone to overfitting, that is, models may overfit specific features (including noise and bias) in the training data while ignoring more general and generalizable patterns when dealing with nonlinear data.¹⁴⁹ This overfitting results in models performing well on the training set but performing much less on new data.¹⁵⁰ Further, these models have a high dependency on large-scale, high-quality datasets, and the predictive power of the models is significantly reduced in the presence of scarce or noisy data.¹⁵¹ Current deep learning models mainly focus on static structure prediction and have limited ability to predict the dynamic changes of proteins under physiological conditions, especially in complex biological systems.¹⁵² Although these models are more effective in predicting the interactions of known molecules under static conditions, dynamic conformational changes of proteins in metabolic pathways or complete biological systems are still difficult to capture accurately.¹⁵³ This is mainly because proteins continuously adjust their spatial structure to fit the optimal molecular interaction conformation in a dynamic environment. Therefore, the predictive accuracy of existing models is still deficient for large-scale, multicomponent protein systems, such as organelles or transmembrane protein complexes. In addition, the “black-box” nature of deep learning models leads to a lack of interpretability, and the high complexity of protein structures and data scarcity increase the uncertainty of predictions, especially in the case of high-dimensional datasets.¹⁵⁴ This is a critical issue in biological applications, where biologists often need to understand the biological mechanisms behind the predictions. One potential solution is a hybrid model combining traditional physical models and deep learning approaches, which combines deep learning with physical models such as MD simulations—leveraging the strengths of deep learning in pattern recognition and big data processing and drawing on the accuracy of physical models in capturing intermolecular mechanical interactions and temporal evolution thereby improving the ability to model the dynamic behavior of proteins. Another approach is to develop multimodal learning frameworks capable of simultaneously processing sequence, structural, and time-series data to capture the dynamic properties of proteins. By integrating multi-level information more comprehensively, multimodal frameworks can provide more accurate solutions for predicting complex protein networks and large-scale multicomponent systems. Roll's study highlights the importance of capturing proteins' conformational changes and dynamic properties for understanding their functions.¹⁵⁵ Protein structure prediction is a multidisciplinary field involving several disciplines, including computational biology, structural biology, and artificial intelligence. Collaborative efforts and open-source initiatives are essential to drive the rapid development of this field. Although models such as AlphaFold 3 have made breakthroughs in protein structure prediction, they still need to be open-sourced, limiting the participation of the broader community and further innovation.

6 APPLICATION AND SIGNIFICANCE OF PREDICTED PROTEIN STRUCTURES

The rapid development of protein structure prediction technology has revolutionized life science and medical research.¹⁵⁶ Predicting proteins is important for drug discovery and design, understanding disease mechanisms, protein engineering, personalized medicine, and so on. The core of drug design lies in constructing molecules with controllable interaction characteristics to regulate multiple target and nontarget proteins in organisms precisely.¹⁵⁷ Potential binding molecules can be identified more rapidly in screening drug candidate molecules by accurately predicting target protein structures combined with large-scale virtual screening.¹⁵⁸ Zhang et al.¹⁵⁹ showed that the virtual screening method based on the predicted structure of AlphaFold increased the hit rate by 30% compared with the traditional method, which greatly accelerated the discovery process of the lead compounds. Structure prediction of membrane protein complexes is particularly critical in understanding the mechanism of drug-target interactions, as they are the main targets of numerous drugs.¹⁶⁰ By analyzing predicted protein–ligand complex structures, deep learning models can reveal key interactions such as hydrogen bonding, hydrophobic interactions, and π–π stacking, information that is crucial for structural optimization and property improvement of drugs. However, predicting the structure of membrane protein complexes faces unique challenges. As membrane proteins are widely found in cell membranes, their transmembrane regions are highly hydrophobic and structurally unstable, which makes acquiring high-quality experimental data very difficult, thus affecting the accuracy of prediction models. Even advanced deep learning models, such as AlphaFold and RoseTTAFold, still have limitations when dealing with transmembrane regions or multicomponent membrane protein complexes.^{161, 162} These challenges have important implications for drug discovery because membrane protein complexes usually involve complex protein–protein and protein–ligand interactions, directly determining the drugs' binding effect and biological activity. Liu et al. successfully designed a series of highly efficient inhibitors using the AlphaFold-predicted structure of the SARS-CoV-2 master protease, which provided a new idea for developing therapeutic drugs for neoconjugate pneumonia.¹⁶³ In addition, protein aggregates play a role in many diseases. The effects of 59 mutations on domain stability in cystic fibrosis were predicted using 15 different algorithms, and the structures help predict the effects and mechanisms of multiple disease-causing mutations in other proteins.¹⁶⁴ Meanwhile, by accurately predicting protein structures, researchers can purposefully design the amino acid sequences of proteins to create novel proteins with specific functions, which have a wide range of applications in fields such as industrial biocatalysis, environmental protection, and biomaterial development.¹⁷

7 CONCLUSION AND OUTLOOK

As deep learning methods evolve in PSP, traditional methods are moving toward revolutionary “neuralization” heralded by the emergence of computational power and combining multiple deep learning techniques. This review outlines the paradigm shift from traditional PSP methods to hybrid prediction methods. Modern prediction methods based on deep learning networks have significantly improved prediction accuracy, especially when competing with nonhomologous proteins, where traditional template-based and TFM approaches often falter. Both TBM and free modeling require computers and corresponding algorithms to simulate the 3D structure of proteins. These methods rely on complex computational processes, including sequence alignment, structure modeling, and energy minimization. The establishment and improvement of various bioinformatics databases have laid a solid foundation for predicting protein sequences, structures, functional annotations, and protein–protein interactions.

The success of AF 2 lies in its innovative attention mechanism, evolutionary information encoding, and end-to-end framework. Subsequently, a series of models such as Umol, RGN2, and TrRosetta adopted deep learning architectures, including Transformer. Further, they refined the algorithmic strategy to enhance the predictive power of the models significantly. The recent release of AlphaFold 3 introduces diffusion modeling techniques that further extend the boundaries of predictive models, particularly in protein-small molecule structure prediction. This extension marks a key shift toward a holistic approach in elucidating biomolecular structures. With the addition of more deep learning methods, such as adversarial generative networks, the applicability of the protein structure prediction problem has been greatly improved, and the prediction of RNA structure is now at the forefront. Meanwhile, in addition to improving the accuracy of the original models, methods combining large and multimodal language models are gradually showing advantages. In the future, as the size and complexity of biological data grow, parallel computing will play a key role in driving deep learning techniques to handle more extensive and complex protein structure prediction tasks. Through multi-GPU architectures and distributed computing frameworks, parallel computing is expected to significantly improve the computational efficiency of models and further optimize the performance of deep learning models in the context of big biological data.

Although MSA is crucial in moving from sequence prediction to structure prediction, it may no longer be the only core challenge. Adding more deep learning techniques, such as diffusion models, GANs, and large language models, is gradually changing the focus of research in this field. The integration of protein language models with innovative neural network architectures will undoubtedly refine our predictive capabilities, heralding a new era in which accurate modeling of proteins will be transformed from a scientific challenge to a routine capability with far-reaching implications for drug discovery, therapeutic interventions, and understanding of the molecular structure of life itself.

AUTHOR CONTRIBUTIONS

Yiming Qin: Conceptualization (equal); data curation (equal); investigation (equal); visualization (equal); writing—original draft (equal); writing—review and editing (equal). Zihan Chen: Writing—review and editing (equal). Ye Peng: Writing—review and editing (equal). Ying Xiao: Supervision (equal); writing—review and editing (equal). Tian Zhong: Resources (equal); supervision (equal); writing—review and editing (equal). Xi Yu: Conceptualization (equal); funding acquisition (equal); project administration (equal); supervision (equal). All authors have read and approved the final manuscript.

ACKNOWLEDGMENTS

We gratefully thank the anonymous reviewers for their important and constructive comments and suggestions. This study was supported by the Fundo para o Desenvolvimento das Ciências e da Tecnologia (Grant Number: 0065/2023/ITP2).

CONFLICT OF INTEREST STATEMENT

The authors declare no conflict of interest.

ETHICS STATEMENT

Not applicable.

Open Research

DATA AVAILABILITY STATEMENT

Not applicable.

REFERENCES

1Borkakoti N, Thornton JM. AlphaFold2 protein structure prediction: implications for drug discovery. Curr Opin Struct Biol. 2023; 78:102526.
10.1016/j.sbi.2022.102526
CAS PubMed Web of Science® Google Scholar
2Zhu J, Zou J, Li F, et al. Evaluation of neuroprotective agents acting via the BDNF–TrkB pathway using AI-enabled predictions of ligand–receptor interactions. MedComm Fut Med. 2022; 1(1):e15.
10.1002/mef2.15
Google Scholar
3Du K, Huang H. Development of anti-PD-L1 antibody based on structure prediction of AlphaFold2. Front Immunol. 2023; 14:1275999.
10.3389/fimmu.2023.1275999
CAS PubMed Google Scholar
4Heinemann M, Panke S. Synthetic biology—putting engineering into biology. Bioinformatics. 2006; 22(22): 2790-2799.
10.1093/bioinformatics/btl469
CAS PubMed Web of Science® Google Scholar
5Anfinsen CB, Haber E, Sela M, White Jr. FH. The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain. Proc Natl Acad Sci USA. 1961; 47(9): 1309-1314.
10.1073/pnas.47.9.1309
CAS PubMed Web of Science® Google Scholar
6Brookes E, Rocco M, Vachette P, Trewhella J. AlphaFold-predicted protein structures and small-angle X-ray scattering: insights from an extended examination of selected data in the small-angle scattering biological data bank. J Appl Crystal. 2023; 56(Pt 4): 910-926.
10.1107/S1600576723005344
CAS PubMed Google Scholar
7Koehler Leman J, Künze G. Recent advances in NMR protein structure prediction with ROSETTA. Int J Mol Sci. 2023; 24(9): 7835.
10.3390/ijms24097835
CAS PubMed Google Scholar
8Su Q, Hu F, Ge X, et al. Structure of the human PKD1-PKD2 complex. Science. 2018; 361(6406):eaat9819.
10.1126/science.aat9819
PubMed Web of Science® Google Scholar
9Suh D, Lee JW, Choi S, Lee Y. Recent applications of deep learning methods on evolution- and contact-based protein structure prediction. Int J Mol Sci. 2021; 22(11): 6032.
10.3390/ijms22116032
CAS PubMed Web of Science® Google Scholar
10Jumper J, Evans R, Pritzel A, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021; 596(7873): 583-589.
10.1038/s41586-021-03819-2
CAS PubMed Web of Science® Google Scholar
11Tunyasuvunakool K, Adler J, Wu Z, et al. Highly accurate protein structure prediction for the human proteome. Nature. 2021; 596(7873): 590-596.
10.1038/s41586-021-03828-1
CAS PubMed Web of Science® Google Scholar
12Fu L, Cao Y, Wu J, Peng Q, Nie Q, Xie X. UFold: fast and accurate RNA secondary structure prediction with deep learning. Nucleic Acids Res. 2022; 50(3):e14.
10.1093/nar/gkab1074
CAS PubMed Google Scholar
13Abramson J, Adler J, Dunger J, et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024; 630(8016): 493-500.
10.1038/s41586-024-07487-w
CAS PubMed Web of Science® Google Scholar
14Wu KE, Yang KK, van den Berg R, et al. Protein structure generation via folding diffusion. Nat Commun. 2024; 15(1): 1059.
10.1038/s41467-024-45051-2
CAS PubMed Google Scholar
15Schauperl M, Denny RA. AI-based protein structure prediction in drug discovery: impacts and challenges. J Chem Inf Model. 2022; 62(13): 3142-3156.
10.1021/acs.jcim.2c00026
CAS PubMed Google Scholar
16Oganov AR, Pickard CJ, Zhu Q, Needs RJ. Structure prediction drives materials discovery. Nat Rev Mater. 2019; 4(5): 331-348.
10.1038/s41578-019-0101-8
Web of Science® Google Scholar
17Kuhlman B, Bradley P. Advances in protein structure prediction and design. Nat Rev Mol Cell Biol. 2019; 20(11): 681-697.
10.1038/s41580-019-0163-x
CAS PubMed Web of Science® Google Scholar
18Huang B, Kong L, Wang C, et al. Protein structure prediction: challenges, advances, and the shift of research paradigms. Genom Insights. 2023; 21(5): 913-925.
CAS Google Scholar
19Eisenhaber F, Persson B, Argos P. Protein structure prediction: recognition of primary, secondary, and tertiary structural features from amino acid sequence. Crit Rev Biochem Mol Biol. 1995; 30(1): 1-94.
10.3109/10409239509085139
CAS PubMed Web of Science® Google Scholar
20Godzik A. The structural alignment between two proteins: is there a unique answer. Prot Sci. 1996; 5(7): 1325-1338.
10.1002/pro.5560050711
CAS PubMed Web of Science® Google Scholar
21Meng Q, Guo F, Tang J. Improved structure-related prediction for insufficient homologous proteins using MSA enhancement and pre-trained language model. Brief Bioinform. 2023; 24(4):bbad217.
10.1093/bib/bbad217
PubMed Google Scholar
22Nishimura M, Arimura Y, Nozawa K, Kurumizaka H. Linker DNA and histone contributions in nucleosome binding by p53. J Biochem. 2020; 168(6): 669-675.
10.1093/jb/mvaa081
CAS PubMed Google Scholar
23Sharma R, Kim JJ, Qin L, et al. An auto-inhibited state of protein kinase G and implications for selective activation. eLife. 2022; 11:e79530.
10.7554/eLife.79530
CAS PubMed Google Scholar
24Evans R, O'Neill M, Pritzel A, et al. Protein complex prediction with AlphaFold-Multimer. biorxiv. 2022:2021.10.04.463034. doi:10.1101/2021.10.04.463034
Google Scholar
25Bryant P, Pozzati G, Elofsson A. Improved prediction of protein-protein interactions using AlphaFold2. Nat Commun. 2022; 13(1): 1265.
10.1038/s41467-022-28865-w
CAS PubMed Web of Science® Google Scholar
26Callaway E. It will change everything’: DeepMind's AI makes gigantic leap in solving protein structures. Nature. 2020; 588(7837): 203-204.
10.1038/d41586-020-03348-4
CAS PubMed Web of Science® Google Scholar
27Zhang C, Zheng W, Huang X, Bell EW, Zhou X, Zhang Y. Protein structure and sequence reanalysis of 2019-nCoV genome refutes snakes as its intermediate host and the unique similarity between its spike protein insertions and HIV-1. J Proteome Res. 2020; 19(4): 1351-1360.
10.1021/acs.jproteome.0c00129
CAS PubMed Web of Science® Google Scholar
28Ryu JY, Kim HU, Lee SY. Deep learning enables high-quality and high-throughput prediction of enzyme commission numbers. Proc Natl Acad Sci USA. 2019; 116(28): 13996-14001.
10.1073/pnas.1821905116
CAS PubMed Web of Science® Google Scholar
29Aloy P, Ceulemans H, Stark A, Russell RB. The relationship between sequence and interaction divergence in proteins. J Mol Biol. 2003; 332(5): 989-998.
10.1016/j.jmb.2003.07.006
CAS PubMed Web of Science® Google Scholar
30Croll TI, Sammito MD, Kryshtafovych A, Read RJ. Evaluation of template-based modeling in CASP13. Prot Struct Funct Bioinf. 2019; 87(12): 1113-1127.
10.1002/prot.25800
CAS PubMed Web of Science® Google Scholar
31Pons J-L, Labesse G. TOME-2: a new pipeline for comparative modeling of protein–ligand complexes. Nucl Acids Res. 2009; 37(suppl 2): W485-W491.
10.1093/nar/gkp368
CAS PubMed Google Scholar
32Bordoli L, Kiefer F, Arnold K, Benkert P, Battey J, Schwede T. Protein structure homology modeling using SWISS-MODEL workspace. Nat Protoc. 2009; 4(1): 1-13.
10.1038/nprot.2008.197
CAS PubMed Web of Science® Google Scholar
33Jones DT, Taylort WR, Thornton JM. A new approach to protein fold recognition. Nature. 1992; 358(6381): 86-89.
10.1038/358086a0
CAS PubMed Web of Science® Google Scholar
34Rozano L, Jones DAB, Hane JK, Mancera RL. Template-based modelling of the structure of fungal effector proteins. Mol Biotechnol. 2024; 66(4): 784-813.
10.1007/s12033-023-00703-4
CAS PubMed Google Scholar
35Schwede T. SWISS-MODEL: an automated protein homology-modeling server. Nucl Acids Res. 2003; 31(13): 3381-3385.
10.1093/nar/gkg520
CAS PubMed Web of Science® Google Scholar
36Söding J. Protein homology detection by HMM–HMM comparison. Bioinformatics. 2005; 21(7): 951-960.
10.1093/bioinformatics/bti125
PubMed Web of Science® Google Scholar
37Ye J, McGinnis S, Madden TL. BLAST: improvements for better sequence analysis. Nucl Acids Res. 2006; 34(suppl 2): W6-W9.
10.1093/nar/gkl164
CAS PubMed Google Scholar
38Rost B. Twilight zone of protein sequence alignments. Protein Eng Des Sel. 1999; 12(2): 85-94.
10.1093/protein/12.2.85
CAS Web of Science® Google Scholar
39Varadi M, Anyango S, Deshpande M, et al. AlphaFold protein structure database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucl Acids Res. 2022; 50(D1): D439-D444.
10.1093/nar/gkab1061
CAS PubMed Web of Science® Google Scholar
40Holm L, Sander C. Protein structure comparison by alignment of distance matrices. J Mol Biol. 1993; 233(1): 123-138.
10.1006/jmbi.1993.1489
CAS PubMed Web of Science® Google Scholar
41Hamamsy T, Morton JT, Blackwell R, et al. Protein remote homology detection and structural alignment using deep learning. Nat Biotechnol. 2024; 42(6): 975-985.
10.1038/s41587-023-01917-2
CAS PubMed Google Scholar
42Murphy LR, Wallqvist A, Levy RM. Simplified amino acid alphabets for protein fold recognition and implications for folding. Protein Eng Des Sel. 2000; 13(3): 149-152.
10.1093/protein/13.3.149
CAS Google Scholar
43Yin R, Feng BY, Varshney A, Pierce BG. Benchmarking AlphaFold for protein complex modeling reveals accuracy determinants. Prot Sci. 2022; 31(8):e4379.
10.1002/pro.4379
CAS PubMed Web of Science® Google Scholar
44Jisna VA, Jayaraj PB. Protein structure prediction: conventional and deep learning perspectives. Protein J. 2021; 40(4): 522-544.
10.1007/s10930-021-10003-y
CAS PubMed Google Scholar
45Zhang Y, Kolinski A, Skolnick J. TOUCHSTONE II: a new approach to ab initio protein structure prediction. Biophys J. 2003; 85(2): 1145-1164.
10.1016/S0006-3495(03)74551-2
CAS PubMed Web of Science® Google Scholar
46Zhang Y. Template-based modeling and free modeling by I-TASSER in CASP7. Prot Struct Funct Bioinf. 2007; 69(S8): 108-117.
10.1002/prot.21702
CAS PubMed Web of Science® Google Scholar
47Pons JL, Labesse G. @TOME-2: a new pipeline for comparative modeling of protein-ligand complexes. Nucl Acids Res. 2009; 37(Web Server issue): W485-W491.
10.1093/nar/gkp368
CAS PubMed Google Scholar
48Zhang Y, Kolinski A, Skolnick J. TOUCHSTONE II: a new approach to ab initio protein structure prediction. Biophys J. 2003; 85(2): 1145-1164.
10.1016/S0006-3495(03)74551-2
CAS PubMed Web of Science® Google Scholar
49Ovchinnikov S, Park H, Varghese N, et al. Protein structure determination using metagenome sequence data. Science. 2017; 355(6322): 294-298.
10.1126/science.aah4043
CAS PubMed Web of Science® Google Scholar
50Kaufmann KW, Lemmon GH, DeLuca SL, Sheehan JH, Meiler J. Practically useful: what the Rosetta protein modeling suite can do for you. Biochemistry. 2010; 49(14): 2987-2998.
10.1021/bi902153g
CAS PubMed Web of Science® Google Scholar
51Bradley P, Misura KMS, Baker D. Toward high-resolution de novo structure prediction for small proteins. Science. 2005; 309(5742): 1868-1871.
10.1126/science.1113801
CAS PubMed Web of Science® Google Scholar
52Chen CY-C, Tou WI. How to design a drug for the disordered proteins? Drug Discovery Today. 2013; 18(19-20): 910-915.
10.1016/j.drudis.2013.04.008
CAS PubMed Google Scholar
53Bitran A, Jacobs WM, Zhai X, Shakhnovich E. Cotranslational folding allows misfolding-prone proteins to circumvent deep kinetic traps. Proc Natl Acad Sci USA. 2020; 117(3): 1485-1495.
10.1073/pnas.1913207117
CAS PubMed Google Scholar
54Yang C-H, Lin Y-S, Chuang L-Y, Lin Y-D. Effective hybrid approach for protein structure prediction in a two-dimensional hydrophobic–polar model. Comput Biol Med. 2019; 113:103397.
10.1016/j.compbiomed.2019.103397
CAS PubMed Google Scholar
55Dhingra S, Sowdhamini R, Cadet F, Offmann B. A glance into the evolution of template-free protein structure prediction methodologies. Biochimie. 2020; 175: 85-92.
10.1016/j.biochi.2020.04.026
CAS PubMed Web of Science® Google Scholar
56Senior AW, Evans R, Jumper J, et al. Improved protein structure prediction using potentials from deep learning. Nature. 2020; 577(7792): 706-710.
10.1038/s41586-019-1923-7
CAS PubMed Web of Science® Google Scholar
57LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015; 521(7553): 436-444.
10.1038/nature14539
CAS PubMed Web of Science® Google Scholar
58Jones DT, Singh T, Kosciolek T, Tetchner S. MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins. Bioinformatics. 2015; 31(7): 999-1006.
10.1093/bioinformatics/btu791
CAS PubMed Web of Science® Google Scholar
59Jones DT, Kandathil SM. High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features. Bioinformatics. 2018; 34(19): 3308-3315.
10.1093/bioinformatics/bty341
CAS PubMed Web of Science® Google Scholar
60Wang T, Qiao Y, Ding W, Mao W, Zhou Y, Gong H. Improved fragment sampling for ab initio protein structure prediction using deep neural networks. Nat Mach Intel. 2019; 1(8): 347-355.
10.1038/s42256-019-0075-7
Google Scholar
61Du Z, Su H, Wang W, et al. The trRosetta server for fast and accurate protein structure prediction. Nat Protoc. 2021; 16(12): 5634-5651.
10.1038/s41596-021-00628-9
CAS PubMed Web of Science® Google Scholar
62Ju F, Zhu J, Shao B, et al. CopulaNet: learning residue co-evolution directly from multiple sequence alignment for protein structure prediction. Nat Commun. 2021; 12(1): 2535.
10.1038/s41467-021-22869-8
CAS PubMed Google Scholar
63Chowdhury R, Bouatta N, Biswas S, et al. Single-sequence protein structure prediction using a language model and deep learning. Nat Biotechnol. 2022; 40(11): 1617-1623.
10.1038/s41587-022-01432-w
CAS PubMed Web of Science® Google Scholar
64Singh J, Litfin T, Singh J, Paliwal K, Zhou Y. SPOT-contact-LM: improving single-sequence-based prediction of protein contact map using a transformer language model. Bioinformatics. 2022; 38(7): 1888-1894.
10.1093/bioinformatics/btac053
CAS PubMed Google Scholar
65Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M. ColabFold: making protein folding accessible to all. Nat Methods. 2022; 19(6): 679-682.
10.1038/s41592-022-01488-1
CAS PubMed Web of Science® Google Scholar
66Bryant P, Kelkar A, Guljas A, Clementi C, Noé F. Structure prediction of protein-ligand complexes from sequence information with Umol. Nat Commun. 2024; 15(1):4536.
10.1038/s41467-024-48837-6
Google Scholar
67Lin Z, Akin H, Rao R, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science. 2023; 379(6637): 1123-1130.
10.1126/science.ade2574
CAS PubMed Web of Science® Google Scholar
68Ruffolo JA, Chu L-S, Mahajan SP, Gray JJ. Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies. Nat Commun. 2023; 14(1): 2389.
10.1038/s41467-023-38063-x
CAS PubMed Web of Science® Google Scholar
69Baek M, McHugh R, Anishchenko I, Jiang H, Baker D, DiMaio F. Accurate prediction of protein–nucleic acid complexes using RoseTTAFoldNA. Nat Methods. 2024; 21(1): 117-121.
10.1038/s41592-023-02086-5
CAS PubMed Google Scholar
70Wayment-Steele HK, Ojoawo A, Otten R, et al. Predicting multiple conformations via sequence clustering and AlphaFold2. Nature. 2024; 625(7996): 832-839.
10.1038/s41586-023-06832-9
CAS PubMed Web of Science® Google Scholar
71Qiao Y, Yang R, Liu Y, et al. DeepFusion: a deep bimodal information fusion network for unraveling protein-RNA interactions using in vivo RNA structures. Comput Struct Biotechnol J. 2024; 23: 617-625.
10.1016/j.csbj.2023.12.040
CAS PubMed Google Scholar
72Shor B, Schneidman-Duhovny D. CombFold: predicting structures of large protein assemblies using a combinatorial assembly algorithm and AlphaFold2. Nat Methods. 2024; 21(3): 477-487.
10.1038/s41592-024-02174-0
CAS PubMed Google Scholar
73Basheer IA, Hajmeer M. Artificial neural networks: fundamentals, computing, design, and application. J Microbiol Meth. 2000; 43(1): 3-31.
10.1016/S0167-7012(00)00201-3
CAS PubMed Web of Science® Google Scholar
74Yang L, Han Y, Zhang H, Li W, Dai Y. Prediction of protein-protein interactions with local weight-sharing mechanism in deep learning. BioMed Res Int. 2020; 2020:5072520.
10.1155/2020/5072520
PubMed Web of Science® Google Scholar
75Damm KL, Carlson HA. Gaussian-weighted RMSD superposition of proteins: a structural comparison for flexible proteins and predicted protein structures. Biophys J. 2006; 90(12): 4558-4573.
10.1529/biophysj.105.066654
CAS PubMed Web of Science® Google Scholar
76Zhang Z, Xu M, Jamasb A, et al. Protein representation learning by geometric structure pretraining. arXiv preprint. 2022;arXiv:220306125. doi:10.48550/arXiv.2203.06125
Google Scholar
77Santoso LW, Singh B, Rajest SS, Regin R, Kadhim KH. A genetic programming approach to binary classification problem. EAI Endorsed Trans Energy Web. 2021; 8(31):e11.
Google Scholar
78Zhang Y, Gao S, Cai P, Lei Z, Wang Y. Information entropy-based differential evolution with extremely randomized trees and LightGBM for protein structural class prediction. Appl Soft Comp. 2023; 136:110064.
10.1016/j.asoc.2023.110064
Google Scholar
79Cui F, Li S, Zhang Z, et al. DeepMC-iNABP: deep learning for multiclass identification and classification of nucleic acid-binding proteins. Comput Struct Biotechnol J. 2022; 20: 2020-2028.
10.1016/j.csbj.2022.04.029
CAS PubMed Google Scholar
80Seeger M. Learning With Labeled and Unlabeled Data. EPFL; 2000. https://infoscience.epfl.ch/handle/20.500.14299/61765
Google Scholar
81Ibrar D, Khan S, Raza M, et al. Application of machine learning for identification of heterotic groups in sunflower through combined approach of phenotyping, genotyping and protein profiling. Sci Rep. 2024; 14(1): 7333.
10.1038/s41598-024-58049-z
CAS PubMed Google Scholar
82Chen C, Zhou J, Wang F, Liu X, Dou D. Structure-aware protein self-supervised learning. Bioinformatics. 2023; 39(4):btad189.
10.1093/bioinformatics/btad189
CAS PubMed Google Scholar
83Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 1998; 86(11): 2278-2324.
10.1109/5.726791
Web of Science® Google Scholar
84Pun MN, Ivanov A, Bellamy Q, et al. Learning the shape of protein microenvironments with a holographic convolutional neural network. Proc Natl Acad Sci USA. 2024; 121(6):e2300838121.
10.1073/pnas.2300838121
CAS PubMed Google Scholar
85Zhang Y, Chen Y, Wang C, et al. Prodconn-protein design using a convolutional neural network. Biophys J. 2020; 118(3): 43a-44a.
10.1016/j.bpj.2019.11.419
Web of Science® Google Scholar
86Cao X, He W, Chen Z, et al. PSSP-MVIRT: peptide secondary structure prediction based on a multi-view deep learning architecture. Brief Bioinform. 2021; 22(6):bbab203.
10.1093/bib/bbab203
PubMed Web of Science® Google Scholar
87Tsai S-T, Kuo E-J, Tiwary P. Learning molecular dynamics with simple language model built upon long short-term memory neural network. Nat Commun. 2020; 11(1): 5115.
10.1038/s41467-020-18959-8
CAS PubMed Google Scholar
88Karimi M, Wu D, Wang Z, Shen Y. DeepAffinity: interpretable deep learning of compound–protein affinity through unified recurrent and convolutional neural networks. Bioinformatics. 2019; 35(18): 3329-3338.
10.1093/bioinformatics/btz111
CAS PubMed Web of Science® Google Scholar
89Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997; 9(8): 1735-1780.
10.1162/neco.1997.9.8.1735
CAS PubMed Web of Science® Google Scholar
90Wang DD, Ou-Yang L, Xie H, Zhu M, Yan H. Predicting the impacts of mutations on protein-ligand binding affinity based on molecular dynamics simulations and machine learning methods. Comput Struct Biotechnol J. 2020; 18: 439-454.
10.1016/j.csbj.2020.02.007
CAS PubMed Web of Science® Google Scholar
91Wan C, Jones DT. Protein function prediction is improved by creating synthetic feature samples with generative adversarial networks. Nat Mach Intel. 2020; 2(9): 540-550.
10.1038/s42256-020-0222-1
Google Scholar
92Yang H, Wang M, Yu Z, Zhao XM, Li A. GANcon: protein contact map prediction with deep generative adversarial network. IEEE Access. 2020; 8: 80899-80907.
10.1109/ACCESS.2020.2991605
Web of Science® Google Scholar
93Madani M, Behzadi MM, Song D, Ilies H, Tarakanova A. CGAN-Cmap: protein contact map prediction using deep generative adversarial neural networks. bioRxiv. 2022:2022.07.26.501607. doi:10.1101/2022.07.26.501607
Google Scholar
94Yang Y, Wang H, Li W, et al. Prediction and analysis of multiple protein lysine modified sites based on conditional wasserstein generative adversarial networks. BMC Bioinform. 2021; 22(1): 171.
10.1186/s12859-021-04101-y
CAS PubMed Google Scholar
95Karimi M, Zhu S, Cao Y, Shen Y. De novo protein design for novel folds using guided conditional wasserstein generative adversarial networks. J Chem Inf Model. 2020; 60(12): 5667-5681.
10.1021/acs.jcim.0c00593
CAS PubMed Google Scholar
96Yim J, Campbell A, Foong AY, et al. Fast protein backbone generation with SE (3) flow matching. arXiv preprint. 2023;arXiv:231005297. doi:10.48550/arXiv.2310.05297
Google Scholar
97Guo Z, Liu J, Wang Y, et al. Diffusion models in bioinformatics and computational biology. Nat Rev Bioeng. 2024; 2(2): 136-154.
10.1038/s44222-023-00114-9
PubMed Google Scholar
98Jing B, Berger B, Jaakkola T. AlphaFold meets flow matching for generating protein ensembles. arXiv preprint. 2024;arXiv:240204845. doi:10.48550/arXiv.2402.04845
Google Scholar
99Sajadi SZ, Zare Chahooki MA, Gharaghani S, Abbasi K. AutoDTI++: deep unsupervised learning for DTI prediction by autoencoders. BMC Bioinformatics. 2021; 22(1): 204.
10.1186/s12859-021-04127-2
PubMed Web of Science® Google Scholar
100Gligorijević V, Renfrew PD, Kosciolek T, et al. Structure-based protein function prediction using graph convolutional networks. Nat Commun. 2021; 12(1): 3168.
10.1038/s41467-021-23303-9
CAS PubMed Web of Science® Google Scholar
101Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Adv Neural Inf Process Syst. 2017; 30.
Google Scholar
102Chen L, Tan X, Wang D, et al. TransformerCPI: improving compound–protein interaction prediction by sequence-based deep learning with self-attention mechanism and label reversal experiments. Bioinformatics. 2020; 36(16): 4406-4414.
10.1093/bioinformatics/btaa524
CAS PubMed Web of Science® Google Scholar
103Bao H, Dong L, Piao S, Wei F. Beit: bert pre-training of image transformers. arXiv preprint. 2021;arXiv:2106.08254. doi:10.48550/arXiv.2106.08254
Google Scholar
104Ji Y, Zhou Z, Liu H, Davuluri RV. DNABERT: pre-trained bidirectional encoder representations from transformers model for DNA-language in genome. Bioinformatics. 2021; 37(15): 2112-2120.
10.1093/bioinformatics/btab083
CAS PubMed Web of Science® Google Scholar
105Cai T, Lim H, Abbu KA, Qiu Y, Nussinov R, Xie L. MSA-regularized protein sequence transformer toward predicting genome-wide chemical-protein interactions: application to GPCRome deorphanization. J Chem Inf Model. 2021; 61(4): 1570-1582.
10.1021/acs.jcim.0c01285
CAS PubMed Google Scholar
106Pan X, Shen H-B. Predicting RNA–protein binding sites and motifs through combining local and global deep convolutional neural networks. Bioinformatics. 2018; 34(20): 3427-3436.
10.1093/bioinformatics/bty364
CAS PubMed Web of Science® Google Scholar
107Liu X. Deep recurrent neural network for protein function prediction from sequence. arXiv preprint. 2017;arXiv:1701.08318. doi:10.48550/arXiv.1701.08318
Google Scholar
108Rahman T, Du Y, Zhao L, Shehu A. Generative adversarial learning of protein tertiary structures. Molecules. 2021; 26(5): 1209.
10.3390/molecules26051209
CAS PubMed Google Scholar
109Lee J, Kim S-Y, Lee J. Protein structure prediction based on fragment assembly and parameter optimization. Biophys Chem. 2005; 115(2-3): 209-214.
10.1016/j.bpc.2004.12.046
CAS PubMed Google Scholar
110Saldaño T, Escobedo N, Marchetti J, et al. Impact of protein conformational diversity on AlphaFold predictions. Bioinformatics. 2022; 38(10): 2742-2748.
10.1093/bioinformatics/btac202
CAS PubMed Web of Science® Google Scholar
111Meier J, Rao R, Verkuil R, Liu J, Sercu T, Rives A. Language models enable zero-shot prediction of the effects of mutations on protein function. Adv Neural Inf Process Syst. 2021; 34: 29287-29303.
Google Scholar
112Maisuradze GG, Liwo A, Scheraga HA. Principal component analysis for protein folding dynamics. J Mol Biol. 2009; 385(1): 312-329.
10.1016/j.jmb.2008.10.018
CAS PubMed Web of Science® Google Scholar
113Ojeda-May P, Mushtaq AU, Rogne P, et al. Dynamic connection between enzymatic catalysis and collective protein motions. Biochemistry. 2021; 60(28): 2246-2258.
10.1021/acs.biochem.1c00221
CAS PubMed Web of Science® Google Scholar
114Khan S, Khan M, Iqbal N, Rahman MAA, Karim MKA. Deep-piRNA: bi-layered prediction model for PIWI-interacting RNA using discriminative features. Comp Mater Cont. 2022; 72(2): 2243-2258.
Google Scholar
115Uzma, Manzoor U, Halim Z. Protein encoder: an autoencoder-based ensemble feature selection scheme to predict protein secondary structure. Expert Syst Appl. 2023; 213:119081.
10.1016/j.eswa.2022.119081
Google Scholar
116Zhang Y. TM-align: a protein structure alignment algorithm based on the TM-score. Nucl Acids Res. 2005; 33(7): 2302-2309.
10.1093/nar/gki524
CAS PubMed Web of Science® Google Scholar
117Xu J, Zhang Y. How significant is a protein structure similarity with TM-score= 0.5? Bioinformatics. 2010; 26(7): 889-895.
10.1093/bioinformatics/btq066
CAS PubMed Web of Science® Google Scholar
118Zhang J, Vancea AI, Arold ST. Targeting plant UBX proteins: AI-enhanced lessons from distant cousins. Trends Plant Sci. 2022; 27(11): 1099-1108.
10.1016/j.tplants.2022.05.012
CAS PubMed Google Scholar
119Guo HB, Perminov A, Bekele S, et al. AlphaFold2 models indicate that protein sequence determines both structure and dynamics. Sci Rep. 2022; 12:10696.
10.1038/s41598-022-14382-9
CAS PubMed Web of Science® Google Scholar
120Basu S, Wallner B. DockQ: a quality measure for protein-protein docking models. PLoS One. 2016; 11(8):e0161879.
10.1371/journal.pone.0161879
PubMed Web of Science® Google Scholar
121Chou K-C. Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol. 2011; 273(1): 236-247.
10.1016/j.jtbi.2010.12.024
CAS PubMed Web of Science® Google Scholar
122Du X, Hu J, Li S. Using Chou's 5-step rule to predict DNA-protein binding with multi-scale complementary feature. J Proteome Res. 2021; 20(3): 1639-1656.
10.1021/acs.jproteome.0c00864
CAS PubMed Google Scholar
123Berman HM. The protein data bank. Nucl Acids Res. 2000; 28(1): 235-242.
10.1093/nar/28.1.235
CAS PubMed Web of Science® Google Scholar
124Apweiler R. UniProt: the universal protein knowledgebase. Nucl Acids Res. 2004; 32: 115D-119D.
10.1093/nar/gkh131
CAS PubMed Web of Science® Google Scholar
125Varadi M, Bertoni D, Magana P, et al. AlphaFold protein structure database in 2024: providing structure coverage for over 214 million protein sequences. Nucl Acids Res. 2024; 52(D1): D368-D375.
10.1093/nar/gkad1011
PubMed Google Scholar
126Zou J. Artificial intelligence revolution in structure prediction for entire proteomes. MedComm Fut Med. 2022; 1(2):e19.
10.1002/mef2.19
Google Scholar
127Wang W, Peng Z, Yang J. Single-sequence protein structure prediction using supervised transformer protein language models. Nat Comput Sci. 2022; 2(12): 804-814.
10.1038/s43588-022-00373-3
CAS PubMed Google Scholar
128Fang X, Gao J, Hu J, et al. HelixFold-multimer: elevating protein complex structure prediction to new heights. arXiv preprint. 2024;arXiv:240410260. doi:10.48550/arXiv.2404.10260
Google Scholar
129Krishna R, Wang J, Ahern W, et al. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science. 2024; 384(6693):eadl2528.
10.1126/science.adl2528
CAS PubMed Web of Science® Google Scholar
130Ahdritz G, Bouatta N, Floristean C, et al. OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization. Nat Methods. 2024; 218:1514-1524.
10.1038/s41592-024-02272-z
PubMed Google Scholar
131Zhu F, Deng L, Dai Y, et al. PPICT: an integrated deep neural network for predicting inter-protein PTM cross-talk. Brief. Bioinform. 2023; 24(2):bbad052.
10.1093/bib/bbad052
PubMed Google Scholar
132Pereira J, Simpkin AJ, Hartmann MD, Rigden DJ, Keegan RM, Lupas AN. High-accuracy protein structure prediction in CASP14. Prot Struct Funct Bioinf. 2021; 89(12): 1687-1699.
10.1002/prot.26171
CAS PubMed Web of Science® Google Scholar
133Sala D, Engelberger F, Mchaourab HS, Meiler J. Modeling conformational states of proteins with AlphaFold. Curr Opin Struct Biol. 2023; 81:102645.
10.1016/j.sbi.2023.102645
CAS PubMed Web of Science® Google Scholar
134Cheng J, Novati G, Pan J, et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science. 2023; 381(6664):eadg7492.
10.1126/science.adg7492
CAS PubMed Web of Science® Google Scholar
135Hekkelman ML, de Vries I, Joosten RP, Perrakis A. AlphaFill: enriching AlphaFold models with ligands and cofactors. Nat Methods. 2023; 20(2): 205-213.
10.1038/s41592-022-01685-y
CAS PubMed Web of Science® Google Scholar
136Albergo MS, Vanden-Eijnden E. Building normalizing flows with stochastic interpolant. arXiv preprint. 2022;arXiv:220915571. doi:10.48550/arXiv.2209.15571
Google Scholar
137Bortoli De, Thornton V, Heng J, Doucet J. A diffusion schrödinger bridge with applications to score-based generative modeling. Adv Neural Inf Process Syst. 2021; 34: 17695-17709.
Google Scholar
138Guo Z, Liu J, Skolnick J, Cheng J. Prediction of inter-chain distance maps of protein complexes with 2D attention-based deep neural networks. Nat Commun. 2022; 13(1): 6963.
10.1038/s41467-022-34600-2
CAS PubMed Google Scholar
139Rao R, Bhattacharya N, Thomas N, et al. Evaluating protein transfer learning with TAPE. Adv Neural Inf Process Syst. 2019; 32: 9689-9701.
PubMed Google Scholar
140Liu J, Liu Q, Zhang L, Su S, Liu Y. Enabling massive XML-based biological data management in HBase. IEEE/ACM Trans Comput Biol Bioinf. 2020; 17(6): 1994-2004.
10.1109/TCBB.2019.2915811
PubMed Google Scholar
141Wong R, Chang W-L. Fast quantum algorithm for protein structure prediction in hydrophobic-hydrophilic model. J Paral Distribut Comp. 2022; 164: 178-190.
10.1016/j.jpdc.2022.03.011
Google Scholar
142Khan S, Khan MA, Khan M, et al. Optimized feature learning for anti-inflammatory peptide prediction using parallel distributed computing. Appl Sci. 2023; 13(12): 7059.
10.3390/app13127059
CAS Google Scholar
143Bryant P, Kelkar A, Guljas A, Clementi C, Noé F. Structure prediction of protein-ligand complexes from sequence information with Umol. Nat Commun. 2024; 15(1): 4536.
10.1038/s41467-024-48837-6
CAS PubMed Google Scholar
144Schweke H, Pacesa M, Levin T, et al. An atlas of protein homo-oligomerization across domains of life. Cell. 2024; 187(4): 999-1010.e15.
10.1016/j.cell.2024.01.022
CAS PubMed Web of Science® Google Scholar
145Zhao K, Zhao P, Wang S, Xia Y, Zhang G. FoldPAthreader: predicting protein folding pathway using a novel folding force field model derived from known protein universe. Genome Biol. 2024; 25(1): 152.
10.1186/s13059-024-03291-x
CAS PubMed Google Scholar
146Khan S, Uddin I, Khan M, et al. Sequence based model using deep neural network and hybrid features for identification of 5-hydroxymethylcytosine modification. Sci Rep. 2024; 14(1): 9116.
10.1038/s41598-024-59777-y
CAS PubMed Google Scholar
147Ramanathan A, Ma H, Parvatikar A, Chennubhotla SC. Artificial intelligence techniques for integrative structural biology of intrinsically disordered proteins. Curr Opin Struct Biol. 2021; 66: 216-224.
10.1016/j.sbi.2020.12.001
CAS PubMed Web of Science® Google Scholar
148Siebenmorgen T, Zacharias M. Computational prediction of protein–protein binding affinities. WIREs Comp Mol Sci. 2020; 10(3):e1448.
10.1002/wcms.1448
CAS Web of Science® Google Scholar
149khan S, Naeem M, Qiyas M. Deep intelligent predictive model for the identification of diabetes. AIMS Math. 2023; 8(7): 16446-16462.
10.3934/math.2023840
Google Scholar
150Chen CW, Yang HC. OPATs: Omnibus p-value association tests. Brief Bioinform. 2019; 20: 1-14.
10.1093/bib/bbx068
PubMed Google Scholar
151Yang J, Anishchenko I, Park H, Peng Z, Ovchinnikov S, Baker D. Improved protein structure prediction using predicted interresidue orientations. Proc Natl Acad Sci USA. 2020; 117(3): 1496-1503.
10.1073/pnas.1914677117
CAS PubMed Web of Science® Google Scholar
152Jin S, Zeng X, Xia F, Huang W, Liu X. Application of deep learning methods in biological networks. Brief Bioinform. 2021; 22(2): 1902-1917.
10.1093/bib/bbaa043
CAS PubMed Google Scholar
153Romero P, Wagg J, Green ML, Kaiser D, Krummenacker M, Karp PD. Computational prediction of human metabolic pathways from the complete human genome. Genome Biol. 2004; 6(1):R2.
10.1186/gb-2004-6-1-r2
PubMed Web of Science® Google Scholar
154Khan S, Khan M, Iqbal N, Dilshad N, Almufareh MF, Alsubaie N. Enhancing sumoylation site prediction: a deep neural network with discriminative features. Life. 2023; 13(11): 2153.
10.3390/life13112153
CAS PubMed Google Scholar
155Croll TI, Sammito MD, Kryshtafovych A, Read RJ. Evaluation of template-based modeling in CASP13. Proteins Struct Funct Bioinf. 2019; 87(12): 1113-1127.
10.1002/prot.25800
CAS PubMed Web of Science® Google Scholar
156Branco I, Choupina A. Bioinformatics: new tools and applications in life science and personalized medicine. Appl Microbiol Biotechnol. 2021; 105(3): 937-951.
10.1007/s00253-020-11056-2
CAS PubMed Web of Science® Google Scholar
157Schmidt T, Bergner A, Schwede T. Modelling three-dimensional protein structures for applications in drug design. Drug Discovery Today. 2014; 19(7): 890-897.
10.1016/j.drudis.2013.10.027
CAS PubMed Web of Science® Google Scholar
158Li H, Sun X, Cui W, et al. Computational drug development for membrane protein targets. Nat Biotechnol. 2024; 42(2): 229-242.
10.1038/s41587-023-01987-2
CAS PubMed Google Scholar
159Zhou Y, Zhang Y, Lian X, et al. Therapeutic target database update 2022: facilitating drug discovery with enriched comparative data of targeted agents. Nucl Acids Res. 2022; 50(D1): D1398-D1407.
10.1093/nar/gkab953
CAS PubMed Web of Science® Google Scholar
160Arinaminpathy Y, Khurana E, Engelman DM, Gerstein MB. Computational analysis of membrane proteins: the largest class of drug targets. Drug Discovery Today. 2009; 14(23): 1130-1135.
10.1016/j.drudis.2009.08.006
CAS PubMed Web of Science® Google Scholar
161Dobson L, Szekeres LI, Gerdán C, Langó T, Zeke A, Tusnády GE. TmAlphaFold database: membrane localization and evaluation of AlphaFold2 predicted alpha-helical transmembrane protein structures. Nucl Acids Res. 2023; 51(D1): D517-D522.
10.1093/nar/gkac928
CAS PubMed Google Scholar
162Samanta R, Harmalkar A, Prathima P, Gray JJ. Advancing membrane-associated protein docking with improved sampling and scoring in Rosetta. bioRxiv. 2024:2024.07.09.602802. doi:10.1101/2024.07.09.602802
PubMed Google Scholar
163Khan I, Li S, Tao L, et al. Tubeimosides are pan-coronavirus and filovirus inhibitors that can block their fusion protein binding to Niemann-Pick C1. Nat Commun. 2024; 15(1): 162.
10.1038/s41467-023-44504-4
PubMed Google Scholar
164Bahia MS, Khazanov N, Zhou Q, et al. Stability prediction for mutations in the cytosolic domains of cystic fibrosis transmembrane conductance regulator. J Chem Inf Model. 2021; 61(4): 1762-1777.
10.1021/acs.jcim.0c01207
CAS PubMed Google Scholar

Citing Literature

Volume3, Issue3

September 2024

e96

Deep learning methods for protein structure prediction

Abstract

1 INTRODUCTION