Assessing model accuracy using the homology modeling automatically software
Aneerban Bhattacharya
Center for Advanced Biotechnology and Medicine (CABM), Rutgers University and Robert Wood Johnson Medical School (UMDNJ), Piscataway, New Jersey 08854
Search for more papers by this authorZeba Wunderlich
Center for Advanced Biotechnology and Medicine (CABM), Rutgers University and Robert Wood Johnson Medical School (UMDNJ), Piscataway, New Jersey 08854
Search for more papers by this authorDaniel Monleon
Center for Advanced Biotechnology and Medicine (CABM), Rutgers University and Robert Wood Johnson Medical School (UMDNJ), Piscataway, New Jersey 08854
Search for more papers by this authorRoberto Tejero
Center for Advanced Biotechnology and Medicine (CABM), Rutgers University and Robert Wood Johnson Medical School (UMDNJ), Piscataway, New Jersey 08854
Departmento de Química Física, Universidad de Valencia, Dr Moliner 50, 46100-Burjassot, Valencia, Spain
Search for more papers by this authorCorresponding Author
Gaetano T. Montelione
Center for Advanced Biotechnology and Medicine (CABM), Rutgers University and Robert Wood Johnson Medical School (UMDNJ), Piscataway, New Jersey 08854
CABM, Rutgers University, 679 Hoes Lane, Piscataway, NJ 08854-5638===Search for more papers by this authorAneerban Bhattacharya
Center for Advanced Biotechnology and Medicine (CABM), Rutgers University and Robert Wood Johnson Medical School (UMDNJ), Piscataway, New Jersey 08854
Search for more papers by this authorZeba Wunderlich
Center for Advanced Biotechnology and Medicine (CABM), Rutgers University and Robert Wood Johnson Medical School (UMDNJ), Piscataway, New Jersey 08854
Search for more papers by this authorDaniel Monleon
Center for Advanced Biotechnology and Medicine (CABM), Rutgers University and Robert Wood Johnson Medical School (UMDNJ), Piscataway, New Jersey 08854
Search for more papers by this authorRoberto Tejero
Center for Advanced Biotechnology and Medicine (CABM), Rutgers University and Robert Wood Johnson Medical School (UMDNJ), Piscataway, New Jersey 08854
Departmento de Química Física, Universidad de Valencia, Dr Moliner 50, 46100-Burjassot, Valencia, Spain
Search for more papers by this authorCorresponding Author
Gaetano T. Montelione
Center for Advanced Biotechnology and Medicine (CABM), Rutgers University and Robert Wood Johnson Medical School (UMDNJ), Piscataway, New Jersey 08854
CABM, Rutgers University, 679 Hoes Lane, Piscataway, NJ 08854-5638===Search for more papers by this authorAbstract
Homology modeling is a powerful technique that greatly increases the value of experimental structure determination by using the structural information of one protein to predict the structures of homologous proteins. We have previously described a method of homology modeling by satisfaction of spatial restraints (Li et al., Protein Sci 1997;6:956–970). The Homology Modeling Automatically (HOMA) web site, <http://www-nmr.cabm.rutgers.edu/HOMA>, is a new tool, using this method to predict 3D structure of a target protein based on the sequence alignment of the target protein to a template protein and the structure coordinates of the template. The user is presented with the resulting models, together with an extensive structure validation report providing critical assessments of the quality of the resulting homology models. The homology modeling method employed by HOMA was assessed and validated using twenty-four groups of homologous proteins. Using HOMA, homology models were generated for 510 proteins, including 264 proteins modeled with correct folds and 246 modeled with incorrect folds. Accuracies of these models were assessed by superimposition on the corresponding experimentally determined structures. A subset of these results was compared with parallel studies of modeling accuracy using several other automated homology modeling approaches. Overall, HOMA provides prediction accuracies similar to other state-of-the-art homology modeling methods. We also provide an evaluation of several structure quality validation tools in assessing the accuracy of homology models generated with HOMA. This study demonstrates that Verify3D (Luthy et al., Nature 1992;356:83–85) and ProsaII (Sippl, Proteins 1993;17:355–362) are most sensitive in distinguishing between homology models with correct or incorrect folds. For homology models that have the correct fold, the steric conformational energy (including primarily the Van der Waals energy), MolProbity clashscore (Word et al., Protein Sci 2000;9:2251–2259), and the PROCHECK G-factors (Laskowski et al., J Biomol NMR 1996;8:477–486) provide sensitive and consistent methods for assessing accuracy and can distinguish between homology models of higher and lower accuracy. As demonstrated in the accompanying paper (Bhattacharya et al., accompanying paper), combinations of these scores for models generated with HOMA provide a basis for distinguishing low from high accuracy models. Proteins 2008. © 2007 Wiley-Liss, Inc.
Supporting Information
The Supplementary Material referred to in this article can be found at http://www.interscience.wiley.com/jpages/0887-3585/suppmat/
Filename | Description |
---|---|
jws-prot.21466.doc235 KB | Supporting Information file jws-prot.21466.doc |
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
REFERENCES
- 1 Lesk AM,Chothia C. How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins. J Mol Biol 1980; 136: 225–270.
- 2 Chothia C,Lesk AM. The relation between the divergence of sequence and structure in proteins. EMBO J 1986; 5: 823–826.
- 3 Sali A,Blundell TL. Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol 1993; 234: 779–815.
- 4 Valencia A,Kjeldgaard M,Pai EF,Sander C. GTPase domains of ras p21 oncogene protein and elongation factor Tu: analysis of three-dimensional structures, sequence families, and functional sites. Proc Natl Acad Sci USA 1991; 88: 5443–5447.
- 5 Rost B. Twilight zone of protein sequence alignments. Protein Eng 1999; 12: 85–94.
- 6 Tramontano A,Morea V. Assessment of homology-based predictions in CASP5. Proteins 2003; 53( Suppl 6): 352–368.
- 7 DeWeese-Scott C,Moult J. Molecular modeling of protein function regions. Proteins 2004; 55: 942–961.
- 8 Baker D,Sali A. Protein structure prediction and structural genomics. Science 2001; 294: 93–96.
- 9 Marti-Renom MA,Stuart AC,Fiser A,Sanchez R,Melo F,Sali A. Comparative protein structure modeling of genes and genomes. Annu Rev Biophys Biomol Struct 2000; 29: 291–325.
- 10 Sutcliffe MJ,Haneef I,Carney D,Blundell TL. Knowledge based modelling of homologous proteins, Part I: three-dimensional frameworks derived from the simultaneous superposition of multiple structures. Protein Eng 1987; 1: 377–384.
- 11 Levitt M. Accurate modeling of protein conformation by automatic segment matching. J Mol Biol 1992; 226: 507–533.
- 12 Li H,Tejero R,Monleon D,Bassolino-Klimas D,Abate-Shen C,Bruccoleri RE,Montelione GT. Homology modeling using simulated annealing of restrained molecular dynamics and conformational search calculations with CONGEN: application in predicting the three-dimensional structure of murine homeodomain Msx-1. Protein Sci 1997; 6: 956–970.
- 13
Sahasrabudhe PV,Tejero R,Kitao S,Furuichi Y,Montelione GT.
Homology modeling of an RNP domain from a human RNA-binding protein: homology-constrained energy optimization provides a criterion for distinguishing potential sequence alignments.
Proteins
1998;
33:
558–566.
10.1002/(SICI)1097-0134(19981201)33:4<558::AID-PROT8>3.0.CO;2-Z CAS PubMed Web of Science® Google Scholar
- 14 Güntert P,Mumenthaler C,Wüthrich K. Torsion angle dynamics for NMR structure calculation with the new program DYANA. JMol Biol 1997; 273: 283–298.
- 15 Brünger AT. X-PLOR, Version 3.1: a system for X-ray crystallography and NMR. New Haven: Yale University Press; 1992. xvii, 382 pp.
- 16 Schwieters CD,Kuszewski JJ,Tjandra N,Marius Clore G. The Xplor-NIH NMR molecular structure determination package. J Magn Reson 2003; 160: 65–73.
- 17 Bhattacharya A,Tejero R,Montelione GT. Evaluating protein structures determined by structural genomics consortia. Proteins 2006; 62: 587–603.
- 18 Esnouf RM. An extensively modified version of MolScript that includes greatly enhanced coloring capabilities. J Mol Graph Model 1997; 15: 112–133, 132–134.
- 19 Altschul SF,Gish W,Miller W,Myers EW,Lipman DJ. Basic local alignment search tool. J Mol Biol 1990; 215: 403–410.
- 20 Thompson JD,Higgins DG,Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 1994; 22: 4673–4680.
- 21 Pearson WR. Empirical statistical estimates for sequence similarity searches. J Mol Biol 1998; 276: 71–84.
- 22 Laskowski RA,Rullmannn JA,MacArthur MW,Kaptein R,Thornton JM. AQUA and PROCHECK-NMR: programs for checking the quality of protein structures solved by NMR. J Biomol NMR 1996; 8: 477–486.
- 23 Luthy R,Bowie JU,Eisenberg D. Assessment of protein models with three-dimensional profiles. Nature 1992; 356: 83–85.
- 24 Sippl MJ. Recognition of errors in three-dimensional structures of proteins. Proteins 1993; 17: 355–362.
- 25 Word JM,Bateman RC,Jr,Presley BK,Lovell SC,Richardson DC. Exploring steric constraints on protein mutations using MAGE/PROBE. Protein Sci 2000; 9: 2251–2259.
- 26 Word JM,Lovell SC,LaBean TH,Taylor HC,Zalis ME,Presley BK,Richardson JS,Richardson DC. Visualizing and quantifying molecular goodness-of-fit: small-probe contact dots with explicit hydrogen atoms. J Mol Biol 1999; 285: 1711–1733.
- 27 Word JM,Lovell SC,Richardson JS,Richardson DC. Asparagine and glutamine: using hydrogen atom contacts in the choice of side-chain amide orientation. J Mol Biol 1999; 285: 1735–1747.
- 28 Richardson DC,Richardson JS. The kinemage: a tool for scientific communication. Protein Sci 1992; 1: 3–9.
- 29 Lovell SC,Davis IW,Arendall WB,III,de Bakker PI,Word JM,Prisant MG,Richardson JS,Richardson DC. Structure validation by Calpha geometry: phi,psi and Cbeta deviation. Proteins 2003; 50: 437–450.
- 30 Hyberts SG,Goldberg MS,Havel TF,Wagner G. The solution structure of eglin c based on measurements of many NOEs and coupling constants and its comparison with X-ray structures. Protein Sci 1992; 1: 736–751.
- 31 Murzin AG,Brenner SE,Hubbard T,Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 1995; 247: 536–540.
- 32 Brünger A. Free R value: a novel statistical quantity for assessing the accuracy of crystal structures. Nature 1992; 355: 472–475.
- 33 Koh IY,Eyrich VA,Marti-Renom MA,Przybylski D,Madhusudhan MS,Eswar N,Grana O,Pazos F,Valencia A,Sali A,Rost B. EVA: evaluation of protein structure prediction servers. Nucleic Acids Res 2003; 31: 3311–3315.
- 34 Moult J,Fidelis K,Zemla A,Hubbard T. Critical assessment of methods of protein structure prediction (CASP)-round V. Proteins 2003; 53 ( Suppl 6): 334–339.
- 35 Kossiakoff AA,Randal M,Guenot J,Eigenbrot C. Variability of conformations at crystal contacts in BPTI represent true low-energy structures: correspondence among lattice packing and molecular dynamics structures. Proteins 1992; 14: 65–74.
- 36 Clore GM,Gronenborn AM. Two-, three-, and four-dimensional NMR methods for obtaining larger and more precise three-dimensional structures of proteins in solution. Annu Rev Biophys Biophys Chem 1991; 20: 29–63.
- 37 Fischer D,Rychlewski L,Dunbrack RL,Jr,Ortiz AR,Elofsson A. CAFASP3: the third critical assessment of fully automated structure prediction methods. Proteins 2003; 53 ( Suppl 6): 503–516.
- 38 Lambert C,Leonard N,De Bolle X,Depiereux E. ESyPred3D: prediction of proteins 3D structures. Bioinformatics 2002; 18: 1250–1256.
- 39 Bates PA,Kelley LA,MacCallum RM,Sternberg MJ. Enhancement of protein modeling by human intervention in applying the automatic programs 3D-JIGSAW and 3D-PSSM. Proteins 2001; 45 ( Suppl 5): 39–46.
- 40
Bates PA,Sternberg MJ.
Model building by comparison at CASP3: using expert knowledge and computer automation.
Proteins
1999;
37(
Suppl 3):
47–54.
10.1002/(SICI)1097-0134(1999)37:3+<47::AID-PROT7>3.0.CO;2-F Google Scholar
- 41 Contreras-Moreira B,Fitzjohn PW,Bates PA. Comparative modelling: an essential methodology for protein structure prediction in the post-genomic era. Appl Bioinformatics 2002; 1: 177–190.
- 42 Schwede T,Kopp J,Guex N,Peitsch MC. SWISS-MODEL: an automated protein homology-modeling server. Nucleic Acids Res 2003; 31: 3381–3385.
- 43 Fiser A,Do RK,Sali A. Modeling of loops in protein structures. Protein Sci 2000; 9: 1753–1773.
- 44 Bassolino-Klimas D,Tejero R,Krystek SR,Metzler WJ,Montelione GT,Bruccoleri RE. Simulated annealing with restrained molecular dynamics using a flexible restraint potential: theory and evaluation with simulated NMR constraints. Protein Sci 1996; 5: 593–603.
- 45 Tejero R,Bassolino-Klimas D,Bruccoleri RE,Montelione GT. Simulated annealing with restrained molecular dynamics using CONGEN: energy refinement of the NMR solution structures of epidermal and type-alpha transforming growth factors. Protein Sci 1996; 5: 578–592.
- 46 Bhattacharya A,Ye J,Muchnick I,Montelione GT,Kulikowski C. Estimating the accuracy of homology models. J Struct Funct Genomics, submitted.
- 47 Wallner B,Elofsson A. All are not equal: a benchmark of different homology modeling programs. Protein Sci 2005; 14: 1315–1327.
- 48 Nayeem A,Sitkoff D,Krystek S,Jr. A comparative study of available software for high-accuracy homology modeling: from sequence alignments to structural models. Protein Sci 2006; 15: 808–824.
- 49 Moult J. Comparison of database potentials and molecular mechanics force fields. Curr Opin Struct Biol 1997; 7: 194–199.
- 50 Melo F,Sanchez R,Sali A. Statistical potentials for fold assessment. Protein Sci 2002; 11: 430–448.
- 51 Jernigan RL,Bahar I. Structure-derived potentials and protein simulations. Curr Opin Struct Biol 1996; 6: 195–209.
- 52 Kuhlman B,Baker D. Native protein sequences are close to optimal for their structures. Proc Natl Acad Sci USA 2000; 97: 10383–10388.
- 53
Lazaridis T,Karplus M.
Effective energy function for proteins in solution.
Proteins
1999;
35:
133–152.
10.1002/(SICI)1097-0134(19990501)35:2<133::AID-PROT1>3.0.CO;2-N CAS PubMed Web of Science® Google Scholar
- 54 Petrey D,Honig B. Free energy determinants of tertiary structure and the evaluation of protein models. Protein Sci 2000; 9: 2181–2191.
- 55 John B,Sali A. Comparative protein structure modeling by iterative alignment, model building and model assessment. Nucleic Acids Res 2003; 31: 3982–3992.