Dissecting contact potentials for proteins: Relative contributions of individual amino acids
Corresponding Author
N.-V. Buchete
Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892
Laboratory of Chemical Physics, NIDDK, National Institutes of Health, 9000 Rockville Pike, Bldg. 5, Rm. 137A, Bethesda, Maryland, 20892-0520===Search for more papers by this authorJ. E. Straub
Department of Chemistry, Boston University, Boston, Massachusetts 02215
Search for more papers by this authorD. Thirumalai
Biophysics Program, Institute for Physical Science and Technology, University of Maryland, College Park, Maryland 20742
Search for more papers by this authorCorresponding Author
N.-V. Buchete
Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892
Laboratory of Chemical Physics, NIDDK, National Institutes of Health, 9000 Rockville Pike, Bldg. 5, Rm. 137A, Bethesda, Maryland, 20892-0520===Search for more papers by this authorJ. E. Straub
Department of Chemistry, Boston University, Boston, Massachusetts 02215
Search for more papers by this authorD. Thirumalai
Biophysics Program, Institute for Physical Science and Technology, University of Maryland, College Park, Maryland 20742
Search for more papers by this authorAbstract
Knowledge-based contact potentials are routinely used in fold recognition, binding of peptides to proteins, structure prediction, and coarse-grained models to probe protein folding kinetics. The dominant physical forces embodied in the contact potentials are revealed by eigenvalue analysis of the matrices, whose elements describe the strengths of interaction between amino acid side chains. We propose a general method to rank quantitatively the importance of various inter-residue interactions represented in the currently popular pair contact potentials. Eigenvalue analysis and correlation diagrams are used to rank the inter-residue pair interactions with respect to the magnitude of their relative contributions to the contact potentials. The amino acid ranking is shown to be consistent with a mean field approximation that is used to reconstruct the original contact potentials from the most relevant amino acids for several contact potentials. By providing a general, relative ranking score for amino acids, this method permits a detailed, quantitative comparison of various contact interaction schemes. For most contact potentials, between 7 and 9 amino acids of varying chemical character are needed to accurately reconstruct the full matrix. By correlating the identified important amino acid residues in contact potentials and analysis of about 7800 structural domains in the CATH database we predict that it is important to model accurately interactions between small hydrophobic residues. In addition, only potentials that take interactions involving the protein backbone into account can predict dense packing in protein structures. Proteins 2008. © 2007 Wiley-Liss, Inc.
Supporting Information
The Supplementary Material referred to in this article can be found online at http://www.interscience.wiley.com/jpages/0887-3585/suppmat/
Filename | Description |
---|---|
jws-prot.21538.pdf564.1 KB | Supporting Information file jws-prot.21538.pdf |
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
REFERENCES
- 1 Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The protein data bank. Nucleic Acids Res 2000; 28: 235–242.
- 2 Buchete NV, Straub JE, Thirumalai D. Development of novel statistical potentials for protein fold recognition. Curr Opin Struct Biol 2004; 14: 225–232.
- 3 Tanaka S, Scheraga HA. Medium- and long-range interaction parameters between amino acids for predicting three-dimensional structures of proteins. Macromolecules 1976; 9: 945–950.
- 4 Skolnick J, Jaroszewski L, Kolinski A, Godzik A. Derivation and testing of pair potentials for protein folding. When is the quasichemical approximation correct? Protein Sci 1997; 6: 676–688.
- 5 Miyazawa S, Jernigan RL. Residue–residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. J Mol Biol 1996; 256: 623–644.
- 6 Best RB, Chen Y-G, Hummer G. Slow protein conformational dynamics from multiple experimental structures: the helix/sheet transition of Arc repressor. Structure 2005; 13: 1755–1763.
- 7 Bahar I, Rader A. Coarse-grained normal mode analysis in structural biology. Curr Opin Struct Biol 2005; 15: 586–592.
- 8 Levitt M, Warshel A. Computer simulation of protein folding. Nature 1975; 253: 694–698.
- 9 Wolynes PG. As simple as can be? Nat Struct Biol 1997; 4: 871–874.
- 10 Doi N, Kakukawa K, Oishi Y, Yanagawa H. High solubility of random-sequence proteins consisting of five kinds of primitive amino acids. Protein Eng Des Sel 2005; 18: 279–284.
- 11 Murphy LR, Wallqvist A, Levy RM. Simplified amino acid alphabets for protein fold recognition and implications for folding. Protein Eng 2000; 13: 149–152.
- 12 Li T, Fan K, Wang J, Wang W. Reduction of protein sequence complexity by residue grouping. Protein Eng 2003; 16: 323–330.
- 13 Khatun J, Khare SD, Dokholyan NV. Can contact potentials reliably predict stability of proteins? J Mol Biol 2004; 336: 1223–1238.
- 14 Esteve JG, Falceto F. Classification of amino acids induced by their associated matrices. Biophys Chem 2005; 115(2–3, Special Issue): 177–180.
- 15 Du R, Grosberg AY, Tanaka T. Models of protein interactions: how to choose one. Fold Des 1998; 3: 203–211.
- 16 Loose C, Klepeis JL, Floudas CA. A new pairwise folding potential based on improved decoy generation and side-chain packing. Proteins 2004; 54: 303–314.
- 17 Kosiol C, Goldman N, Buttimore NH. A new criterion and method for amino acid classification. J Theor Biol 2004; 228: 97–106.
- 18 Wang Z-H, Lee HC. Origin of the native driving force for protein folding. Phys Rev Lett 2000; 84: 574–577.
- 19 Wang J, Wang W. Grouping of residues based on their contact interactions. Phys Rev E 2002; 65: 419111–419115.
- 20 Williams G, Doherty P. Inter-residue distances derived from fold contact propensities correlate with evolutionary substitution costs. BMC Bioinformatics 2004; 5: 153.
- 21 Fan K, Wang W. What is the minimum number of letters required to fold a protein? J Mol Biol 2003; 328: 921–926.
- 22 Schueler-Furman O, Altuvia Y, Sette A, Margalit H. Structure-based prediction of binding peptides to MHC class I molecules: application to a broad range of MHC alleles. Protein Sci 2000; 9: 1838–1846.
- 23 Betancourt MR, Thirumalai D. Pair potentials for protein folding: choice of reference states and sensitivity of predicted native states to variations in the interaction schemes. Protein Sci 1999; 8: 361–369.
- 24 Li H, Tang C, Wingreen NS. Nature of driving force for protein folding: a result from analyzing the statistical potential. Phys Rev Lett 1997; 79: 765–768.
- 25
Tobi D,
Shafran G,
Linial N,
Elber R.
On the design and analysis of protein folding potentials.
Proteins
2000;
40:
71–85.
10.1002/(SICI)1097-0134(20000701)40:1<71::AID-PROT90>3.0.CO;2-3 CAS PubMed Web of Science® Google Scholar
- 26
Miyazawa S,
Jernigan RL.
Self-consistent estimation of inter-residue protein contact energies based on an equilibrium mixture approximation of residues.
Proteins
1999;
34:
49–68.
10.1002/(SICI)1097-0134(19990101)34:1<49::AID-PROT5>3.0.CO;2-L CAS PubMed Web of Science® Google Scholar
- 27 Skolnick J, Kolinski A, Ortiz A. Derivation of protein-specific pair potentials based on weak sequence fragment similarity. Proteins 2000; 38: 3–16.
- 28 Hinds DA, Levitt M. A lattice model for protein structure prediction at low resolution. Proc Natl Acad Sci USA 1992; 89: 2536–2540.
- 29 Buchete NV, Straub JE, Thirumalai D. Anisotropic coarse-grained statistical potentials improve the ability to identify nativelike protein structures. J Chem Phys 2003; 118: 7658–7671.
- 30 Levitt M. A simplified representation of protein conformations for rapid stimulation of protein folding. J Mol Biol 1976; 104: 59–107.
- 31 Chipot C, Maigret B, Rivail JL, Scheraga HA. Modeling amino-acid side-chains. I. Determination of net atomic charges from ab initio self-consistent-field molecular electrostatic properties. J Phys Chem 1992; 96: 10276–10284.
- 32 Chan HS, Dill KA. Origins of structure in globular proteins. Proc Natl Acad Sci USA 1990; 87: 6388–6392.
- 33 Bastolla U, Porto M, Roman HE, Vendruscolo M. Looking at structure, stability, and evolution of proteins through the principal eigenvector of contact matrices and hydrophobicity profiles. Gene 2005; 347(2, Special Issue): 219–230.
- 34 Bastolla U, Porto M, Roman HE, Vendruscolo M. Prinicipal eigenvector of contact matrices and hydrophobicity profiles in proteins. Proteins 2005; 58: 22–30.
- 35 Esteve JG, Falceto F. A general clustering approach with application to the Miyazawa-Jernigan potentials for amino acids. Proteins 2004; 55: 999–1004.
- 36 Rivas E. Evolutionary models for insertions and deletions in a probabilistic modeling framework. BMC bioinformatics 2005; 6: 63.
- 37 Wiederstein M, Sippl MJ. Protein sequence randomization: efficient estimation of protein stability using knowledge-based potentials. J Mol Biol 2005; 345: 1199–1212.
- 38 Pokarowski P, Kloczkowski A, Jernigan RL, Kothari NS, Pokarowska M, Kolinski A. Inferring ideal amino acid interaction forms from statistical protein contact potentials. Proteins 2005; 59: 49–57.
- 39 Arfken GB, Weber H-J. Mathematical methods for physicists. Boston: Elsevier; 2005. p 1182.
- 40 Dima RI, Thirumalai D. Asymmetry in the shapes of folded and denatured states of proteins. J Phys Chem B 2004; 108: 6564–6570.
- 41 Pearl F, Todd A, Sillitoe I, Dibley M, Redfern O, Lewis T, Bennett C, Marsden R, Grant A, Lee D, Akpor A, Maibaum M, Harrison A, Dallman T, Reeves G, Diboun I, Addou S, Lise S, Johnston C, Sillero A, Thornton J, Orengo C. The CATH domain structure database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis. Nucleic Acids Res 2005; 33 (Database issue): D247–D251.
- 42 Buchete NV, Straub JE, Thirumalai D. Orientational potentials extracted from protein structures improve native fold recognition. Protein Sci 2004; 13: 862–874.
- 43 Murphy J, Gatchell DW, Prasad C, Vajda S. Combination of scoring functions improves discrimination in protein–protein docking. Proteins 2003; 53: 840–854.
- 44 Riddle DS, Santiago JV, Bray-Hall ST, Doshi N, Grantcharova VP, Yi Q, Baker D. Functional rapidly folding proteins from simplified amino acid sequences. Nat Struct Biol 1997; 4: 805–809.
- 45 Chan HS. Folding alphabets. Nat Struct Biol 1999; 6: 994–996.
- 46 Wang J, Wang W. A computational approach to simplifying the protein folding alphabet. Nat Struct Biol 1999; 6: 1033–1038.
- 47 Cieplak M, Holter NS, Maritan A, Banavar JR. Amino acid classes and the protein folding problem. J Chem Phys 2001; 114: 1420–1423.
- 48 Miyazawa S, Jernigan RL. A new substitution matrix for protein sequence searches based on contact frequencies in protein structures. Protein Eng 1993; 6: 267–278.
- 49 Tan YH, Huang H, Kihara D. Statistical potential-based amino acid similarity matrices for aligning distantly related protein sequences. Proteins 2006; 64: 587–600.
- 50 Prlic A, Domingues FS, Sippl MJ. Structure-derived substitution matrices for alignment of distantly related sequences. Protein Eng 2000; 13: 545–550.