Super folds, networks, and barriers
Sean Burke
Institute for Computational Engineering and Sciences, University of Texas at Austin, Austin, Texas 78712
Search for more papers by this authorCorresponding Author
Ron Elber
Institute for Computational Engineering and Sciences, University of Texas at Austin, Austin, Texas 78712
Department of Chemistry and Biochemistry, University of Texas at Austin, Austin, Texas 78712
Institute of Computational Engineering and Sciences, 1 University Station C0200, Austin, Texas 78712-0027===Search for more papers by this authorSean Burke
Institute for Computational Engineering and Sciences, University of Texas at Austin, Austin, Texas 78712
Search for more papers by this authorCorresponding Author
Ron Elber
Institute for Computational Engineering and Sciences, University of Texas at Austin, Austin, Texas 78712
Department of Chemistry and Biochemistry, University of Texas at Austin, Austin, Texas 78712
Institute of Computational Engineering and Sciences, 1 University Station C0200, Austin, Texas 78712-0027===Search for more papers by this authorAbstract
Exhaustive enumeration of sequences and folds is conducted for a simple lattice model of conformations, sequences, and energies. Examination of all foldable sequences and their nearest connected neighbors (sequences that differ by no more than a point mutation) illustrates the following: (i) There exist unusually large number of sequences that fold into a few structures (super-folds). The same observation was made experimentally and computationally using stochastic sampling and exhaustive enumeration of related models. (ii) There exist only a few large networks of connected sequences that are not restricted to one fold. These networks cover a significant fraction of fold spaces (super-networks). (iii) There exist barriers in sequence space that prevent foldable sequences of the same structure to “connect” through a series of single point mutations (super-barrier), even in the presence of the sequence connection between folds. While there is ample experimental evidence for the existence of super-folds, evidence for a super-network is just starting to emerge. The prediction of a sequence barrier is an intriguing characteristic of sequence space, suggesting that the overall sequence space may be disconnected. The implications and limitations of these observations for evolution of protein structures are discussed. Proteins 2012. © 2011 Wiley Periodicals, Inc.
REFERENCES
- 1 Goldstein RA. The structure of protein evolution and the evolution of protein structure. Curr Opin Struct Biol 2008; 18: 170–177.
- 2 Chan H,Bronberg-Bauer E. Perspectives on protein evolution from simple exact models. Appl Bioinf 2002; 1: 121–144.
- 3 Meyerguz L,Kempe D,Kleinberg J,Elber R. The evolutionary capacity of protein structures. Proceedings of ACM Recomb Intl Conference on Computational Molecular Biology, 2004.
- 4 Meyerguz L,Grasso C,Kleinberg J,Elber R. Computational analysis of sequence selection mechanisms. Structure 2004; 12: 547–557.
- 5 Shakhnovich EI. Protein design: a perspective from simple tractable models. Fold Des 1998; 3: R45–R58.
- 6 Chan HS,Dill KA. Sequence space soup of proteins and copolymers. J Chem Phys 1991; 95: 3775–3787.
- 7 Li H,Helling R,Tang C,Wingreen N. Emergence of preferred structures in a simple model of protein folding. Science 1996; 273: 666–669.
- 8 Bornberg-Bauer E. How are model protein structures distributed in sequence space? Biophys J 1997; 73: 2393–2403.
- 9 Wright S. The roles of mutation, inbreeding, crossbreeding, and selection in evolution. In: D Jones, editor. Proceeding of the Sixth International Congreess on Genetics, Vol. 1. New York: Brooklyn Botanic Gardens; 1932. pp 356–366.
- 10 Zeldovich KB,Berezovsky IN,Shakhnovich EI. Physical origins of protein superfamilies. J Mol Biol 2006; 357: 1335–1343.
- 11 Betancourt MR,Thirumalai D. Protein sequence design by energy landscaping. J Phys Chem B 2002; 106: 599–609.
- 12 Saven JG,Wolynes PG. Statistical mechanics of the combinatorial synthesis and analysis of folding macromolecules. J Phys Chem B 1997; 101: 8375–8389.
- 13 Drummond DA,Bloom JD,Adami C,Wilke CO,Arnold FH. Why highly expressed proteins evolve slowly. Proc Natl Acad Sci USA 2005; 102: 14338–14343.
- 14 Bornberg-Bauer E,Chan HS. Modeling evolutionary landscapes: mutational stability, topology, and superfunnels in sequence space. Proc Natl Acad Sci USA 1999; 96: 10689–10694.
- 15
Govindarajan S,Goldstein RA.
Evolution of model proteins on a foldability landscape.
Prot Struct Funct Genet
1997;
29:
461–466.
10.1002/(SICI)1097-0134(199712)29:4<461::AID-PROT6>3.0.CO;2-B CAS PubMed Web of Science® Google Scholar
- 16 Huynen MA,van Nimwegen E. The frequency distribution of gene family sizes in complete genomes. Mol Biol Evol 1998; 15: 583–589.
- 17 Qian J,Luscombe NM,Gerstein M. Protein family and fold occurrence in genomes: Power-law behaviour and evolutionary model. J Mol Biol 2001; 313: 673–681.
- 18 Govindarajan S,Goldstein RA. Why are some protein structures so common? Proc Natl Acad Sci USA 1996; 93: 3341–3345.
- 19 Wroe R,Bornberg-Bauer E,Chan HS. Comparing folding codes in simple heteropolymer models of protein evolutionary landscape: robustness of the superfunnel paradigm. Biophys J 2005; 88: 118–131.
- 20 England JL,Shakhnovich EI. Structural determinant of protein designability. Phys Rev Lett 2003; 90: 218101.
- 21 Shakhnovich EI. Proteins with selected sequences fold into unique native conformation. Phys Rev Lett 1994; 72: 3907–3910.
- 22 Lau KF,Dill KA. A lattice statistical-mechanics model of the conformational and sequence-spaces of proteins. Macromolecules 1989; 22: 3986–3997.
- 23 Dinner A,Sali A,Karplus M,Shakhnovich E. Phase-diagram of a model protein-derived by exhaustive enumeration of the conformations. J Chem Phys 1994; 101: 1444–1451.
- 24 Shakhnovich E,Gutin A. Enumeration of all compact conformations of copolymers with random sequence of links. J Chem Phys 1990; 93: 5967–5971.
- 25 Camacho CJ,Thirumalai D. A criterion that determines fast folding of proteins: a model study. Euro Phys Lett 1996; 35: 627–632.
- 26 Meyerguz L,Kleinberg J,Elber R. The network of sequence flow between protein structures Proc Natl Acad Sci USA 2007; 104: 11627–11632.
- 27 Bryan PN,Orban J. Proteins that switch folds. Curr Opin Struct Biol 2010; 20: 482–488.
- 28 Cao BQ,Elber R. Computational exploration of the network of sequence flow between protein structures. Prot Struct Funct Bioinf 2010; 78: 985–1003.
- 29 Alexander P,He Y,Chen Y,Orban J,Bryan P. The design and characterization of two proteins with 88% sequence identity but different structure and function. PNAS 2007; 104: 11963–11968.
- 30 Fontana W,Stadler PF,Bornbergbauer EG,Griesmacher T,Hofacker IL,Tacker M,Tarazona P,Weinberger ED,Schuster P. RNA folding and combinatory landscapes. Phys Rev E 1993; 47: 2083–2099.
- 31 Blackburne BP,Hirst JD. Evolution of functional model proteins. J Chem Phys 2001; 115: 1935–1942.
- 32 Hart WE. On the computational complexity of sequence design problems. Proc First Annual Int Conf Comput Mol Biol 1997; 1: 128–136.
- 33 Miyazawa S,Jernigan RL. Estimation of effective interresidue contact energies from protein crystal-structures–quasi-chemical approximation. Macromolecules 1985; 18: 534–552.
- 34 Williams PD,Pollock DD,Goldstein RA. Evolution of functionality in lattice proteins. J Mol Graph Model 2001; 19: 150–156.
- 35 Miller DW,Dill KA. Ligand binding to proteins: the binding landscape model. Prot Sci 1997; 6: 2166–2179.
- 36 Khodabakhshi AH,Manuch J,Rafiey A,Gupta A. Inverse protein folding in 3D hexagonal prism lattice under HPC model. J Comput Biol 2009; 16: 769–802.
- 37 Helling R,Li H,Melin R,Miller J,Wingreen N,Zeng C,Tang C. The designability of protein structures. J Mol Graph Model 2001; 19: 157–167.
- 38 Keasar C,Elber R. Homology as a tool in optimization problems—structure determination of 2d heteropolymers. J Phys Chem 1995; 99: 11550–11556.
- 39 Liu Y,Eisenberg D. 3D domain swapping: as domains continue to swap. Prot Sci 2002; 11: 1285–1299.
- 40 Cui Y,Wong WH,Bornberg-Bauer E,Chan HS. Recombinatoric exploration of novel folded structures: a heteropolymer-based model of protein evolutionary landscapes. Proc Natl Acad Sci USA 2002; 99: 809–814.