Design of multispecific protein sequences using probabilistic graphical modeling
Corresponding Author
Menachem Fromer
School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel
School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 91904, Israel===Search for more papers by this authorChen Yanover
Program in Computational Biology, Fred Hutchinson Cancer Research Center, Seattle, Washington
Search for more papers by this authorMichal Linial
Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Israel
Search for more papers by this authorCorresponding Author
Menachem Fromer
School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel
School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 91904, Israel===Search for more papers by this authorChen Yanover
Program in Computational Biology, Fred Hutchinson Cancer Research Center, Seattle, Washington
Search for more papers by this authorMichal Linial
Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Israel
Search for more papers by this authorAbstract
In nature, proteins partake in numerous protein– protein interactions that mediate their functions. Moreover, proteins have been shown to be physically stable in multiple structures, induced by cellular conditions, small ligands, or covalent modifications. Understanding how protein sequences achieve this structural promiscuity at the atomic level is a fundamental step in the drug design pipeline and a critical question in protein physics. One way to investigate this subject is to computationally predict protein sequences that are compatible with multiple states, i.e., multiple target structures or binding to distinct partners. The goal of engineering such proteins has been termed multispecific protein design. We develop a novel computational framework to efficiently and accurately perform multispecific protein design. This framework utilizes recent advances in probabilistic graphical modeling to predict sequences with low energies in multiple target states. Furthermore, it is also geared to specifically yield positional amino acid probability profiles compatible with these target states. Such profiles can be used as input to randomly bias high-throughput experimental sequence screening techniques, such as phage display, thus providing an alternative avenue for elucidating the multispecificity of natural proteins and the synthesis of novel proteins with specific functionalities. We prove the utility of such multispecific design techniques in better recovering amino acid sequence diversities similar to those resulting from millions of years of evolution. We then compare the approaches of prediction of low energy ensembles and of amino acid profiles and demonstrate their complementarity in providing more robust predictions for protein design. Proteins 2010. © 2009 Wiley-Liss, Inc.
REFERENCES
- 1 Kuhlman B,Dantas G,Ireton GC,Varani G,Stoddard BL,Baker D. Design of a novel globular protein fold with atomic-level accuracy. Science 2003; 302: 1364–1368.
- 2 Jiang L,Althoff EA,Clemente FR,Doyle L,Rothlisberger D,Zanghellini A,Gallaher J,Betker J,Tanaka F,Barbas CF,Hilvert D,Houk KN,Stoddard BL,Baker D. De novo computational design of Retro-Aldol enzymes. Science 2008; 319: 1387–1391.
- 3 Rothlisberger D,Khersonsky O,Wollacott AM,Jiang L,DeChancie J,Betker J,Gallaher JL,Althoff EA,Zanghellini A,Dym O,Albeck S,Houk KN,Tawfik DS,Baker D. Kemp elimination catalysts by computational enzyme design. Nature 2008; 453: 190–195.
- 4 Pagel P,Kovac S,Oesterheld M,Brauner B,Dunger-Kaltenbach I,Frishman G,Montrone C,Mark P,Stumpflen V,Mewes H,Ruepp A,Frishman D. The MIPS mammalian protein–protein interaction database. Bioinformatics 2005; 21: 832–834.
- 5
Novikova O,Kim N,Lukyanov P,Likhatskaya G,Emelyanenko V,Soloveva T.
Effects of pH on structural and functional properties of porin from the outer membrane of yersinia pseudotuberculosis. II. characterization of pH-induced conformational intermediates of yersinin.
Biochem (Moscow) Suppl Ser A: Membr Cell Biol
2007;
1:
154–162.
10.1134/S1990747807020080 Google Scholar
- 6 Liu H,Cheng Y,Lu J,Li R,Wang K. The mechanism of kinetic inhibition of Cu(II)-induced oxidation of low density lipoprotein by lanthanide ions. J Inorg Biochem 2006; 100: 1280–1289.
- 7 Jones EM,Squier TC,Sacksteder CA. An altered mode of calcium coordination in methionine-oxidized calmodulin. Biophys J 2008; 95: 5268–5280.
- 8 Murayama K,Tomida M. Heat-induced secondary structure and conformation change of bovine serum albumin investigated by fourier transform infrared spectroscopy. Biochemistry 2004; 43: 11526–11532.
- 9 Nielsen L,Khurana R,Coats A,Frokjaer S,Brange J,Vyas S,Uversky VN,Fink AL. Effect of environmental factors on the kinetics of insulin fibril formation: Elucidation of the molecular mechanism. Biochemistry 2001; 40: 6036–6046.
- 10 Tholey A,Pipkorn R,Bossemeyer D,Kinzel V,Reed J. Influence of myristoylation, phosphorylation, and deamidation on the structural behavior of the N-terminus of the catalytic subunit of cAMP-dependent protein kinase. Biochemistry 2001; 40: 225–231.
- 11 Groban ES,Narayanan A,Jacobson MP. Conformational changes in protein loops and helices induced by post-translational phosphorylation. PLoS Comput Biol 2006; 2: e32.
- 12 Kuboniwa H,Tjandra N,Grzesiek S,Ren H,Klee CB,Bax A. Solution structure of calcium-free calmodulin. Nat Struct Mol Biol 1995; 2: 768–776.
- 13 Zoltowski BD,Schwerdtfeger C,Widom J,Loros JJ,Bilwes AM,Dunlap JC,Crane BR. Conformational switching in the fungal light sensor vivid. Science 2007; 316: 1054–1057.
- 14 Grant BJ,Gorfe AA,McCammon JA. Ras conformational switching: simulating nucleotide-dependent conformational transitions with accelerated molecular dynamics. PLoS Comput Biol 2009; 5: e1000325.
- 15 Meador W,Means A,Quiocho F. Target enzyme recognition by calmodulin: 2.4 A structure of a calmodulin-peptide complex. Science 1992; 257: 1251–1255.
- 16 Margittai M,Widengren J,Schweinberger E,Schroder GF,Felekyan S,Haustein E,Konig M,Fasshauer D,Grubmuller H,Jahn R,Seidel CAM. Single-molecule fluorescence resonance energy transfer reveals a dynamic equilibrium between closed and open conformations of syntaxin 1. Proc Natl Acad Sci USA 2003; 100: 15516–15521.
- 17 Luo X,Tang Z,Xia G,Wassmann K,Matsumoto T,Rizo J,Yu H. The Mad2 spindle checkpoint protein has two distinct natively folded states. Nat Struct Mol Biol 2004; 11: 338–345.
- 18 Tuinstra RL,Peterson FC,Kutlesa S,Elgin ES,Kron MA,Volkman BF. Interconversion between two unrelated protein folds in the lymphotactin native state. Proc Natl Acad Sci USA 2008; 105: 5057–5062.
- 19 Xie Gx,Palmer PP. How regulators of G protein signaling achieve selective regulation. J Mol Biol 2007; 366: 349–365.
- 20 Ambroggio XI,Kuhlman B. Design of protein conformational switches. Curr Opin Struc Biol 2006; 16: 525–530.
- 21 Bewley CA,Louis JM,Ghirlando R,Clore GM. Design of a novel peptide inhibitor of HIV fusion that disrupts the internal trimeric coiled-coil of gp41. J Biol Chem 2002; 277: 14238–14245.
- 22 Arnold FH. Combinatorial and computational challenges for biocatalyst design. Nature 2001; 409: 253–257.
- 23 Dalal S,Regan L. Understanding the sequence determinants of conformational switching using protein design. Protein Sci 2000; 9: 1651–1659.
- 24 He Y,Chen Y,Alexander P,Bryan PN,Orban J. NMR structures of two designed proteins with high sequence identity but different fold and function. Proc Natl Acad Sci USA 2008; 105: 14412–14417.
- 25 Ciani B,Hutchinson EG,Sessions RB,Woolfson DN. A designed system for assessing how sequence affects alpha to beta conformational transitions in proteins. J Biol Chem 2002; 277: 10150–10155.
- 26 Signarvic RS,DeGrado WF. De novo design of a molecular switch: phosphorylation-dependent association of designed peptides. J Mol Biol 2003; 334: 1–12.
- 27 Pandya MJ,Cerasoli E,Joseph A,Stoneman RG,Waite E,Woolfson DN. Sequence and structural duality: designing peptides to adopt two stable conformations. J Am Chem Soc 2004; 126: 17016–17024.
- 28 Mukherjee M,Zhu X,Ogawa MY. Cd2+-induced conformational change of a synthetic metallopeptide: slow metal binding followed by a slower conformational change. Inorg Chem 2008; 47: 4430–4432.
- 29 Dublin SN,Conticello VP. Design of a selective metal ion switch for self-assembly of peptide-based fibrils. J Am Chem Soc 2008; 130: 49–51.
- 30 Ambroggio XI,Kuhlman B. Computational design of a single amino acid sequence that can switch between two distinct protein folds. J Am Chem Soc 2006; 128: 1154–1161.
- 31 Humphris EL,Kortemme T. Design of multispecificity in protein interfaces. PLoS Comput Biol 2007; 3: e164.
- 32 Shifman JM,Mayo SL. Exploring the origins of binding specificity through the computational redesign of calmodulin. Proc Natl Acad Sci USA 2003; 100: 13274–13279.
- 33 Dunbrack RL,Karplus M. Backbone-dependent rotamer library for proteins application to side-chain prediction. J Mol Biol 1993; 230: 543–574.
- 34 Ding F,Dokholyan NV. Emergence of protein fold families through rational design. PLoS Comput Biol 2006; 2: e85.
- 35 Georgiev I,Donald BR. Dead-end elimination with backbone flexibility. Bioinformatics 2007; 23: i185–194.
- 36 Gordon DB,Marshall SA,Mayo SL. Energy functions for protein design. Curr Opin Struc Biol 1999; 9: 509–513.
- 37 Park S,Kono H,Wang W,Boder ET,Saven JG. Progress in the development and application of computational methods for probabilistic protein design. Comput Chem Eng 2005; 29: 407–421.
- 38 Pei J,Dokholyan NV,Shakhnovich EI,Grishin NV. Using protein design for homology detection and active site searches. Proc Natl Acad Sci USA 2003; 100: 11361–11366.
- 39 Fromer M,Yanover C. A computational framework to empower probabilistic protein design. Bioinformatics 2008; 24: i214–222.
- 40 Havranek JJ,Harbury PB. Automated design of specificity in molecular recognition. Nat Struct Mol Biol 2003; 10: 45–52.
- 41 Kortemme T,Joachimiak LA,Bullock AN,Schuler AD,Stoddard BL,Baker D. Computational redesign of protein-protein interaction specificity. Nat Struct Mol Biol 2004; 11: 371–379.
- 42 Bolon DN,Grant RA,Baker TA,Sauer RT. Specificity versus stability in computational protein design. Proc Natl Acad Sci USA 2005; 102: 12724–12729.
- 43 Yanover C,Fromer M,Shifman JM. Dead-end elimination for multistate protein design. J Comput Chem 2007; 28: 2122–2129.
- 44 Pearl J. Probabilistic reasoning in intelligent systems: networks of plausible inference. San Fransisco, CA: Morgan Kaufmann; 1988.
- 45 Fromer M,Yanover C. Accurate prediction for atomic-level protein design and its application in diversifying the near-optimal sequence space. Proteins 2009; 75: 682–705.
- 46 Dokholyan NV,Shakhnovich EI. Understanding hierarchical protein evolution from first principles. J Mol Biol 2001; 312: 289–307.
- 47 Fu X,Apgar JR,Keating AE. Modeling backbone flexibility to achieve sequence diversity: The design of novel α-helical ligands for Bcl-xL. J Mol Biol 2007; 371: 1099–1117.
- 48 Kono H,Saven JG. Statistical theory for protein combinatorial libraries. packing interactions, backbone flexibility, and sequence variability of main-chain structure. J Mol Biol 2001; 306: 607–628.
- 49 Calhoun JR,Kono H,Lahr S,Wang W,DeGrado WF,Saven JG. Computational design and characterization of a monomeric helical dinuclear metalloprotein. J Mol Biol 2003; 334: 1101–1115.
- 50 Moore GL,Maranas CD. Identifying residue-residue clashes in protein hybrids by using a second-order mean-field approach. Proc Natl Acad Sci USA 2003; 100: 5091–5096.
- 51 Jaramillo A,Wernisch L,Hery S,Wodak SJ. Folding free energy function selects native-like protein sequences in the core but not on the surface. Proc Natl Acad Sci USA 2002; 99: 13554–13559.
- 52 Larson SM,England JL,Desjarlais JR,Pande VS. Thoroughly sampling sequence space: large-scale protein design of structural ensembles. Protein Sci 2002; 11: 2804–2813.
- 53 Saunders CT,Baker D. Recapitulation of protein family divergence using flexible backbone protein design. J Mol Biol 2005; 346: 631–644.
- 54 Bastolla U,Porto M,Roman HE,Vendruscolo M. A protein evolution model with independent sites that reproduces site-specific amino acid distributions from the protein data bank. BMC Evol Bio 2006; 6: 43.
- 55 Grigoryan G,Reinke AW,Keating AE. Design of protein-interaction specificity gives selective bZIP-binding peptides. Nature 2009; 458: 859–864.
- 56
Meyerguz L,Kempe D,Kleinberg J,Elber R.
The evolutionary capacity of protein structures. In:
PE Bourne, D Gusfield, editors.
RECOMB.
New York, NY:
ACM Press;
2004. pp
290–297.
10.1145/974614.974653 Google Scholar
- 57 Lilien RH,Stevens BW,Anderson AC,Donald BR. A novel ensemble-based scoring and search algorithm for protein redesign and its application to modify the substrate specificity of the gramicidin synthetase a phenylalanine adenylation enzyme. J Comput Biol 2005; 12: 740–761.
- 58 Yedidia JS,Freeman WT,Weiss Y. Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Trans Inf Theory 2005; 51: 2282–2312.
- 59 Galzitskaya OV,Finkelstein AV. A theoretical search for folding/unfolding nuclei in three-dimensional protein structures. Proc Natl Acad Sci USA 1999; 96: 11299–11304.
- 60 Smith CA,Kortemme T. Backrub-like backbone simulation recapitulates natural protein conformational variability and improves mutant side-chain prediction. J Mol Biol 2008; 380: 742–756.
- 61 Friedland GD,Linares AJ,Smith CA,Kortemme T. A simple model of backbone flexibility improves modeling of side-chain conformational variability. J Mol Biol 2008; 380: 757–774.
- 62 Kamisetty H,Bailey-Kellogg C,Langmead CJ. A graphical model approach for predicting free energies of association for protein–protein interactions under backbone and side-chain flexibility. Technical Report CMU-CS-08–162, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA; 2008. Available at: http://reports-archive.adm.cs.cmu.edu/anon/2008/CMU-CS-08–162.pdf
- 63 Chen CY,Georgiev I,Anderson AC,Donald BR. Computational structure-based redesign of enzyme activity. Proc Natl Acad Sci USA. 2009; 106: 3764–3769.
- 64 Kuhlman B,Baker D. Native protein sequences are close to optimal for their structures. Proc Natl Acad Sci USA 2000; 97: 10383–10388.
- 65 Nordberg J,Arner ESJ. Reactive oxygen species, antioxidants, and the mammalian thioredoxin system. Free Radic Biol Med 2001; 31: 1287–1312.
- 66 Feige JN,Gelman L,Michalik L,Desvergne B,Wahli W. From molecular action to physiological outputs: Peroxisome proliferator-activated receptors are nuclear receptors at the crossroads of key cellular functions. Prog Lipid Res 2006; 45: 120–159.
- 67 Artemyev N. Light-dependent compartmentalization of transducin in rod photoreceptors. Mol Neurobiol 2008; 37: 44–51.
- 68 England JL,Shakhnovich EI. Structural determinant of protein designability. Phys Rev Lett 2003; 90: 218101.
- 69 Sander C,Schneider R. Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins: Struct Funct Genet 1991; 9: 56–68.
- 70 Dodge C,Schneider R,Sander C. The HSSP database of protein structure-sequence alignments and family profiles. Nucl Acids Res 1998; 26: 313–315.
- 71 Jin W,Kambara O,Sasakawa H,Tamura A,Takada S. De novo design of foldable proteins with smooth folding funnel: automated negative design and experimental verification. Structure 2003; 11: 581–590.
- 72 Carvalho LE,Lawrence CE. Centroid estimation in discrete high-dimensional spaces with applications in biology. Proc Natl Acad Sci USA 2008; 105: 3209–3214.
- 73 Yanover C,Singh M,Zaslavsky E. M are better than one: an ensemble-based motif finder and its application to regulatory element prediction. Bioinformatics 2009; 25: 868–874.
- 74 Pettersen EF,Goddard TD,Huang CC,Couch GS,Greenblatt DM,Meng EC,Ferrin TE. UCSF Chimera – A visualization system for exploratory research and analysis. J Comput Chem 2004; 25: 1605–1612.
- 75 Hu X,Kuhlman B. Protein design simulations suggest that side-chain conformational entropy is not a strong determinant of amino acid environmental preferences. Proteins 2006; 62: 739–748.
- 76 Crooks GE,Hon G,Chandonia JM,Brenner SE. WebLogo: a Sequence Logo Generator. Genome Res 2004; 14: 1188–1190.
- 77 Koehl P,Levitt M. Protein topology and stability define the space of allowed sequences. Proc Natl Acad Sci USA 2002; 99: 1280–1285.