A divide-and-conquer approach to determine the Pareto frontier for optimization of protein engineering experiments
Lu He
Department of Computer Science, Dartmouth College, Hanover, New Hampshire 03755
Search for more papers by this authorAlan M. Friedman
Department of Biological Sciences, Markey Center for Structural Biology, Purdue Cancer Center, and Bindley Bioscience Center, Purdue University, West Lafayette, Indiana
Search for more papers by this authorCorresponding Author
Chris Bailey-Kellogg
Department of Computer Science, Dartmouth College, Hanover, New Hampshire 03755
6211 Sudikoff Laboratory, Honover, NH 03755===Search for more papers by this authorLu He
Department of Computer Science, Dartmouth College, Hanover, New Hampshire 03755
Search for more papers by this authorAlan M. Friedman
Department of Biological Sciences, Markey Center for Structural Biology, Purdue Cancer Center, and Bindley Bioscience Center, Purdue University, West Lafayette, Indiana
Search for more papers by this authorCorresponding Author
Chris Bailey-Kellogg
Department of Computer Science, Dartmouth College, Hanover, New Hampshire 03755
6211 Sudikoff Laboratory, Honover, NH 03755===Search for more papers by this authorAbstract
In developing improved protein variants by site-directed mutagenesis or recombination, there are often competing objectives that must be considered in designing an experiment (selecting mutations or breakpoints): stability versus novelty, affinity versus specificity, activity versus immunogenicity, and so forth. Pareto optimal experimental designs make the best trade-offs between competing objectives. Such designs are not “dominated”; that is, no other design is better than a Pareto optimal design for one objective without being worse for another objective. Our goal is to produce all the Pareto optimal designs (the Pareto frontier), to characterize the trade-offs and suggest designs most worth considering, but to avoid explicitly considering the large number of dominated designs. To do so, we develop a divide-and-conquer algorithm, Protein Engineering Pareto FRontier (PEPFR), that hierarchically subdivides the objective space, using appropriate dynamic programming or integer programming methods to optimize designs in different regions. This divide-and-conquer approach is efficient in that the number of divisions (and thus calls to the optimizer) is directly proportional to the number of Pareto optimal designs. We demonstrate PEPFR with three protein engineering case studies: site-directed recombination for stability and diversity via dynamic programming, site-directed mutagenesis of interacting proteins for affinity and specificity via integer programming, and site-directed mutagenesis of a therapeutic protein for activity and immunogenicity via integer programming. We show that PEPFR is able to effectively produce all the Pareto optimal designs, discovering many more designs than previous methods. The characterization of the Pareto frontier provides additional insights into the local stability of design choices as well as global trends leading to trade-offs between competing criteria. Proteins 2011. © 2012 Wiley Periodicals, Inc.
Supporting Information
Additional Supporting Information may be found in the online version of this article.
Filename | Description |
---|---|
PROT_23237_sm_SuppInfo.pdf210 KB | Supporting Information |
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
REFERENCES
- 1OteyCR,SilbergJJ,VoigtCA,EndelmanJB,BandaraG,ArnoldFH.Functional evolution and structural conservation in chimeric cytochromes P450: calibrating a structure-guided approach.Chem Biol2004; 11: 309–318.
- 2CarboneMN,ArnoldFH.Engineering by homologous recombination: exploring sequence and function within a conserved fold.Curr Opin Struct Biol2007; 17: 454–459.
- 3LiY,DrummondDA,SawayamaAM,SnowCD,BloomJD,ArnoldFH.A diverse family of thermostable cytochrome P450s created by recombination of stabilizing fragments.Nat Biotechnol2007; 25: 1051–1056.
- 4ZhengW,FriedmanAM,Bailey-KelloggC.Algorithms for joint optimization of stability and diversity in planning combinatorial libraries of chimeric proteins.J Comp Biol2009; 16: 1151–1168.
- 5HeinzelmanP,SnowCD,WuI,NguyenC,VillalobosA,GovindarajanS,MinshullJ,ArnoldFH.A family of thermostable fungal cellulases created by structure-guided recombination.PNAS2009; 106: 5610–5615.
- 6ReinaJ,LacroixE,HobsonSD,Fernandez-BallesterG,RybinV,SchwabMS,SerranoL,GonzalezC.Computer-aided design of a PDZ domain to recognize new target sequences.Nat Struct Biol2002; 9: 621–627.
- 7HavranekJJ,HarburyPB.Automated design of specificity in molecular recognition.Nat Struct Biol2003; 10: 45–52.
- 8KortemmeT,JoachimiakLA,BullockAN,SchulerAD,StoddardBL,BakerD.Computational redesign of protein–protein interaction specificity.Nat Struct Mol Biol2004; 11: 371–379.
- 9LiJ,YiZ,LaskowskiMC,LaskowskiM,Jr,Bailey-KelloggC.Analysis of sequence-reactivity space for protein-protein interactions.Proteins2005; 58: 661–671.
- 10BolonDN,GrantRA,BakerTA,SauerRT.Specificity versus stability in computational protein design.PNAS2005; 102: 12724–12729.
- 11YinH,SluskyJS,BergerBW,WaltersRS,VilaireG,LitvinovRI,LearJD,CaputoGA,BennettJS,DeGradoWF.Computational design of peptides that target transmembrane helices.Science2007; 315: 1817–1822.
- 12GrigoryanG,ReinkeAW,KeatingAE.Design of protein-interaction specificity gives selective bZIP-binding peptides.Nature2009; 458: 859–864.
- 13De GrootAS,KnoppPM,MartinW.De-immunization of therapeutic proteins by T-cell epitope modification.Dev Biol (Basel)2005; 122: 171–194.
- 14De GrootAS,MartinW.Reducing risk, improving outcomes: bioengineering less immunogenic protein therapeutics.Clin Immunol2009; 131: 189–201.
- 15ParkerAS,ZhengW,GriswoldKE,Bailey-KelloggC.Optimization algorithms for functional deimmunization of therapeutic proteins.BMC Bioinformatics2010; 11: 180.
- 16ParkerAS,GriswoldKE,Bailey-KelloggC.Optimization of therapeutic proteins to delete t-cell epitopes while maintaining beneficial residue interactions.J Bioinf Comp Biol2011; 9: 207–229.
- 17HeL,FriedmanAM,Bailey-KelloggC. Algorithms for optimizing cross-overs in DNA shuffling. In: Proc ACM Conf Bioinformatics, Comput Biol Biomed (ACM BCB), 2011;143–152.
- 18CarterCWJr,LeFebvreBC,CammerSA,TropshaA,EdgellMH.Four-body potentials reveal protein-specific correlations to stability changes caused by hydrophobic core mutations.J Mol Bio2001; 311: 625–638.
- 19GueroisR,NielsenJE,SerranoL.Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations.J Mol Biol2002; 320: 369–387.
- 20KamisettyH,XingEP,LangmeadCJ.Free energy estimates of all-atom protein structures using generalized belief propagation.J Comput Biol2008; 15: 755–766.
- 21SarafMC,MooreGL,MaranasCD.Using multiple sequence correlation analysis to characterize functionally important protein regions.Protein Eng2003; 16: 397–406.
- 22RussWP,LoweryDM,MishraP,YaffeeMB,RanganathanR.Natural-like function in artificial WW domains.Nature2005; 437: 579–583.
- 23SocolichM,LocklessSW,RussWP,LeeH,GardnerKH,RanganathanR.Evolutionary information for specifying a protein fold. Nature2005; 437: 512–518.
- 24ThomasJ,RamakrishnanN,Bailey-KelloggC.Graphical models of residue coupling in protein families.IEEE/ACM Trans Comput Biol Bioinform2008; 5: 183–197.
- 25VoigtCA,MartinezC,WangZG,MayoSL,ArnoldFH.Protein building blocks preserved by recombination.Nat Struct Biol2002; 9: 553–558.
- 26SarafMC,MaranasCD.Using a residue clash map to functionally characterize protein recombination hybrids.Prot Eng2003; 16: 1025–1034.
- 27SarafMC,HorswillAR,BenkovicSJ,MaranasCD.FamClash: a method for ranking the activity of engineered enzymes.PNAS2004; 12: 4142–4147.
- 28YeX,FriedmanAM,Bailey-KelloggC.Hypergraph model of multi-residue interactions in proteins: sequentially-constrained partitioning algorithms for optimization of site-directed protein recombination.J Comput Biol2007; 14: 777–790.
- 29ZhengW,YeX,FriedmanAM,Bailey-KelloggC. Algorithms for selecting breakpoint locations to optimize diversity in protein engineering by site-directed protein recombination. In: Proc. CSB, Vol. 6;2007. pp 31–40.
- 30De GrootAS,MoiseL.Prediction of immunogenicity for therapeutic proteins: state of the art.Curr Opin Drug Discov Devel2007; 10: 332–340.
- 31WangP,SidneyJ,DowC,MotheB,SetteA,PetersB.A systematic assessment of MHC class II peptide binding predictions and evaluation of a consensus approach.PLoS Comp Biol2008; 4: e1000048.
- 32LuSM,LuW,QasimMA,AndersonS,ApostolI,ArdeltW,BiglerT,ChiangYW,CookJ,JamesMN,KatoI,KellyC,KohrW,KomiyamaT,LinTY,OgawaM,OtlewskiJ,ParkSJ,QasimS,RanjbarM,TashiroM,WarneN,WhatleyH,WieczorekA,WieczorekM,WiluszT,WynnR,ZhangW,LaskowskiM,Jr.Predicting the reactivity of proteins from their sequence alone: Kazal family of protein inhibitors of serine proteinases.PNAS2001; 98: 1410–1415.
- 33KortemmeT,BakerD.A simple physical model for binding energy hot spots in protein–protein complexes.PNAS2002; 99: 14116–14121.
- 34GrigoryanG,ZhouF,LustigSR,CederG,MorganD,KeatingAE.Ultra-fast evaluation of protein conformational energies directly from sequence.PLoS Comput Biol2006; 2: e63.
- 35ThomasJ,RamakrishnanN,Bailey-KelloggC.Graphical models of protein–protein interaction specificity from correlated mutations and interaction data.Proteins2009; 76: 911–929.
- 36ShaoX,TanCS,VossC,LiSS,DengN,BaderGD.A regression framework incorporating quantitative and negative interaction data improves quantitative prediction of PDZ domain-peptide interaction from primary sequence.Bioinformatics2010; 27: 383–390.
- 37KamisettyH,RamanathanA,Bailey-KelloggC,LangmeadCJ.Accounting for conformational entropy in predicting binding free energies of protein–protein interactions.Proteins2011; 79: 444–462.
- 38KungHT,LuccioF,PreparataFP.On finding the maxima of a set of vectors.J ACM1975; 22: 469–476.
- 39GodfreyP,ShipleyR,GryzJ. Maximal vector computation in large data sets. In: Proc VLDB;2005; pp 229–240.
- 40GronwaldW,HohmT,HoffmannD.Evolutionary Pareto-optimization of stably folding peptides.BMC Bioinformatics2008; 9: 109.
- 41SuárezM,TortosaP,CarreraJ,JaramilloA.Pareto optimization in computational protein design with multiple objectives.J Comput Chem2008; 29: 2704–2711.
- 42BellmanR,KalabaR.On the Kth best policies.J SIAM1960; 8: 582–588.
- 43WatermanMS,ByersTH.A dynamic programming algorithm to find all solutions in a neighborhood of the optimum.Math Biosci1985; 77: 179–188.
- 44KorenE,ZuckermanLA,Mire-SluisAR.Immune responses to therapeutic proteins in humans—clinical significance, assessment and prediction.Curr Pharm Biotechnol2002; 3: 349–360.
- 45SchellekensH.Immunogenicity of therapeutic proteins: clinical implications and future prospects.Clin Ther2002; 24: 1720–1740.
- 46MorrisonSL,JohnsonMJ,HerzenbergLA,OiVT.Chimeric human antibody molecules: mouse antigen-binding domains with human constant region domains.PNAS1984; 81: 6851–6855.
- 47JonesPT,DearPH,FooteJ,NeubergerMS,WinterG.Replacing the complementarity-determining regions in a human antibody with those from a mouse.Nature1986; 321: 522–525.
- 48WinterG,HarrisWJ.Humanized antibodies.Trends Pharmacol Sci1993; 14: 139–143.
- 49WarmerdamPAM,PlaisanceS,VanderlickK,VandervoortP,BrepoelsK,CollenD,De MaeyerM.Elimination of a human T-cell region in staphylokinase by T-cell screening and computer modeling.J Thromb Haemost2002; 87: 666–673.
- 50TangriS,MotheBR,EisenbraunJ,SidneyJ,SouthwoodS,BriggsK,ZinckgrafJ,BilselP,NewmanM,ChesnutR,LiCalsiC,SetteA.Rationally engineered therapeutic proteins with reduced immunogenicity.J Immunol2005; 174: 3187–3196.
- 51JonesTD,PhillipsWJ,SmithBJ,BamfordCA,NayeePD,BaglinTP,GastonJSH,BakerMP.Identification and removal of a promiscuous CD4+ T cell epitope from the C1 domain of factor VIII.J Thromb Haemost2005; 3: 991–1000.
- 52SturnioloT,BonoE,DingJ,RaddrizzaniL,TuereciO,SahinU,BraxenthalerM,GallazziF,ProttiMP,SinigagliaF,HammerJ.Generation of tissue-specific and promiscuous HLA ligand database using DNA microarrays and virtual HLA class II matrices.Nat Biotechnol1999; 17: 555–561.
- 53SinghH,RaghavaGPS.ProPred: prediction of HLA-DR binding sites.Bioinformatics2001; 17: 1236–1237.
- 54NielsenM,LundegaardC,LundO.Prediction of MHC class II binding affinity using SMM-align, a novel stabilization matrix alignment method.BMC Bioinformatics2007; 8: 238.
- 55NielsenM,LundegaardC,BlicherT,PetersB,SetteA,JustesenS,BuusS,LundO.Quantitative predictions of peptide binding to any HLA-DR molecule of known sequence: NetMHCIIpan.PLoS Comput Biol2008; 4:e1000107.
- 56MandellDJ,KortemmeT.Computer-aided design of functional protein interactions.Nat Chem Bio2009; 5: 797–807.
- 57MeyerMM,SilbergJJ,VoigtCA,EndelmanJB,MayoSL,WangZG,ArnoldFH.Library analysis of SCHEMA-guided protein recombination.Protein Sci2003; 12: 1686–1693.
- 58EndelmanJB,SilbergJJ,WangZG,ArnoldFH.Site-directed protein recombination as a shortest-path problem.Protein Eng Des Sel2004; 17: 589–594.
- 59MeyerMM,HochreinL,ArnoldFH.Structure-guided SCHEMA recombination of distantly related beta-lactamases.Prot Eng Des Sel2006; 19: 563–570.
- 60OteyCR,LandwehrM,EndelmanJB,HiragaK,BloomJD,ArnoldFH.Structure-guided recombination creates an artificial family of cytochromes P450.PLoS Biol2006; 4: e112.
- 61LandwehrM,CarboneM,OteyCR,LiY,ArnoldFH.Diversification of catalytic function in a synthetic family of chimeric cytochrome P450s.Chem Biol2007; 14: 269–278.
- 62HeinzelmanP,KomorR,KanaanA,RomeroP,YuX,MohlerS,SnowC,ArnoldF.Efficient screening of fungal cellobiohydrolase class I enzymes for thermostabilizing sequence blocks by SCHEMA structure-guided recombination.Protein Eng Des Sel2010; 23: 871–880.
- 63MooreGL,MaranasCD.Identifying residue–residue clashes in protein hybrids by using a second-order mean-field approach.PNAS2003; 100: 5091–5096.
- 64SouthwoodS,SidneyJ,KondoA,del GuercioMF,AppellaE,HoffmanS,KuboRT,ChesnutRW,GreyHM,SetteA.Several common HLA-DR types share largely overlapping peptide binding repertoires.J Immunol1998; 160: 3363–3373.
- 65IwataT,KogameK,TokiT,YokoyamaM,YamamotoM,ItoE.Structure and chromosome mapping of the human small maf genes mafg and mafk.Cytogenet Cell Genet1998; 82: 88–90.
- 66HandlJ,KellDB,KnowlesJ.Multiobjective optimization in bioinformatics and computational biology.IEEE/ACM Trans Comput Biol Bioinform2007; 4: 279–292.
- 67DahiyatBI,MayoSL.De novo protein design: fully automated sequence selection.Science1997; 278: 82–87.
- 68KuhlmanB,DantasG,IretonGC,VaraniG,StoddardBL,BakerD.Design of a novel globular protein fold with atomic-level accuracy.Science2003; 302: 1364–1368.