PDB-scale analysis of known and putative ligand-binding sites with structural sketches
Jun-Ichi Ito
Department of Computational Biology, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8568, Japan
Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), Koto-ku, Tokyo 135-0064, Japan
Search for more papers by this authorYasuo Tabei
Minato Discrete Structure Manipulation System Project, ERATO, Japan Science and Technology Agency, Sapporo 060-0814, Japan
Search for more papers by this authorKana Shimizu
Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), Koto-ku, Tokyo 135-0064, Japan
Search for more papers by this authorCorresponding Author
Kentaro Tomii
Department of Computational Biology, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8568, Japan
Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), Koto-ku, Tokyo 135-0064, Japan
2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan===Search for more papers by this authorKoji Tsuda
Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), Koto-ku, Tokyo 135-0064, Japan
Minato Discrete Structure Manipulation System Project, ERATO, Japan Science and Technology Agency, Sapporo 060-0814, Japan
Search for more papers by this authorJun-Ichi Ito
Department of Computational Biology, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8568, Japan
Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), Koto-ku, Tokyo 135-0064, Japan
Search for more papers by this authorYasuo Tabei
Minato Discrete Structure Manipulation System Project, ERATO, Japan Science and Technology Agency, Sapporo 060-0814, Japan
Search for more papers by this authorKana Shimizu
Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), Koto-ku, Tokyo 135-0064, Japan
Search for more papers by this authorCorresponding Author
Kentaro Tomii
Department of Computational Biology, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8568, Japan
Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), Koto-ku, Tokyo 135-0064, Japan
2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan===Search for more papers by this authorKoji Tsuda
Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), Koto-ku, Tokyo 135-0064, Japan
Minato Discrete Structure Manipulation System Project, ERATO, Japan Science and Technology Agency, Sapporo 060-0814, Japan
Search for more papers by this authorAbstract
Computational investigation of protein functions is one of the most urgent and demanding tasks in the field of structural bioinformatics. Exhaustive pairwise comparison of known and putative ligand-binding sites, across protein families and folds, is essential in elucidating the biological functions and evolutionary relationships of proteins. Given the vast amounts of data available now, existing 3D structural comparison methods are not adequate due to their computation time complexity. In this article, we propose a new bit string representation of binding sites called structural sketches, which is obtained by random projections of triplet descriptors. It allows us to use ultra-fast all-pair similarity search methods for strings with strictly controlled error rates. Exhaustive comparison of 1.2 million known and putative binding sites finished in ∼30 h on a single core to yield 88 million similar binding site pairs. Careful investigation of 3.5 million pairs verified by TM-align revealed several notable analogous sites across distinct protein families or folds. In particular, we succeeded in finding highly plausible functions of several pockets via strong structural analogies. These results indicate that our method is a promising tool for functional annotation of binding sites derived from structural genomics projects. Proteins 2011. © 2012 Wiley Periodicals, Inc.
Supporting Information
Additional Supporting Information may be found in the online version of this article.
Filename | Description |
---|---|
PROT_23232_sm_suppinfo.pdf849.2 KB | Supporting Information |
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
REFERENCES
- 1 Holm L,Sander C. Protein structure comparison by alignment of distance matrices. J Mol Biol 1993; 233: 123–138.
- 2 Shindyalov IN,Bourne PE. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng 1998; 11: 739–747.
- 3 Zhang Y,Skolnick J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res 2005; 33: 2302–2309.
- 4 Murzin AG,Brenner SE,Hubbard T,Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 1995; 247: 536–540.
- 5 Orengo CA,Michie AD,Jones S,Jones DT,Swindells MB,Thornton JM. CATH—a hierarchic classification of protein domain structures. Structure 1997; 5: 1093–1108.
- 6 Brady L,Brzozowski AM,Derewenda ZS,Dodson E,Dodson G,Tolley S,Turkenburg JP,Christiansen L,Huge-Jensen B,Norskov L,Thim L,Menge. A serine protease triad forms the catalytic centre of a triacylglycerol lipase. Nature 1990; 343: 767–770.
- 7 Wallace AC,Laskowski RA,Thornton JM. Derivation of 3D coordinate templates for searching structural databases: application to Ser-His-Asp catalytic triads in the serine proteinases and lipases. Protein Sci 1996; 5: 1001–1013.
- 8 Via A,Ferre F,Brannetti B,Valencia A,Helmer-Citterich M. Three-dimensional view of the surface motif associated with the P-loop structure: cis and trans cases of convergent evolution. J Mol Biol 2000; 303: 455–465.
- 9 Berman HM,Westbrook J,Feng Z,Gilliland G,Bhat TN,Weissig H,Shindyalov IN,Bourne PE. The protein data bank. Nucleic Acids Res 2000; 28: 235–242.
- 10 Laskowski RA. SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. J Mol Graph 1995; 13: 323–330, 307–328.
- 11 Hendlich M,Rippmann F,Barnickel G. LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. J Mol Graph Model 1997; 15: 359–363, 389.
- 12 Brady GP,Jr,Stouten PF. Fast prediction and visualization of protein binding pockets with PASS. J Comput Aided Mol Des 2000; 14: 383–401.
- 13 An J,Totrov M,Abagyan R. Pocketome via comprehensive identification and classification of ligand binding envelopes. Mol Cell Proteomics 2005; 4: 752–761.
- 14 Dundas J,Ouyang Z,Tseng J,Binkowski A,Turpaz Y,Liang J. CASTp: computed atlas of surface topography of proteins with structural and topographical mapping of functionally annotated residues. Nucleic Acids Res 2006; 34(Web Server issue): W116–W118.
- 15 Kawabata T. Detection of multi-scale pockets on protein surfaces using mathematical morphology. Proteins 2010; 78: 1195–1211.
- 16 Yu J,Zhou Y,Tanaka I,Yao M. Roll: a new algorithm for the detection of protein pockets and cavities with a rolling probe sphere. Bioinformatics 2010; 26: 46–52.
- 17 Kasahara K,Kinoshita K,Takagi T. Ligand-binding site prediction of proteins based on known fragment-fragment interactions. Bioinformatics 2010; 26: 1493–1499.
- 18 Wang B,Chen P,Huang DS,Li JJ,Lok TM,Lyu MR. Predicting protein interaction sites from residue spatial sequence profile and evolution rate. FEBS Lett 2006; 580: 380–384.
- 19 Chen XW,Jeong JC. Sequence-based prediction of protein interaction sites with an integrative method. Bioinformatics 2009; 25: 585–591.
- 20 Huang B,Schroeder M. LIGSITEcsc: predicting ligand binding sites using the Connolly surface and degree of conservation. BMC Struct Biol 2006; 6: 19–29.
- 21 Capra JA,Laskowski RA,Thornton JM,Singh M,Funkhouser TA. Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure. PLoS Comput Biol 2009; 5: e1000585.
- 22 Kinoshita K,Furui J,Nakamura H. Identification of protein functions from a molecular surface database, eF-site. J Struct Funct Genomics 2002; 2: 9–22.
- 23 Schmitt S,Kuhn D,Klebe G. A new method to detect related function among proteins independent of sequence and fold homology. JMol Biol 2002; 323: 387–406.
- 24 Shulman-Peleg A,Nussinov R,Wolfson HJ. Recognition of functional sites in protein structures. J Mol Biol 2004; 339: 607–633.
- 25 Gold ND,Jackson RM. Fold independent structural comparisons of protein-ligand binding sites for exploring functional relationships. J Mol Biol 2006; 355: 1112–1124.
- 26 Park K,Kim D. A method to detect important residues using protein binding site comparison. Genome Inform 2006; 17: 216–225.
- 27 Kinjo AR,Nakamura H. Similarity search for local protein structures at atomic resolution by exploiting a database management system. Biophysics 2007; 3: 75–84.
- 28 Minai R,Matsuo Y,Onuki H,Hirota H. Method for comparing the structures of protein ligand-binding sites and application for predicting protein-drug interactions. Proteins 2008; 72: 367–381.
- 29 Najmanovich R,Kurbatova N,Thornton J. Detection of 3D atomic similarities and their use in the discrimination of small molecule protein-binding sites. Bioinformatics 2008; 24: i105–i111.
- 30 Brakoulias A,Jackson RM. Towards a structural classification of phosphate binding sites in protein-nucleotide complexes: an automated all-against-all structural comparison using geometric matching. Proteins 2004; 56: 250–260.
- 31 Kinjo AR,Nakamura H. Comprehensive structural classification of ligand-binding motifs in proteins. Structure 2009; 17: 234–246.
- 32 Schalon C,Surgand JS,Kellenberger E,Rognan D. A simple and fuzzy method to align and compare druggable ligand-binding sites. Proteins 2008; 71: 1755–1778.
- 33 Yin S,Proctor EA,Lugovskoy AA,Dokholyan NV. Fast screening of protein surfaces using geometric invariant fingerprints. Proc Natl Acad Sci USA 2009; 106: 16622–16626.
- 34 Weill N,Rognan D. Alignment-free ultra-high-throughput comparison of druggable protein-ligand binding sites. J Chem Inf Model 2010; 50: 123–135.
- 35 Chikhi R,Sael L,Kihara D. Real-time ligand binding pocket database search using local surface descriptors. Proteins 2010; 78: 2007–2028.
- 36 Altschul SF,Madden TL,Schaffer AA,Zhang J,Zhang Z,Miller W,Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997; 25: 3389–3402.
- 37 Li H,Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 2010; 26: 589–595.
- 38 Jambon M,Imberty A,Deleage G,Geourjon C. A new bioinformatic approach to detect common 3D sites in protein structures. Proteins 2003; 52: 137–145.
- 39 Kabsch W,Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983; 22: 2577–2637.
- 40 Tabei Y,Uno T,Sugiyama M,Tsuda K. Single versus multiple sorting for all pairs similarity search. In the 2nd Asian Conference on Machine Learning (ACML2010), Tokyo, Japan, 2010; 13: 145–160.
- 41 Indyk P,Motwani R. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the 30th Annual ACM Symposium on Theory of Computing, Dallas, TX, 1998. pp. 604–613.
- 42 Goemans M,Williamson D. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J Assoc Comput Machine 1995; 42: 1115–1145.
- 43 Uno T. Multi-sorting algorithm for finding pairs of similar short substrings from large-scale string data. Knowl Inf Syst 2010; 25: 229–251.
- 44 Sommer I,Muller O,Domingues FS,Sander O,Weickert J,Lengauer T. Moment invariants as shape recognition technique for comparing protein binding sites. Bioinformatics 2007; 23: 3139–3146.
- 45 Gentry HR,Singer AU,Betts L,Yang C,Ferrara JD,Sondek J,Parise LV. Structural and biochemical characterization of CIB1 delineates a new family of EF-hand-containing proteins. J Biol Chem 2005; 280: 8407–8415.
- 46 Ahvazi B,Boeshans KM,Idler W,Baxa U,Steinert PM. Roles of calcium ions in the activation and activity of the transglutaminase 3 enzyme. J Biol Chem 2003; 278: 23834–23841.
- 47 Kato M,Ito T,Wagner G,Richardson CC,Ellenberger T. Modular architecture of the bacteriophage T7 primase couples RNA primer synthesis to DNA synthesis. Mol Cell 2003; 11: 1349–1360.
- 48 Pan H,Wigley DB. Structure of the zinc-binding domain of Bacillus stearothermophilus DNA primase. Structure 2000; 8: 231–239.
- 49 Huang L,Sun G,Cobessi D,Wang A,Shen J,Tung E,Anderson V,Berry E. 3-Nitropropionic acid is a suicide inhibitor of mitochondrial respiration that, upon oxidation by complex II, forms a covalent adduct with a catalytic base arginine in the active site of the enzyme. J Biol Chem 2005; 281: 5965–5972.
- 50 Jormakka M,Tornroth S,Byrne B,Iwata S. Molecular basis of proton motive force generation: structure of formate dehydrogenase-N. Science 2002; 295: 1863–1868.
- 51 Brautigam C,Wynn R,Chuang J,Machius M,Tomchick D,Chuang D. Structural insight into interactions between dihydrolipoamide dehydrogenase (E3) and E3 binding protein of human pyruvate dehydrogenase complex. Structure 2006; 14: 611–621.
- 52 Dalhus B,Saarinen M,Sauer UH,Eklund P,Johansson K,Karlsson A,Ramaswamy S,Bjork A,Synstad B,Naterstad K,Sirevag R,Eklund H. Structural basis for thermophilic protein stability: structures of thermophilic and mesophilic malate dehydrogenases. J Mol Biol 2002; 318: 707–721.
- 53 Sekulic N,Dietrich K,Paarmann I,Ort S,Konrad M,Lavie A. Elucidation of the active conformation of the APS-kinase domain of human PAPS synthetase 1. J Mol Biol 2007; 367: 488–500.
- 54 Yan Y,Sardana V,Xu B,Homnick C,Halczenko W,Buser CA,Schaber M,Hartman GD,Huber HE,Kuo LC. Inhibition of a mitotic motor protein: where, how, and conformational consequences. J Mol Biol 2004; 335: 547–554.
- 55 Yuan YR,Blecker S,Martsinkevich O,Millen L,Thomas PJ,Hunt JF. The crystal structure of the MJ0796 ATP-binding cassette. Implications for the structural consequences of ATP hydrolysis in the active site of an ABC transporter. J Biol Chem 2001; 276: 32313–32321.
- 56 Sugahara M,Ohshima N,Ukita Y,Kunishima N. Structure of ATP-dependent phosphoenolpyruvate carboxykinase from Thermus thermophilus HB8 showing the structural basis of induced fit and thermostability. Acta Crystallogr D Biol Crystallogr 2005; 61(Part 11): 1500–1507.
- 57 Grabarek Z. Structural basis for diversity of the EF-hand calcium-binding proteins. J Mol Biol 2006; 359: 509–525.
- 58 Qian X,Jeon C,Yoon H,Agarwal K,Weiss MA. Structure of a new nucleic-acid-binding motif in eukaryotic transcriptional elongation factor TFIIS. Nature 1993; 365: 277–279.
- 59 Larsson KM,Andersson J,Sjoberg BM,Nordlund P,Logan DT. Structural basis for allosteric substrate specificity regulation in anaerobic ribonucleotide reductases. Structure 2001; 9: 739–750.
- 60 Park IY,Youn B,Harley JL,Eidsness MK,Smith E,Ichiye T,Kang C. The unique hydrogen bonded water in the reduced form of Clostridium pasteurianum rubredoxin and its possible role in electron transfer. J Biol Inorg Chem 2004; 9: 423–428.
- 61 Liaw SH,Jun G,Eisenberg D. Interactions of nucleotides with fully unadenylylated glutamine synthetase from Salmonella typhimurium. Biochemistry 1994; 33: 11184–11188.
- 62 Lima CD,Wang LK,Shuman S. Structure and mechanism of yeast RNA triphosphatase: an essential component of the mRNA capping apparatus. Cell 1999; 99: 533–543.
- 63 Babor M,Gerzon S,Raveh B,Sobolev V,Edelman M. Prediction of transition metal-binding sites from apo protein structures. Proteins 2008; 70: 208–217.
- 64 Dym O,Eisenberg D. Sequence-structure analysis of FAD-containing proteins. Protein Sci 2001; 10: 1712–1728.
- 65 Kinoshita K,Sadanami K,Kidera A,Go N. Structural motif of phosphate-binding site common to various protein superfamilies: all-against-all structural comparison of protein-mononucleotide complexes. Protein Eng 1999; 12: 11–14.
- 66 Lehmann C,Begley TP,Ealick SE. Structure of the Escherichia coli ThiS-ThiF complex, a key component of the sulfur transfer system in thiamin biosynthesis. Biochemistry 2006; 45: 11–19.
- 67 Zeqiraj E,Filippi BM,Deak M,Alessi DR,van Aalten DM. Structure of the LKB1-STRAD-MO25 complex reveals an allosteric mechanism of kinase activation. Science 2009; 326: 1707–1711.
- 68 Leader DP. Identification of protein kinases by computer. Nature 1988; 333: 308.
- 69 Malakhova M,Tereshko V,Lee SY,Yao K,Cho YY,Bode A,Dong Z. Structural basis for activation of the autoinhibitory C-terminal kinase domain of p90 RSK2. Nat Struct Mol Biol 2008; 15: 112–113.
- 70 Mattevi A,Tedeschi G,Bacchella L,Coda A,Negri A,Ronchi S. Structure of L-aspartate oxidase: implications for the succinate dehydrogenase/fumarate reductase oxidoreductase family. Structure 1999; 7: 745–756.
- 71 Mattevi A,Vanoni MA,Todone F,Rizzi M,Teplyakov A,Coda A,Bolognesi M,Curti B. Crystal structure of D-amino acid oxidase: a case of active site mirror-image convergent evolution with flavocytochrome b2. Proc Natl Acad Sci USA 1996; 93: 7496–7501.
- 72 Bossi RT,Negri A,Tedeschi G,Mattevi A. Structure of FAD-bound L-aspartate oxidase: insight into substrate specificity and catalysis. Biochemistry 2002; 41: 3018–3024.
- 73 Bock C,Kaufman-Katz A,Markham G,Glusker J. Manganese as a replacement for magnesium and zinc: functional comparison of the divalent ions. J Am Chem Soc 1999; 121: 7360–7372.
- 74 Ng KK,Drickamer K,Weis WI. Structural analysis of monosaccharide recognition by rat liver mannose-binding protein. J Biol Chem 1996; 271: 663–674.
- 75 Ng KK,Kolatkar AR,Park-Snyder S,Feinberg H,Clark DA,Drickamer K,Weis WI. Orientation of bound ligands in mannose-binding proteins. Implications for multivalent ligand recognition. J Biol Chem 2002; 277: 16088–16095.