Probabilistic alignment detects remote homology in a pair of protein sequences without homologous sequence information
Ryotaro Koike
Global Scientific Information and Computing Center, Tokyo Institute of Technology, Ookayama, Tokyo 152-8550, Japan
Institute for Bioinformatics Research and Development, Japan Science and Technology Agency, Japan
Search for more papers by this authorKengo Kinoshita
Structure and Function of Biomolecules, SORST, Japan Science and Technology Agency, Kawaguchi 332-0012, Japan
Institute of Medical Science, University of Tokyo, Tokyo 108-8639, Japan
Search for more papers by this authorCorresponding Author
Akinori Kidera
International Graduate School of Arts and Sciences, Yokohama City University, Tsurumi, Yokohama 230-0045, Japan
Institute of Molecular Science, Okazaki, 444-8585, Japan
International Graduate School of Arts and Sciences, Yokohama City University, Tsurumi, Yokohama 230-0045, Japan===Search for more papers by this authorRyotaro Koike
Global Scientific Information and Computing Center, Tokyo Institute of Technology, Ookayama, Tokyo 152-8550, Japan
Institute for Bioinformatics Research and Development, Japan Science and Technology Agency, Japan
Search for more papers by this authorKengo Kinoshita
Structure and Function of Biomolecules, SORST, Japan Science and Technology Agency, Kawaguchi 332-0012, Japan
Institute of Medical Science, University of Tokyo, Tokyo 108-8639, Japan
Search for more papers by this authorCorresponding Author
Akinori Kidera
International Graduate School of Arts and Sciences, Yokohama City University, Tsurumi, Yokohama 230-0045, Japan
Institute of Molecular Science, Okazaki, 444-8585, Japan
International Graduate School of Arts and Sciences, Yokohama City University, Tsurumi, Yokohama 230-0045, Japan===Search for more papers by this authorAbstract
Dynamic programming (DP) and its heuristic algorithms are the most fundamental methods for similarity searches of amino acid sequences. Their detection power has been improved by including supplemental information, such as homologous sequences in the profile method. Here, we describe a method, probabilistic alignment (PA), that gives improved detection power, but similarly to the original DP, uses only a pair of amino acid sequences. Receiver operating characteristic (ROC) analysis demonstrated that the PA method is far superior to BLAST, and that its sensitivity and selectivity approach to those of PSI-BLAST. Particularly for orphan proteins having few homologues in the database, PA exhibits much better performance than PSI-BLAST. On the basis of this observation, we applied the PA method to a homology search of two orphan proteins, Latexin and Resuscitation-promoting factor domain. Their molecular functions have been described based on structural similarities, but sequence homologues have not been identified by PSI-BLAST. PA successfully detected sequence homologues for the two proteins and confirmed that the observed structural similarities are the result of an evolutional relationship. Proteins 2007. © 2006 Wiley-Liss, Inc.
Supporting Information
The Supplementary Material referred to in this article can be found at http://www.interscience.wiley.com/jpages/0887-3585/suppmat/
Filename | Description |
---|---|
jws-prot.21240.doc67 KB | Supporting Information file jws-prot.21240.doc |
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
References
- 1 Pearson WR, Sierk ML. The limits of protein sequence comparison? Curr Opin Struct Biol 2005; 15: 254–260.
- 2 Dietmann S, Fernandez-Fuentes N, Holm L. Automated detection of remote homology. Curr Opin Struct Biol 2002; 12: 362–367.
- 3 Brenner SE, Chothia C, Hubbard TJ. Assesing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc Natl Acad Sci USA 1998; 95: 6073–6078.
- 4 Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol 1981; 147: 195–197.
- 5 Pearson WR, Lipman DJ. Improved tools for biological sequence comparison. Proc Natl Acad Sci USA 1988; 85: 2444–2448.
- 6 Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol 1990; 215: 403–410.
- 7 Gribskov M, McLachlan AD, Eisenberg D. Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci USA 1987; 84: 4355–4358.
- 8 Henikoff S, Henikoff JG. Embedding strategies for effective use of information from multiple sequence alignments. Protein Sci 1997; 6: 698–705.
- 9 Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997; 25: 3389–3402.
- 10 Eddy SR. Profile hidden Markov models. Bioinformatics 1998; 14: 755–763.
- 11 Karplus K, Barrett C, Hughey R. Hidden Markov models for detecting remote protein homologies. Bioinformatics 1998; 14: 846–856.
- 12 Rychlewski L, Jaroszewski L, Li W, Godzik A. Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Sci 2000; 9: 232–241.
- 13 Yona G, Levitt M. Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. J Mol Biol 2002; 315: 1257–1275.
- 14 Sadreyev R, Grishin NV. COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. J Mol Biol 2003; 326: 317–336.
- 15 Kelley LA, MacCallum RM, Sternberg MJ. Enhanced genome annotation using structural profiles in the program 3D-PSSM. J Mol Biol 2000; 299: 499–520.
- 16 Panchenko AR, Marchler-Bauer A, Bryant SH. Combination of threading potentials and sequence profiles improves fold recognition. J Mol Biol 2000; 296: 1319–1331.
- 17 Ginalski K, Pas J, Wyrwicz LS, von Grotthuss M, Bujnicki JM, Rychlewski L. Nucleic Acids Res 2003; 31: 3804–3807.
- 18 Koike R, Kinoshita K, Kidera A. Probabilistic description of protein alignments for sequences and structures. Proteins 2004; 56: 157–166.
- 19 Siew N, Fischer D. Structural biology sheds light on the puzzle of genomic ORFans. J Mol Biol 2004; 342: 369–373.
- 20 Siew N, Saini NK, Fischer D. A putative novel α/β hydrolase ORFan family in Bacillus. FEBS Lett 2005; 579: 3175–3182.
- 21 Gvibskov M, Robinson NL. Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching. Comput Chem 1996; 20: 25–33.
- 22 Aagaard A, Listwan P, Cowieson N, Huber T, Ravasi T, Wells CA, Flanagan JU, Kellie S, Hume DA, Kobe B, Martin JL. An inflammatory role for the mammalian carboxypeptidase inhibitor latexin: relationship to cystatins and the tumor suppressor TIG1. Structure 2005; 13: 309–317.
- 23 Pallares I, Bonet R, Carcia-Castellanos R, Ventura S, Aviles FX, Vendrell J, Gomis-Ruth FX. Structure of human carboxypeptidase A4 with its endogeneous protein inhibitor, latexin. Proc Natl Acad Sci USA 2005; 102: 3978–3983.
- 24 Cohen-Gonsaud M, Barthe P, Bagneris C, Henderson B, Ward J, Roumestand C, Keep NH. The structure of a resuscitation-promoting factor domain from Mycobacterium tuberculosis shows homology to lysozymes. Nat Struct Mol Biol 2005; 12: 270–273.
- 25 Andreeva A, Howorth D, Brenner SE, Hubbard TJ, Chothia C, Murzin AG. SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res 2004; 32: D226–D229.
- 26 Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lepez R, Magrane M, Martin MJ, Natale DA, O'Donovan C, Redaschi N, Yeh LS. The universal protein resource (UniProt). Nucleic Acids Res 2005; 33: D154–D159.
- 27 Gautron J, Hincke MT, Mann K, Panhéleux M, Bain M, McKee MD, Solomon SE, Nys Y. Ovocalyxin-32, a novel chicken eggshell matrix protein. J Biol Chem 2001; 276: 39243–39252.
- 28 Kraulis PJ. MOLSCRIPT. A program to produce both detailed and schematic plots of protein structures. J Appl Crystallogr 1991; 24: 946–950.
- 29 Mukamolova GV, Kaprelyants AS, Young DI, Kell DB. A bacterial cytokine. Proc Natl Acad Sci USA 1998; 95: 8916–8921.
- 30 Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 1992; 89: 10915–10919.
- 31 Bishop MJ, Thompson EA. Maximum likelihood alignment of DNA sequences. J Mol Biol 1986; 190: 159–165.
- 32 Thorne JL, Kishino H, Felsenstein J. An evolutionary model for maximum likelihood alignment of DNA sequences. J Mol Evol 1991; 33: 114–124.
- 33 Hein J, Wiuf C, Knudsen B, Moller MB, Wibling G. Statistical alignment: computational properties, homology testing and goodness-of-fit. J Mol Biol 2000; 302: 265–279.
- 34 Barrett AJ. The cystatins: a diverse superfamily of cysteine peptidase inhibitors. Biomed Biochim Acta 1986; 45: 1363–1374.