Conserved sequence features of inteins (protein introns) and their use in identifying new inteins and related proteins
Shmuel Pietrokovski
Department of Structural Biology, The Weizmann Institute of Science, Rehovot 76100, Israel
Search for more papers by this authorShmuel Pietrokovski
Department of Structural Biology, The Weizmann Institute of Science, Rehovot 76100, Israel
Search for more papers by this authorAbstract
Inteins (protein introns) are internal portions of protein sequences that are posttranslationally excised while the flanking regions are spliced together, making an additional protein product. Inteins have been found in a number of homologous genes in yeast, mycobacteria, and extreme thermophile archaebacteria. The inteins are probably multifunctional, autocatalyzing their own splicing, and some were also shown to be DNA endonucleases. The splice junction regions and two regions similar to homing endonucleases were thought to be the only common sequence features of inteins.
This work analyzed all published intein sequences with recently developed methods for detecting weak, conserved sequence features. The methods complemented each other in the identification and assessment of several patterns characterizing the intein sequences. New intein conserved features are discovered and the known ones are quantitatively described and localized. The general sequence description of all the known inteins is derived from the motifs and their relative positions. The intein sequence description is used to search the sequence databases for intein-like proteins. A sequence region in a mycobacterial open reading frame possessing all of the intein motifs and absent from sequences homologous to both of its flanking sequences is identified as an intein. A newly discovered putative intein in red algae chloroplasts is found not to contain the endonuclease motifs present in all other inteins. The yeast HO endonuclease is found to have an overall intein-like structure and a few viral polyprotein cleavage sites are found to be significantly similar to the inteins amino-end splice junction motif. The intein features described may serve for detection of intein sequences.
References
- Altschul SF, Boguski MS, Gish W, Wootton JC. 1994. Issues in searching molecular sequence databases. Nature Genet 6: 119–129.
- Altschul SF, Carroll RJ, Lipman DJ. 1989. Weights for data related by a tree. J Mol Biol 207: 647–653.
- Anraku Y, Hirata R. 1994. Protozyme: Emerging evidence in nature. J Biol Chem 115: 175–178.
- Arnold E, Luo M, Vriend G, Rossmann MG, Palmenberg AC, Parks GD, Nicklin MJ, Wimmer E. 1987. Implications of the picornavirus capsid structure for polyprotein processing. Proc Natl Acad Sci USA 84: 21–25.
- Bairoch A, Boeckmann B. 1993. The SWISS-PROT protein sequence data bank: Recent developments. Nucleic Acids Res 21: 3093–3096.
- Benson D, Lipman DJ, Ostell J. 1993. GenBank. Nucleic Acids Res 21: 2963–2965.
- Cooper AA, Chen YJ, Lindorfer MA, Stevens TH. 1993. Protein splicing of the yeast TFP1 intervening protein sequence: A model for self-excision. EMBO J 12: 2575–2583.
- Cooper AA, Stevens TH. 1993. Protein splicing: Excision of intervening sequences at the protein level. BioEssays 15: 667–674.
- Davis EO, Jenner PJ, Brooks PC, Colston MJ, Sedgwick SG. 1992. Protein splicing in the maturation of M. tuberculosis recA protein: A mechanism for tolerating a novel class of intervening sequence. Cell 71: 201–210.
- Davis EO, Sedgwick SG, Colston MJ. 1991. Novel structure of the recA locus of Mycobacterium tuberculosis implies processing of gene product. J Bacteriol 173: 5653–5662.
- Davis EO, Thangaraj HS, Brooks PC, Colston MJ. 1994. Evidence of selection for protein introns in the recAs of pathogenic mycobacteria. EMBO J 13: 699–703.
- Doolittle RF. 1993. The comings and goings of homing endonucleases and mobile introns. Proc Natl Acad Sci USA 90: 5379–5381.
- Dujon B. 1989. Group I introns as mobile genetic elements: Facts and mechanistic speculations — A review. Gene 82: 91–114.
- Gimble FS, Thorner J. 1993. Purification and characterization of VDE, a site-specific endonuclease from the yeast Saccharomyces cerevisiae. J Biol Chem 268: 21844–21853.
- Gu HH, Xu J, Gallagher M, Dean GE. 1993. Peptide splicing in the vacuolar ATPase subunit A from Candida tropicalis. J Biol Chem 268: 7372–7381.
- Henikoff S. 1992. Detection of Caenorhabditis transposon homologs in diverse organisms. New Biologist 4: 382–388.
- Henikoff S, Henikoff JG. 1991. Automated assembly of protein blocks for database searching. Nucleic Acids Res 19: 6565–6572.
- Henikoff S, Henikoff JG. 1992. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 89: 10915–10919.
- Henikoff S, Henikoff JG. 1994. Protein family classification based on searching a database of blocks. Genomics 19: 97–107.
- Hensgens LAM, Bonen L, de Haan M, Van der Horst G, Grivell LA. 1983. Two intron sequences in yeast mitochondrial COX1 gene: Homology among URF-containing introns and strain-dependent variation in flanking exons. Cell 32: 379–389.
- Hirata R, Ohsumi Y, Nakano A, Kawasaki H, Suzuki K, Anraku Y. 1990. Molecular structure of a gene, VMA1, encoding the catalytic subunit of H+-translocating adenosine triphosphatase from vacuolar membranes of Saccharomyces cerevisiae. J Biol Chem 265: 6726–6733.
- Hodges RA, Perler FB, Noren JN, Jack WE. 1992. Protein splicing removes intervening sequences in an Archaea DNA polymerase. Nucleic Acids Res 20: 6153–6157.
- Kane PM, Yamashiro CT, Wolczyk DF, Neff N, Goebl M, Stevens TH. 1990. Protein splicing converts the yeast TFP1 gene product to the 69-kD subunit of the vacuolar H+-adenosine triphosphatase. Science 250: 651–657.
- Karlin S, Altschul SF. 1990. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc Natl Acad Sci USA 87: 2264–2268.
- Kostriken R, Strathern JN, Klar AJ, Hicks JB, Heffron F. 1983. A site-specific endonuclease essential for mating-type switching in Saccharomyces cerevisiae. Cell 35: 167–174.
- Kostrzewa M, Zetsche K. 1992. Large ATP synthase operon of the red alga Antithamnion spec. resembles the corresponding operon in Cyanobacteria. J Mol Biol 227: 961–970.
- Kostrzewa M, Zetsche K. 1993. Organization of plastid-encoded ATPase genes and flanking regions including homologues of infB and tsf in the thermophilic red alga Galdieria sulphuraria. Plant Mol Biol 23: 67–76.
- Lambowitz AM, Belfort M. 1993. Introns as mobile genetic elements. Annu Rev Biochem 62: 587–622.
- Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC. 1993. Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science 262: 208–214.
- Michel F, Jacquier A, Dujon B. 1982. Comparison of fungal mitochondrial introns reveals extensive homologies in RNA secondary structure. Biochimie 64: 867–881.
- Neff NF. 1993. Protein splicing: Selfish genes invade cellular proteins. Curr Opin Cell Biol 5: 971–976.
- Neuwald AF, Green P. 1994. Detecting patterns in protein sequences. J Mol Biol 239: 698–712.
- Perler FB, Comb DG, Jack WE, Moran LS, Qiang B, Kucera RB, Benner J, Slatko BE, Nwankwo DO, Hempstead SK, Carlow CKS, Jannasch H. 1992. Intervening sequences in an Archaea DNA polymerase gene. Proc Natl Acad Sci USA 89: 5577–5581.
- Perler FB, Davis EO, Dean GE, Gimble FS, Jack WE, Neff N, Noren JN, Thorner J, Belfort M. 1994. Protein splicing elements: Inteins and exteins — A definition of terms and recommended nomenclature. Nucleic Acids Res 22: 1125–1127.
- Posfai J, Bhagwat AS, Posfai G, Roberts RJ. 1989. Predictive motifs derived from cytosine methyltransferases. Nucleic Acids Res 17: 2421–2435.
- Russell DW, Jensen R, Zoller MJ, Burke J, Errede B, Smith M, Herskowitz I. 1986. Structure of the Saccharomyces cerevisiae HO gene and analysis of its upstream regulatory region. Mol Cell Biol 6: 4281–4294.
- Schneider TD, Stephens RM. 1990. Sequence logos: A new way to display consensus sequences. Nucleic Acids Res 18: 6097–6100.
- Schneider TD, Stormo GD, Gold L, Ehrenfeucht A. 1986. Information content of binding sites on nucleotide sequences. J Mol Biol 188: 415–431.
- Schuler GD, Altschul SF, Lipman DJ. 1991. A workbench for multiple alignment construction and analysis. Proteins Struct Funct Genet 9: 180–190.
- Shub DA, Goodrich-Blair H. 1992. Protein introns: A new home for endonucleases. Cell 71: 183–186.
- Smith HO, Annau TM, Chandrasegaran S. 1990. Finding sequence motifs in groups of functionally related proteins. Proc Natl Acad Sci USA 87: 826–830.
- Strathern JN, Klar AJ, Hicks JB, Abraham JA, Ivy JM, Nasmyth KA, McGill C. 1982. Homothallic switching of yeast mating type cassettes is initiated by a double-stranded cut in the MAT locus. Cell 31: 183–192.
- Thompson JD, Higgins DG, Gibson TJ. 1994. Improved sensitivity of profile searches through the use of sequence weights and gap excision. CABIOS 10: 19–29.
- Thony-Meyer L, Bock A, Hennecke H. 1992. Prokaryotic polyprotein precursors. FEBS Lett 307: 62–65.
- Wallace CJA. 1993. The curious case of protein splicing: Mechanistic insights suggested by protein semisynthesis. Protein Sci 2: 697–705.
- Wallace JC, Henikoff S. 1992. PATMAT: A searching and extraction program for sequence, pattern and block queries and databases. CABIOS 8: 249–254.
- Waring RB, Davies RW, Scazzocchio C, Brown TA. 1982. Internal structure of a mitochondrial intron of Aspergillus nidulans. Proc Natl Acad Sci USA 79: 6332–6336.
- Williamson DH, Gardner MJ, Preiser P, Moore DJ, Rangachari K, Wilson RJM. 1994. The evolutionary origin of the 35 kb circular DNA of Plasmodium falciparum: New evidence supports a possible rhodophyte ancestry. Mol Gen Genet 243: 249–252.
- Xu MQ, Southworth MW, Mersha FB, Hornstra LJ, Perler FB. 1993. In vitro protein splicing of purified precursor and the identification of a branched intermediate. Cell 75: 1371–1377.