Residue–Residue contact substitution probabilities derived from aligned three-dimensional structures and the identification of common folds

Michael A. Rodionov

ICRF Unit of Structural Molecular Biology, Department of Crystallography, Birkbeck College, University of London, Malet Street, London WC1E 7HX, United Kingdom

Institute of Bioorganic Chemistry, Belarus Academy of Sciences, Zhodinscaya str. 5/2, Minsk-141, Republic of Belarus 220141

Search for more papers by this author

Mark S. Johnson,

Mark S. Johnson

[email protected]

ICRF Unit of Structural Molecular Biology, Department of Crystallography, Birkbeck College, University of London, Malet Street, London WC1E 7HX, United Kingdom

Search for more papers by this author

Michael A. Rodionov,

Michael A. Rodionov

ICRF Unit of Structural Molecular Biology, Department of Crystallography, Birkbeck College, University of London, Malet Street, London WC1E 7HX, United Kingdom

Institute of Bioorganic Chemistry, Belarus Academy of Sciences, Zhodinscaya str. 5/2, Minsk-141, Republic of Belarus 220141

Search for more papers by this author

Mark S. Johnson,

Mark S. Johnson

[email protected]

ICRF Unit of Structural Molecular Biology, Department of Crystallography, Birkbeck College, University of London, Malet Street, London WC1E 7HX, United Kingdom

Search for more papers by this author

First published: December 1994

https://doi.org/10.1002/pro.5560031221

Citations: 11

About

PDF

Tools

Share a link

Email
Wechat
Bluesky

Abstract

We report the derivation of scores that are based on the analysis of residue-residue contact matrices from 443 3-dimensional structures aligned structurally as 96 families, which can be used to evaluate sequence-structure matches. Residue–residue contacts and the more than 3 × 10⁶ amino acid substitutions that take place between pairs of these contacts at aligned positions within each family of structures have been tabulated and segregated according to the solvent accessibility of the residues involved. Contact maps within a family of structures are shown to be highly conserved (∼75%) even when the sequence identity is approaching 10%. In a comparison involving a globin structure and the search of a sequence databank (>21,000 sequences), the contact probability scores are shown to provide a very powerful secondary screen for the top scoring sequence–structure matches, where between 69% and 84% of the unrelated matches are eliminated. The search of an aligned set of 2 globins against a sequence databank and the subsequent residue contact-based evaluation of matches locates all 618 globin sequences before the first non-globin match. From a single bacterial serine proteinase structure, the structural template approach coupled with residue–residue contact substitution data lead to the detection of the mammalian serine proteinase family among the top matches in the search of a sequence databank.

References

Abola EE, Bernstein FC, Bryant SH, Koetzle TF, Weng J. 1987. Protein Data Bank. In: FH Allen, G Bergerhoff, R Sievers, eds. Crystallographic databases—Information content, software systems, scientific applications. Data Commission of the International Union of Crystallography. pp 107–132.
Google Scholar
Barton GJ, Sternberg MJE. 1990. Flexible protein sequence patterns. A sensitive method to detect weak structural similarities. J Mol Biol 212: 389–402.
10.1016/0022-2836(90)90133-7
CAS PubMed Web of Science® Google Scholar
Bernstein FC, Koetzle TF, Williams GJB, Meyer EF Jr, Brice MD, Rodgers JR, Kennard O, Shimanouchi T, Tasumi M. 1977. The Protein Data Bank: A computer based archival file for macromolecular structures. J Mol Biol 112: 535–542.
10.1016/S0022-2836(77)80200-3
CAS PubMed Web of Science® Google Scholar
Bleasby AJ, Wooton JC. 1990. Construction of validated, non-redundant composite protein sequence databases. Protein Eng 3: 153–159.
10.1093/protein/3.3.153
CAS PubMed Web of Science® Google Scholar
Blundell TL, Johnson MS. 1993. Catching a common fold. Protein Sci 2: 877–883.
10.1002/pro.5560020602
CAS PubMed Web of Science® Google Scholar
Bowie JU, Lüthy R, Eisenberg D. 1991. A method to identify protein sequences that fold into known three-dimensional structures. Science 253: 164–170.
10.1126/science.1853201
CAS PubMed Web of Science® Google Scholar
Dayhoff MO, Schwartz RM, Orcutt BC. 1978. A model for evolutionary change. In: MO Dayhoff, ed. Atlas of protein sequence and structure, vol 5, suppl 3. National Biomedical Research Foundation. pp 345–358.
Google Scholar
Edwards YJ, Johnson MS, Moss DS, Blundell TL. 1994. The effects of local environments on the pattern of amino-acid substitution in homologous protein structures. The role of side-chain to main-chain van der Waals interactions. In: JW Crabb, ed. Techniques in protein chemistry. San Diego, California: Academic Press pp. 405–412.
Google Scholar
Felsenstein J. 1985. Confidence limits on phylogenies: An approach using the bootstrap. Evolution 39: 783–791.
10.1111/j.1558-5646.1985.tb00420.x
PubMed Web of Science® Google Scholar
Fermi G, Perutz M, Shaanan B, Fourme R. 1984. The crystal structure of human deoxyhaemoglobin at 1.74 Å resolution. J Mol Biol 175: 159–174.
10.1016/0022-2836(84)90472-8
CAS PubMed Web of Science® Google Scholar
Fredman ML. 1984. Computing evolutionary similarity measures with length independent gap penalties. Bull Math Biol 46: 553–566.
10.1007/BF02459503
Web of Science® Google Scholar
Fujinaga M, Delbaere LTJ, Brayer GD, James MNG. 1985. Refined structure of α-lytic protease at 1.7 Å resolution: Analysis of hydrogen bonding and solvent structure. J Mol Biol 184: 479–502.
10.1016/0022-2836(85)90296-7
CAS PubMed Google Scholar
Galaktionov SG, Rodionov MA. 1980. Calculation of the tertiary structure of proteins on the basis of analysis of matrices of contacts between amino acid residues. Biofisics 25: 395–403. [Translated from Biofizika 25:385–392.]
Google Scholar
Galaktionov SG, Rodionov MA, Golubovich VP. 1975. Investigation of residue-residue contact systems in globular proteins. Abst Soviet-French Symp Phys Chem Proteins, Pushino on Oka: 66.
Google Scholar
Godzik A, Kolinski A, Skolnick J. 1992. Topology fingerprint approach to the inverse folding problem. J Mol Biol 226: 227–238.
10.1016/0022-2836(92)90693-E
PubMed Web of Science® Google Scholar
Hubbard TJP, Blundell T. 1987. Comparison of solvent inaccessible cores of homologous proteins: Definitions useful in protein modelling. Protein Eng 1: 155–171.
10.1093/protein/1.3.159
Web of Science® Google Scholar
Johnson MS, Overington JP, Blundell TL. 1993. Alignment and searching for common protein folds using a data bank of structural templates. J Mol Biol 231: 735–752.
10.1006/jmbi.1993.1323
CAS PubMed Web of Science® Google Scholar
Johnson MS, Overington JP, Edwards Y, May ACW, Rodionov MA. 1994a. The comparison of structures and sequences: Alignment, searching and the detection of common folds. Proc 27th Hawaii Int Conf Sys Sci V: 296–305.
Google Scholar
Johnson MS, Srinivasan N, Sowdhamini R, Blundell TL. 1994b. Knowledge-based protein modelling. Crit Rev Biochem Mol Biol 29: 1–68.
10.3109/10409239409086797
CAS PubMed Web of Science® Google Scholar
Jones DT, Taylor WR, Thornton JM. 1992. A new approach to protein fold recognition. Nature 358: 86–89.
10.1038/358086a0
CAS PubMed Web of Science® Google Scholar
Kuntz ID. 1975. An approach to the tertiary structure of globular proteins. J Am Chem Soc 97: 4362–4366.
10.1021/ja00848a038
CAS PubMed Web of Science® Google Scholar
Lüthy R, McLachlan AD, Eisenberg D. 1991. Secondary structure-based profiles: Use of the structure-conserving scoring tables in searching protein sequence databases for structural similarities. Proteins Struct Funct Genet 10: 229–239.
10.1002/prot.340100307
CAS PubMed Web of Science® Google Scholar
Needleman SB, Wunsch C. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48: 444–453.
10.1016/0022-2836(70)90057-4
CAS Web of Science® Google Scholar
Ooi T, Nishikawa K. 1974. Comparison of homologous tertiary structures. J Theor Biol 43: 351–374.
10.1016/S0022-5193(74)80066-4
PubMed Web of Science® Google Scholar
Ouzounis C, Sander C, Scharf M, Schneider R. 1993. Prediction of protein structure by evaluation of sequence-structure fitness. J Mol Biol 232: 805–825.
10.1006/jmbi.1993.1433
CAS PubMed Web of Science® Google Scholar
Overington JP, Donnelly D, Sali A, Johnson MS, Blundell TL. 1992. Environmental-specific amino acid substitution tables: Tertiary templates and prediction of protein folds. Protein Sci 1: 216–226.
10.1002/pro.5560010203
CAS PubMed Web of Science® Google Scholar
Overington JP, Johnson MS, Sali A, Blundell TL. 1990. Tertiary structural constraints on protein evolutionary diversity: Templates, key residues and structural prediction. Proc R Soc Lond B 241: 146–152.
10.1098/rspb.1990.0077
Web of Science® Google Scholar
Overington JP, Zhu ZY, Johnson MS, Sowdhamini R, Louie GV, Blundell TL. 1993. Molecular recognition in protein families: A database of aligned 3D structures of related proteins. Biochem Soc Symp 21: 597–604.
CAS Web of Science® Google Scholar
Phillips DS. 1970. Development of crystallographic enzymology. Biochem Soc Symp N31: 11–28.
Google Scholar
Rodionov MA, Galaktionov SG. 1992a. The analysis of protein three-dimensional structures in terms of residue–residue contact matrices. I. Contact criterion. Mol Biol 26: 773–776. [Translated from Mol Biol (Russia) 26:1160–1166.]
Web of Science® Google Scholar
Rodionov MA, Galaktionov SG. 1992b. The analysis of protein three-dimensional structures in terms of residue–residue contact matrices. II. Coordination numbers. Mol Biol 26: 777–783. [Translated from Mol Biol (Russia) 26:1167–1180.]
Web of Science® Google Scholar
Rodionov MA, Galaktionov SG, Aklrem AA. 1981. Prediction of amino acid residue accessibilities in globular proteins. Dokl Akad Nauk SSSR 261: 756–759.
CAS PubMed Web of Science® Google Scholar
Rodionov MA, Guzevich AV, Galaktionov SG. 1993. The analysis of protein three-dimensional structures in terms of residue–residue contact matrices. III. Residue affinity. Mol Biol 27: 220–224. [Translated from Mol Biol (Russia) 27:363–370.]
Web of Science® Google Scholar
Sali A, Overington JP, Johnson MS, Blundell TL. 1990. From comparison of protein sequences and structures to protein modelling and design. Trends Biochem Sci 15: 235–240.
10.1016/0968-0004(90)90036-B
CAS PubMed Web of Science® Google Scholar
Sippl MJ. 1990. Calculation of conformational ensembles from potentials of mean force. J Mol Biol 213: 859–883.
10.1016/S0022-2836(05)80269-4
CAS PubMed Web of Science® Google Scholar
Steigemann W, Weber E. 1979. Structure of erythrocruorin in different ligand states refined at 1.4 Å resolution. J Mol Biol 127: 309–338.
10.1016/0022-2836(79)90332-2
CAS PubMed Web of Science® Google Scholar
Taylor WR, Orengo CA. 1989. Protein structure alignment. J Mol Biol 208: 1–22.
10.1016/0022-2836(89)90084-3
CAS PubMed Web of Science® Google Scholar

Citing Literature

Volume3, Issue12

December 1994

Pages 2366-2377

Residue–Residue contact substitution probabilities derived from aligned three-dimensional structures and the identification of common folds

Abstract

References

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Residue–Residue contact substitution probabilities derived from aligned three-dimensional structures and the identification of common folds

Abstract

References

Citing Literature

References

Related

Information