Volume 3, Issue 12 pp. 2366-2377
Article
Free Access

Residue–Residue contact substitution probabilities derived from aligned three-dimensional structures and the identification of common folds

Michael A. Rodionov

Michael A. Rodionov

ICRF Unit of Structural Molecular Biology, Department of Crystallography, Birkbeck College, University of London, Malet Street, London WC1E 7HX, United Kingdom

Institute of Bioorganic Chemistry, Belarus Academy of Sciences, Zhodinscaya str. 5/2, Minsk-141, Republic of Belarus 220141

Search for more papers by this author
Mark S. Johnson

Mark S. Johnson

ICRF Unit of Structural Molecular Biology, Department of Crystallography, Birkbeck College, University of London, Malet Street, London WC1E 7HX, United Kingdom

Search for more papers by this author
First published: December 1994
Citations: 11

Abstract

We report the derivation of scores that are based on the analysis of residue-residue contact matrices from 443 3-dimensional structures aligned structurally as 96 families, which can be used to evaluate sequence-structure matches. Residue–residue contacts and the more than 3 × 106 amino acid substitutions that take place between pairs of these contacts at aligned positions within each family of structures have been tabulated and segregated according to the solvent accessibility of the residues involved. Contact maps within a family of structures are shown to be highly conserved (∼75%) even when the sequence identity is approaching 10%. In a comparison involving a globin structure and the search of a sequence databank (>21,000 sequences), the contact probability scores are shown to provide a very powerful secondary screen for the top scoring sequence–structure matches, where between 69% and 84% of the unrelated matches are eliminated. The search of an aligned set of 2 globins against a sequence databank and the subsequent residue contact-based evaluation of matches locates all 618 globin sequences before the first non-globin match. From a single bacterial serine proteinase structure, the structural template approach coupled with residue–residue contact substitution data lead to the detection of the mammalian serine proteinase family among the top matches in the search of a sequence databank.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.