Quantifying the evolutionary divergence of protein structures: The role of function change and function conservation†
Alberto Pascual-García
Centro de Biología Molecular ‘Severo Ochoa’ (CSIC-UAM), Cantoblanco, Madrid 28049, Spain
Search for more papers by this authorDavid Abia
Centro de Biología Molecular ‘Severo Ochoa’ (CSIC-UAM), Cantoblanco, Madrid 28049, Spain
Search for more papers by this authorRaúl Méndez
Centro de Biología Molecular ‘Severo Ochoa’ (CSIC-UAM), Cantoblanco, Madrid 28049, Spain
Search for more papers by this authorGonzalo S. Nido
Centro de Biología Molecular ‘Severo Ochoa’ (CSIC-UAM), Cantoblanco, Madrid 28049, Spain
Search for more papers by this authorCorresponding Author
Ugo Bastolla
Centro de Biología Molecular ‘Severo Ochoa’ (CSIC-UAM), Cantoblanco, Madrid 28049, Spain
Centro de Biologia Molecular ‘Severo Ochoa’, (CSIC-UAM), Cantoblanco, Madrid 28049, Spain===Search for more papers by this authorAlberto Pascual-García
Centro de Biología Molecular ‘Severo Ochoa’ (CSIC-UAM), Cantoblanco, Madrid 28049, Spain
Search for more papers by this authorDavid Abia
Centro de Biología Molecular ‘Severo Ochoa’ (CSIC-UAM), Cantoblanco, Madrid 28049, Spain
Search for more papers by this authorRaúl Méndez
Centro de Biología Molecular ‘Severo Ochoa’ (CSIC-UAM), Cantoblanco, Madrid 28049, Spain
Search for more papers by this authorGonzalo S. Nido
Centro de Biología Molecular ‘Severo Ochoa’ (CSIC-UAM), Cantoblanco, Madrid 28049, Spain
Search for more papers by this authorCorresponding Author
Ugo Bastolla
Centro de Biología Molecular ‘Severo Ochoa’ (CSIC-UAM), Cantoblanco, Madrid 28049, Spain
Centro de Biologia Molecular ‘Severo Ochoa’, (CSIC-UAM), Cantoblanco, Madrid 28049, Spain===Search for more papers by this authorThe authors state no conflict of interest.
Abstract
The molecular clock hypothesis, stating that protein sequences diverge in evolution by accumulating amino acid substitutions at an almost constant rate, played a major role in the development of molecular evolution and boosted quantitative theories of evolutionary change. These studies were extended to protein structures by the seminal paper by Chothia and Lesk, which established the approximate proportionality between structure and sequence divergence. Here we analyse how function influences the relationship between sequence and structure divergence, studying four large superfamilies of evolutionarily related proteins: globins, aldolases, P-loop and NADP-binding. We introduce the contact divergence, which is more consistent with sequence divergence than previously used structure divergence measures. Our main findings are: (1) Small structure and sequence divergences are proportional, consistent with the molecular clock. Approximate validity of the clock is also supported by the analysis of the clustering coefficient of structure similarity networks. (2) Functional constraints strongly limit the structure divergence of proteins performing the same function and may allow to identify incomplete or wrong functional annotations. (3) The rate of structure versus sequence divergence is larger for proteins performing different functions than for proteins performing the same function. We conjecture that this acceleration is due to positive selection for new functions. Accelerations in structure divergence are also suggested by the analysis of the clustering coefficient. (4) For low sequence identity, structural diversity explodes. We conjecture that this explosion is related to functional diversification. (5) Large indels are almost always associated with function changes. Proteins 2010. © 2009 Wiley-Liss, Inc.
Supporting Information
Additional Supporting Information may be found in the online version of this article.
Filename | Description |
---|---|
PROT_22616_sm_suppinfo.pdf285.9 KB | Supporting Information |
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
REFERENCES
- 1 Bromham L,Penny D. The modern molecular clock. Nature Reviews Genetics 2003; 4: 216–224.
- 2 Zuckerkandl E,Pauling L. In: M Kasha, B Pullman, editors. Horizons in biochemistry. New York: Academic Press; 1962.
- 3 Kimura M. Evolutionary rate at the molecular level. Nature 1968; 217: 624–626.
- 4 King J-L,Jukes TH. Non-Darwinian evolution. Science 1969; 164: 788–798.
- 5 Ohta T,Kimura M. On the constancy of the evolutionary rate of cistrons. J Mol Evol 1971; 1: 18–25.
- 6
Kimura M.
The neutral theory of molecular evolution.
Cambridge:
Cambridge University Press;
1983.
10.1017/CBO9780511623486 Google Scholar
- 7 Ohta T. Role of very slightly deleterious mutations in molecular evolution and polymorphism. Theor Pop Biol 1976; 10: 254–275.
- 8
Durrett R.
Probability models for DNA sequence evolution.
New York:
Springer;
2002.
10.1007/978-1-4757-6285-3 Google Scholar
- 9 Sella G,Hirsch AE. The application of statistical physics to evolutionary biology. Proc Natl Acad Sci USA 2005; 102: 9541–9546.
- 10 Bastolla U,Moya A,Viguera E,van Ham RCHJ. Genomic determinants of protein folding thermodynamics. J Mol Biol 2004; 343: 1451–1466.
- 11 Gillespie JH. The causes of molecular evolution. Oxford: Oxford University Press; 1991.
- 12 Graur D,Li WH. Fundamentals of molecular evolution. Sinauer, Sunderland: Vagaries of the molecular clock; 2000.
- 13 Chothia C,Lesk AM. The relation between the divergence of sequence and structure in proteins. EMBO J 1986; 5: 823–826.
- 14 Murzin AG,Brenner SE,Hubbard T,Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 1995; 247: 536–540.
- 15 Orengo CA,Michie AD,Jones S,Jones DT,Swindells MB,Thornton JM. CATH-a hierarchic classification of protein domain structures. Structure 1997; 5: 1093–1108.
- 16 Grishin NV. Fold change in evolution of protein structures. J Struct Biol 2001; 134: 167–185.
- 17 Krishna SS,Grishin NV. Structural drift: a possible path to protein fold change. Bioinformatics 2005; 21: 1308–1310.
- 18 Viksna J,Gilbert D. Assessment of the probabilities for evolutionary structural changes in protein folds Bioinformatics 2007; 23: 832–841.
- 19 Pascual-García A,Abia D,Ortiz AR,Bastolla U. Cross-over between discrete and continuous protein structure space: Insights into automatic classification and networks of protein structures. PLoS Comput Biol 2009; 5: e1000331.
- 20
Devos D,Valencia A.
Practical limits of function prediction.
Proteins
2000;
41:
98–107.
10.1002/1097-0134(20001001)41:1<98::AID-PROT120>3.0.CO;2-S CAS PubMed Web of Science® Google Scholar
- 21 Wilson CA,Kreychman J,Gerstein M. Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. J Mol Biol 2000; 297: 233–249.
- 22 Todd AE,Orengo CA,Thornton JM. Evolution of function in protein superfamilies, from a structural perspective. J Mol Biol 2001; 307: 1113–1143.
- 23 Lecomte JT,Vuletich DA,Lesk AM. Structural divergence and distant relationships in proteins: evolution of the globins. Curr Opin Struct Biol 2005; 15: 290–301.
- 24 Sangar V,Blankenberg DJ,Altman N,Lesk AM. Quantitative sequence-function relationships in proteins based on gene ontology. BMC Bioinformatics 2007; 8: 294.
- 25 Shakhnovich BE,Dokholyan NV,DeLisi C,Shakhnovich EI. Functional fingerprints of folds: evidence for correlated structure-function evolution. J Mol Biol 2003; 326: 1–9.
- 26 Whisstock JC,Lesk AM. Prediction of protein function from protein sequence and structure. Q Rev Biophys 2003; 36: 307340.
- 27 Friedberg I. Automated protein function predictionthe genomic challenge. Brief Bioinf 2006; 7: 225–242.
- 28 Ponomarenko JV,Bourne PE,Shindyalov IN. Assigning new GO annotations to protein data bank sequences by combining structure and sequence homology. Proteins. 2005; 58: 855–865.
- 29 Wang K,Horst JA,Cheng G,Nickle DC,Samudrala1 R. Protein meta-functional signatures from combining sequence, structure, evolution, and amino acid property information. PLoS Comput Biol 2008; 4: e1000181.
- 30 Shakhnovich BE,Max Harvey J. Quantifying structure-function uncertainty: a graph theoretical exploration into the origins and limitations of protein annotation. J Mol Biol 2004; 337: 933–949.
- 31 Shakhnovich BE. Improving the precision of the structure-function relationship by considering phylogenetic context. PLoS Comput Biol 2005; 1: e9.
- 32 Holm L,Sander C. Protein structure comparison by alignment of distance matrices. J Mol Biol 1993; 233: 123–138.
- 33 Ortiz AR,Strauss C,Olmea O. MAMMOTH (Matching Molecular Models Obtained from Theory): an automated method for model comparison. Protein Sci 2002; 11: 2606–2621.
- 34 Zhang Y,Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins 2004; 57: 702–710.
- 35 Jones DT,Taylor WR,Thornton JM. The rapid generation of mutation data matrices from protein sequences. CABIOS 1992; 8: 275–282.
- 36
Nei M,Kumar S.
Molecular evolution and phylogenetics.
Oxford:
Oxford University Press;
2000.
10.1093/oso/9780195135848.001.0001 Google Scholar
- 37 Bastolla U,Porto M,Roman HE,Vendruscolo M. Statistical properties of neutral evolution. J Mol Evol 2003; 57 ( Suppl 1): S103–S119.
- 38 Cohen J. Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull 1968; 70: 213–220.
- 39 Gene Ontology Consortium. Gene Ontology: tool for the unification of biology. Nature Genet 2000; 25: 25–29.
- 40 Hunter S,Apweiler R,Attwood TK,Bairoch A,Bateman A,Binns D,Bork P,Das U,Daugherty L,Duquenne L,Finn RD,Gough J,Haft D,Hulo N,Kahn D,Kelly E,Laugraud A,Letunic I,Lonsdale D,Lopez R,Madera M,Maslen J,McAnulla C,McDowall J,Mistry J,Mitchell A,Mulder N,Natale D,Orengo C,Quinn AF,Selengut JD,Sigrist CJ,Thimma M,Thomas PD,Valentin F,Wilson D,Wu CH,Yeats C. InterPro: the integrative protein signature database. Nucleic Acids Res 2009; 37 (Database Issue): D211–D215.
- 41 Clegg JB,Gagnon J. Structure of the zeta chain of human embryonic hemoglobin. Proc Natl Acad Sci USA 1981; 78: 6076–6080.
- 42 Giardina B,Messana I,Scatena R,Castagnola M. The multiple functions of hemoglobin. Crit Rev Biochem Mol Biol 1995; 30: 165–196.
- 43 Rammal R,Toulouse G,Virasoro MA. Ultrametricity for physicists Rev Mod Phys 1986; 58: 765–788.
- 44 Feng L,Zhou S,Gu L,Gell DA,Mackay JP,Weiss MJ,Gow AJ,Shi Y. Structure of oxidized alpha-haemoglobin bound to AHSP reveals a protective mechanism for haem. Nature 2005; 435: 697–701.
- 45 Stenberg K,Clausen T,Lindqvist Y,Macheroux P. Involvement of Tyr24 and Trp108 in substrate binding and substrate specificity of glycolate oxidase. Eur J Biochem 1995; 228: 408–416.
- 46 Lustbader JW,Cirilli M,Lin C,Xu HW,Takuma K,Wang N,Caspersen C,Chen X,Pollak S,Chaney M,Trinchese F,Liu S,Gunn-Moore F,Lue LF,Walker DG,Kuppusamy P,Zewier ZL,Arancio O,Stern D,Yan SS,Wu H. ABAD directly links Abeta to mitochondrial toxicity in Alzheimer's disease. Science 2004; 304: 448–452.
- 47 Perozzo R,Kuo M,Sidhu AS,Valiyaveettil JT,Bittman R,Jacobs WRJr,Fidock DA,Sacchettini JC. Structural elucidation of the specificity of the antibacterial agent triclosan for malarial enoyl acyl carrier protein reductase. J Biol Chem 2002; 277: 13106–13114.
- 48 Scheidig AJ,Sanchez-Llorente A,Lautwein A,Pai EF,Corrie JE,Reid GP,Wittinghofer A,Goody RS. Crystallographic studies on p21(H-ras) using the synchrotron Laue method: improvement of crystal quality and monitoring of the GTPase reaction at different time points. Acta Cryst 1994; D50: 512–520.
- 49 Ding F,Dokholyan NV. Emergence of protein fold families through rational design. PLoS Comp Biol 2006; 2: e85.
- 50 David FP,Yip YL. SSMap: A new UniProt-PDB mapping resource for the curation of structural-related information in the UniProt/Swiss-Prot Knowledgebase. BMC Bioinformatics 2008; 9: 391.
- 51 Lupyan D,Leo-Macias A,Ortiz AR. A new progressive-iterative algoithm for multiple structure alignment. Bioinformatics 2005; 21: 3255–3263.
- 52 Wang JD,Du Z,Payattakool R,Yu PS,Chen CF. A new method to measure the semantic similarity of GO terms. Bioinformatics 2007; 23: 1274–1281.