Identification and application of the concepts important for accurate and reliable protein secondary structure prediction

Ross D. King

Biomolecular Modelling Laboratory, Imperial Cancer Research Fund, Lincoln's Inn Fields, London, WC2A 3PX, United Kingdom

Search for more papers by this author

Michael J.E. Sternberg,

Corresponding Author

Michael J.E. Sternberg

[email protected].

Biomolecular Modelling Laboratory, Imperial Cancer Research Fund, Lincoln's Inn Fields, London, WC2A 3PX, United Kingdom

Biomolecular Modelling Laboratory, Imperial Cancer Research Fund, Lincoln's Inn Fields, P.O. Box 123, London, WC2A 3PX, United Kingdom;Search for more papers by this author

Ross D. King,

Ross D. King

Biomolecular Modelling Laboratory, Imperial Cancer Research Fund, Lincoln's Inn Fields, London, WC2A 3PX, United Kingdom

Search for more papers by this author

Michael J.E. Sternberg,

Corresponding Author

Michael J.E. Sternberg

[email protected].

Biomolecular Modelling Laboratory, Imperial Cancer Research Fund, Lincoln's Inn Fields, London, WC2A 3PX, United Kingdom

Biomolecular Modelling Laboratory, Imperial Cancer Research Fund, Lincoln's Inn Fields, P.O. Box 123, London, WC2A 3PX, United Kingdom;Search for more papers by this author

First published: November 1996

https://doi.org/10.1002/pro.5560051116

Citations: 343

About

PDF

Tools

Share a link

Email
Wechat
Bluesky

Abstract

A protein secondary structure prediction method from multiply aligned homologous sequences is presented with an overall per residue three-state accuracy of 70.1%. There are two aims: to obtain high accuracy by identification of a set of concepts important for prediction followed by use of linear statistics; and to provide insight into the folding process. The important concepts in secondary structure prediction are identified as; residue conformational propensities, sequence edge effects, moments of hydrophobicity, position of insertions and deletions in aligned homologous sequence, moments of conservation, auto-correlation, residue ratios, secondary structure feedback effects, and filtering. Explicit use of edge effects, moments of conservation, and auto-correlation are new to this paper. The relative importance of the concepts used in prediction was analyzed by stepwise addition of information and examination of weights in the discrimination function. The simple and explicit structure of the prediction allows the method to be reimplemented easily. The accuracy of a prediction is predictable a priori. This permits evaluation of the utility of the prediction: 10% of the chains predicted were identified correctly as having a mean accuracy of >80%. Existing high-accuracy prediction methods are “black-box” predictors based on complex nonlinear statistics (e.g., neural networks in P.HD: Rost & Sander, 1993a). For medium- to short-length chains (≥90 residues and <170 residues), the prediction method is significantly more accurate (P < 0.01) than the PHD algorithm (probably the most commonly used algorithm). In combination with the PHD, an algorithm is formed that is significantly more accurate than either method, with an estimated overall three-state accuracy of 72.4%, the highest accuracy reported for any prediction method.

References

Benner SA, Cohen MA, Gerloff D. 1992. Correct structure prediction. Nature 359: 781.
10.1038/359781a0
CAS PubMed Web of Science® Google Scholar
Benner SA, Gerloff D. 1990. Patterns of divergence in homologous proteins as indicators of secondary and tertiary structure: A prediction of the structure of the catalytic domain of protein kinases. Adv Enz Reg 31: 121–181.
10.1016/0065-2571(91)90012-B
CAS Web of Science® Google Scholar
Benner SA, Gerloff DL. 1993. Predicting the conformation of proteins. FEES Lett 325: 29–33.
10.1016/0014-5793(93)81408-R
CAS PubMed Web of Science® Google Scholar
Biou V, Gibrat JF, Levin JM, Robson B, Garnier J. 1988. Secondary structure prediction: Combination of three different methods. Protein Eng 2: 185–191.
10.1093/protein/2.3.185
CAS PubMed Web of Science® Google Scholar
Blaber M, Xue-jun Z, Matthews BW. 1993. Structural basis of amino acid α-helix propensity. Science 260: 1637–1640.
10.1126/science.8503008
CAS PubMed Web of Science® Google Scholar
Breiman L, Friedman JH, Olshen RA, Stone CJ. 1984. Classification and regression trees. Wadsworth: Belmont.
Google Scholar
Bryson JW, Betz SF, Lu HS, Suich DJ, Zhou HX, O'Neil KT, DeGrado WF. 1995. Protein design: A hierarchic approach. Science 270: 935–941.
10.1126/science.270.5238.935
CAS PubMed Web of Science® Google Scholar
Chou PY, Fasman GD. 1974. Prediction of protein conformation. Biochemistry 13: 222–245.
10.1021/bi00699a002
CAS PubMed Web of Science® Google Scholar
Cohen FE, Abarbanel RM, Kuntz ID, Fletterick RJ. 1983. Secondary structure-assignment for alpha/beta proteins by a combinatorial approach. Biochemistry 22: 4894–4904.
10.1021/bi00290a005
CAS PubMed Web of Science® Google Scholar
Colloc'h N, Cohen FE. 1991. β-Breakers: An aperiodic secondary structure. J Mol Biol 221: 603–613.
10.1016/0022-2836(91)80075-6
CAS PubMed Web of Science® Google Scholar
Colloc'h N, Etchebest C, Thoreau E, Henrissat B, Mornon JP. 1993. Comparison of three algorithms for the assignment of secondary structure in proteins the advantages of consensus assignment. Protein Eng 6: 377–382.
10.1093/protein/6.4.377
CAS PubMed Web of Science® Google Scholar
Dowe LD, Oliver J, Dix T, Allison L, Wallace CS. 1993. A decision graph explanation of protein secondary structure prediction. In: TN Mudge, V Milutinovic, L Hunter, eds. Proceedings of the 26th Annual Hawaii International Conference on System Sciences. IEEE Computer Society Press, pp 669–678.
Google Scholar
Eisenberg D. 1984. Three-dimensional structure of membrane and surface proteins. Annu Rev Biochem 53: 595–623.
10.1146/annurev.bi.53.070184.003115
CAS PubMed Web of Science® Google Scholar
Garnier J, Osguthorpe DJ, Robson B. 1978. Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J Mol Biol 120: 97–120.
10.1016/0022-2836(78)90297-8
CAS PubMed Web of Science® Google Scholar
Geourjon C, Deleage G. 1994. SOPM: A self optimised prediction method for protein secondary structure prediction. Protein Eng 7: 157–164.
10.1093/protein/7.2.157
CAS PubMed Web of Science® Google Scholar
Gibrat JF, Gamier J, Robson B. 1987. Further developments of protein secondary structure prediction using information theory. New parameters and consideration of residue pairs. J Mol Biol 198: 425–443.
10.1016/0022-2836(87)90292-0
CAS PubMed Web of Science® Google Scholar
Horovitz A, Matthews JM, Fersht AR. 1992. α-Helix stability in proteins. II. Factors that influence stability at an internal position. J Mol Biol 227: 560–568.
10.1016/0022-2836(92)90907-2
CAS PubMed Web of Science® Google Scholar
Jenny TF, Benner SA. 1994. Evaluating predictions of secondary structure in proteins. Biochem Biophys Res Commun 200: 149–155.
10.1006/bbrc.1994.1427
CAS PubMed Web of Science® Google Scholar
King RD. 1996. Secondary structure prediction. In: MJE Sternberg, ed. Protein structure prediction: A practical approach. Oxford: Oxford University Press. Forthcoming.
Google Scholar
King RD, Feng C, Sutherland A. 1995. StatLog: Comparison of classification algorithms on large real-world problems. Applied Artificial Intelligence 9: 289–335.
10.1080/08839519508945477
Web of Science® Google Scholar
King RD, Sternberg MJE. 1990. Machine learning approach for the prediction of protein secondary structure. J Mol Biol 216: 441–457.
10.1016/S0022-2836(05)80333-X
CAS PubMed Web of Science® Google Scholar
Kneller DG, Cohen FE, Langridge R. 1990. Improvements in protein secondary structure prediction by an enhanced neural network. J Mol Biol 214: 171–182.
10.1016/0022-2836(90)90154-E
CAS PubMed Web of Science® Google Scholar
Lim VI. 1974a. Structural principles of the globular organisation of protein chains. A stereochemical theory of globular protein secondary structure. J Mol Biol 80: 857–872.
10.1016/0022-2836(74)90404-5
PubMed Web of Science® Google Scholar
Lim VI. 1974b. Algorithms for prediction of alpha-helical and beta-structural regions in globular proteins. J Mol Biol 88: 873–894.
10.1016/0022-2836(74)90405-7
CAS PubMed Web of Science® Google Scholar
Matthews BW. 1975. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta 405: 442–451.
10.1016/0005-2795(75)90109-9
CAS PubMed Web of Science® Google Scholar
Mehta PK, Heringa J, Argos P. 1995. A simple and fast approach to prediction of protein secondary structure from multiply aligned sequences with accuracy above 70%. Protein Sci 4: 2517–2525.
10.1002/pro.5560041208
CAS PubMed Web of Science® Google Scholar
Michie D. 1986. The superarticulacy phenomenon in the context of software manufacture. Proc R Soc Lind (A) 405: 185–212.
10.1098/rspa.1986.0049
Web of Science® Google Scholar
Michie D, Spiegelhalter DJ, Taylor CC. 1994. Machine learning, neural and statistical classification. London: Ellis Horwood.
Google Scholar
Muggleton S, King RD, Sternberg MJE. 1992. Protein secondary structure prediction using logic. Protein Eng 5: 647–657.
10.1093/protein/5.7.647
CAS PubMed Web of Science® Google Scholar
Padmanabhan S, Marqusee S, Ridgeway T, Laue TM, Baldwin RL. 1990. Relative helix-forming tendencies of nonpolar amino acids. Nature 344: 268–270.
10.1038/344268a0
CAS PubMed Web of Science® Google Scholar
Qian N, Sejnowski TJ. 1988. Predicting the secondary structure of globular proteins using neural network models. J Mol Biol 202: 865–884.
10.1016/0022-2836(88)90564-5
CAS PubMed Web of Science® Google Scholar
Richardson JS, Richardson DC. 1988. Amino acid preferences for specific locations at the ends of alpha helices. Science 240: 1648–1652.
10.1126/science.3381086
CAS PubMed Web of Science® Google Scholar
Robson B. 1976. Conformational properties of amino acid residues in globular proteins. J Mol Biol 107: 327–356.
10.1016/S0022-2836(76)80008-3
CAS PubMed Web of Science® Google Scholar
Robson B, Suzuki E. 1976. Conformational properties of amino acid residues in globular proteins. J Mol Biol 107: 327–356.
10.1016/S0022-2836(76)80008-3
CAS PubMed Web of Science® Google Scholar
Rost B, Sander C. 1993a. Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol 232: 584–599.
10.1006/jmbi.1993.1413
CAS PubMed Web of Science® Google Scholar
Rost R, Sander C. 1993b. Secondary structure prediction of all-helical proteins in two states. Protein Eng 8: 831–836.
10.1093/protein/6.8.831
Web of Science® Google Scholar
Rost B, Sander C, Schneider R. 1994. Redefining the goals of protein secondary structure prediction. J Mol Biol 235: 13–26.
10.1016/S0022-2836(05)80007-5
CAS PubMed Web of Science® Google Scholar
Russell BR, Barton GJ. 1993. The limits of protein secondary structure prediction accuracy from multiple sequence alignment. J Mol Biol 234: 951–957.
10.1006/jmbi.1993.1649
CAS PubMed Web of Science® Google Scholar
Salamov AA, Solovyev VV. 1995. Prediction of protein secondary structure by combining nearest-neighbour algorithms and multiple sequence alignments. J Mol Biol 247: 11–15.
10.1006/jmbi.1994.0116
CAS PubMed Web of Science® Google Scholar
Sander C, Schneider R. 1991. Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins Struct Funct Genet 9: 56–68.
10.1002/prot.340090107
CAS PubMed Web of Science® Google Scholar
Schulz GE, Schirmer RH. 1979. Principles of protein structure. New York: Springer-Verlag.
10.1007/978-1-4612-6137-7
Google Scholar
Solovyev VV, Salamov AA. 1994. Predicting α-helix and β-strand segments of globular proteins. CABIOS 10: 661–669.
CAS PubMed Web of Science® Google Scholar
Thornton JM, Sibanda BL. 1983. Amino and carboxy-terminus regions in globular proteins. J Mol Biol 167: 433–460.
10.1016/S0022-2836(83)80344-1
Web of Science® Google Scholar
Wako H, Blundell TL. 1994. Use of amino acid environment-dependent substitution tables and conformation propensities in structure prediction from aligned sequence of homologous proteins. II. Secondary structures. J Mol Biol 238: 693–708.
10.1006/jmbi.1994.1330
CAS PubMed Web of Science® Google Scholar
Weiss SM, Kulikowski CA. 1991. Computer systems that learn San Mateo: Morgan Kaufmann.
Google Scholar
White SH. 1992. Amino acid preferences in small proteins. J Mol Biol 227: 991–995.
10.1016/0022-2836(92)90515-L
CAS PubMed Web of Science® Google Scholar
Williams RA, Chang A, Juretic D, Loughran S. 1987. Secondary structure predictions and medium range interactions. Biochim Biophys Acta 916: 200–204.
10.1016/0167-4838(87)90109-9
CAS PubMed Web of Science® Google Scholar
Wodak SJ, Rooman MJ. 1993. Generating and testing protein folds. Curr Opin Struct Biol 3: 247–259.
10.1016/S0959-440X(05)80160-5
CAS Web of Science® Google Scholar
Yi T, Lander ES. 1993. Protein secondary structure prediction using nearest-neighbour methods. J Mol Biol 232: 1117–1129.
10.1006/jmbi.1993.1464
CAS PubMed Web of Science® Google Scholar
Zhang X, Mesirov JP, Waltz DL. 1992. Hybrid system for predicting secondary structure prediction. J Mol Biol 225: 1049–1063.
10.1016/0022-2836(92)90104-R
CAS PubMed Web of Science® Google Scholar
Zvelebil MJJM, Barton GJ, Taylor WR, Sternberg MJE. 1987. Prediction of protein secondary structure and active sites using the alignment of homologous sequences. J Mol Biol 195: 957–961.
10.1016/0022-2836(87)90501-8
CAS PubMed Web of Science® Google Scholar

Citing Literature

Volume5, Issue11

November 1996

Pages 2298-2310

Identification and application of the concepts important for accurate and reliable protein secondary structure prediction

Abstract

References

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Identification and application of the concepts important for accurate and reliable protein secondary structure prediction

Abstract

References

Citing Literature

References

Related

Information