Identification and application of the concepts important for accurate and reliable protein secondary structure prediction
Ross D. King
Biomolecular Modelling Laboratory, Imperial Cancer Research Fund, Lincoln's Inn Fields, London, WC2A 3PX, United Kingdom
Search for more papers by this authorCorresponding Author
Michael J.E. Sternberg
Biomolecular Modelling Laboratory, Imperial Cancer Research Fund, Lincoln's Inn Fields, London, WC2A 3PX, United Kingdom
Biomolecular Modelling Laboratory, Imperial Cancer Research Fund, Lincoln's Inn Fields, P.O. Box 123, London, WC2A 3PX, United Kingdom;Search for more papers by this authorRoss D. King
Biomolecular Modelling Laboratory, Imperial Cancer Research Fund, Lincoln's Inn Fields, London, WC2A 3PX, United Kingdom
Search for more papers by this authorCorresponding Author
Michael J.E. Sternberg
Biomolecular Modelling Laboratory, Imperial Cancer Research Fund, Lincoln's Inn Fields, London, WC2A 3PX, United Kingdom
Biomolecular Modelling Laboratory, Imperial Cancer Research Fund, Lincoln's Inn Fields, P.O. Box 123, London, WC2A 3PX, United Kingdom;Search for more papers by this authorAbstract
A protein secondary structure prediction method from multiply aligned homologous sequences is presented with an overall per residue three-state accuracy of 70.1%. There are two aims: to obtain high accuracy by identification of a set of concepts important for prediction followed by use of linear statistics; and to provide insight into the folding process. The important concepts in secondary structure prediction are identified as; residue conformational propensities, sequence edge effects, moments of hydrophobicity, position of insertions and deletions in aligned homologous sequence, moments of conservation, auto-correlation, residue ratios, secondary structure feedback effects, and filtering. Explicit use of edge effects, moments of conservation, and auto-correlation are new to this paper. The relative importance of the concepts used in prediction was analyzed by stepwise addition of information and examination of weights in the discrimination function. The simple and explicit structure of the prediction allows the method to be reimplemented easily. The accuracy of a prediction is predictable a priori. This permits evaluation of the utility of the prediction: 10% of the chains predicted were identified correctly as having a mean accuracy of >80%. Existing high-accuracy prediction methods are “black-box” predictors based on complex nonlinear statistics (e.g., neural networks in P.HD: Rost & Sander, 1993a). For medium- to short-length chains (≥90 residues and <170 residues), the prediction method is significantly more accurate (P < 0.01) than the PHD algorithm (probably the most commonly used algorithm). In combination with the PHD, an algorithm is formed that is significantly more accurate than either method, with an estimated overall three-state accuracy of 72.4%, the highest accuracy reported for any prediction method.
References
- Benner SA, Cohen MA, Gerloff D. 1992. Correct structure prediction. Nature 359: 781.
- Benner SA, Gerloff D. 1990. Patterns of divergence in homologous proteins as indicators of secondary and tertiary structure: A prediction of the structure of the catalytic domain of protein kinases. Adv Enz Reg 31: 121–181.
- Benner SA, Gerloff DL. 1993. Predicting the conformation of proteins. FEES Lett 325: 29–33.
- Biou V, Gibrat JF, Levin JM, Robson B, Garnier J. 1988. Secondary structure prediction: Combination of three different methods. Protein Eng 2: 185–191.
- Blaber M, Xue-jun Z, Matthews BW. 1993. Structural basis of amino acid α-helix propensity. Science 260: 1637–1640.
- Breiman L, Friedman JH, Olshen RA, Stone CJ. 1984. Classification and regression trees. Wadsworth: Belmont.
- Bryson JW, Betz SF, Lu HS, Suich DJ, Zhou HX, O'Neil KT, DeGrado WF. 1995. Protein design: A hierarchic approach. Science 270: 935–941.
- Chou PY, Fasman GD. 1974. Prediction of protein conformation. Biochemistry 13: 222–245.
- Cohen FE, Abarbanel RM, Kuntz ID, Fletterick RJ. 1983. Secondary structure-assignment for alpha/beta proteins by a combinatorial approach. Biochemistry 22: 4894–4904.
- Colloc'h N, Cohen FE. 1991. β-Breakers: An aperiodic secondary structure. J Mol Biol 221: 603–613.
- Colloc'h N, Etchebest C, Thoreau E, Henrissat B, Mornon JP. 1993. Comparison of three algorithms for the assignment of secondary structure in proteins the advantages of consensus assignment. Protein Eng 6: 377–382.
- Dowe LD, Oliver J, Dix T, Allison L, Wallace CS. 1993. A decision graph explanation of protein secondary structure prediction. In: TN Mudge, V Milutinovic, L Hunter, eds. Proceedings of the 26th Annual Hawaii International Conference on System Sciences. IEEE Computer Society Press, pp 669–678.
- Eisenberg D. 1984. Three-dimensional structure of membrane and surface proteins. Annu Rev Biochem 53: 595–623.
- Garnier J, Osguthorpe DJ, Robson B. 1978. Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J Mol Biol 120: 97–120.
- Geourjon C, Deleage G. 1994. SOPM: A self optimised prediction method for protein secondary structure prediction. Protein Eng 7: 157–164.
- Gibrat JF, Gamier J, Robson B. 1987. Further developments of protein secondary structure prediction using information theory. New parameters and consideration of residue pairs. J Mol Biol 198: 425–443.
- Horovitz A, Matthews JM, Fersht AR. 1992. α-Helix stability in proteins. II. Factors that influence stability at an internal position. J Mol Biol 227: 560–568.
- Jenny TF, Benner SA. 1994. Evaluating predictions of secondary structure in proteins. Biochem Biophys Res Commun 200: 149–155.
- King RD. 1996. Secondary structure prediction. In: MJE Sternberg, ed. Protein structure prediction: A practical approach. Oxford: Oxford University Press. Forthcoming.
- King RD, Feng C, Sutherland A. 1995. StatLog: Comparison of classification algorithms on large real-world problems. Applied Artificial Intelligence 9: 289–335.
- King RD, Sternberg MJE. 1990. Machine learning approach for the prediction of protein secondary structure. J Mol Biol 216: 441–457.
- Kneller DG, Cohen FE, Langridge R. 1990. Improvements in protein secondary structure prediction by an enhanced neural network. J Mol Biol 214: 171–182.
- Lim VI. 1974a. Structural principles of the globular organisation of protein chains. A stereochemical theory of globular protein secondary structure. J Mol Biol 80: 857–872.
- Lim VI. 1974b. Algorithms for prediction of alpha-helical and beta-structural regions in globular proteins. J Mol Biol 88: 873–894.
- Matthews BW. 1975. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta 405: 442–451.
- Mehta PK, Heringa J, Argos P. 1995. A simple and fast approach to prediction of protein secondary structure from multiply aligned sequences with accuracy above 70%. Protein Sci 4: 2517–2525.
- Michie D. 1986. The superarticulacy phenomenon in the context of software manufacture. Proc R Soc Lind (A) 405: 185–212.
- Michie D, Spiegelhalter DJ, Taylor CC. 1994. Machine learning, neural and statistical classification. London: Ellis Horwood.
- Muggleton S, King RD, Sternberg MJE. 1992. Protein secondary structure prediction using logic. Protein Eng 5: 647–657.
- Padmanabhan S, Marqusee S, Ridgeway T, Laue TM, Baldwin RL. 1990. Relative helix-forming tendencies of nonpolar amino acids. Nature 344: 268–270.
- Qian N, Sejnowski TJ. 1988. Predicting the secondary structure of globular proteins using neural network models. J Mol Biol 202: 865–884.
- Richardson JS, Richardson DC. 1988. Amino acid preferences for specific locations at the ends of alpha helices. Science 240: 1648–1652.
- Robson B. 1976. Conformational properties of amino acid residues in globular proteins. J Mol Biol 107: 327–356.
- Robson B, Suzuki E. 1976. Conformational properties of amino acid residues in globular proteins. J Mol Biol 107: 327–356.
- Rost B, Sander C. 1993a. Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol 232: 584–599.
- Rost R, Sander C. 1993b. Secondary structure prediction of all-helical proteins in two states. Protein Eng 8: 831–836.
- Rost B, Sander C, Schneider R. 1994. Redefining the goals of protein secondary structure prediction. J Mol Biol 235: 13–26.
- Russell BR, Barton GJ. 1993. The limits of protein secondary structure prediction accuracy from multiple sequence alignment. J Mol Biol 234: 951–957.
- Salamov AA, Solovyev VV. 1995. Prediction of protein secondary structure by combining nearest-neighbour algorithms and multiple sequence alignments. J Mol Biol 247: 11–15.
- Sander C, Schneider R. 1991. Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins Struct Funct Genet 9: 56–68.
-
Schulz GE,
Schirmer RH.
1979.
Principles of protein structure.
New York:
Springer-Verlag.
10.1007/978-1-4612-6137-7 Google Scholar
- Solovyev VV, Salamov AA. 1994. Predicting α-helix and β-strand segments of globular proteins. CABIOS 10: 661–669.
- Thornton JM, Sibanda BL. 1983. Amino and carboxy-terminus regions in globular proteins. J Mol Biol 167: 433–460.
- Wako H, Blundell TL. 1994. Use of amino acid environment-dependent substitution tables and conformation propensities in structure prediction from aligned sequence of homologous proteins. II. Secondary structures. J Mol Biol 238: 693–708.
- Weiss SM, Kulikowski CA. 1991. Computer systems that learn San Mateo: Morgan Kaufmann.
- White SH. 1992. Amino acid preferences in small proteins. J Mol Biol 227: 991–995.
- Williams RA, Chang A, Juretic D, Loughran S. 1987. Secondary structure predictions and medium range interactions. Biochim Biophys Acta 916: 200–204.
- Wodak SJ, Rooman MJ. 1993. Generating and testing protein folds. Curr Opin Struct Biol 3: 247–259.
- Yi T, Lander ES. 1993. Protein secondary structure prediction using nearest-neighbour methods. J Mol Biol 232: 1117–1129.
- Zhang X, Mesirov JP, Waltz DL. 1992. Hybrid system for predicting secondary structure prediction. J Mol Biol 225: 1049–1063.
- Zvelebil MJJM, Barton GJ, Taylor WR, Sternberg MJE. 1987. Prediction of protein secondary structure and active sites using the alignment of homologous sequences. J Mol Biol 195: 957–961.