We report a C-atom-based scoring function, named OPUS-CSF, for ranking protein structural models. Rather than using traditional Boltzmann formula, we built a scoring function (CSF score) based on the native distributions (derived from the entire PDB) of coordinate components of mainchain C (carbonyl) atoms on selected residues of peptide segments of 5, 7, 9, and 11 residues in length. In testing OPUS-CSF on decoy recognition, it maximally recognized 257 native structures out of 278 targets in 11 commonly used decoy sets, significantly outperforming other popular all-atom empirical potentials. The average correlation coefficient with TM-score was also comparable with those of other potentials. OPUS-CSF is a highly coarse-grained scoring function, which only requires input of partial mainchain information, and very fast. Thus, it is suitable for applications at early stage of structural building.

Introduction

A potential function plays a central role in predicting protein structures. Generally, there are two kinds of potential functions: physics-based potentials and knowledge-based potentials. Physics-based potentials typically are the all-atom molecular mechanics force-fields,1-5 such as CHARMM1,2 and AMBER.4 They also include coarse-grained potentials such as MARTINI,6 UNRES7, 8 and OPEP.9

The knowledge-based potentials are derived from statistical analysis of known structures and are widely used in structural prediction.10-41 They usually perform better than the physical potentials in structural prediction. In general, knowledge-based potentials can be constructed either at coarse-grained residue level17, 21-31 or at atomic level.32-41 Although coarse-grained potentials may not be rigorous, it helps to focus on essential features and excludes less important details, thus reduces computational cost.42, 43 The performance of coarse-grained potential is related to how one designs the coarse-graining scheme. For example, OPUS-Ca potential30 uses the positions of C^α atoms as input, calculates other atomic positions as pseudo-positions and significantly reduces the computing cost. Other applications of coarse-grained models using C^α positions are also reported in literature.44-55

In this work, unlike traditional empirical potential functions using Boltzmann formula, we built a scoring function based on the native distributions of coordinate components of mainchain C (carbonyl) atoms on a few selected residues of small peptide segments of 5, 7, 9, and 11 residues in length. A lookup table, termed as configurational native distribution (CND) lookup table, was first generated for native distributions of coordinate components by analyzing peptide segments in the entire Protein Data Bank (PDB). Then the scoring function, termed as CSF scoring function, was calculated for a particular test structure by comparing the information of its segments with the CND lookup table. The performance of OPUS-CSF was tested on 11 commonly used decoy sets, the results indicated that OPUS-CSF was able to identify significantly more native structures from their decoys than other empirical potentials. In terms of the correlation coefficients between CSF scores and TM-scores, they were comparable to those of popular all-atom empirical potentials. Most importantly, OPUS-CSF achieved such performance despite its highly coarse-grained nature. That indicates the advantages of OPUS-CSF in terms of its speed and also for its applicability in the early stage of structural modeling. This is vitally important for applications such as building structural models from intermediate resolution data from experimental techniques like cryogenic electro-microscopy (cryo-EM).

Results and Discussion

We compared the performance of OPUS-CSF on 11 commonly used decoy sets with that of popular all-atom potential functions. In Table 1, we listed the results of 5-residue segment case (OPUS-CSF5) and all-segment combined case (OPUS-CSF). For the 5-residue segment case, OPUS-CSF5 successfully recognized 244 out of 278 native structures from their decoys and had the average Z-score (–3.56) nearly identical to that of GOAP (–3.57). For combined segment case, OPUS-CSF performs even better and successfully recognized 257 out of 278 native structures from their decoys and had an average Z-score (–4.12) better than that of GOAP (–3.57). It is interesting that although OPUS-CSF is a highly coarse-grained scoring function, its performance is significantly better than other all-atom potentials.

Table 1. The results of OPUS-CSF5 (5-residue segment) and OPUS-CSF (combined segment length) on 11 decoys sets compared with different potentialsa

Decoy sets	Total # of targets	DFIRE	RWplus	dDFIRE	OPUS-PSP	GOAP	OPUS-CSF5	OPUS-CSF
4state_reduced	7	6 (–3.48)	6 (3.51)	7 (–4.15)	7 (–4.49)	7 (–4.38)	7 (–3.38)	7 (–3.31)
fisa	4	3 (–4.87)	3 (–4.79)	3 (–3.80)	3 (–4.24)	3 (–3.97)	2 (–2.31)	2 (–2.55)
fisa_casp3	5	4 (–4.80)	4 (–5.17)	4 (–4.83)	5 (–6.33)	5 (–5.27)	4 (–4.38)	4 (–6.72)
hg_structal	29	12 (–1.97)	12 (–1.74)	16 (–1.33)	18 (1.87)	22 (–2.73)	23 (–2.07)	23 (–2.06)
ig_structal	61	0 (0.92)	0 (1.11)	26 (–1.02)	20 (0.69)	47 (–1.62)	49 (–2.03)	56 (–2.14)
ig_structal_hires	20	0 (0.17)	0 (0.32)	16 (–2.05)	14 (–0.77)	18 (–2.35)	19 (–2.19)	20 (–2.08)
I–TASSER	56	49 (–4.02)	56 (–5.77)	48 (–5.03)	55 (–7.43)	45 (–5.36)	55 (–5.32)	56 (–6.39)
lattice_ssfit	8	8 (–9.44)	8 (–8.85)	8 (–10.12)	8 (–6.75)	8 (–8.38)	8 (–9.56)	8 (–11.79)
lmds	10	7 (–0.88)	7 (–1.03)	6 (–2.44)	8 (–5.63)	7 (–4.07)	8 (–5.47)	8 (–6.80)
MOULDER	20	19 (–2.97)	19 (–2.84)	18 (–2.74)	19 (–4.84)	19 (–3.58)	20 (–3.18)	20 (–3.16)
ROSETTA	58	20 (–1.82)	20 (–1.47)	12 (–0.83)	39 (–3.00)	45 (–3.70)	49 (–3.68)	53 (–4.53)
Total	278	128 (–1.94)	135 (–2.13)	164 (–2.52)	196 (–2.86)	226 (–3.57)	244 (–3.56)	257 (–4.12)

^a The results of other potentials come from the GOAP paper. The numbers of targets, with their native structures successfully recognized by various potentials, are listed in the table. The numbers in parentheses are the average Z-scores of the native structures. The larger the absolute value of Z-score, the better. Out of the total 278 targets in 11 decoy sets, OPUS-CSF5 (5-residue segment) recognized 244 and OPUS-CSF (combined segment length) recognizes 257 native structures from their decoys. The bold number in each row indicates the best one among all the potential functions for that particular decoy set (if the numbers of targets are the same, the bold face entries are those having the better Z-scores).

We also calculated the Pearson's correlation coefficients between CSF score and TM-score56 in all decoy sets. The results are shown in Table 2. OPUS-CSF has comparable average correlation coefficient with those of GOAP and OPUS-PSP despite the fact that OPUS-CSF is highly coarse-grained and the other two are all-atom potentials.

Table 2. Average Pearson correlation coefficients of CSF scores with TM-scoresa

Decoy sets	OPUS-PSP	GOAP	OPUS-CSF
4state_reduced	−0.589	–0.694	−0.667
fisa	−0.282	−0.347	–0.552
fisa_casp3	−0.095	−0.221	–0.333
hg_structal	−0.752	–0.825	−0.803
ig_structal	−0.779	−0.865	–0.882
ig_structal_hires	−0.832	−0.885	–0.901
I–TASSER	−0.284	–0.477	−0.452
lattice_ssfit	−0.051	−0.058	–0.151
lmds	−0.091	−0.146	–0.342
MOULDER	−0.802	–0.886	−0.863
ROSETTA	−0.343	–0.476	−0.391
Average	−0.521	−0.632	−0.624

^a The correlation coefficient of a decoy set is the average coefficient of all targets in that decoy set. In calculating the correlation coefficients, the native structure was excluded. OPUS-CSF has comparable average correlation coefficient with other two potentials. The bold number in each row indicates the best one among the three potential functions for that particular decoy set. For OPUS-CSF, only those results for the combined segment case are listed.

For further analysis of the method, we use 5-residue segment case as an example, Figure 1 shows the histogram of standard deviations of the coordinate components of mainchain C (carbonyl) atoms of the 1st and 5th residues in the CND lookup table. It is clear that the distribution peaks at a very small value indicating that the coordinate components are clustered in a narrow distribution, that is, the configurational distributions of the 5-residue peptide segments are narrow,57 which provides a foundation for the success of OPUS-CSF. The narrow configurational distribution of small peptide fragments is also seen in other studies.58 In addition, the average value of the standard deviation is 1.20 Å.

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

The histogram of standard deviations of the coordinate components in the CND lookup table for 5-residue segment case. The distribution peaks at a very small value of standard deviation indicating that the coordinate components of the 1st and 5th mainchain C (carbonyl) are clustered in a narrow distribution, that is, the configurational distributions of the 5-residue peptide segments are narrow. In addition, the average value of the standard deviation is 1.20 Å.

It needs to be mentioned that, in the implementation of OPUS-CSF, we assume that the smaller the CSF score, the more likely the structure to be native. This is an approximation because even a native structure may not usually have a zero CSF score. However, the narrow distributions of standard deviations of the coordinate components of mainchain C (carbonyl) atoms (Fig. 1) suggests small scores for the native structures. Figure 2 shows a population distribution of the CSF scores for 278 native structures in 11 decoy sets (per independent coordinate component). The average value of the native CSF scores is 0.84 and the standard deviation is 0.27. Thus, in native structures, the deviations of the coordinate components from their average values are less than one standard deviation of the coordinate component distribution in CND lookup table. The fluctuation of the native CSF scores is also very small.

Figure 3 shows the frequencies of sequence repeating in the CND lookup table in 5-residue case. In principle, the more times a sequence repeats in PDB, the better statistics one would have for that sequence in CND lookup table. In the 5-residue case, half of the sequences repeat >26 times in the distribution. The largest value of X-axis is 29,618 with one sequence. In constructing CND lookup table, there is always an issue between the sequence diversity and sequence repeating frequency in PDB.

We examined OPUS-CSF using different length of segments. As the length of segment increases, naturally the coverage decreases, and the ratio of the number of segments that appear more than five times to the total number of segments in PDB decreases (Table 3). On the other hand, if Coverage is defined as the ratio between the number of segments available in CND lookup table and the number of total segments of a test sequence, the average coverage of the 11 decoy sets (in total 278 targets) decreases as the length of segment increases. If a test sequence has <20% of its segments available in the CND lookup table, that is, its coverage is <20%, it is regarded as Unknown, then the number of unknowns increase as the lengths of segments increase. More details of OPUS-CSF on different segment lengths can be found in Supplemental Information.

Table 3. The result of OPUS-CSF built by different length of residue segmentsa

	Num_above5	Num_all	Num_above5/Num_all
5-residues	1766273	2350969	0.751
7-residues	3736778	9544858	0.391
9-residues	3713506	10262243	0.362
11-residues	3743204	10698802	0.350

^a Num_above5 is the number of sequence segments which occur at least five times in PDB. Num_all shows the total number of sequence segments in PDB. The ratio decreases as the length of segments increases.

The 5-residue case delivers the best performance in terms of decoy recognition (244 out 278 native recognition in Table 4). However, the Z-scores are better for longer-segment cases. This is probably because the longer segments preserve more sequence homology information.

Table 4. The performance of OPUS-CSF based on different lengths of residue segments on the 11 decoys setsa

	5-residues	7-residues	9-residues	11-residues
Success numbers	244 (278)	218 (278)	220 (278)	219 (278)
Z-scores	−3.56	−4.55	−4.62	−4.57
Average Coverage	0.971	0.749	0.712	0.683
Unknowns	0	41	45	46

^a Success numbers are the numbers of native structures that OPUS-CSF correctly recognized from the decoys. Numbers in parentheses (278) are the total number of native structures (or targets) in 11 decoy sets. The Z-scores are the calculated for the CSF scores of the native structures with respect to their decoys. Coverage means the ratio between the number of segments available in CND lookup table and the number of total segments of a target sequence. The table shows the average coverage among 278 targets in 11 decoy sets. Unknowns are the numbers of target sequences that have <20% of coverage. For these sequences, OPUS-CSF is not applicable. Note, 5-residue case does not have sequence classified as unknown, while 7-residue case, for example, has 41 out of 278 sequences not applicable for OPUS-CSF. The number of unknown increases slightly as the length of segment increases. Note, in the combined segment case, the longer segments may make no contribution to the CSF score if they are unknowns. Since the 5-residue segment case has no unknowns, it guarantees OPUS-CSF applicable to all target sequences even in rare ones that all longer segments are regarded as unknown.

For the 5-residue case, we also tested a scenario by constructing CND lookup table using four residues (1, 2, 4, and 5), instead of using two terminal residues (1, 5). The number of native recognition and Z-score are 226 and −3.60, while, in the case of (1, 5), they are 244 and −3.56 (as indicated in Table 4). This is very interesting as it indicates that using two terminal residues (1, 5) captures a better coarse graining level than using more residues (1, 2, 4, and 5).

OPUS-CSF has some obvious advantages. First, the CND lookup table is constructed directly from the entire PDB, and it contains the information of all allowed configurational information of the native segments (at least for the ones repeated more than five times in PDB). The results seem to indicate that it is better than Boltzmann formula based methods. Second, the speed of OPUS-CSF is very fast, especially for longer polypeptide chains. This is because the entire chain is scanned once and linearly, it only requires partial mainchain atom coordinates to calculate the CSF score for a structure. Unlike other potentials such as GOAP40 and OPUS-PSP,34 no inter-atomic distances need to be calculated. We want to emphasize that, in modeling protein structures, an empirical potential function or a scoring function, should be fast and accurate. In early stage of modeling, it is advantageous that the scoring function requires minimal amount of structural information. In this regard, OPUS-CSF seems to be a good choice.

Methods

Scanning through the polypeptide chain with a step size of one residue, we collected small peptide segments with sequence length of 5, 7, 9, and 11 residues and searched for their configurations in the entire PDB. Totally, we downloaded 130,054 PDB structures on June 7, 2017 via ftp://ftp.wwpdb.org/pub/pdb/data/structures/divided/pdb. The sequences that appeared less than five times in PDB were discarded. The number five was chosen empirically. Peptide segments with poorly resolved structures such as broken bonds were not included.

Here we use 5-residue segment case as an example to illustrate the details of the procedure. The ratio of segments that appear more than five times to all segments in PDB is 75.1%, which means we can utilize 75.1% of the information in the whole PDB using 5-residue segments (also see Table 3 in Results and Discussion).

A local molecular coordinate system was defined for every segment using the positions of three main-chain atoms in the middle residue. The origin was set at the C^α atom, the X-axis was defined along the line connecting C^α and C (carbonyl) atoms, Y-axis was in the C^α -C-O plane, parallel to component of C-O vector that was perpendicular to the X-axis, and the Z-axis was defined correspondingly (Fig. 4).

For a 5-residue segment with a specific sequence, we saved the mainchain C (carbonyl) coordinates of the 1st and 5th residue in the local coordinate system, denoted as $urn:x-wiley:09618368:media:pro3327:pro3327-math-0001$ and $urn:x-wiley:09618368:media:pro3327:pro3327-math-0002$ . And under our assumption, we treated coordinate components $urn:x-wiley:09618368:media:pro3327:pro3327-math-0003$ as six independent variables. By scanning through the entire PDB, we generated six independent distributions of these variables, called configurational native distributions (CNDs) of 5-residue segments. We then calculated the means and standard deviations of the distributions and they were kept as the CND lookup table.

For a test structure, we scanned through its sequence with 5-residue-segments. For each segment and its sequence, we looked for the Z-scores of the six independent variables in the CND lookup table. At the end, we added up all the absolute values of Z-scores of all variables for all segments, and it was called CSF score. We assume the structure with smallest CSF score has the largest likelihood to be the native structure.

The segments of varying lengths are denoted as 5(1, 3, 5), 7(2, 4, 6), 9(1, 3, 5, 7, 9) and 11(2, 4, 6, 8, 10). Here, in segments with the form of 5(1, 3, 5), for example, the first number 5 is the segment length, 1,5 in the parenthesis are the residues that we record C (carbonyl) atom positional distributions in local coordinate system, 3 is the residue on which the local coordinate system is defined. For 9(1, 3, 5, 7, 9) and 11(2, 4, 6, 8, 10), four atoms are used for recording mainchain C (carbonyl) positional distributions, thus totally 12 independent variables are used.

The CSF score can be calculated either based on one particular segment length or by combining all segment length together. In the case of combined segment length, final CSF score is a linear sum of all CSF scores of different segment length. No weighting function is introduced for the contribution of different segment lengths.

The 11 commonly used decoy sets we used to test OPUS-CSF are the same as those used in GOAP,40 including decoy sets of 4state_reduced,59 fisa,58 fisa_casp3.58 hg_structal, ig_structal and ig_structal_hires (R. Samudrala, E. Huang, and M. Levitt, unpublished). I-TASSER,39 lattice_ssfit,60, 61 lmds,62 MOULDER63 and ROSETTA.64

Accessibility of OPUS-CSF

The scoring function is freely available to the academic community.

Acknowledgments

The authors wish to thank Robert L. Jernigan for careful reading of the manuscript and numerous comments on how to improve it. J.M. thanks support from the National Institutes of Health (R01-GM067801, R01-GM116280), and the Welch Foundation (Q-1512). Q.W. thanks support from the National Institutes of Health (R01-AI067839, R01-GM116280), the Gillson-Longenbaugh Foundation, and The Welch Foundation (Q-1826).

Supporting Information

References

1 MacKerell AD, Bashford D, Bellott M, Dunbrack RL, Evanseck JD, Field MJ, Fischer S, Gao J, Guo H, Ha S, Joseph-McCarthy D, Kuchnir L, Kuczera K, Lau FT, Mattos C, Michnick S, Ngo T, Nguyen DT, Prodhom B, Reiher WE, Roux B, Schlenkrich M, Smith JC, Stote R, Straub J, Watanabe M, Wiórkiewicz-Kuczera J, Yin D, Karplus M (1998) All-atom empirical potential for molecular modeling and dynamics studies of proteins. J Phys Chem B 102: 3586–3616.
10.1021/jp973084f
CAS PubMed Web of Science® Google Scholar
2 Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M (1983) CHARMM: a program for macromolecular energy, minimization, and dynamics calculations. J Comput Chem 4: 187–217.
10.1002/jcc.540040211
CAS Web of Science® Google Scholar
3 Weiner SJ, Kollman PA, Nguyen DT, Case DA (1986) An all atom force field for simulations of proteins and nucleic acids. J Comput Chem 7: 230–252.
10.1002/jcc.540070216
CAS PubMed Web of Science® Google Scholar
4 Case DA, Cheatham TE, Darden T, Gohlke H, Luo R, Merz KM, Onufriev A, Simmerling C, Wang B, Woods RJ (2005) The Amber biomolecular simulation programs. J Comput Chem 26: 1668–1688.
10.1002/jcc.20290
CAS PubMed Web of Science® Google Scholar
5 Arnautova YA, Jagielska A, Scheraga HA (2006) A new force field (ECEPP-05) for peptides, proteins, and organic molecules. J Phys Chem B 110: 5025–5044.
10.1021/jp054994x
CAS PubMed Web of Science® Google Scholar
6 Marrink SJ, Risselada HJ, Yefimov S, Tieleman DP, De Vries AH (2007) The MARTINI force field: coarse grained model for biomolecular simulations. J Phys Chem B 111: 7812–7824.
10.1021/jp071097f
CAS PubMed Web of Science® Google Scholar
7 Liwo A, O?dziej S, Pincus MR, Wawak RJ, Rackovsky S, Scheraga HA (1997) A united-residue force field for off-lattice protein-structure simulations. I. Functional forms and parameters of long-range side-chain interaction potentials from protein crystal data. J Comput Chem 18: 849–873.
10.1002/(SICI)1096-987X(199705)18:7<849::AID-JCC1>3.0.CO;2-R
CAS Web of Science® Google Scholar
8 Liwo A, Pincus MR, Wawak RJ, Rackovsky S, O?dziej S, Scheraga HA (1997) A united-residue force field for off-lattice protein-structure simulations. II. Parameterization of short-range interactions and determination of weights of energy terms by Z-score optimization. J Comput Chem 18: 874–887.
10.1002/(SICI)1096-987X(199705)18:7<874::AID-JCC2>3.0.CO;2-O
CAS Web of Science® Google Scholar
9 Chebaro Y, Pasquali S, Derreumaux P (2012) The coarse-grained OPEP force field for non-amyloid and amyloid proteins. J Phys Chem B 116: 8741–8752.
10.1021/jp301665f
CAS PubMed Web of Science® Google Scholar
10 Skolnick J (2006) In quest of an empirical potential for protein structure prediction. Curr Opin Struct Biol 16: 166–171.
10.1016/j.sbi.2006.02.004
CAS PubMed Web of Science® Google Scholar
11 Sippl MJ (1995) Knowledge-based potentials for proteins. Curr Opin Struct Biol 5: 229–235.
10.1016/0959-440X(95)80081-6
CAS PubMed Web of Science® Google Scholar
12 Jernigan RL, Bahar I (1996) Structure-derived potentials and protein simulations. Curr Opin Struct Biol 6: 195–209.
10.1016/S0959-440X(96)80075-3
CAS PubMed Web of Science® Google Scholar
13 Moult J (1997) Comparison of database potentials and molecular mechanics force fields. Curr Opin Struct Biol 7: 194–199.
10.1016/S0959-440X(97)80025-5
CAS PubMed Web of Science® Google Scholar
14 Lazaridis T, Karplus M (2000) Effective energy functions for protein structure prediction. Curr Opin Struct Biol 10: 139–145.
10.1016/S0959-440X(00)00063-4
CAS PubMed Web of Science® Google Scholar
15 Gohlke H, Klebe G (2001) Statistical potentials and scoring functions applied to protein–ligand binding. Curr Opin Struct Biol 11: 231–235.
10.1016/S0959-440X(00)00195-0
CAS PubMed Web of Science® Google Scholar
16 Russ WP, Ranganathan R (2002) Knowledge-based potential functions in protein design. Curr Opin Struct Biol 12: 447–452.
10.1016/S0959-440X(02)00346-9
CAS PubMed Web of Science® Google Scholar
17 Buchete N, Straub J, Thirumalai D (2004) Development of novel statistical potentials for protein fold recognition. Curr Opin Struct Biol 14: 225–232.
10.1016/j.sbi.2004.03.002
CAS PubMed Web of Science® Google Scholar
18 Poole AM, Ranganathan R (2006) Knowledge-based potentials in protein design. Curr Opin Struct Biol 16: 508–513.
10.1016/j.sbi.2006.06.013
CAS PubMed Web of Science® Google Scholar
19 Zhou Y, Zhou H, Zhang C, Liu S (2006) What is a desirable statistical energy functions for proteins and how can it be obtained?. Cell Biochem Biophys 46: 165–174.
10.1385/CBB:46:2:165
CAS PubMed Web of Science® Google Scholar
20 Ma J (2009) Explicit orientation dependence in empirical potentials and its significance to side-chain modeling. Acc Chem Res 42: 1087–1096.
10.1021/ar900009e
CAS PubMed Web of Science® Google Scholar
21 Gilis D, Biot C, Buisine E, Dehouck Y, Rooman M (2006) Development of novel statistical potentials describing cation-π interactions in proteins and comparison with semiempirical and quantum chemistry approaches. J Chem Inform Model 46: 884–893.
10.1021/ci050395b
CAS PubMed Web of Science® Google Scholar
22 Hendlich M, Lackner P, Weitckus S, Floeckner H, Froschauer R, Gottsbacher K, Casari G, Sippl MJ (1990) Identification of native protein folds amongst a large number of incorrect models: the calculation of low energy conformations from potentials of mean force. J Mol Biol 216: 167–180.
10.1016/S0022-2836(05)80068-3
CAS PubMed Web of Science® Google Scholar
23 Hoppe C, Schomburg D (2005) Prediction of protein thermostability with a direction- and distance-dependent knowledge-based potential. Protein Sci 14: 2682–2692.
10.1110/ps.04940705
CAS PubMed Web of Science® Google Scholar
24 Jones DT, Taylor WR, Thornton JM (1992) A new approach to protein fold recognition. Nature 358: 86–89.
10.1038/358086a0
CAS PubMed Web of Science® Google Scholar
25 Koliński A, Bujnicki JM (2005) Generalized protein structure prediction based on combination of fold-recognition with de novo folding and evaluation of models. Proteins 61: 84–90.
10.1002/prot.20723
CAS PubMed Web of Science® Google Scholar
26 Miyazawa S, Jernigan RL (1985) Estimation of effective interresidue contact energies from protein crystal-structures: quasi-chemical approximation. Macromolecules 18: 534–552.
10.1021/ma00145a039
CAS Web of Science® Google Scholar
27 Sippl MJ (1990) Calculation of conformational ensembles from potentials of mena force: an approach to the knowledge-based prediction of local structures in globular proteins. J Mol Biol 213: 859–883.
10.1016/S0022-2836(05)80269-4
CAS PubMed Web of Science® Google Scholar
28 Skolnick J, Kolinski A, Ortiz A (2000) Derivation of protein-specific pair potentials based on weak sequence fragment similarity. Proteins 38: 3–16.
10.1002/(SICI)1097-0134(20000101)38:1<3::AID-PROT2>3.0.CO;2-S
CAS PubMed Google Scholar
29 Tobi D, Elber R (2000) Distance-dependent, pair potential for protein folding: Results from linear optimization. Proteins 41: 40–46.
10.1002/1097-0134(20001001)41:1<40::AID-PROT70>3.0.CO;2-U
CAS PubMed Web of Science® Google Scholar
30 Wu Y, Lu M, Chen M, Li J, Ma J (2007) OPUS-Ca: a knowledge-based potential function requiring only Cα positions. Protein Sci 16: 1449–1463.
10.1110/ps.072796107
CAS PubMed Web of Science® Google Scholar
31 Zhang Y, Kolinski A, Skolnick J (2003) TOUCHSTONE II: a new approach to ab initio protein structure prediction. Biophys J 85: 1145–1164.
10.1016/S0006-3495(03)74551-2
CAS PubMed Web of Science® Google Scholar
32 DeBolt SE, Skolnick J (1996) Evaluation of atomic level mean force potentials via inverse folding and inverse refinement of protein structures: atomic burial position and pairwise non-bonded interactions. Protein Eng 9: 637–655.
10.1093/protein/9.8.637
CAS PubMed Web of Science® Google Scholar
33 Lu H, Skolnick J (2001) A distance-dependent atomic knowledge-based potential for improved protein structure selection. Proteins 44: 223–232.
10.1002/prot.1087
CAS PubMed Web of Science® Google Scholar
34 Lu M, Dousis AD, Ma J (2008) OPUS-PSP: an orientation-dependent statistical all-atom potential derived from side-chain packing. J Mol Biol 376: 288–301.
10.1016/j.jmb.2007.11.033
CAS PubMed Web of Science® Google Scholar
35 Samudrala R, Moult J (1998) An all-atom distance-dependent conditional probability discriminatory function for protein structure prediction. J Mol Biol 275: 895–916.
10.1006/jmbi.1997.1479
CAS PubMed Web of Science® Google Scholar
36 Shen M, Sali A (2006) Statistical potential for assessment and prediction of protein structures. Protein Sci 15: 2507–2524.
10.1110/ps.062416606
CAS PubMed Web of Science® Google Scholar
37 Yang Y, Zhou Y (2008) Specific interactions for ab initio folding of protein terminal regions with secondary structures. Proteins 72: 793–803.
10.1002/prot.21968
CAS PubMed Web of Science® Google Scholar
38 Zhang C, Vasmatzis G, Cornette JL, DeLisi C (1997) Determination of atomic desolvation energies from the structures of crystallized proteins. J Mol Biol 267: 707–726.
10.1006/jmbi.1996.0859
CAS PubMed Web of Science® Google Scholar
39 Zhang J, Zhang Y (2010) A novel side-chain orientation dependent potential derived from random-walk reference state for protein fold selection and structure prediction. PLoS One 5:e15386.
10.1371/journal.pone.0015386
Web of Science® Google Scholar
40 Zhou H, Skolnick J (2011) GOAP: a generalized orientation-dependent, all-atom statistical potential for protein structure prediction. Biophys J 101: 2043–2052.
10.1016/j.bpj.2011.09.012
CAS PubMed Web of Science® Google Scholar
41 Zhou H, Zhou Y (2002) Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction. Protein Sci 11: 2714–2726.
10.1110/ps.0217002
CAS PubMed Web of Science® Google Scholar
42 Noid W (2013) Perspective: Coarse-grained models for biomolecular systems. J Chem Phys 139: 090901.
10.1063/1.4818908
CAS PubMed Web of Science® Google Scholar
43 Kmiecik S, Gront D, Kolinski M, Wieteska L, Dawid AE, Kolinski A (2016) Coarse-grained protein models and their applications. Chem Rev 116: 7898–7936.
10.1021/acs.chemrev.6b00163
CAS PubMed Web of Science® Google Scholar
44 Wu Y, Tian X, Lu M, Chen M, Wang Q, Ma J (2005) Folding of small helical proteins assisted by small-angle X-ray scattering profiles. Structure 13: 1587–1597.
10.1016/j.str.2005.07.023
CAS PubMed Web of Science® Google Scholar
45 Wu Y, Chen M, Lu M, Wang Q, Ma J (2005) Determining protein topology from skeletons of secondary structures. J Mol Biol 350: 571–586.
10.1016/j.jmb.2005.04.064
CAS PubMed Web of Science® Google Scholar
46 Maupetit J, Gautier R, Tufféry P (2006) SABBAC: online Structural Alphabet-based protein BackBone reconstruction from Alpha-Carbon trace. Nucleic Acids Res 34: W147–W151.
10.1093/nar/gkl289
CAS PubMed Web of Science® Google Scholar
47 Kong Y, Ma J (2003) A structural-informatics approach for mining β-sheets: locating sheets in intermediate-resolution density maps. J Mol Biol 332: 399–413.
10.1016/S0022-2836(03)00859-3
CAS PubMed Web of Science® Google Scholar
48 Kong Y, Zhang X, Baker TS, Ma J (2004) A structural-informatics approach for tracing β-sheets: Building pseudo-Cα traces for β-strands in intermediate-resolution density maps. J Mol Biol 339: 117–130.
10.1016/j.jmb.2004.03.038
CAS PubMed Web of Science® Google Scholar
49 Moore BL, Kelley LA, Barber J, Murray JW, MacDonald JT (2013) High–quality protein backbone reconstruction from alpha carbons using Gaussian mixture models. J Comput Chem 34: 1881–1889.
10.1002/jcc.23330
CAS PubMed Web of Science® Google Scholar
50 Reid LS, Thornton JM (1989) Rebuilding flavodoxin from Cα coordinates: a test study. Proteins 5: 170–182.
10.1002/prot.340050212
CAS PubMed Web of Science® Google Scholar
51 Rey A, Skolnick J (1992) Efficient algorithm for the reconstruction of a protein backbone from the α-carbon coordinates. J Comput Chem 13: 443–456.
10.1002/jcc.540130407
CAS Web of Science® Google Scholar
52 Liwo A, Wawak R, Scheraga H, Pincus M, Rackovsky S (1993) Calculation of protein backbone geometry from α-carbon coordinates based on peptide-group dipole alignment. Protein Sci 2: 1697–1714.
10.1002/pro.5560021015
CAS PubMed Web of Science® Google Scholar
53 Iwata Y, Kasuya A, Miyamoto S (2002) An efficient method for reconstructing protein backbones from α-carbon coordinates. J Mol Graph Model 21: 119–128.
10.1016/S1093-3263(02)00142-0
CAS PubMed Web of Science® Google Scholar
54 Correa PE (1990) The building of protein structures form α-carbon coordinates. Proteins 7: 366–377.
10.1002/prot.340070408
CAS PubMed Web of Science® Google Scholar
55 Payne PW (2008) Reconstruction of protein conformations from estimated positions of the Cα coordinates. Protein Sci 2: 315–324.
10.1002/pro.5560020303
Google Scholar
56 Zhang Y, Skolnick J (2004) Scoring function for automated assessment of protein structure template quality. Proteins 57: 702–710.
10.1002/prot.20264
CAS PubMed Web of Science® Google Scholar
57 Tang H-Y, Zhang Z-G (2007) Using C′ deviation to study structures of central amino acids in peptide fragments. Amino Acids 33: 689–693.
10.1007/s00726-006-0463-2
Google Scholar
58 Simons KT, Kooperberg C, Huang E, Baker D (1997) Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J Mol Biol 268: 209–225.
10.1006/jmbi.1997.0959
CAS PubMed Web of Science® Google Scholar
59 Park B, Levitt M (1996) Energy functions that discriminate X-ray and near-native folds from well-constructed decoys. J Mol Biol 258: 367–392.
10.1006/jmbi.1996.0256
CAS PubMed Web of Science® Google Scholar
60 Samudrala R, Xia Y, Levitt M, Huang E (1999) A combined approach for ab initio construction of low resolution protein tertiary structures from sequence. Pac Symp Biocomput 1999: 505–516.
CAS PubMed Google Scholar
61 Xia Y, Huang ES, Levitt M, Samudrala R (2000) Ab initio construction of protein tertiary structures using a hierarchical approach. J Mol Biol 300: 171–185.
10.1006/jmbi.2000.3835
CAS PubMed Web of Science® Google Scholar
62 Keasar C, Levitt M (2003) A novel approach to decoy set generation: designing a physical energy function having local minima with native structure characteristics. J Mol Biol 329: 159–174.
10.1016/S0022-2836(03)00323-1
CAS PubMed Web of Science® Google Scholar
63 John B, Sali A (2003) Comparative protein structure modeling by iterative alignment, model building and model assessment. Nucleic Acids Res 31: 3982–3992.
10.1093/nar/gkg460
CAS PubMed Web of Science® Google Scholar
64 Tsai J, Bonneau R, Morozov AV, Kuhlman B, Rohl CA, Baker D (2003) An improved protein decoy set for testing energy functions for protein structure prediction. Proteins 53: 76–87.
10.1002/prot.10454
CAS PubMed Web of Science® Google Scholar

Citing Literature

Volume27, Issue1

Special Issue on Tools for Protein Science

January 2018

Pages 286-292

OPUS-CSF: A C-atom-based scoring function for ranking protein structural models

Abstract

Introduction

Results and Discussion

Methods

Accessibility of OPUS-CSF

Acknowledgments

Supporting Information

References

Citing Literature

Figures

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

OPUS-CSF: A C-atom-based scoring function for ranking protein structural models

Abstract

Introduction

Results and Discussion

Methods

Accessibility of OPUS-CSF

Acknowledgments

Supporting Information

References

Citing Literature

Figures

References

Related

Information