An ANI-2 enabled open-source protocol to estimate ligand strain after docking
Abstract
In protein-ligand docking, the score assigned to a protein-ligand complex is approximate. Especially, the internal energy of the ligand is difficult to compute precisely using a molecular mechanics based force-field, introducing significant noise in the rank-ordering of ligands. We propose an open-source protocol (https://github.com/UnixJunkie/MMO), using two quantum mechanics (QM) single point energy calculations, plus a Monte Carlo (Monte Carlo) based ligand minimization procedure in-between, to estimate ligand strain after docking. The MC simulation uses the ANI-2x (QM approximating) force field and is performed in the dihedral space. On some protein targets, using strain filtering after docking allows to significantly improve hit rates. We performed a structure-based virtual screening campaign on nine protein targets from the Laboratoire d'Innovation Thérapeutique—PubChem assays dataset using Cambridge crystallographic data centre genetic optimization for ligand docking. Then, docked ligands were submitted to the strain estimation protocol and the impact on hit rate was analyzed. As for docking, the method does not always work. However, if sufficient active and inactive molecules are known for a given protein target, its efficiency can be evaluated.
1 INTRODUCTION
Several researchers have warned about the difficulty of precisely calculating the internal energy of a ligand using Molecular Mechanics (MM) based methods.1-3 This fact has important implications for protein-ligand docking scores.
Tirado-Rives and Jorgensen formally investigated the fact that when a ligand binds to a protein, it is typically not in the same lowest-energy conformation as the unbound ligand.1 The authors name “conformer focusing” the loss of conformational freedom in the ligand upon binding to a protein. The authors explain that misestimation or neglect of this free-energy change alone is sufficient to prevent proper rank-ordering of chemically diverse ligands in a protein-ligand docking screen: docking scores are noisy. With force-fields at the time (2006), the authors optimistically estimated that the error in calculating the free-energy change due to conformer focusing might be between five to 10 kcal/mol.
In the same vein, Winkler3 noticed that “The inability to properly account for entropy in the binding interactions is one of the contributors to the relatively poor performance of scoring methods even when the docking force fields can generate good poses.” So, while it is known that docking can predict the binding-mode of a known ligand (the “docking power”,4, 5 as assessed by redocking experiments), Winkler acknowledges that the virtual screening (VS) power4, 5 is lacking.
Peach and colleagues2 did a literature review and gathered that estimates of ligand strain energy in protein crystal structures vary wildly (between zero to 25 kcal/mol). Brueckner et al.6 proposed a formula to compute an upper bound of ligand strain upon binding, solely based on the ligand's number of heavy atoms (HA): .
To those remarks about the inaccuracy of computing a ligand's conformer internal energy and so the significant noise in protein-ligand docking scores, we must add that to the best of our knowledge all flexible-ligand rigid-protein docking software only perform a heuristic search in a high dimensional space. For example, for a ligand with seven rotatable bonds, there are 13 degrees of freedom (three rotational plus three translational plus the seven ligand internal degrees of freedom). So, while a heuristic search can find a good solution in a timely manner, it does not guarantee it has found the global minimum. And, this global minimum might be several kcal/mol away from the found solution. In our opinion, this is another significant source of noise in protein-ligand docking scores.
Recently, researchers have tried to exploit ligand strain after docking.7 Rather than trying to “fix” docking scores, the authors propose to use a qualitative approach flagging docked ligands as “strained” or “non-strained.” One way they propose to do this is to compute the total ligand strain (measured in an unspecified “Torsion Energy Unit” [TEU]). Their approach uses an adapted version of the knowledge-based torsion library from Guba and colleagues.8 It is very fast, taking on average 0.04 s for one ligand conformer on their test computer. With their approach, if the total strain of a compound is ≤7.0 TEU, the conformer is classified as unstrained. Conversely, with a total strain TEU the conformer is deemed strained. In their benchmark on 40 database of useful decoys enhanced (DUD-E)9 targets and using DOCK3.7,10 the authors notice that with a total strain threshold of 7.0 TEU, 30 targets see an increase in 7 (a measure of early enrichment in VS). Encouraged by those results and recent developments in neural network potentials (NNP),11-14 we decided to investigate if a quantitative approach to estimate ligand strain after docking was feasible and study its effect on post processing docking results from a commercial software (Cambridge crystallographic data centre [CCDC] genetic optimization for ligand docking [GOLD]) on the stringent Laboratoire d'Innovation Thérapeutique—PubChem assays (LIT-PCBA)15 dataset.
2 METHODS
The standard definition of ligand strain is the difference in internal energy between the lowest energy conformer of a ligand in vacuum and its docked conformer on a protein.16 Unfortunately, we cannot guarantee that we are able to compute the lowest energy conformer of a ligand (a global minimum). Instead, we have opted to run a Monte Carlo (MC) simulation in the dihedral space starting from the docked ligand conformer. Bond lengths and bond angles are considered rigid. Only single bonds out of rings and not connecting a terminal atom (a heavy atom with degree one on the molecular graph) are considered rotatable during this simulation. The idea is that if there is a local energy minimum near the docked conformer, an MC simulation should be able to reach it. After looking into a few books,17-19 we implemented Algorithm 1 in order to perform minimization of the docked ligand conformer in vacuum, with a force field (ANI-2x11-14) described as reaching quantum mechanics (QM) accuracy. Our implementation uses the ANI-2x11 model provided by the torchani20 Python package. On our test computer, ANI-2x allows the ligand internal energy to be calculated about 100 times per second for a typical drug-like molecule. Experiments were run using MC simulations of steps. ANI-2x enables running MC simulations at QM-accuracy, which would be computationally intractable using a standard QM method (cf. Figure 2). Using psi4, the average run-time of a single point ligand internal energy calculation is 633 s when using a single core of our computing node (the minimum run-time is 1 s, median 536 s and maximum min!). It would have been possible to run MC simulations using a MM force field, but such force fields are known to be of poor accuracy1-3 when describing a ligand's internal energy. At each MC step, one of the rotatable bonds is selected uniformly at random. The angular value of the corresponding dihedral angle is tweaked by adding a uniform random quantity , where is the currently allowed maximal rotation increment for this bond ( at start). Every 100 MC steps, the selected bond is instead tried to be set to a new value in . Acceptance or rejection of the last move (and hence of a new ligand conformer) is subject to the standard Metropolis criterion. If is the number of rotatable bonds of a ligand, every steps, each rotatable bond maximum rotation increment allowed () is updated using a hysteresis scheme taking into account the move acceptance ratio for this bond and the target acceptance ratio (Algorithm 1). In essence, the more likely moves are accepted for this bond, the more this bond will be allowed to rotate. For bonds which are more constrained, the opposite is true. These schemes are intended to make the conformational space exploration more efficient.17
QM single point energy calculations (Figure 1) were performed using psi421 driven by Mayachemtools;22 using the Becke, 3-parameter, Lee-Yang-Parr (B3LYP) method and the 6-31G** basis set (6-31+G** for sulfur containing compounds) (Tables 1, 2).

ALDH1 (5l2m) | ESR1-(5ufx) | FEN1 (5fv7) |
![]() |
![]() |
![]() |
GBA (2v3d) | KAT2A (5mlj) | MAPK1 (4zzn) |
![]() |
![]() |
![]() |
MTORC1 (4dri) | PKM2 (3gr4) | VDR (3a2j) |
![]() |
![]() |
![]() |
- Note: The protein target name is on top of each binding-site with the corresponding protein data bank (PDB) id between parenthesis. Figures made at www.rcsb.org using NGL23 viewer.
![]() |
ALGORITHM 1. LIGAND-ONLY (VACUUM) MONTE CARLO SIMULATION IN THE DIHEDRAL SPACE. THE FORCE FIELD IS ANI-2X. COMMENTS ARE PREFIXED WITH //
function
param // properly initialized random number generator
param // number of MC steps to perform
param // simulation temperature in Kelvin (for Metropolis criterion)
param // ligand with previously identified rotatable bonds
output // minimized ligand conformer and its energy
// initial maximum rotation allowed per rotatable bond is ±5°
// target acceptance ratio
// hysteresis low bound
// hysteresis high bound
for in do
// a single rotatable bond is affected
// the force field is ANI-2x
if then
end if
// record lowest energy conformer
if then
end if
if then
// update maximum rotation allowed per rotatable bond
end if
end for
return
In order to compare diverse molecules, the calculated strain is divided by the number of rotatable bonds in a molecule; giving an average dihedral strain. To compute hit rates among strained or unstrained molecules, docked ligands were first sorted by increasing strain (Figures 6 and 7).
3 DATASET
3.1 A subset of the LIT-PCBA dataset
Structure-based virtual screening (SBVS) experiments were conducted on nine protein targets from the LIT-PCBA dataset15 (Figures 1-3). LIT-PCBA was designed to benchmark ligand and SBVS methods. It was carefully curated from 149 PubChem26 assays. It contains 7844 confirmed actives and 407381 inactive molecules. In LIT-PCBA, for a given protein target a true positive is a molecule passing all LIT-PCBA filters15 for “true active” (only C, H, N, O, P, S and halogens allowed, dose-response curve with Hill slope , molecule not a known PubChem aggregator or luciferase inhibitor or autofluorescent). A true negative is a molecule which is not detected as a hit in the corresponding PubChem assay, but still has to pass all LIT-PCBA filters (organic compound, adequate property ranges, etc.); cf. the LIT-PCBA article15 for details. LIT-PCBA is a tough but realistic dataset for VS experiments. Our nine targets were selected based on the existence of at least 100 known active molecules, so that hit rates are calculated using a significant number. Then, inactive molecules were randomly selected to achieve a total of about 1000 molecules per protein, to keep the computational cost reasonable. Numbers vary slightly (Figure 3) because some molecules fail the ligand preparation protocol or QM calculation with psi421 or scoring with ANI-2x.11 One protein structure protein data bank (PDB) was selected for each protein (Figure 3), mostly based on the high resolution of the crystallographic data (LIT-PCBA proposes several PDB entries for each protein target).
3.2 Protein preparation and docking
Each protein was prepared using the Hermes graphical interface from CCDC GOLD. Starting from PDB files downloaded from rcsb.org (5l2m, 5ufx, 5fv7, 2v3d, 5mlj, 4zzn, 4dri, 3gr4 and 3a2j), hydrogen atoms were added, all water molecules and the cognate ligand were removed, then the protein was saved in MOL2 format. The cognate ligand's geometric center in combination with a radius of 10 Å were used as the binding-site definition (GOLD's defaults). All other parameters were left to their default values. Only the highest scoring pose per ligand was considered for further analysis (default ChemPLP scoring function).
3.3 Ligands preparation prior to docking
Starting from the SMILES provided by LIT-PCBA: OpenEye tautomers27 was used to generate one reasonable tautomer at pH 7.4 for each ligand. Then, OpenEye OMEGA28, 29 was used to generate one low energy conformer per protonated ligand. Finally, OpenEye molcharge30 was used to assign partial charges to each conformer using merck molecular force field 94.31
4 RESULTS
Experiments were run on computers running Linux CentOS-7.9; equipped with 2.6 GHz Intel Xeon CPUs (56 threads per node) and 256 GB of RAM.
4.1 Flexible-ligand/rigid-protein docking VS performance
In Figure 4, Receiver operating characteristic (ROC) area under the curve (AUC) curves corresponding to the VS performance of CCDC GOLD are shown. If one considers the virtual screen to have acceptable performance only if the two following conditions are met: (1) the AUC's 95% confidence interval32, 33 is higher than 0.5 (Figure 5) and (2) the early enrichment is better than random (i.e., the left part of the ROC curve, between 0% false positive rate [FPR] and 5%, 10% or 20% FPR, is higher than the diagonal line), then only three targets are amenable to a docking screen: ALDH1 (AUC = 0.63), GBA (AUC = 0.6) and MAPK1 (AUC = 0.59). On ESR1m, the ROC AUC is borderline (AUC = 0.57), but the early enrichment is random or slightly worse. On three proteins (FEN1, KAT2A and VDR), with respective AUC values (0.37, 0.38 and 0.32), the VS performance is significantly worse than random. On the remaining two protein targets MTORC1 and PKM2, the AUC values (0.51 and 0.54 respectively) are dangerously close to 0.5, indicating random VS performance. While such results might be surprising, they are in line with our own past experience34 when trying to dock the full LIT-PCBA dataset (only five out of 15 protein targets had an acceptable enrichment factor at 1% and only six had an AUC ≥ 0.6). Some other facts about protein-ligand docking are worrisome: in a recent study on the PDBScan2235 dataset, Flachsenberg and colleagues35 found that only of ligands were positioned at most 2 Å away from the crystal ligand in a large, fully automated redocking study. In terms of Enrichment Factor at 1% (EF1%), the LIT-PCBA authors used Surflex-Dock and observed an average on only six out of 21 protein targets.
4.2 Effect of ligand strain filtering on hit rates
In Figure 6, the impact of strain filtering on hit rates is shown. On two protein targets (FEN1 and VDR), the method could be used to improve hit rates among selected molecules regardless of docking scores. For FEN1, the background hit rate is about 20%. Using a strain threshold of maximum 0.15 kcal/mol (average ligand strain per rotatable bond) the hit rate can be improved to 30%. Using a strain threshold of maximum 0.1 kcal/mol, the hit rate can be doubled. On the VDR protein, a threshold of 0.2 kcal/mol allows to improve the background hit rate from 20% to 30%. Deciding on a threshold to double the hit rate on this protein does not seem advisable: the range is too narrow and the filter would reject too many molecules. One has to keep in mind that the smaller the strain threshold, the more docked molecules are going to be rejected by the filter. In Figure 6, the green curve indicates the fraction of the docked molecules which pass the filter. For the two proteins where the method looks useful (FEN1 and VDR), a 0.2 kcal/mol strain threshold would keep about 5% of the docked molecules. We do not see a usable strain threshold for the seven other protein targets (ALDH1, ESR1m, GBA, KAT2A, MAPK1, MTORC1, PKM2). Especially, for five proteins (ALDH1, GBA, MAPK1, MTORC1 and PKM2), there is not any advisable strain filtering threshold. Strain filtering would be detrimental to the hit rate among docked molecules for those proteins.
4.3 Effect of total ligand strain filtering on hit rates
To check our results, we also analyzed our dataset using the method of Gu and colleagues.7 1 When considering total ligand strain, our method has a positive but small Spearman correlation with the University of California, San Francisco (UCSF) method7 ( depending on the protein target). Looking at Figure 7, the method looks advisable on one protein target (KAT2A), and to some extent to two others (FEN1 and VDR). On KAT2A, a total ligand strain threshold of four TEU allows to bump the hit rate from about 20% to 30%. Such a threshold would keep about 25% of the docked molecules. A strain filter of two TEU allows to double the hit rate. On FEN1, a total ligand strain threshold of two to three TEU allows to bump the hit rate slightly, from 20% to about 25% to 30%. On VDR, a similar strain filter of two to three TEU bumps the hit rate from 20% to 30%. On the six other protein targets (ALDH1, ESR1m, GBA, MAPK1, MTORC1 and PKM2), we do not see an advisable total strain filtering threshold. Finally, the two different approaches agree on two protein targets (FEN1 and VDR) where strain filtering might be useful.
5 DISCUSSION
5.1 Advantages
While we would have loved to see the method work on a majority of protein targets, this is not the case but the same can be told for docking. Docking's VS performance is acceptable on ALDH1, GBA and MAPK1 while strain filtering works on FEN1 and VDR. Strain filtering can be used regardless of the performance of docking in terms of VS power. That is, docking can be trusted to optimize non-bonded protein ligand interactions (optimizing a ligand's conformer into the binding site), then strain filtering used to select molecules instead of relying on docking scores.
5.2 Drawbacks
The method does not seem applicable in general and needs to be benchmarked on a given protein target of interest. The proposed protocol (Figure 1) is not fail-safe: in about 4% of the 7895 ligands analyzed after docking, the MC simulation failed to minimize the docked conformer. That is, the calculated strain was negative. Of course, such points were excluded from the hit rate calculations and figures. We attribute this to some local disagreement between the ANI-2x NNP and an actual QM calculation. We also wonder about the transferability of the analysis. Probably, results for one protein structure should not be used to infer anything about other structures. Also, results obtained using one docking software and docking parameters are probably not advisable for another docking software or set of docking parameters. For example, Gu and colleagues, using DOCK3.710 on DUD-E9 recommend a seven TEU total strain filtering threshold. In our benchmark and using CCDC-GOLD, it seems that the optimal total strain filtering threshold lies between two to four TEU (if and when the method is advisable).
The filter is very selective (cf. the retained curves in Figure 6). With a too low strain filtering threshold, no molecule will pass the filter.
While our protocol is computationally tractable for a thousand molecules and one protein target using at least one massively multi-core computer, it does take a few hours to run all the QM single-point energy calculations (cf. the outliers in Figure 2).






Following a suggestion from an anonymous reviewer, we did rerun all calculations with geometry optimization at the QM level (bond lengths and bond angles only; rotatable bonds were fixed) for the docked ligands prior to running the rest of the pipeline. Indeed, this could make the estimated strain values less noisy. However, this resulted in an explosion of the required computing power (we estimate, by a factor of at least ). Also, only about 10% of the molecules for each protein target survived such treatment without any error. cf. Figure S1 for results. Using this more accurate protocol, three protein targets seem amenable to strain filtering: ALDH1, ESR1- and VDR. Interestingly, in all those three cases, a threshold filtering range of maximum 0.2 kcal/mol per rotatable bond seems adequate. Also, we note that VDR is the only protein target that seems amenable to strain filtering whatever protocol we were using. We do note that the crystallographic ligand in this protein is completely buried inside a protein cavity and almost completely desolvated (PDB: 3A2J, ligand of interest identifier: TEJ), unlike all other protein-ligand complexes that were studied.
We leave to courageous researchers investigating if taking into account solvation36 during the QM single-point energy calculation might make such strain filtering protocol more generally applicable.
6 CONCLUSION
In this study, we have performed a SBVS campaign (protein-ligand docking) on nine protein targets of the LIT-PCBA dataset using CCDC GOLD. An open-source protocol taking advantage of the ANI-2x NNP was developed then used to estimate ligand strain after docking. The effect of strain filtering on hit rate was analyzed. The protocol was also checked against a torsion library8 based method.7
Because of the computational cost of the method and the low number of protein targets where it does benefit, we recommend potential users to first consider simpler and faster methods like pharmacophore filtering,37 Protein-Ligand Interaction Fingerprints38, 39 or maybe some rescoring function.40
ACKNOWLEDGMENTS
This work was supported by AMED grant JP20NK0101111. KT acknowledges an academic license from OpenEye for tautomers, OMEGA and molcharge.
CONFLICT OF INTEREST STATEMENT
The authors declare no conflict of interest.
Endnote
Open Research
DATA AVAILABILITY STATEMENT
Our source code is released as open-source under the BSD license at https://github.com/UnixJunkie/MMO (accessed March 19, 2024). The dataset used in this study can be downloaded from https://zenodo.org/records/10846630 (accessed March 21, 2024).