With the rise of machine learning (ML), the modeling of chemical systems has reached a new era and has the potential to revolutionize how we understand and predict chemical reactions. Here, we probe the historic dependence on utilizing enantiomeric excess (ee) as a target variable and discuss the benefits of using relative Gibbs free activation energies (ΔΔG^≠), grounded firmly in transition-state theory, emphasizing practical benefits for chemists. This perspective is intended to discuss best practices that enhance modeling efforts especially for chemists with an experimental background in asymmetric catalysis that wish to explore modelling of their data. We outline the enhanced modeling performance using ΔΔG^≠, escaping physical limitations, addressing temperature effects, managing non-linear error propagation, adjusting for data distributions and how to deal with unphysical predictions,in order to streamline modeling for the practical chemist and provide simple guidelines to strong statistical tools. For this endeavor, we gathered ten datasets from the literature covering very different reaction types. We evaluated the datasets using fingerprint-, descriptor-, and graph neural network-based models. Our results highlight the distinction in performance among varying model complexities with respect to the target representation, emphasizing practical benefits for chemists.

1 Introduction

Enantioselective catalysis is a challenging yet essential aspect of synthesis, for example, of pharmaceutically active compounds, and represents a critical hurdle towards developing and marketing these products. The precision required in the process makes the quest for highly selective catalysts a formidable task. However, methods such as computational modeling offer promising avenues to streamline the process and expedite discovery and optimization cycles. Computational modeling, a sophisticated tool for unraveling complex catalytic mechanisms, acts as an innovative conduit for better understanding these phenomena, subsequently aiding the development of improved catalysts.

Although the incorporation of machine learning (ML) in catalyst discovery and optimization is a comparatively newer modeling trend, it should be noted that most of the underlying principles have been established over the last several decades. Both ligand-based and quantitative structure–activity relationship models have been used for over thirty years, often under the umbrella terms of chemometrics or cheminformatics, setting a solid foundation for the integration of AI in chemistry.¹ The “newness” or revival of ML can be attributed to the significant rise in processing power of devices like graphics processing units, access to more extensive and information-rich data sets and new approaches to molecular representation.² Together, these developments enhance the accuracy and predictive power made possible by ML.

However, as the merger of ML and enantioselective catalysis is still in its relative infancy, best practices for its successful application are yet to be firmly established. While preliminary guidelines exist for implementing ML in chemistry and other fields, careful validation and methodical clarity will be needed to ensure its reliable application in catalyst discovery and optimization. Scientists are actively contributing to this field, setting out to establish these best practices and expand the potential of this exciting interdisciplinary approach.^3-5 When modeling enantioselective reactions, it is crucial to consider how the target variable is represented, with options including ee, enantiomeric ratio (er), and ΔΔG^≠. Each of these can be derived from one another and represent different aspects of the same underlying data, as long as the tenets of transition state theory are valid. In this discussion, we advocate for the use of ΔΔG^≠ when aiming for models that not only provide chemically meaningful insights but are also robust and generalizable across a variety of conditions. It is important to emphasize that linear regression and other modeling techniques allow for a wide array of transformations on both the target variables and the predictors. This flexibility enables the exploration of various transformations, such as logarithmic, polynomial, sigmoid, and Box-Cox,⁶ depending on the specific analytical needs and data characteristics.

Both, ee and er, are experimental quantities used to describe the selectivity and efficiency of asymmetric catalysts in producing one enantiomer over the other in an enantioselective reaction. However, they have different origins, historical uses, and reasons for their application as ee has historically been used in asymmetric catalysis and has its roots in the early days of enantiomer separation and analysis.⁷ It is the difference between the mole fractions of the two enantiomers in the product mixture. The ee is typically expressed as a percentage ranging from 0 % (racemic mixture) to 100 % (single enantiomer). Enantiomeric excess was historically introduced as a simple and intuitive way to describe enantioselectivity because it is a quantity that is proportional to the optical purity, e.g., rotation of linear-polarized light of a sample relative to that of the pure enantiomer. Conversely, the er is defined as the ratio of the percentages of each enantiomer in a mixture,⁸ which is proportional to the rate constants for forming each enantiomer. Thus, using er emerged as an alternative to ee due to its direct measurability in high-performance liquid chromatography, particularly in kinetic resolutions and asymmetric catalysis, where the two enantiomers′ formation rate is more relevant.⁹

In many cases, catalyzed enantioselective reactions are characterized by an irreversible (i.e., stereo-defining) progression through a transition structure (derived from the concepts of transition-state theory, TST) and the product ratio only depends on the difference of the diastereomeric transition structure free energies. Thus, ΔΔG^≠ as the relative free activation energy difference of the enantio-determining pathways translates into the product‘s chiral purity. In contrast, experimental observables of chiral purity such as ee and er are indirect representations of the underlying cause that determines chiral purity, i.e., ΔΔG^≠. For regression modeling, targeting a quantity that directly measures the underlying cause of a process, such as ΔΔG^≠, may yield models that are more physically meaningful and insightful. In reality, other factors can influence the chiral purity of reaction products, such as side reactions that lead to product racemization or the formation of enantiomeric products by different mechanisms (such as a competition between enantiospecific reactions of enantiomerically enriched starting materials). An example of a catalyzed reaction that converts an achiral starting material to a chiral product is the Corey-Bakshi-Shibata (CBS) reduction (Scheme 1).¹⁰

Details are in the caption following the image — **Scheme 1**
Open in figure viewer PowerPoint

Exemplary potential free energy surface showing the enantioselective reduction of acetophenone to the corresponding chiral benzyl alcohol via CBS reduction. The selectivity is determined by the difference in free energy of the diastereomeric transition structures (TS_S and TS_R) ΔΔG^≠.¹¹ Relative free energies at the B3LYP−D3(BJ)/6–311+G(d,p)-SMD(THF)//B3LYP−D3(BJ)/6–311G(d,p) level of theory.^12-16

In the CBS reduction, a prochiral ketone is reduced with a chiral oxazaborolidine catalyst to yield a chiral alcohol. This reaction occurs through two diastereomeric transition structures. The difference in their free energy, represented as ΔΔG^≠, dictates the chiral purity of the resulting product, i.e., the enantioselectivity of the reaction. In the example shown in Scheme 1, the reduction of acetophenone, the computed ΔΔG^≠ is 3.1 kcal mol⁻¹, which converts to an ee of 98.9 % at 25 °C. This is in reasonable agreement with the experimentally observed value of 97 % ee. Thus, the picture of competing transition structures serves as a useful bridge between the experimental observables ee or er and mechanistic concepts that are expressed in relative free energies.

Utilizing ΔΔG^≠ values rather than ee values in molecular modeling can further be rationalized based on Linear Free Energy Relationships (LFERs)^17-21 and the Bell−Evans−Polanyi principle,^{22, 23} as these concepts provide an understanding of the thermodynamic and kinetic factors influencing enantioselective reactions. The Bell−Evans−Polanyi principle asserts that the difference in activation energies between two reactions may be proportional to the difference in their reaction enthalpies.²⁴ In the context of enantioselective processes, this principle implies that the difference in activation energies between the formation of two enantiomers (represented by ΔΔG^≠) is related to the difference in their reaction energies (to the catalyst-bound product as the individual enantiomers have identical free energies). By incorporating ΔΔG^≠ values into molecular modeling, a reasonably complete representation of thermodynamic and kinetic aspects can be achieved. Additionally, LFERs posit that the free reaction energy may correlate linearly with specific attributes of the reactants, be they steric or electronic. Historically, these steric and electronic factors were treated as separate entities in chemical interactions. However, contemporary understanding has advanced to integrate these two factors, leading to the emergence of the stereo-electronic effects concept.²⁵ This convergence represents a significant advance in our current understanding and analysis of chemical reactions. It also is physically more sound as all such effects originate solely from electron distributions.

This concludes our outline of the relationship between ee and ΔΔG^≠ with a proposal to use ΔΔG^≠ as the target variable for modeling, as it is advantageous for several reasons:

Physicality: ΔΔG^≠ possesses direct physical significance as it is the resulting quantity behind observed selectivities (see above), making it a pertinent choice for understanding the underlying processes and mechanisms (which would ideally be accompanied by transition state computations), while ee is more of a historic quantity.
Temperature: ΔΔG^≠ incorporates temperature as a variable, which has proven to be an essential factor affecting selectivity in most catalyzed enantioselective reactions.^{26, 27} Only by using ΔΔG^≠, the model can routinely capture temperature effects on the selectivity according to the underlying physical effects. We hypothesize that this results in more accurate predictions and comparabilities of stereoselectivities. Conversely, ee does not include its inherent temperature dependence, which may lead to less accurate models, because the temperature has to be provided as an extra input feature and additional context has to be learned by the model.
Limits: A key consideration when selecting between ee and ΔΔG^≠ as target variables is their respective value ranges. While ee is by definition constrained to the limits −100 to +100 (for denoting the excess of R and S enantiomers), this restriction does not apply to ΔΔG^≠ as shown in Figure 1

Figure 1
Open in figure viewer PowerPoint

Illustration of the non-linear relationship between ee and the change in ΔΔG^≠ at three temperatures. The green dashed lines represent the fundamental limits for ee, indicating that employing a model based on ee may yield implausible predictions if they occur over 100 %. In contrast, such restrictions are absent when utilizing ΔΔG^≠ as the modeling target. The marked part of the Figure emphasizes the approximately linear region at low selectivities (<50 % ee). This highlights that the non-linearity between ee and ΔΔG^≠ is especially noticeable at higher selectivities, which are more desirable.

. Although a model can in principle cover the entire range for both variables, predictions for ee from numerical models might extend beyond its meaningful range (if not explicitly constrained). Consequently, these predictions would have no real-world significance, as values below −100 or above +100 are not physically possible. In contrast, the absence of such limitations for ΔΔG^≠ allows the model to make predictions across a much wider range of values, enhancing generalizability and ensuring that predictions maintain relevance in real-world scenarios.
Distribution: As a consequence of the non-linear relationship between ee and ΔΔG^≠, the distribution of a data set changes, meaning that the way data points are spread or clustered can vary when converting between these domains (as illustrated in Figure 2

Figure 2
Open in figure viewer PowerPoint

Histograms showing DS3 (small skew, top) and DS7 (high skew, bottom) in the ee (blue) and ΔΔG^≠ (pink) domain. The purple region indicates where both (pink and blue) distributions overlap. Note that the a smaller skew is observed when looking at narrower selectivity ranges (top), compared to a wide-ranging dataset (bottom), where the skew gets larger to the non-linear transformation.

), typically resulting in more normally distributed data when expressed as ΔΔG^≠. Unbalanced data sets can lead to clustering effects and misleading patterns. For instance, the common goodness of fit metric R² (coefficient of determination) might be artificially high in strongly clustered data. A classic example of how fit metrics can be misleading for different data distributions is Anscombe's quartet.²⁸ Furthermore, metrics such as R² and RMSE are used to evaluate the quality of a model fit and the ability of a model to make predictions on unseen data, which we scaled by the all-mean prediction (the “null-model”). Both scores are affected by the transformation between ee and ΔΔG^≠ (see below). In practical terms, minor errors in the 95–99 % ee range are critical, while large errors in the 1–50 % ee range are less relevant when evaluating enantioselectivity. Data clustering can also result in non-representative training-testing splits in small data sets, as often encountered in enantioselective catalysis, unfavorably affecting the generality of test set metrics. Notably, the non-linearity of the ee-to-ΔΔG^≠ transformation only becomes pronounced at higher selectivities (Figure 1, bottom), i.e., for data<50 % ee the transformation makes little difference to the distribution.

In the following sections, we demonstrate the effect of each of these arguments using real-world data from the literature as well as from artificial data sets. While target transformations can have an important effect on model performance, we stress that different choices may be right depending on the context.^29-32 For example, one upside of using ee is that it is intuitive to experimentalists, facilitating communication of model outcomes. However, this can also be achieved by modeling on ΔΔG^≠ and then transforming predictions and metrics to the ee domain for more intuitive representation.³³

In addressing the nuances of chemical data analysis, particularly the decision to use ee vs. ΔΔG^≠ for modeling enantioselectivity, it is important to highlight the practical reasons for choosing ΔΔG^≠. For chemists without extensive statistical training, ΔΔG^≠ offers a straightforward and interpretable alternative. Unlike other transformations (e.g., log, Box-Cox),⁶ which require additional steps and expertise, ΔΔG^≠ directly correlates with the underlying physical processes, making it more intuitive and easier to apply. These transformations are pivotal in linear regression models, where they serve to homogenize variance, thereby enhancing the reliability of statistical inferences. Such practices are not unique to chemistry but are extensively applied in various domains of data analysis. For instance, Box and Cox′s transformation is a well-known method for stabilizing variance in data that follow a non-normal distribution.⁶ Similarly, in bioinformatics, variance stabilization is crucial for analyzing microarray data, a topic explored by Huber and co-workers.³⁴ By integrating these foundational concepts, our discussion on ee vs. ΔΔG^≠ not only aligns with chemical specificity but also resonates with established statistical methods, thereby enriching our contextual framework within statistical modeling practices.

2 Benchmarking Methods

We collected data from the literature to investigate various aspects of the target transformation on real-world results in enantioselective catalysis that have been used for data-driven modeling. Our focus is on publications that utilize molecular descriptor-based models to compare the influence of featurization. Furthermore, we aimed for a representative selection with different data structures (i.e., combinatorial screening or traditional linear optimization) across a range of data set sizes commonly encountered in modeling enantioselectivity (about 20–1000 data points) and featuring examples from various types of catalytic reactions (Table 1).

Table 1. Overview of data sets utilized in this work. DAP=doubly-axially chiral phosphoric acid, IDPi=imidodiphosphorimidate, TDG=transient directing groups, PyrOx=pyridine oxazoline, and CPA=chiral phosphoric acid.

Dataset	Research groups	Reaction	Samples	Available features
DS1⁴⁷	Sigman, Biscoe	Pd-phosphine catalyzed Alkyl-Suzuki coupling	24	Descriptors
DS2⁴⁸	Doyle	Ni-BiOx/BiIm, photo-catalyzed Cross-electrophile coupling	29	Descriptors
DS3⁴⁹	Sigman, Toste	DAP-catalyzed allenoate-Claisen rearrangement	37	Descriptors
DS4⁵⁰	Tsuji, Sidorov, Varnek, List	IDPi-catalyzed Hydroalkoxylation	80	SMILES, Descriptors
DS5⁵¹	Ackermann, Hong	Pd+TDG, electrocatalyzed oxidative Heck reaction	127	SMILES, Descriptors
DS6⁵²	Toste, Sigman	Triazole-PA-catalyzed cross-dehydrogenative coupling	159	SMILES, Descriptors
DS7⁵³	Sunoj	Asymmetric hydrogenation	371	SMILES, Descriptors
DS8⁵⁴	Sunoj	Pd-PyrOx-catalyzed relay-Heck reaction	398	Descriptors
DS9⁵⁵	Belyk, Sherer	Phase transfer-catalyzed aza-Michael addition	471	SMILES, Descriptors
DS10⁵⁶	Denmark	CPA-catalyzed nucleophilic thiol addition	1075	SMILES

We employed three general classes of ML approaches to investigate the impact of target transformation on model performance:

1) For descriptor-based models, molecular features were utilized unchanged where available in the original publications. In data sets DS1–DS9, various steric and electronic descriptors were available from DFT computations. In DS9 and DS4-THF, 2D topological descriptors were used.

2) Morgan fingerprints with a radius of 2—meaning they capture the chemical environment up to two bonds away for each atom—were generated using the RDKit from the SMILES representations and folded to 1024 dimensions for fingerprint-based models.³⁵ In data sets with several variable molecules (e.g., substrates and catalysts) present, the resulting fingerprints were added. Models for representations 1) and 2) were trained with various standard machine learning regressors as implemented in Scikit-learn³⁶ on 50 different random 80 : 20 train:test splits of the data to obtain consistent model scores. Hyperparameter optimization was performed once for each model-feature set combination using repeated k-fold (four folds, five repetitions) cross-validation in the training set on the first 80 : 20 random split. The following linear regressors were used: Linear regression, ridge, Lasso, LassoLars, elastic net. The following non-linear regressors were used: random forest, gradient boosting, extra trees, kernel ridge, Gaussian processes, and k-nearest neighbors.^37-46

3) Graph-based models were trained with molecular graphs that were generated from the provided SMILES representations, which were converted into molecular objects using the RDKit and subsequently constructed into molecular graphs with standard node and edge features (refer to Supporting Information for a comprehensive list). In instances with n SMILES, we generated a graph comprising n distinct, non-interconnected subgraphs embedded within the overall molecular graph. These graphs were then processed using a Graph Neural Network (GNN) to produce a molecular embedding after global pooling operation, which was fed into a feed-forward neural network (FFNN) (code is available free of charge, see below). For GNN and FFNN, we used Pytorch Geometric and Pytorch.^{57, 58} We conducted training for the model both with and without incorporating temperature as an extra input feature in the FFNN. This was done to account for the potential effects of temperature dependence, which was also of interest. We employed a Bayesian optimization approach via Optuna to optimize the hyperparameters of the GNN model.⁵⁹ As an optimizer we used Adam with the Mean Squared Error (MSE) as a training metric. To avoid overfitting, we employed the early stopping technique. While cross-validation is a widely used method for model evaluation, it may not be the most suitable choice for small datasets and GNNs due to the constraints posed by limited data, high computational cost, and potential bias in the molecular structure encoded in the molecular graph. Instead, assessing the performance using multiple random states can provide a more reliable estimate of the model's generalization ability while overcoming these limitations. This approach involves randomly splitting the data into training and test sets multiple times (i.e., 500 times) and calculating the performance metrics for each split. The average performance across all random states provides a more robust estimate of the model's generalization capability while mitigating the challenges associated with cross-validation.

We divided the model performance evaluation into four categories: descriptor-based (separated into linear and non-linear models), fingerprint-based, and GNN. We computed the mean absolute error (MAE) on each dataset for each model and variation (including or not including temperature). To fairly scrutinize the disparity in performance between ΔΔG^≠ and ee modeling, we transformed predictions from both into the er domain, which is mathematically possible without applying a capping threshold that could have led to potential distortions. To obtain comparable values across the data sets, we normalized the MAE between 0 and 1 by the MAE of an all-mean prediction.

3 Results

The transformation between ee and ΔΔG^≠ is non-linear and thus affects the distribution of data sets. We investigated the effect of this transformation on the data structure using the skew score and the Kolmogorov–Smirnov (KS) test. The skew score quantifies the skewness, that is, the degree of asymmetry in a distribution with a positive value indicating a right-skewed distribution, i.e., leaning towards lower values. A skewness of 0 indicates that the data are perfectly symmetric, and deviations from 0 indicate asymmetry in either direction of the tested datasets. The Kolmogorov–Smirnov (KS) score indicates the difference between two distributions, with 0 indicating identical distributions and 1 indicating entirely different distributions. Here, we test the experimental selectivities against a normal distribution, i.e., lower scores indicate distributions closer to normal. It should be noted that data from traditional “linear” reaction optimization and reaction scope tables tend to be biased towards higher selectivities because experimentalists focus on more selective reactions when pursuing a new method. Screening methods such as combinatorial evaluation of all substrate/catalyst pairs tend to generate data with a relatively higher portion of less selective or lower-yielding results, which is favorable for ML modeling.^{60, 61} Accordingly, nearly all data sets display a negative skew (towards higher values) when expressed as ee. Averaged over all ten sources, the absolute skew score decreases from 1.2 to 0.35 by transforming from ee to ΔΔG^≠, indicating a clear overall shift towards more symmetric data distributions in the free energy domain. Likewise, the KS decreases from 17 % to 11 % probability indicating that the data were not drawn from a normal distribution upon transformation of ee to ΔΔG^≠. The differences are most pronounced in more extensive data sets (>100 samples) of traditional “linear” optimization and scope results (DS7, DS8). In smaller data sets (<100 samples), the differences are less pronounced but also less significant due to the low sample size. Denmark's CPA catalysis data set (DS10) is the biggest screening-type data set, containing the complete combinatorial evaluation of substrate/catalyst pairs. Interestingly, while the absolute skewness increases and switches from negative to positive by transformation to ΔΔG^≠, the similarity to a normal distribution is still higher in the free energy domain (17 % vs. 8 %). Differences in data distribution are the smallest for data sets with lower overall selectivity (DS3) due to the approximate linearity between ee and ΔΔG^≠ in that range (Figure 1, bottom). A summary of all distribution metrics is shown in Table 2.

Table 2. Summary of skewing and KS-test scores in the ee and the ΔΔG^≠ domain. N=Number of samples, AVG=Average over all datasets.

Dataset	ee		ΔΔG^≠		\|skew (ΔΔG^≠)\| − \|skew (ee)\|	N
Dataset	skew	KS	skew	KS	\|skew (ΔΔG^≠)\| − \|skew (ee)\|	N
DS1	−0.81	0.22	0.29	0.16	−0.52	24
DS2	0.03	0.19	0.31	0.22	0.28	29
DS3	−0.15	0.11	0.40	0.11	0.25	37
DS4	0.18	0.12	0.78	0.12	0.60	80
DS5	−1.49	0.29	−0.54	0.17	−0.95	127
DS6	−0.46	0.08	0.39	0.07	−0.07	159
DS7	−3.24	0.23	−0.04	0.05	−3.20	371
DS8	−3.54	0.22	−0.39	0.09	−3.15	398
DS9	−1.05	0.14	−0.60	0.09	−0.45	471
DS10	−0.54	0.17	0.86	0.08	0.32	1075
AVG	−1.10	0.18	0.15	0.12	−0.69	277

A visual representation of the skew can be seen in Figure 2, in which a small skew (top, DS3) and a large skew (bottom, DS7) are shown between the ee and ΔΔG^≠ domains. In the case of a small skew, the distributions look similar and evenly-distributed, while the large skew shows a normally-distributed dataset in ΔΔG^≠ while being highly unbalanced in ee, as apparent from the large distribution density in the high ee range close to the maximum.

The detrimental effects of an unbalanced data set become clearly evident when examining a data set like DS7 (Figure 3). We simulate a model with an MAE of 5 % in the ee domain by adding random noise of that magnitude to the data (depicted in blue). Superficially, this score seems commendable, with a corresponding R²-score of 0.92. To the end-user, this might appear as a robust, “reliable” model. However, this is no longer the case when we transition to the more physically meaningful ΔΔG^≠ domain where the “model's” inadequacies become evident: The scatter plot reveals a far less convincing fit, with an R²-score of only 0.49 (shown in magenta). In stark contrast, the opposite approach, constructing an artificial model with 5 % noise on the ΔΔG^≠ values (R²-score of 0.98), translates seamlessly to the ee domain (R²-score of 0.99). This underscores the dangers of relying on seemingly “good” models without thoroughly probing their applicability across different domains. This is particularly relevant in the high ee regime where the largest errors occur, but which also is the most sought-after range for practical applications. We want to emphasize that any 5 % variance in % ee at the high end of the selectivity range would result in a significant impact in the ΔΔG^≠ domain. This highlights the risk of trusting a model trained on ee with a relatively low MAE. Depending on where the model‘s main error lies, a large error in ΔΔG^≠ at the high selectivity range cannot be prevented.

Figure 4 underscores the significance of maintaining predictions within the physically meaningful domain ΔΔG^≠ also from a practical perspective because the impact of an error in ee hinges on the specific selectivity range where the error transpires. An artificial data set is shown, and four specific “prediction” errors are highlighted (red bars): two with 0.2 kcal mol⁻¹ ΔΔG^≠ and two with 10 % ee, one each at a lower and a higher selectivity range. For constant ΔΔG^≠ errors, this translates to either an error of 16.1 % ee at low selectivity (<40 % ee) or 0.6 % ee at high selectivity (>98 % ee), thus resembling acceptable uncertainties from an experimental point of view, that change accordingly depending on the selectivity. Conversely, the constant ee error may correspond to 0.2 or 1.3 kcal mol⁻¹, i.e., a high error where higher precision would be needed and a low error where precision is less important.

Concerning the value limits, model predictions exceeding 100 % ee account for about 4 % of all predictions performed for the models in this work. The frequency of such predictions largely depends on the number of entries with very high ee values in the training data. The conversion to ΔΔG^≠ becomes arbitrarily large for values approaching 100 % and is undefined for ee values of 100 % and above. Thus, predictions in ee must be capped at a certain threshold before conversion (e.g., 99 % or 99.9 %) and selecting an appropriate capping threshold can significantly impact the model‘s metrics, potentially leading to misconceptions about its true performance. To investigate this effect, we looked at the model predictions for DS4 and varied the capping thresholds while calculating ΔΔG^≠ derived from the predicted ee, and subsequently computed MAE in kcal mol⁻¹. The results indicate an exponential growth in error with respect to the chosen threshold (Figure 5), illustrating the considerable influence of this seemingly innocuous parameter on the overall model evaluation. We suggest a capping threshold of 99.9 % when calculating ee back to ΔΔG^≠, based on the elbow method.⁶² The chosen capping threshold can also be justified by typical experimental error ranges of enantiomeric ratios.

We evaluate the impact of target transformation on general model performance by comparing the best of several standard ML regressors in four different categories, each modeled separately using either ee or ΔΔG^≠ as targets for each of 11 data sets. The goal was not to achieve the best possible models or to compare models with the results from the respective original publications, but to have a consistent approach that allows a fair comparison among the data sets and target transformations. Thus, we do not discuss absolute model metrics such as MAE or R², but rather relative MAEs after scaling all predictions by the mean baseline after transforming predictions to er (lower is better).

The performance evaluations shown in Figure 6 suggest that the discrepancy between modeling in the ee and ΔΔG^≠ domain is generally small but not negligible. For nearly all sections (model type and dataset), modeling ΔΔG^≠ appears to be superior to modeling ee; in the few cases where ee was superior to ΔΔG^≠, the differences were only marginal. Generally, the differences between both domains were very prominent for some datasets, e.g., DS1, DS5, and DS10. This renders ΔΔG^≠ the better choice of modeling domain compared to ee.

Notably, the improvement of modeling on ΔΔG^≠ was more pronounced for linear, descriptor-based models (24 % improved relative MAE on average) than for non-linear or fingerprint-based models (6 and 10 % improved relative MAE, each), which is in line with an interpretation of linear descriptor-based models as an evolution of LFER, further suggesting that the transformation to ΔΔG^≠ does lead to improved models. Note that perhaps similar improvements compared to ee could be achieved with other target transformations (see above), however, when working with small datasets optimizing the target transformation could lead of overfit models, as the target transformation has to be considered another hyperparameter of the model. Especially in real-world modeling, where practical chemists create or use the output of trained models, it would be wise to use a target transformation that is familiar to practical chemists and makes decision support via modeling easier to convey. Conversely, non-linear models may be able to learn the effect of the non-linear nature of ee given a physically-based molecular representation. Despite this, GNN models also showed an average relative MAE improvement of 20 % when using ΔΔG^≠ as targets.

As Figure 7 indicates, incorporating temperature as an additional modeling feature is generally not of major importance, but the differences are often subtle when excluding temperature, which is not statistically relevant. However, for GNNs using temperature as an extra input feature seems superior in every case, and for DS5 a clear preference to include the temperature is observed. This is bolstered by our quantitative evaluation, which indicates that excluding temperature from modelling is favorable for FP-based models by 0.5 %. Descriptor-based models are not influenced by inclusion or exclusion of temperature as an additional modelling parameter, and have an average difference of 0 %. Inclusion of temperature for GNN-based models is beneficial in modelling attempts by 5.9 %.

4 Conclusions

In organic chemistry, particularly in asymmetric organocatalysis, we face a reporting bias (publication bias), which refers to the selective publication and reporting of results that exhibit high yields, enantioselectivities, and reaction rates. This bias may arise from various factors, such as the desire to present new, impactful findings or the pressure to publish positive results to secure funding and advance careers. Consequently, what is perceived as less favorable or less exciting results may be less often reported in full or possibly not at all, leading to a skewed understanding of catalysts′ true scope and limitations in asymmetric reactions. This leads to an incomplete understanding: A biased representation of results in the literature can hinder the development of a comprehensive understanding of catalysts, mechanisms and their limitations, especially when using ML. This can lead to a distorted view and biased models. The lack of transparency and reporting negative or less favorable results can slow scientific progress, as such data are valuable to model training. In data modeling, this reporting bias leads to data markedly skewed to higher selectivities (or yields). This is unfortunate because an objective distribution would even be expected to lead to lower selectivities, as experience shows that achieving high selectivity is difficult and most “random” combinations of catalyst and substrate will result in no or low selectivity. However, for a model to truly learn the structure-selectivity relationships of a reaction, the reasons why specific catalysts result in low selectivity are equally as important as those leading to high selectivity.⁶³ Especially when thinking about decision support the statistical tool at hand should provide guidelines for the practical chemist. As shown in Figure 2, the resulting unbalanced data sets can be compensated when staying in the ΔΔG^≠ domain.

To truly identify a model of potential use, the domain has to be considered, as models resulting in a good fit in the ee domain do not necessarily remain useful when applying non-linear transformations like, to, e.g., the ΔΔG^≠ domain. Since the error transformation between both domains is non-linear, comparing models from ee domain with those from utilizing non-linear transformations, i.e., ΔΔG^≠, becomes challenging. It is thus advisable to directly model in the physically grounded ΔΔG^≠ domain, which aligns with the overall superiority in modeling performance. Using one domain and not relying on transformations between domains, such as non-linear transformations like ΔΔG^≠, also eliminates the need for a cutoff threshold, which we discuss in Figure 5. This simplification is particularly beneficial for chemists without extensive statistical training, making the modeling process more accessible and the results more interpretable. While the impact of including temperature as an additional input feature is subtle, we still advocate for its incorporation to enhance model accuracy.

The process of target transformations, as discussed in this work, is crucial not only in asymmetric catalysis but also in kinetics, specifically in reaction rates. Recently, Votsmeier et al. used the hyperbolic sine function to model chemical kinetics through transformation.⁶⁴ A compelling extension of our research could involve examining the variations in model explanations such as attention maps, saliency maps, integrated gradients, and layer-wise relevance propagation methods, all contingent on the specific modeling domain further improving trust in the reasoning of statistical tools and hence increasing impact with increased usage in the laboratory. We hypothesize that a model trained with physically sound principles would yield more reliable and practically beneficial explanations, which could, for example, be leveraged for catalyst optimization and reduces the chance for an overfitted model, by rendering the target transformation optimization obsolete.

Disclaimer

The opinions expressed in this publication are the view of the author(s) and do not necessarily reflect the opinions or views of Angewandte Chemie International Edition/Angewandte Chemie, the Publisher, the GDCh, or the affiliated editors.

Acknowledgments

This work was supported by the Deutsche Forschungs-gemeinschaft within the priority program “Utilization and Development of Machine Learning for Molecular Applications—Molecular Machine Learning” (SPP 2363, Schr 597/41-1 and GE 3064/2-1). M.R. thanks the Fonds der Chemischen Industrie for doctoral scholarship. We thank Dennis Gerbig (JLU Giessen) for fruitful discussions. Open Access funding enabled and organized by Projekt DEAL.

Conflict of Interests

The authors declare no conflict of interest.

Open Research

Data Availability Statement

The authors have cited additional references within the Supporting Information.^{30, 31} A more detailed description of the datasets, modeling details, and selected plots for specific datasets are included. The data and code is available free of charge at https://github.com/prs-group/Contrasting-Historical-and-Physical-Perspectives-in-Asymmetric-Catalysis.git.

Supporting Information

References

1W. L. Williams, L. Zeng, T. Gensch, M. S. Sigman, A. G. Doyle, E. V. Anslyn, ACS Cent. Sci. 2021, 7, 1622–1637.
10.1021/acscentsci.1c00535
CAS PubMed Web of Science® Google Scholar
2E. O. Pyzer-Knapp, T. Laino, in Machine Learning in Chemistry: Data-Driven Algorithms, Learning Systems, and Predictions, Vol. 1326, American Chemical Society, Washington, DC, 2019, pp. ix–x.
Google Scholar
3N. Artrith, K. T. Butler, F.-X. Coudert, S. Han, O. Isayev, A. Jain, A. Walsh, Nat. Chem. 2021, 13, 505–508.
10.1038/s41557-021-00716-z
CAS PubMed Web of Science® Google Scholar
4A. Bender, N. Schneider, M. Segler, W. Patrick Walters, O. Engkvist, T. Rodrigues, Nat. Chem. Rev. 2022, 6, 428–442.
10.1038/s41570-022-00391-9
PubMed Web of Science® Google Scholar
5M. P. Maloney, C. W. Coley, S. Genheden, N. Carson, P. Helquist, P.-O. Norrby, O. Wiest, Org. Lett. 2023, 25, 2945–2947.
10.1021/acs.orglett.3c01282
CAS PubMed Web of Science® Google Scholar
6G. E. P. Box, D. R. Cox, J. Roy. Stat. Soc. Ser. B. (Stat. Method.) 1964, 26, 211–252.
10.1111/j.2517-6161.1964.tb00553.x
PubMed Web of Science® Google Scholar
7R. E. Gawley, J. Org. Chem. 2006, 71, 2411–2416.
10.1021/jo052554w
CAS PubMed Web of Science® Google Scholar
8G. P. Moss, Pure Appl. Chem. 1996, 68, 2193–2222.
10.1351/pac199668122193
CAS Web of Science® Google Scholar
9M. Wernerova, T. Hudlicky, Synlett 2010, 2010, 2701–2707.
10.1055/s-0030-1259018
CAS Web of Science® Google Scholar
10E. J. Corey, R. K. Bakshi, S. Shibata, C. P. Chen, V. K. Singh, J. Am. Chem. Soc. 1987, 109, 7925–7926.
10.1021/ja00259a075
CAS Web of Science® Google Scholar
11C. Eschmann, L. Song, P. R. Schreiner, Angew. Chem. Int. Ed. 2021, 60, 4823–4832.
10.1002/anie.202012760
CAS PubMed Web of Science® Google Scholar
12S. Grimme, S. Ehrlich, L. Goerigk, J. Comput. Chem. 2011, 32, 1456–1465.
10.1002/jcc.21759
CAS PubMed Web of Science® Google Scholar
13A. D. Becke, J. Chem. Phys. 1993, 98, 1372–1377.
10.1063/1.464304
CAS Web of Science® Google Scholar
14A. V. Marenich, C. J. Cramer, D. G. Truhlar, J. Phys. Chem. B 2009, 113, 6378–6396.
10.1021/jp810292n
CAS PubMed Web of Science® Google Scholar
15A. D. McLean, G. S. Chandler, J. Chem. Phys. 1980, 72, 5639–5648.
10.1063/1.438980
CAS Web of Science® Google Scholar
16R. Krishnan, J. S. Binkley, R. Seeger, J. A. Pople, J. Chem. Phys. 1980, 72, 650–654.
10.1063/1.438955
CAS Web of Science® Google Scholar
17P. R. Wells, Chem. Rev. 1963, 63, 171–219.
10.1021/cr60222a005
CAS Web of Science® Google Scholar
18C. Hansch, T. Fujita, J. Am. Chem. Soc. 1964, 86, 1616–1626.
10.1021/ja01062a035
CAS PubMed Web of Science® Google Scholar
19C. Hansch, A. Leo, R. W. Taft, Chem. Rev. 1991, 91, 165–195.
10.1021/cr00002a004
CAS Web of Science® Google Scholar
20L. P. Hammett, J. Am. Chem. Soc. 1937, 59, 96–103.
10.1021/ja01280a022
CAS Web of Science® Google Scholar
21S. M. Free, J. W. Wilson, J. Med. Chem. 1964, 7, 395–399.
10.1021/jm00334a001
CAS PubMed Web of Science® Google Scholar
22M. G. Evans, M. Polanyi, Trans. Faraday Soc. 1935, 32, 1333–1360.
10.1039/tf9363201333
Google Scholar
23R. P. Bell, Proc. R. Soc. London Ser. A 1936, 154, 414–429.
10.1098/rspa.1936.0060
Google Scholar
24P. Muller, Pure Appl. Chem. 1994, 66, 1077–1184.
10.1351/pac199466051077
Web of Science® Google Scholar
25I. V. Alabugin, Stereoelectronic Effects: A Bridge Between Structure and Reactivity, John Wiley & Sons, Ltd., Chichester, UK, 2016.
Google Scholar
26H. Zhang, K. Shing Chan, J. Chem. Soc. Perkin Trans. 1 1999, 381–382.
10.1039/a808920e
Web of Science® Google Scholar
27A. Matusmoto, S. Fujiwara, Y. Hiyoshi, K. Zawatzky, A. A. Makarov, C. J. Welch, K. Soai, Org. Biomol. Chem. 2017, 15, 555–558.
10.1039/C6OB02415G
CAS PubMed Web of Science® Google Scholar
28F. J. Anscombe, Am. Stat. 1973, 27, 17–21.
10.1080/00031305.1973.10478966
Web of Science® Google Scholar
29B. Li, J. Tang, Q. Yang, X. Cui, S. Li, S. Chen, Q. Cao, W. Xue, N. Chen, F. Zhu, Sci. Rep. 2016, 6, 38881.
10.1038/srep38881
CAS PubMed Web of Science® Google Scholar
30H. S. Obaid, S. A. Dheyab, S. S. Sabry, in 2019 9th Annual Information Technology, Electromechanical Engineering and Microelectronics Conference (IEMECON), 2019, pp. 279–283.
Google Scholar
31D. Singh, B. Singh, Appl. Soft Comput. 2020, 97, 105524.
10.1016/j.asoc.2019.105524
Web of Science® Google Scholar
32L. Huang, J. Qin, Y. Zhou, F. Zhu, L. Liu, L. Shao, ITPAM 2023, 1–20.
PubMed Web of Science® Google Scholar
33B. T. Rose, J. C. Timmerman, S. A. Bawel, S. Chin, H. Zhang, S. E. Denmark, J. Am. Chem. Soc. 2022, 144, 22950–22964.
10.1021/jacs.2c08820
CAS PubMed Web of Science® Google Scholar
34W. Huber, A. von Heydebreck, H. Sültmann, A. Poustka, M. Vingron, Bioinformatics 2002, 18 Suppl 1, 96–104.
Google Scholar
35G. Landrum, RDKit: Open-Source Cheminformatics Software. https://www.rdkit.org.
Google Scholar
36F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, J. Mach. Learn. 2011, 12, 2825–2830.
Web of Science® Google Scholar
37A. E. Hoerl, R. W. Kennard, Technometrics 1970, 12, 55–67.
10.1080/00401706.1970.10488634
Web of Science® Google Scholar
38R. Tibshirani, J. Roy. Stat. Soc. Ser. B. (Stat. Method.) 1996, 58, 267–288.
10.1111/j.2517-6161.1996.tb02080.x
Web of Science® Google Scholar
39E. Bradley, H. Trevor, J. Iain, T. Robert, Ann. Stat. 2004, 32, 407–499.
Google Scholar
40H. Zou, T. Hastie, J. Roy. Stat. Soc. Ser. B. (Stat. Method.) 2005, 67, 301–320.
10.1111/j.1467-9868.2005.00503.x
Web of Science® Google Scholar
41L. Breiman, Mach. Learn. 2001, 45, 5–32.
10.1023/A:1010933404324
Web of Science® Google Scholar
42J. H. Friedman, Ann. Stat. 2001, 29, 1189–1232.
10.1214/aos/1013203451
Web of Science® Google Scholar
43P. Geurts, D. Ernst, L. Wehenkel, Mach. Learn. 2006, 63, 3–42.
10.1007/s10994-006-6226-1
Web of Science® Google Scholar
44V. Vovk, in Empirical Inference: Festschrift in Honor of Vladimir N. Vapnik (Eds.: B. Schölkopf, Z. Luo, V. Vovk), Springer Berlin Heidelberg, Berlin, Heidelberg, 2013, pp. 105–116.
10.1007/978-3-642-41136-6_11
Google Scholar
45C. Williams, C. Rasmussen, in NeurIPS, 1995.
Google Scholar
46A. Mucherino, P. J. Papajorgji, P. M. Pardalos, in Data Mining in Agriculture (Eds.: A. Mucherino, P. J. Papajorgji, P. M. Pardalos), Springer New York, New York, NY, 2009, pp. 83–106.
10.1007/978-0-387-88615-2_4
Google Scholar
47S. Zhao, T. Gensch, B. Murray, Z. L. Niemeyer, M. S. Sigman, M. R. Biscoe, Science 2018, 362, 670–674.
10.1126/science.aat2299
CAS PubMed Web of Science® Google Scholar
48S. H. Lau, M. A. Borden, T. J. Steiman, L. S. Wang, M. Parasram, A. G. Doyle, J. Am. Chem. Soc. 2021, 143, 15873–15881.
10.1021/jacs.1c08105
CAS PubMed Web of Science® Google Scholar
49J. Miró, T. Gensch, M. Ellwart, S.-J. Han, H.-H. Lin, M. S. Sigman, F. D. Toste, J. Am. Chem. Soc. 2020, 142, 6390–6399.
10.1021/jacs.0c01637
CAS PubMed Web of Science® Google Scholar
50N. Tsuji, P. Sidorov, C. Zhu, Y. Nagata, T. Gimadiev, A. Varnek, B. List, Angew. Chem. Int. Ed. 2023, 62, e202218659.
10.1002/anie.202218659
CAS PubMed Web of Science® Google Scholar
51L.-C. Xu, J. Frey, X. Hou, S.-Q. Zhang, Y.-Y. Li, J. C. A. Oliveira, S.-W. Li, L. Ackermann, X. Hong, Nat. Synth. 2023, 2, 321–330.
10.1038/s44160-022-00233-y
CAS Web of Science® Google Scholar
52A. Milo, A. J. Neel, F. D. Toste, M. S. Sigman, Science 2015, 347, 737–743.
10.1126/science.1261043
CAS PubMed Web of Science® Google Scholar
53S. Singh, M. Pareek, A. Changotra, S. Banerjee, B. Bhaskararao, P. Balamurugan, R. B. Sunoj, PNAS 2020, 117, 1339–1345.
10.1073/pnas.1916392117
CAS PubMed Web of Science® Google Scholar
54M. Das, P. Sharma, R. B. Sunoj, J. Chem. Phys. 2022, 156.
Web of Science® Google Scholar
55K. W. Lexa, K. M. Belyk, J. Henle, B. Xiang, R. P. Sheridan, S. E. Denmark, R. T. Ruck, E. C. Sherer, Org. Process Res. Dev. 2022, 26, 670–682.
10.1021/acs.oprd.1c00155
CAS Web of Science® Google Scholar
56A. F. Zahrt, J. J. Henle, B. T. Rose, Y. Wang, W. T. Darrow, S. E. Denmark, Science 2019, 363, eaau5631.
10.1126/science.aau5631
CAS PubMed Web of Science® Google Scholar
57A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steinerand, L. Fang, J. Bai, S. Chintala, Adv. Neural Inf. Process. Syst. 2019, 32.
Google Scholar
58M. Fey, J. E. Lenssen, in International Conference on Learning Representations, New Orleans, USA, 2019.
Google Scholar
59T. Akiba, S. Sano, T. Yanase, T. Ohta, M. Koyama, in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, Anchorage, AK, USA, 2019, pp. 2623–2631.
Google Scholar
60T. Gensch, S. R. Smith, T. J. Colacot, Y. N. Timsina, G. Xu, B. W. Glasspoole, M. S. Sigman, ACS Catal. 2022, 12, 7773–7780.
10.1021/acscatal.2c01970
CAS Web of Science® Google Scholar
61J. Schleinitz, M. Langevin, Y. Smail, B. Wehnert, L. Grimaud, R. Vuilleumier, J. Am. Chem. Soc. 2022, 144, 14722–14730.
10.1021/jacs.2c05302
CAS PubMed Web of Science® Google Scholar
62R. L. Thorndike, Psychometrika 1953, 18, 267–276.
10.1007/BF02289263
Web of Science® Google Scholar
63F. Strieth-Kalthoff, F. Sandfort, M. Kühnemund, F. R. Schäfer, H. Kuchen, F. Glorius, Angew. Chem. Int. Ed. 2022, 61, e202204647.
10.1002/anie.202204647
CAS PubMed Web of Science® Google Scholar
64F. A. Döppel, M. Votsmeier, React. Chem. Eng. 2023, 8, 2620–2631.
10.1039/D3RE00212H
Web of Science® Google Scholar

Volume136, Issue48

November 25, 2024

e202410308

This is the

German version

of Angewandte Chemie.

Note for articles published since 1962:

Do not cite this version alone.

Take me to the International Edition version with citable page numbers, DOI, and citation export.

We apologize for the inconvenience.

Contrasting Historical and Physical Perspectives in Asymmetric Catalysis: ΔΔG^≠ versus Enantiomeric Excess

Abstract

1 Introduction

2 Benchmarking Methods

3 Results

4 Conclusions

Disclaimer

Acknowledgments

Conflict of Interests

Open Research

Data Availability Statement

Supporting Information

References

Figures

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Contrasting Historical and Physical Perspectives in Asymmetric Catalysis: ΔΔG≠ versus Enantiomeric Excess

Abstract

1 Introduction

2 Benchmarking Methods

3 Results

4 Conclusions

Disclaimer

Acknowledgments

Conflict of Interests

Open Research

Data Availability Statement

Supporting Information

References

Figures

References

Related

Information

Contrasting Historical and Physical Perspectives in Asymmetric Catalysis: ΔΔG^≠ versus Enantiomeric Excess