Volume 46, Issue 1 e27504
SHORT COMMUNICATION
Open Access

Efficient acceleration of the convergence of the minimum free energy path via a path-planning generated initial guess

Yi Sun

Corresponding Author

Yi Sun

Department of Chemistry, Chicago Center for Theoretical Chemistry, James Franck Institute, and Institute for Biophysical Dynamics, The University of Chicago, Chicago, Illinois, USA

Correspondence

Yi Sun, Department of Chemistry, Chicago Center for Theoretical Chemistry, James Franck Institute, and Institute for Biophysical Dynamics, The University of Chicago, Chicago, IL, USA.

Email: [email protected]

Search for more papers by this author
First published: 18 September 2024
Citations: 1

Abstract

We demonstrate that combining a shifted clustering algorithm with a fast-marching-based algorithm can generate accurate approximations of the minimum energy path (MEP) given a free energy landscape (FEL). Using this approximation as the initial guess for the MEP, followed by further refinement with the string method (referred to as the fast marching tree (FMT)-string combined approach), significantly reduces the number of iterations required for MEP convergence. This approach saves substantial time compared to using linear interpolation (LI) for the initial guess. Our method offers a viable solution for obtaining an effective initial guess of the MEP when an approximate or converged FEL is available. This work highlights the potential of applying FMT-based approaches to extract the MEP in chemical reactions.

1 INTRODUCTION

Analyzing the most probable reaction pathway from reactants to products is a central task in theoretical chemistry. This involves locating transition states (TS) and calculating barrier heights by determining the energy difference between the TS and the reactants to assess the reaction's feasibility.1-7 However, locating the TS is often more challenging than finding the energy minima of stable reactants and products. A common approach involves performing single-point TS optimizations based on an initial TS guess, followed by intrinsic reaction coordinate (IRC) calculations using the converged structure to determine the transition pathway.8-11 Unfortunately, single-point calculations may fail to converge due to the influence of low-frequency modes on the Hessian matrix's quality.

To address this issue, collective variables (CVs) and molecular dynamics (MD) simulations can be used to evaluate the free energy landscape (FEL), locate the TS, and obtain a minimum energy path (MEP). This approach, which relies solely on gradient (force) calculations, is unaffected by low-frequency harmonic modes when evaluating MEPs. Given that reactive processes generally have activation barrier heights greater than k B T, enhanced sampling methods such as umbrella sampling,12-14 metadynamics (MetaD),15-17 replica exchange,18-21 and integrated temperature sampling (ITS)22-24 are employed to overcome activation barriers by biasing the potential energy surface (PES).

There are two distinct methods to obtain a MEP. The first involves guessing an initial string of coordinates and generating structures corresponding to each point on the string. These structures are then used to perform (MD) runs, where the resulting trajectories provide information for modifying the string. Examples of methods in this category include the fly string method,25, 26 finite temperature string (FTS),19, 27, 28 and swarm of trajectories.29-31

The second method involves first obtaining a converged (FEL), assuming the simulation time is sufficiently long to sample most configurations. The free energy F ( q ) corresponding to each combination of (CVs) q can then be expressed as:
p 0 ( q ) = d q δ [ q q ( q ) ] p 0 ( q ) = δ [ q q ( q ) ] , (1)
F ( q ) = 1 β log p 0 ( q ) V ( q ) , (2)
where
p 0 ( q ) = e β U ( q ) d q e β U ( q ) , (3)
U ( q ) is the potential energy at q, β = 1 k B T , k B is the Boltzmann constant, and V ( q ) is the bias potential at a specific position q. An initial guess, typically an interpolation from the starting point to the end point, is then optimized using methods such as the original,32 simplified string method,33 or nudged elastic band (NEB).34-37

In this study, we focus on providing a good initial guess when an approximate or converged FEL is available. To achieve this, we present an efficient approach for obtaining an approximate MEP by utilizing fast-marching-based path planning algorithms.

2 THEORY AND METHODOLOGY

The fast marching method is a numerical technique for solving the Eikonal equation:
| u ( x ) | = 1 f ( x ) for x Ω , (4)
u ( x ) = 0 for x Ω , (5)
where f ( x ) is the speed function, u ( x ) is the time function, and Ω is the boundary of the surface Ω. As the gradient of the time decreases with the increase in speed, u ( x ) can be treated as the shortest time to reach Ω from a point x on the surface. In the context of finding the MEP, if a starting point and an endpoint are specified, one seeks the path that requires the least amount of “time” from start to end, with “speed” being inversely proportional to the energy, U ( q ).
Several studies have attempted to find the MEP by directly solving the Eikonal equation.38-42 The method presented in this work, however, utilizes an approach known as the fast marching tree (FMT).43 This algorithm was originally developed for robotics path planning,44, 45 searching for the shortest path between two points on a 2D surface with obstacles. A detailed description of the FMT implementation can be found in the Reference 43. In summary, the 2D FMT algorithm can be broken down into six key steps:
  1. Initialization: The first step is the stochastic sampling of the points in the free space, where a number n, specified by the user, will be randomly selected for path generation. The initial set Q is created, which includes the initial state x init and the n sampled points. The edge set E is initialized as empty, and the sets W and H are set as Q \ { x init } and { x init }, respectively.
  2. Neighbor identification: The next step is to generate all the points N z that have a distance smaller than r n from the current point z, which initially is the starting point x init .
  3. Path exploration: In the third step, for all the points x in N z that are also in W:

    1. Identify the neighbors N x of x that are within the radius r n .

    2. Find the set Y near of points in N x that are also in H.

    3. Determine y min as the point in Y near that minimizes the cost to come to x.

  4. Edge addition: For each x in N z :

    1. If the path from y min to x is collision-free (the path does not cover the obstacle region), add the edge ( y min , x ) to the tree.

    2. Move x from W to a temporary set H new .

  5. Tree update: After all potential connections for the current z are explored:

    1. Update H as ( H H new ) \ { z }.

    2. If H is empty, return failure as no further expansion is possible.

    3. Otherwise, set z to be the point in H with the minimum cost-to-come.

  6. Termination: Repeat steps 2–5 until z reaches the goal region X goal . Once the goal is reached, return the path from x init to z.

The benefit of FMT is that it always returns a path where the ‘cost’ is minimized given a search radius r and converges asymptotically to its global minimum with the increase in the search radius r, similar to the scaled hypersphere method.46-49 Additionally, FMT is considered more efficient than FMM due to sampling. Here, we describe how the original 2D FMT algorithm can be modified for path planning to find the initial guess of MEP when there are two CVs.

The first modification is that the map X in FMT consists of three-number lists, each containing q 1 , q 2 , F ( q ), which represent the values of q 1 , q 2 , and the corresponding free energy value F ( q ) instead of x and y in a typical path planning problem. The second modification is in the collision check process, where a “collision” is confirmed if the straight line connecting two sets of q 1 , q 2 passes through a pair of q 1 , q 2 whose energy is infinite or undefined. Additionally, the search radius, r, is defined as the set of points that have the distance Δ q 1 2 + Δ q 2 2 + Δ F ( q ) 2 less than r from a specific point in X.

We then present the pipeline for obtaining an MEP from an FEL without any prior guess. The first step is to implement the shifted k-means algorithm to locate the positions of the energy minima. Two inputs are required: the shift m and the number of reactants n. The shift m discards every point with a free energy F ( q ) larger than m, so that only the most stable points are clustered with a weight of F ( q ) n , where n is an even integer.

The second step is to use the centers of the clusters to perform FMT runs, where one center can be used as the start and the other as the goal. A key parameter for efficient calculation is the search radius; setting it too low might lead to failure in pathfinding, as the curve might never surpass the reaction's barrier height. Conversely, setting the radius too high can be inefficient, as too many points will be considered simultaneously to form branches in the tree. Empirically, the appropriate search radius range can be expressed as:
E a 20 r E a 5 , (6)
where E a is the estimated barrier height.
After converging the FMT calculations, the path obtained can then be processed into the initial guess of the string. To increase the number of points, and therefore the granularity of the string, the initial coordinate guess can be expressed as follows:
q 1 u = q 1 i + j m ( q 1 ( i + 1 ) q 1 i ) , (7)
q 2 u = q 2 i + j m ( q 2 ( i + 1 ) q 2 i ) , (8)
where q n i is the value of CV n at the i th point on the path obtained from FMT, m is the integer multiplier, and j can take integer values from 0 to m 1. If q 1 i and q 2 i are the last points on the FMT path, j = 0.
The initial guess can then be used in any string-related algorithm for final polishing. If we define the string as a path φ ( σ ), where σ ( s ) is a function of s, the normalized length of the arc such that φ ( 0 ) corresponds to the configuration of the reactant and φ ( 1 ) corresponds to the configuration of the product, the goal of the string method is to minimize the functional F:
F [ φ ] = 0 1 d σ f ( φ ( σ ) ) T f ( φ ( σ ) ) = k = 0 n f φ σ k 2 , (9)
for discrete images. f ( φ ( σ ) ) is the normal force on the string at σ, defined as:
f ( φ ( σ ) ) = F ( φ ( σ ) ) + t ^ ( σ ) T F ( φ ( σ ) ) t ^ ( σ ) , (10)
where t ^ ( σ ) is the unit tangent vector along the string at φ ( σ ).
MEP, by definition, requires the normal forces on the MEP, f ( φ MEP ), to be zero. To achieve this, the points on the string should first evolve using numerical methods such as steepest descent or Newton–Raphson. Then, the string should be re-parameterized to redistribute the nodes along the string uniformly after each evolution step. This can be done by defining the parameterization density, ρ ( s ), and a constant c that satisfy the integral:
c = 0 1 ρ ( s ) d s , (11)
so that the re-parameterized position, φ k = φ ( s k ), can be determined by finding the value of s k such that:
0 s k ρ ( s ) d s = c σ k , (12)
where s k = s ( σ k ) and σ k = k n .

3 RESULTS AND DISCUSSION

We then compare the number of iterations required to converge a string method calculation using linear interpolation (LI) and the FMT-generated initial guess on model free energy surfaces (FES). The first FES is shown in Figure 1 below. Despite its simplicity, it is a typical FES for an SN 2 reaction or a hydrogen atom transfer (HAT) reaction that involves bond breaking and bond forming.

Details are in the caption following the image
The first model free energy surfaces (FES). The horizontal and vertical axes represent model collective variable (CV) values, and the unit of free energy is kJ/mol.

The two energy minima are first identified using the k-means algorithm, storing only points with an energy value less than 400 kJ/mol. The relationship between the matching error and the number of iterations is plotted in Figure 2. Given the estimated barrier height of 100 kJ/mol, the search radius in FMT is set to 10. Since the initial FMT-generated path produces 25 points, the integer multiplier is set to 4, resulting in a string containing 101 nodes. For simplicity, only the first and last nodes will be shown in all the reaction paths in this article. The q 1 and q 2 gradients of each point on the FEL are evaluated using the “linear” option in scipy for all of the FELs in this article.

Details are in the caption following the image
The relationship between the number of iterations (horizontal axis) and step length, which is the difference in the positions of the points (vertical axis) of each string iteration for both the linear interpolation (LI) generated initial guess and the FMT-generated initial guess.

According to the graph, it is clear that the FMT-generated initial guess performs much better than the LI initial guess, as the step length decreases monotonically in the FMT case, while it takes roughly 18 iterations for the LI initial guess to show a similar trend. To achieve a step length of less than 0.01, 24 iterations are required for LI, whereas only 6 are required for FMT. This advantage is further visualized by observing that after 6 iterations, the FMT initial guess-generated MEP is already very close to convergence, unlike the LI case, as shown in Figure 3. Although the decrease in step length is faster in the LI path from iteration 18 to 32, the speed of the drop becomes the same as in the FMT path after that.

Details are in the caption following the image
The minimum energy paths (MEPs) obtained from fast marching tree (FMT) and linear interpolation (LI) initial guess. (A) The MEP after six iterations using FMT generated path as the initial guess. (B) The MEP after six iterations using LI generated path as the initial guess.

To evaluate the computational savings, we measure the compute time required to reduce the step length to be below 1 × 1 0 2 , 5 × 1 0 3 , 1 × 1 0 3 , 5 × 1 0 4 , and 1 × 1 0 4 using both LI path and FMT path as the initial guess. In addition to r = 10, two other search radii in FMT, namely r = 5 and r = 20, are also experimented with. The computational time to reduce the step length to the aforementioned values, measured in seconds, is shown in Table 1 below.

TABLE 1. The percentage of time saved using the FMT-generated initial path compared to the LI-generated initial path.
Step length LI (s) FMT( r = 5) (s) % Save FMT( r = 10) (s) % Save FMT( r = 20) (s) % Save
0.01 4.171 2.744 35.2 2.017 50.9 2.680 35.7
0.005 4.763 3.441 25.5 2.700 41.1 3.190 33.0
0.001 6.188 4.923 20.4 5.164 16.5 5.843 5.6
0.0005 7.209 6.001 16.8 6.213 13.8 6.874 4.7
0.0001 9.674 8.859 8.4 9.133 5.6 9.466 2.2
  • Note: All times are measured in seconds.

According to the table, using a combined FMT and string approach requires less time than using the string method alone. The time savings is most prominent when the search radius r is 10 and the step length is less than 0.01 due to a relatively optimal choice of r and the fact that the FMT-generated initial path is already close enough to the MEP.

We then proceed to test the performance on a more complex system: the Muller-Brown potential M ( x , y ), defined by the following formula:
M ( x , y ) = k = 1 4 A k exp a k x x k 0 2 (13)
+ b k x x k 0 y y k 0 + c k y y k 0 2 , (14)
with
A = ( 200 , 100 , 170 , 15 ) , a = ( 1 , 1 , 6 . 5 , 0 . 7 ) , b = ( 0 , 0 , 11 , 0 . 6 ) , c = ( 10 , 10 , 6 . 5 , 0 . 7 ) , x 0 = ( 1 , 0 , 0 . 5 , 1 ) , y 0 = ( 0 , 0 . 5 , 1 . 5 , 1 ) .

A plot of the potential is shown in Figure 4 below.

Details are in the caption following the image
The Muller-Brown potential.

To demonstrate the superiority of the FMT-string combined approach, we plot the evolution of the string using the FMT-generated initial path at the fifth and fifteenth iterations, and the LI initial path at the fifth, fifteenth, twenty-fifth, and thirty-fifth iterations. The plots are shown in Figures 5 and 6 below, and the associated computational costs are outlined in Table 2. The search radius r is set to 10, as the estimated barrier height is also 100 kJ/mol. Since the initial FMT-generated path produces 24 points, the integer multiplier is set to 4, resulting in a string containing 97 nodes.

Details are in the caption following the image
The minimum energy paths (MEPs) obtained from fast marching tree (FMT)-generated initial guess. (A) The MEP after five iterations using the FMT-generated path as the initial guess. (B) The MEP after 15 iterations using the FMT-generated path as the initial guess.
Details are in the caption following the image
The minimum energy paths (MEPs) obtained from linear interpolation (LI)-generated initial guess. (A) The MEP after five iterations using the LI-generated path as the initial guess. (B) The MEP after 15 iterations using the LI-generated path as the initial guess. (C) MEP after 25 iterations using the LI-generated path as the initial guess. (D) The MEP after 35 iterations using the LI-generated path as the initial guess.
TABLE 2. The compute time measured in seconds using FMT-generated initial path and LI-generated initial path.
Iterations LI (s) FMT (s)
5 0.970 1.898
15 2.678 3.710
25 4.759
35 5.893

According to the figures, it can be observed that the FMT-generated initial path is nearly converged after 5 iterations, with full convergence achieved after 15 iterations, while 35 iterations are required for the LI-generated initial path. This corresponds to a roughly 40% time savings according to the compute time outlined in Table 2.

We finally present an example where there is a competitive pathway, posing difficulties in finding the MEP. In short, a potential well is created in the bottom-right region of the FES in Figure 1, resulting in a competition between two potential paths: the upper and the lower path. The modified FES is shown in Figure 7.

Details are in the caption following the image
The modified potential. Compared to the potential in Figure 1, this potential contains a competitive pathway.

We then compare the evolution of the string using the FMT-generated initial path at 10 and 20 iterations, shown in Figure 8, while that for the LI-generated initial path at 10, 40, 75, and 100 iterations is shown in Figure 9. The search radius is also set to 10. Since the initial FMT-generated path produces 31 points, the integer multiplier is set to 4, resulting in a string containing 125 nodes.

Details are in the caption following the image
The minimum energy paths (MEPs) obtained from fast marching tree (FMT)-generated initial guess. (A) The MEP after 10 iterations using FMT-generated path as the initial guess. (B) The MEP after 20 iterations using FMT-generated path as the initial guess.
Details are in the caption following the image
The minimum energy paths (MEPs) obtained from linear interpolation (LI)-generated initial guess. (A) The MEP after 10 iterations using LI-generated path as the initial guess. (B) The MEP after 40 iterations using LI-generated path as the initial guess. (C) The MEP after 75 iterations using LI-generated path as the initial guess. (D) The MEP after 100 iterations using LI-generated path as the initial guess.

According to the graphs, it is evident that the MEP iterated from an FMT-generated initial guess has nearly converged after 10 iterations, while that iterated from LI does not converge even after 75 iterations. The time cost for the LI and FMT-based algorithms to reach a certain number of iterations is outlined in Table 3. It can be deduced that using an FMT-generated initial guess results in approximately 75% time savings compared to the LI-generated one, as about 20 iterations are required for convergence in the FMT case but around 100 for the LI counterpart. It is also expected that FMT would be able to find a better initial guess and converge faster than LI when the upper path has a lower barrier height than that of the lower path.

TABLE 3. The compute time measured in seconds using FMT-generated initial path and LI-generated initial path.
Iterations LI (s) FMT (s)
10 1.976 4.186
20 3.984 6.117
40 7.937
75 16.384
100 21.650

4 CONCLUSION

To summarize, we have presented a novel approach that utilizes a state-of-the-art path planning algorithm, FMT, to generate accurate initial approximations of the MEP. We then coupled this with the string algorithm to further refine the path. Our results from obtaining the MEP of three model (FES) demonstrate that fewer iterations are required to converge the MEP, thus reducing the computational time significantly. This indicates that modern path-finding algorithms can play a pivotal role in efficiently extracting MEPs from complex free energy surfaces.

Future work should focus on applying the FMT-string method to a wider range of chemical reactions to facilitate the search for the MEP. Plausible approaches include initially obtaining an approximate MEP from molecular simulations, then using the FMT-string method to generate a reaction path. Umbrella sampling techniques can subsequently be employed around this path to obtain a more refined reaction pathway, the MEP, and potentially the minimum free energy path (MFEP). Additionally, this method's efficiency should be tested in higher-dimensional FES or PESs. It is also interesting to see how this method is combined with other reaction path optimization methods, such as the MaxFlux method.50-52

The promising results suggest that FMT, combined with the string method, offers a powerful tool for theoretical and computational chemists. By providing efficient and accurate initial guesses, this approach can enhance the study of reaction pathways across a broad spectrum of chemical and biochemical processes.

ACKNOWLEDGMENTS

The author acknowledge the University of Chicago for the teaching assistant fellowship that supports this research. The author also thanks Prof. Gregory Voth, Prof. Lijiang Yang and Fanpeng Meng for valuable discussions.

    DATA AVAILABILITY STATEMENT

    The data that support the findings of this study are available from the corresponding author upon reasonable request.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.