High-dimensional metastable molecular dynamics (MD) can often be characterised by a few features of the system, that is, collective variables (CVs). Thanks to the rapid advance in the area of machine learning and deep learning, various deep learning-based CV identification techniques have been developed in recent years, allowing accurate modelling and efficient simulation of complex molecular systems. In this paper, we look at two different categories of deep learning-based approaches for finding CVs, either by computing leading eigenfunctions of transfer operator associated to the underlying dynamics, or by learning an autoencoder via minimisation of reconstruction error. We present a concise overview of the mathematics behind these two approaches and conduct a comparative numerical study of these two approaches on illustrative examples.

1 INTRODUCTION

Molecular dynamics (MD) simulation is a mature computational technique for the study of biomolecular systems. It has proven valuable in a wide range of applications, for example, understanding functional mechanisms of proteins and discovering new drugs [1, 2]. However, the capability of direct (all-atom) MD simulations is often limited, due to the disparity between the tiny step-sizes that the simulations have to adopt in order to ensure numerical stability and the large timescales on which the functionally relevant conformational changes of biomolecules, such as protein folding, typically occur.

One general approach to overcome the aforementioned challenge in MD simulations is by utilising the fact that in many cases the dynamics of a high-dimensional metastable molecular system can be characterised by a few features, that is, collective variables (CVs) of the system. In deed, many enhanced sampling methods (see ref. [3] for a review) and approaches for building surrogate models [4-7] rely on knowing CVs of the underlying molecular system. While empirical approaches and physical/chemical intuition are still widely adopted in choosing CVs (e.g., mass centres, bonds, or angles), it is often difficult or even impossible to intuit biomolecular systems in real-life applications due to their high dimensionality, as well as structural and dynamical complexities.

Thanks to the availability of numerous molecular data being generated and the rapid advance of machine learning techniques, data-driven automatic identification of CVs has attracted considerable research interests. Numerous machine learning-based techniques for CV identification have emerged, such as the well-known principal component analysis (PCA) [8], diffusion maps [9], isometric feature mapping (ISOMAP) [10], sketch-map [11], time-lagged independent component analysis (TICA) [12], as well the kernel-PCA [13] and kernel-TICA [14] using kernel techniques. See refs. [15, 16] for reviews. The recent developments mostly employ deep learning techniques and largely fall into two categories. Methods in the first category are based on the operator approach for the study of stochastic dynamical systems. These include variational approach for Markov processes using neural networks (VAMPnets) [17] and the variant state-free reversible VAMPnets (SRV) [18], the deep-TICA approach [19], and Invariant subspaces of Koopman operators learned by a neural network (ISOKANN) [20], which are capable of learning eigenfunctions of Koopman/transfer operators. The authors of this paper have also developed a deep learning-based method for learning eigenfunctions of infinitesimal generator associated to overdamped Langevin dynamics [21]. Methods in the second category combine deep learning with dimension reduction techniques, typically by training autoencoders [22]. For instance, several approaches are proposed to iteratively train autoencoders and improve training data by “on-the-fly” enhanced sampling. These include the Molecular Enhanced Sampling with Autoencoders (MESA) [23], Free Energy Biasing and Iterative Learning with Autoencoders (FEBILAE) [24], the method based on the predictive information bottleneck framework [25], the Spectral Gap Optimisation of Order Parameters (SGOOP) [26], the deep Linear Discriminant Analysis (deep-LDA) [27]. Besides, various generalised autoencoders are proposed, such as the extended autoencoder (EAE) model [28], the time-lagged (variational) autoencoder [29, 30], Gaussian mixture variational autoencoder [31], and EncoderMap [32].

Motivated by these rapid advances, in this paper we study the aforementioned two categories of deep learning-based approaches for finding CVs, that is, approaches for computing leading eigenfunctions of transfer operator associated to the underlying dynamics and approaches that learn an autoencoder via minimisation of reconstruction error. The remainder of this article is organised as follows. In Section 2, we present the approach for CV identification based on computing eigenfunctions of transfer operator. We present a variational characterisation, as well as a loss function for learning the eigenfunction. In Section 3, we study autoencoders. In particular, we present a characterisation of the optimal (time-lagged) autoencoder. In Section 4, we illustrate the numerical approaches for learning eigenfunctions and autoencoder by applying them to two simple yet illustrative systems.

2 EIGENFUNCTIONS AS CVs FOR THE STUDY OF MOLECULAR KINETICS ON LARGE TIMESCALES

In this section, we consider eigenfunctions of transfer operator associated to the underlying dynamics. Approaches for computing eigenfunctions of infinitesimal generator and further motivations of eigenfunctions can be found in the extended version of this paper [33].

Transfer operator

Transfer operator approach offers an effective way to study dynamical systems without specifying the governing equations [34, 35] and is hence widely adopted in developing numerical algorithms. In this framework, one assumes that the trajectory data is sampled from an underlying (equilibrium) system $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0001$ whose state y at time $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0002$ given its state x at time t can be modelled as a discrete-time Markovian process with transition density $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0003$ , for all $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0004$ , where $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0005$ is called the lag-time and the process is assumed to be ergodic with respect to the unique invariant distribution μ, defined by

$urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0006$ (1)

where Z is a normalising constant. The transfer operator associated to this discrete-time Markovian process is defined as [35]

$urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0007$ (2)

for a density (with respect to μ) $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0008$ . Assume that the detailed balance condition is satisfied, that is, $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0009$ for all $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0010$ . Then, we can derive

$urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0011$ (3)

which shows that in the reversible setting the transfer operator coincides with the semigroup operator (at time τ) associated to the underlying process [36]. One can show that $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0012$ is self-adjoint in $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0013$ with respect to the inner product $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0014$ . Also, for a function $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0015$ we define the energy

$urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0016$ (4)

which is related to the transfer operator by the identities $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0017$ , where $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0018$ is the identity map (see ref. [33, Lemma 1]). In particular, one can conclude that all eigenvalues of $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0019$ are no larger than one. We assume that the spectrum of $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0020$ consists of discrete eigenvalues

$urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0021$ (5)

and the largest eigenvalue $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0022$ (corresponding to the trivial eigenfunction $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0023$ ) is non-degenerate. These eigenvalues and their corresponding eigenfunctions are of great interests in applications, since they encode information about the timescales and metastable conformations of the underlying dynamics, respectively [34, 35].

Variational characterisation and loss function

In the following, we record a variational characterisation of eigenfunctions of transfer operator. Such characterisation is useful in developing numerical algorithms [37, 38], in particular in designing loss functions in recent deep learning-based approaches [17, 18, 21]. In fact, using the same proof of ref. [21, Theorem 1], we can show the following variational characterisation for eigenfunctions of transfer operator.

Theorem 2.1.Let $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0024$ and $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0025$ . Assume that $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0026$ has discrete spectrum consisting of the eigenvalues in (5) with the corresponding eigenfunctions $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0027$ , $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0028$ . Define $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0029$ . We have

$urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0030$ (6)

where $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0031$ is the energy defined in (4), and the minimisation is over all $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0032$ under the constraints

$urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0033$ (7)

Moreover, the minimum in (6) is achieved when $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0034$ for $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0035$ .

Theorem 2.1 motivates the following loss function for learning eigenfunctions of the transfer operator $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0036$ :

$urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0037$ (8)

where α is a penalty constant, $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0038$ denotes the empirical mean with respect to the joint distribution $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0039$ , and Var^data, Cov^data denote empirical estimators of the variance and co-variance with respect to the measure μ, respectively. For brevity, we omit further discussions on the loss (8), and we refer to refs. [21, Section 3] and [33] for more details.

Compared to VAMPnets [17], the loss (8) imposes orthogonality constraints (7) explicitly and directly targets the leading eigenfunctions rather than basis of eigenspaces. Also, as opposed to the approach in ref. [18], training with the loss (8) does not require backpropagation on matrix eigenvalue problems.

3 ENCODER AS CVs FOR LOW-DIMENSIONAL REPRESENTATION OF MOLECULAR CONFIGURATIONS

In this section, we briefly discuss autoencoders in the context of CV identification for MD.

An autoencoder [22] on $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0040$ is a function f that maps an input data $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0041$ to an output $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0042$ by passing through an intermediate (latent) space $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0043$ , where $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0044$ . It can be written in the form $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0045$ , where $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0046$ and $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0047$ are called an encoder and a decoder, respectively. The integer k is called the encoded dimension (resp. bottleneck dimension). In other words, under the mapping of the autoencoder f, the input x is first mapped to a state z in the latent space $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0048$ by the encoder $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0049$ , which is then mapped to y in the original space by the decoder $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0050$ . In practice, both the encoder and the decoder are represented by artificial neural networks (see Figure 1). Given a set of data $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0051$ , they are typically trained by minimising the empirical reconstruction error

$urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0052$ (9)

Details are in the caption following the image — **FIGURE 1**
Open in figure viewer PowerPoint

Illustration of autoencoder represented by an artificial neural network. An input $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0053$ is first mapped to $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0054$ in the latent space, which is then mapped to the output $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0055$ .

**FIGURE 1**
Open in figure viewer PowerPoint

Illustration of autoencoder represented by an artificial neural network. An input $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0053$ is first mapped to $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0054$ in the latent space, which is then mapped to the output $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0055$ .

In the context of CV identification for molecular systems, the trained encoder $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0056$ is used to define a CV map. Note that (9) is invariant under reordering of training data. For trajectory data, instead of (9) it would be beneficial to employ a loss that incorporates temporal information in the data. In this regard, several variants, such as time-lagged autoencoders [29, 39] and the EAE using committor function [28], have been proposed in order to learn low-dimensional representations of the system that can capture its dynamics.

Characterisation of time-lagged autoencoders

We give a characterisation of the optimal time-lagged autoencoders [29]. Assume that the data $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0057$ comes from the trajectory of an underlying ergodic process with invariant measure μ in (1) at time $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0058$ , where $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0059$ and $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0060$ . Also assume that, for some $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0061$ , the state y of the underlying system after time τ given its current state x can be described as an ergodic Markov jump process with transition density $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0062$ (see the discussion in Section 2). For simplicity, we assume $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0063$ for some integer $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0064$ . The time-lagged autoencoder is trained with the loss

$urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0065$ (10)

which reduces to the standard reconstruction loss (9) when $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0066$ .

Let us consider the limit of (10) when $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0067$ . Given the encoder $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0068$ and $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0069$ , denote by $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0070$ the conditional measure on the level set $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0071$ , defined by

$urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0072$ (11)

where $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0073$ denotes the Dirac delta function and $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0074$ is a normalising constant satisfying $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0075$ . Using (11) and ergodicity, we can derive

$urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0076$ (12)

where $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0077$ , and we have denoted by $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0078$ the probability measure on $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0079$ defined by

$urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0080$ (13)

Using the simple identity

$urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0081$

where the right hand side is the variance with respect to $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0082$ and the minimum is attained at $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0083$ , we can finally write the minimisation of (12) as

$urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0084$ (14)

Note that (13) is the distribution of y at time τ starting from points x on the levelset $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0085$ distributed according to the conditional measure $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0086$ . To summarise, (14) implies that, when $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0087$ , training time-lagged autoencoder yields (in theory) the encoder map $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0088$ that minimises the average variance of the future states y (at time τ) of points x on $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0089$ distributed according to $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0090$ , and the decoder that is given by the mean of the future states y, that is, $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0091$ for $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0092$ . Similar results hold for the standard autoencoder with the reconstruction loss (9). In fact, choosing $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0093$ in the above derivation leads to the conclusion that the optimal encoder $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0094$ minimises the average variance of the measures $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0095$ on the levelsets.

To conclude, we note that although the loss (10) in time-lagged autoencoders encodes temporal information of data, the characterisation (14) implies that this temporal information may not be sufficient in order to yield encoders that are suitable to define good CVs that capture the slow modes of the dynamics (see ref. [33, Section 2]). Our characterisation of time-lagged autoencoders is in line with the previous study on the time-lagged autoencoders [39], where the authors analysed the capability and limitations of the time-lagged autoencoders in finding the slowest mode of the system, and proposed modifications of time-lagged autoencoders (in order to discover the slowest mode). In the next section, we will further compare autoencoders and eigenfunctions on concrete numerical examples.

4 NUMERICAL EXAMPLES

In this section, we show numerical results of eigenfunctions and autoencoders on two simple two-dimensional systems.

4.1 First example

The first system satisfies the stochastic differential equation (SDE)

$urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0096$ (15)

where $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0097$ for $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0098$ , $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0099$ is a Brownian motion in $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0100$ , $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0101$ , and the potential (taken from ref. [5])

$urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0102$ (16)

where we choose $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0103$ . As shown in Figure 2, there are two metastable regions in the state space, and the system can transit from one to the other through a curved transition channel. We sampled the trajectory of (15) for 10⁵ steps using Euler-Maruyama scheme with time step-size $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0104$ . The sampled states were recorded every two steps. This resulted in a dataset consisting of 5 × 10⁴ states, which were used in training neural networks 1.

We trained neural networks with the loss (9) for standard autoencoders and the loss (10) for time-lagged autoencoders. In each test, since the total dimension is two, we chose the bottleneck dimension $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0105$ . The encoder is represented by a neural network that has an input layer of size two, an output layer of size one, and four hidden layers of size 30 each. The decoder is represented by a neural network that has an input layer of size one, an output layer of size two, and three hidden layers of size 30 each. We took tanh as activation function in all neural networks. For the training, we used Adam optimiser [40] with batch size 2 × 10⁴ and learning rate 0.005. The random seed was fixed to be 2046 and the total number of training epochs was set to 500. Figure 3 shows the trained autoencoders with different lag-times. As one can see there, for both the standard autoencoder ( $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0106$ ) and the time-lagged autoencoder with a small lag-time ( $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0107$ ), the contour lines of the trained encoder match well with the stiff direction of the potential. The curves determined by the image of the decoders are also close to the transition path. However, the results for time-lagged autoencoders become unsatisfactory when the lag-time was chosen as 1.0 and 2.0.

**FIGURE 3**
Open in figure viewer PowerPoint

First example. (A) Contour lines of encoder trained with the standard reconstruction loss (9). (B), (C) and (D): contour lines of encoders trained with the loss (10) and lag-times $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0108$ , respectively. In each plot, the curve shown in gray dots is the minimal energy path computed by string method [41], whereas the curve shown in black dots is the curve given by the image of the trained decoder.

We also learned the first eigenfunction φ₁ of the transfer operator using the loss (8), where we chose $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0109$ , the coefficient $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0110$ , lag-time $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0111$ , and the penalty constant $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0112$ . The same dataset and the same training parameters as in the training of autoencoders were used, except that for the eigenfunction we employed a neural network that has three hidden layers of size 20 each. The learned eigenfunction is shown in Figure 4. We can see that the eigenfunction is indeed capable of identifying the two metastable regions and its contour lines are well aligned with the stiff directions of the potential in the transition region (but not inside the metastable regions).

**FIGURE 4**
Open in figure viewer PowerPoint

First example. Eigenfunction φ₁ of the transfer operator with $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0113$ trained using the loss (8).

4.2 Second example

In the second example, the system satisfies the SDE (15) with $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0114$ and the potential

$urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0115$

for $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0116$ . As shown in Figure 5, there are again two metastable regions. The region on the left contains the global minimum point of V, and the region on the right contains a local minimum point of V.

To prepare training data, we sampled the trajectory of (15) using Euler-Maruyama scheme with the same parameters as in the previous example, except that in this example we sampled in total 5 × 10⁵ steps. By recording the sampled states every two steps we obtained a dataset of size 2.5 × 10⁵.

We learned the autoencoder with the standard reconstruction loss (9) and the eigenfunction φ₁ of transfer operator with loss (8), respectively. For both autoencoder and eigenfunction, we used the same network architectures as in the previous example. We also used the same training parameters, except that in this example a larger batch-size 10⁵ was used and the total number of training epochs was set to 1000. The lag-time for transfer operator was $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0117$ . Figure 6 shows the learned autoencoder and the eigenfunction φ₁. As one can see there, since the autoencoder was trained to minimise the reconstruction error and most sampled data falls into the two metastable regions, the contour lines of the learned encoder match the stiff directions of the potential in the metastable regions, but the transition region is poorly characterised. On the contrary, the learned eigenfunction φ₁, while being close to constant inside the two metastable regions, gives a good parameterisation of the transition region. We also tried time-lagged autoencoders with lag-time $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0118$ and $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0119$ (results are not shown here). But, we were not successful in obtaining satisfactory results as compared to the learned eigenfunction in Figure 6.

**FIGURE 6**
Open in figure viewer PowerPoint

Second example. Left: contour lines of the learned encoder map and the learned decoder curve (represented by black dots) are shown on top of the potential profile. Right: eigenfunction φ₁ of the transfer operator with $urn:x-wiley:16177061:media:pamm202300189:pamm202300189-math-0120$ trained using the loss (8).

ACKNOWLEDGMENTS

Wei Zhang thanks Tony Lelièvre and Gabriel Stolz for fruitful discussions on autoencoders. The work of Christof Schütte and Wei Zhang is supported by the DFG under Germany's Excellence Strategy-MATH+: The Berlin Mathematics Research Centre (EXC-2046/1)-project ID:390685689.

Open access funding enabled and organized by Projekt DEAL.

REFERENCES

1Durrant, J. D., & McCammon, J. A. (2011). Molecular dynamics simulations and drug discovery. BMC Biology, 9, 71.
10.1186/1741-7007-9-71
CAS PubMed Web of Science® Google Scholar
2Hollingsworth, S. A., & Dror, R. O. (2018). Molecular dynamics simulation for all. Neuron, 99(6), 1129–1143.
10.1016/j.neuron.2018.08.011
CAS PubMed Web of Science® Google Scholar
3Hénin, J., Lelièvre, T., Shirts, M. R., Valsson, O., & Delemotte, L. (2022). Enhanced sampling methods for molecular dynamics simulations [Article v1.0]. Living Journal of Computational Molecular Science, 4(1), 1583.
10.33011/livecoms.4.1.1583
Google Scholar
4Noid, W. G. (2013). Perspective: Coarse-grained models for biomolecular systems. Journal of Chemical Physics, 139(9), 090901.
10.1063/1.4818908
CAS PubMed Web of Science® Google Scholar
5Legoll, F., & Lelièvre, T. (2010). Effective dynamics using conditional expectations. Nonlinearity, 23(9), 2131–2163.
10.1088/0951-7715/23/9/006
Web of Science® Google Scholar
6Zhang, W., Hartmann, C., & Schütte, C. (2016). Effective dynamics along given reaction coordinates, and reaction rate theory. Faraday Discussions, 195, 365–394.
10.1039/C6FD00147E
CAS PubMed Web of Science® Google Scholar
7Ayaz, C., Tepper, L., Brünig, F. N., Kappler, J., Daldrop, J. O., & Netz, R. R. (2021). Non-Markovian modeling of protein folding. Proceedings National Academy of Science USA, 118(31), e2023856118.
10.1073/pnas.2023856118
CAS PubMed Web of Science® Google Scholar
8Jolliffe, I. T., & Cadima, J. (2016). Principal component analysis: A review and recent developments. Philosophical Transactions of the Royal Society A, 374(2065), 20150202.
10.1098/rsta.2015.0202
PubMed Web of Science® Google Scholar
9Coifman, R. R., & Lafon, S. (2006). Diffusion maps. Applied and Computational Harmonic Analysis, 21(1), 5–30.
10.1016/j.acha.2006.04.006
Web of Science® Google Scholar
10Das, P., Moll, M., Stamati, H., Kavraki, L. E., & Clementi, C. (2006). Low-dimensional, free-energy landscapes of protein-folding reactions by nonlinear dimensionality reduction. Proceedings National Academy of Science USA, 103(26), 9885–9890.
10.1073/pnas.0603553103
CAS PubMed Web of Science® Google Scholar
11Ceriotti, M., Tribello, G. A., & Parrinello, M. (2011). Simplifying the representation of complex free-energy landscapes using sketch-map. Proceedings National Academy of Science USA, 108, 13023–13028.
10.1073/pnas.1108486108
CAS PubMed Web of Science® Google Scholar
12Pérez-Hernández, G., Paul, F., Giorgino, T., De Fabritiis, G., & Noé, F. (2013). Identification of slow molecular order parameters for Markov model construction. Journal of Chemical Physics, 139(1), 015102.
10.1063/1.4811489
PubMed Web of Science® Google Scholar
13Schölkopf, B., Smola, A., & Müller, K. R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5), 1299–1319.
10.1162/089976698300017467
Web of Science® Google Scholar
14Schwantes, C. R., & Pande, V. S. (2015). Modeling molecular kinetics with tICA and the kernel trick. Journal of Chemical Theory and Computation, 11(2), 600–608.
10.1021/ct5007357
CAS PubMed Web of Science® Google Scholar
15Noé, F., & Clementi, C. (2017). Collective variables for the study of long-time kinetics from molecular trajectories: Theory and methods. Current Opinion in Structural Biology, 43, 141–147.
10.1016/j.sbi.2017.02.006
CAS PubMed Web of Science® Google Scholar
16Rohrdanz, M. A., Zheng, W., & Clementi, C. (2013). Discovering mountain passes via torchlight: Methods for the definition of reaction coordinates and pathways in complex macromolecular reactions. Annual Review of Physical Chemistry, 64(1), 295–316.
10.1146/annurev-physchem-040412-110006
CAS PubMed Web of Science® Google Scholar
17Mardt, A., Pasquali, L., Wu, H., & Noé, F. (2018). VAMPnets for deep learning of molecular kinetics. Nature Communications, 9(5).
Google Scholar
18Chen, W., Sidky, H., & Ferguson, A. L. (2019). Nonlinear discovery of slow molecular modes using state-free reversible VAMPnets. Journal of Chemical Physics, 150(21), 214114.
10.1063/1.5092521
PubMed Web of Science® Google Scholar
19Bonati, L., Piccini, G., & Parrinello, M. (2021). Deep learning the slow modes for rare events sampling. Proceedings National Academy of Science USA, 118(44), e2113533118.
10.1073/pnas.2113533118
CAS PubMed Web of Science® Google Scholar
20Rabben, R. J., Ray, S., & Weber, M. (2020). ISOKANN: Invariant subspaces of Koopman operators learned by a neural network. Journal of Chemical Physics, 153(11), 114109.
10.1063/5.0015132
CAS PubMed Web of Science® Google Scholar
21Zhang, W., Li, T., & Schütte, C. (2022). Solving eigenvalue PDEs of metastable diffusion processes using artificial neural networks. Journal of Computational Physics, 465, 111377.
10.1016/j.jcp.2022.111377
Web of Science® Google Scholar
22Kramer, M. (1992). Autoassociative neural networks. Computers & Chemical Engineering, 16(4), 313–328.
10.1016/0098-1354(92)80051-A
CAS Web of Science® Google Scholar
23Chen, W., & Ferguson, A. L. (2018). Molecular enhanced sampling with autoencoders: On-the-fly collective variable discovery and accelerated free energy landscape exploration. Journal of Computational Chemistry, 39(25), 2079–2102.
10.1002/jcc.25520
CAS PubMed Web of Science® Google Scholar
24Belkacemi, Z., Gkeka, P., Lelièvre, T., & Stoltz, G. (2022). Chasing collective variables using autoencoders and biased trajectories. Journal of Chemical Theory and Computation, 18(1), 59–78.
10.1021/acs.jctc.1c00415
CAS PubMed Web of Science® Google Scholar
25Wang, Y., Ribeiro, J. M. L., & Tiwary, P. (2019). Past–future information bottleneck for sampling molecular reaction coordinate simultaneously with thermodynamics and kinetics. Nature Communications, 10(1), 3573.
10.1038/s41467-019-11405-4
PubMed Google Scholar
26Tiwary, P., & Berne, B. J. (2016). Spectral gap optimization of order parameters for sampling complex molecular systems. Proceedings National Academy of Science USA, 113, 2839–2844.
10.1073/pnas.1600917113
CAS PubMed Web of Science® Google Scholar
27Bonati, L., Rizzi, V., & Parrinello, M. (2020). Data-driven collective variables for enhanced sampling. Journal of Physical Chemistry Letters, 11(8), 2998–3004.
10.1021/acs.jpclett.0c00535
CAS PubMed Web of Science® Google Scholar
28Frassek, M., Arjun, A., & Bolhuis, P. G. (2021). An extended autoencoder model for reaction coordinate discovery in rare event molecular dynamics datasets. Journal of Chemical Physics, 155(6), 064103.
10.1063/5.0058639
CAS PubMed Web of Science® Google Scholar
29Wehmeyer, C., & Noé, F. (2018). Time-lagged autoencoders: Deep learning of slow collective variables for molecular kinetics. Journal of Chemical Physics, 148(24), 241703.
10.1063/1.5011399
PubMed Web of Science® Google Scholar
30Hernández, C. X., Wayment-Steele, H. K., Sultan, M. M., Husic, B. E., & Pande, V. S. (2018). Variational encoding of complex dynamics. Physical Review E, 97, 062412.
10.1103/PhysRevE.97.062412
CAS PubMed Web of Science® Google Scholar
31Varolgüneş, Y. B., Bereau, T., & Rudzinski, J. F. (2020). Interpretable embeddings from molecular simulations using Gaussian mixture variational autoencoders. Machine Learning: Science and Technology, 1(1), 015012.
10.1088/2632-2153/ab80b7
Google Scholar
32Lemke, T., & Peter, C. (2019). Encodermap: Dimensionality reduction and generation of molecule conformations. Journal of Chemical Theory and Computation, 15(2), 1209–1215.
10.1021/acs.jctc.8b00975
CAS PubMed Web of Science® Google Scholar
33Zhang, W., & Schütte, C. (2023). Understanding recent deep-learning techniques for identifying collective variables of molecular dynamics.arXiv preprint, 2307.00365.
Google Scholar
34Schütte, C., Huisinga, W., & Deuflhard, P. (2001). Transfer operator approach to conformational dynamics in biomolecular systems. In B. Fiedler (Ed.), Ergodic theory, analysis, and efficient simulation of dynamical systems (pp. 191–223). Springer Berlin Heidelberg.
10.1007/978-3-642-56589-2_9
Google Scholar
35Prinz, J. H., Wu, H., Sarich, M., Keller, B., Senne, M., Held, M., Chodera, J. D., Schütte, C., & Noé, F. (2011). Markov models of molecular kinetics: Generation and validation. Journal of Chemical Physics, 134(17), 174105.
10.1063/1.3565032
PubMed Web of Science® Google Scholar
36Zhang, W., & Schütte, C. (2017). Reliable approximation of long relaxation timescales in molecular dynamics. Entropy, 19(7), 367.
10.3390/e19070367
Web of Science® Google Scholar
37Noé, F., & Nüske, F. (2013). A variational approach to modeling slow processes in stochastic dynamical systems. Multiscale Modeling and Simulation, 11(2), 635–655.
10.1137/110858616
Web of Science® Google Scholar
38Nüske, F., Schneider, R., Vitalini, F., & Noé, F. (2016). Variational tensor approach for approximating the rare-event kinetics of macromolecular systems. Journal of Chemical Physics, 144(5), 054105.
10.1063/1.4940774
PubMed Web of Science® Google Scholar
39Chen, W., Sidky, H., & Ferguson, A. L. (2019). Capabilities and limitations of time-lagged autoencoders for slow mode discovery in dynamical systems. Journal of Chemical Physics, 151(6), 064123.
10.1063/1.5112048
Web of Science® Google Scholar
40Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In Y. Bengio & Y. LeCun (Eds.), 3rd International Conference on Learning Representations ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
Google Scholar
41E, W., Ren, W., & Vanden-Eijnden, E. (2002). String method for the study of rare events. Physical Review B, 66, 052301.
10.1103/PhysRevB.66.052301
CAS Web of Science® Google Scholar

1 Note that the empirical distribution of the data (shown in Figure 2) slightly differs from the true invariant distribution μ of the dynamics. However, there are sufficiently many samples in both metastable regions and also in the transition region. In particular, the discrepancy between the empirical distribution and the true invariant distribution is not the main factor that determines the quality of the numerical results.

Volume23, Issue4

Special Issue:93rd Annual Meeting of the International Association of Applied Mathematics and Mechanics (GAMM)

December 2023

e202300189

Understanding recent deep-learning techniques for identifying collective variables of molecular dynamics

Abstract

1 INTRODUCTION

2 EIGENFUNCTIONS AS CVs FOR THE STUDY OF MOLECULAR KINETICS ON LARGE TIMESCALES

Transfer operator

Variational characterisation and loss function

3 ENCODER AS CVs FOR LOW-DIMENSIONAL REPRESENTATION OF MOLECULAR CONFIGURATIONS

Characterisation of time-lagged autoencoders

4 NUMERICAL EXAMPLES

4.1 First example

4.2 Second example

ACKNOWLEDGMENTS

REFERENCES

Figures

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Understanding recent deep-learning techniques for identifying collective variables of molecular dynamics

Abstract

1 INTRODUCTION

2 EIGENFUNCTIONS AS CVs FOR THE STUDY OF MOLECULAR KINETICS ON LARGE TIMESCALES

Transfer operator

Variational characterisation and loss function

3 ENCODER AS CVs FOR LOW-DIMENSIONAL REPRESENTATION OF MOLECULAR CONFIGURATIONS

Characterisation of time-lagged autoencoders

4 NUMERICAL EXAMPLES

4.1 First example

4.2 Second example

ACKNOWLEDGMENTS

REFERENCES

Figures

References

Related

Information