Data-Driven Model Reduction and Transfer Operator Approximation

Data-driven model reduction and transfer operator approximation

Stefan Klus1, Feliks Nüske1, Péter Koltai1, Hao Wu1, Ioannis Kevrekidis2,3, Christof Schütte1,3, and Frank Noé1

1Department of Mathematics and Computer Science, Freie Universität Berlin, Germany 2Department of Chemical and Biological Engineering & Program in Applied and Computational Mathematics, Princeton University, USA 3Zuse Institute Berlin, Germany

Abstract In this review paper, we will present different data-driven dimension reduction techniques for dynamical systems that are based on transfer operator theory as well as methods to approximate transfer operators and their eigenvalues, eigenfunctions, and eigenmodes. The goal is to point out similarities and differences between methods developed independently by the dynamical systems, fluid dynamics, and molecular dynamics communities such as time-lagged independent component analysis (TICA), dynamic mode decomposition (DMD), and their respective generalizations. As a result, extensions and best practices developed for one particular method can be carried over to other related methods.

1 Introduction

The numerical solution of complex systems of differential equations plays an important role in many areas such as molecular dynamics, fluid dynamics, mechanical as well as electrical engineering, and physics. These systems often exhibit multi-scale behavior which can be due to the coupling of subsystems with different time scales – for instance, fast electrical and slow mechanical components – or due to the intrinsic properties of the system itself – for instance, the fast vibrations and slow conformational changes of molecules. Analyzing such problems using transfer operator based methods is often infeasible or prohibitively expensive from a computational point of view due to the so-called curse of dimensionality. arXiv:1703.10112v2 [math.DS] 18 Sep 2017 One possibility to avoid this is to project the dynamics of the high-dimensional system onto a lower-dimensional space and to then analyze the reduced system representing, for instance, only the relevant slow dynamics, see, e.g., [1,2]. In this paper, we will introduce different methods such as time-lagged independent component analysis (TICA) [3,1,4] and dynamic mode decomposition (DMD) [5,6,7,8] to identify the dominant dynamics using only simulation data or experimental data. It was shown that

1 these methods are related to Koopman operator approximation techniques [9, 10, 11, 12]. Extensions of the aforementioned methods called the variational approach of conformation dynamics (VAC) [13, 14, 15] developed mainly for reversible molecular dynamics problems and extended dynamic mode decomposition (EDMD) [16, 17, 18] can be used to compute eigenfunctions, eigenvalues, and eigenmodes of the Koopman operator (and its adjoint, the Perron–Frobenius operator). Interestingly, although the underlying ideas, derivations, and intended applications of these methods differ, the resulting algorithms share a lot of similarities. The goal of this paper is to show the equivalence of different data-driven methods which have been widely used by the dynamical systems, fluid dynamics, and molecular dynamics communities, but under different names. Hence, extensions, generalizations, and algorithms developed for one method can be carried over to its counterparts, resulting in a unified theory and set of tools. An alternative approach to data-driven model reduction – also related to transfer operators and their generators – would be to use diffusion maps [19, 20, 21, 22]. Manifold learning methods, however, are beyond the scope of this paper. The outline of the paper is as follows: Section2 briefly introduces transfer operators and the concept of reversibility. In Section3, different data-driven methods for the approximation of the eigenvalues, eigenfunctions, and eigenmodes of transfer operators will be described. The theoretical background and the derivation of these methods will be outlined in Section4. Section5 addresses open problems and lists possible future work.

2 Transfer operators and reversibility

In the literature, the term transfer operator is sometimes used in different contexts. In this section, we will briefly introduce the Perron–Frobenius operator, the Perron–Frobenius operator with respect to the equilibrium density, and the Koopman operator. All these three operators are, according to our definition, transfer operators.

2.1 Guiding example Our paper will deal with data-driven methods to analyze both stochastic and deterministic dynamical systems. To illustrate the concepts of transfer operators and their spectral components, we ﬁrst introduce a simple stochastic dynamical system that will be revisited throughout the paper.

Example 2.1. Consider the following one-dimensional Ornstein–Uhlenbeck process, given by an Itô stochastic diﬀerential equation1 of the form: √ dXt = −αD Xt dt + 2D dW t.

Here, {Wt}t≥0 is a one-dimensional standard Wiener process (Brownian motion), the pa- rameter α is the friction coefficient, and D = β−1 is the diffusion coefficient. The stochastic forcing usually models physical effects, most often thermal fluctuations and it is customary to call β the inverse temperature.

1 A general time-homogeneous Itô stochastic diﬀerential equation is given by dXt = −α(Xt) Xt dt + d d d d×d σ(Xt) dWt, where α : R → R and σ : R → R are coeﬃcient functions, and {Wt}t≥0 is a d- dimensional standard Wiener process.

2 The transition density of the Ornstein–Uhlenbeck process, i.e., the conditional probability density to ﬁnd the process near y a time τ after it had been at x, is given by

2 ! 1 y − x e−αDτ pτ (x, y) = exp − , (1) p2 π σ2(τ) 2 σ2(τ) where σ2(τ) = α−1 1 − e−2αDτ . Figure1a shows the transition densities for diﬀerent values of τ. More details can be found in [23]. For complex dynamical systems, the transition density is not known explicitly, but must be estimated from simulation or measurement data.

Figure 1: a) Transition density function of the Ornstein–Uhlenbeck process for different values of τ. If τ is small, values starting in x will stay close to it. For larger values of τ, the influence of the starting point x is negligible. Densities converge to the equilibrium density, denoted by π. Here, α = 4 and D = 0.25. b) Evolution of the probability to find the dynamical system at any point x over time t, after starting with a peaked distribution at t = 0. We show the resulting distributions at times t = 0.1, and t = 1, and t = 10. The system relaxes towards the stationary density π(x).

In this work we will describe the dynamics of a system in terms of dynamical operators such as the propagator Pτ , which is deﬁned by the transition density pτ (x, y) and propagates a probability density of Brownian walkers in time by

pt+τ (x) = Pτ pt(x).

See Figure1b for the time evolution of the Ornstein–Uhlenbeck process initiated from a localized starting condition. It can be seen that the distribution spreads out and converges

3 towards a Gaussian distribution, which is then invariant in time. For this simple dynamical system we can give the equation for the invariant density explicitly:

1 x2 π(x) = √ exp − , (2) 2 π α−1 2 α−1 which is a Gaussian whose variance is decreasing with increasing friction and decreasing temperature. 4

2.2 Transfer operators 2 Let {Xt}t≥0 be a time-homogeneous stochastic process defined on the bounded state space d X ⊂ R . It can be genuinely stochastic or it might as well be deterministic, such that there 3 is a flow map Φτ : X → X with Φτ (Xt) = Xt+τ for τ ≥ 0. Let the measure P denote the law of the process {Xt}t≥0 that we will study in terms of its statistical transition properties. To this end, under some mild regularity assumptions4 which are satisfied by Itô diffusions with smooth coefficients [24, 25], we can give the following definition.

Deﬁnition 2.2. The transition density function pτ : X × X → [0, ∞] of a process {Xt}t≥0 is deﬁned by [X ∈ | X = x] = p (x, y) dy, P t+τ A t ˆ τ A for every measurable set A. Here and in what follows, P[ · | E] denotes probabilities conditioned on the event E. That is, pτ (x, y) is the conditional probability density of Xt+τ = y given that Xt = x.

r For 1 ≤ r ≤ ∞, the spaces L (X) denote the usual spaces of r-Lebesgue integrable functions, which is a Banach space with the corresponding norm k · kLr .

1 ∞ Deﬁnition 2.3. Let pt ∈ L (X) be the probability density and ft ∈ L (X) an observable of the system. For a given lag time τ:

1 1 a) The Perron–Frobenius operator or propagator Pτ : L (X) → L (X) is deﬁned by

P p (x) = p (y, x) p (y) dy. τ t ˆ τ t X ∞ ∞ b) The Koopman operator Kτ : L (X) → L (X) is deﬁned by

K f (x) = p (x, y) f (y) dy = [f (X ) | X = x]. τ t ˆ τ t E t t+τ t X

2 We call a stochastic process {Xt}t≥0 time-homogeneous, or autonomous, if it holds for every t ≥ s ≥ 0 that the distribution of Xt conditional to Xs = x only depends on x and (t − s). It is the stochastic analogue of the flow of an autonomous (time-independent) ordinary differential equation. 3For a measure-theoretic discussion of this construction, please refer to [18]. For our purposes, it is sufficient to equip X with the standard Lebesgue measure. In particular, if not stated otherwise, measurability of a set A ⊂ X is meant with respect to the Borel σ-algebra. 4These conditions are called interchangeably absolute continuity,µ-compatibility, or null preservingness.

4 Both Pτ and Kτ are linear but inﬁnite-dimensional operators which are adjoint to each other with respect to the standard duality pairing h·, ·i, deﬁned by hf, gi = f(x) g(x) dx. X The homogeneity of the stochastic process {Xt}t≥0 implies the so-called semigroup´ property of the operators, i.e., Pτ+σ = Pτ Pσ and Kτ+σ = Kτ Kσ for τ, σ ≥ 0. In other words, these operators describe time-stationary Markovian dynamics. While the Perron–Frobenius operator describes the evolution of densities, the Koopman operator describes the evolution of observables. For the analysis of the long-term behavior of dynamical systems, densities that remain unchanged by the dynamics play an important role (one can think of the concept of ergodicity).

Deﬁnition 2.4. A density π is called an invariant density or equilibrium density if Pτ π = π. That is, the equilibrium density π is an eigenfunction of the Perron–Frobenius operator Pτ with corresponding eigenvalue 1.

r r r : In what follows, Lπ(X) denotes the weighted L -space of functions f such that kfkLπ = |f(x)|rπ(x) dx < ∞. While one can consider the evolution of densities with respect to X ´any density, we are particularly interested in the evolution with respect to the equilibrium density. From this point on, we assume there is a unique invariant density. This assumption is typically satisﬁed for molecular dynamics applications, where the invariant density is given by the Boltzmann distribution.

1 −1 Deﬁnition 2.5. Let Lπ(X) 3 ut(x) = π(x) pt(x) be a probability density with respect to the equilibrium density π. Then the Perron–Frobenius operator (propagator) with respect to the equilibrium density, denoted by Tτ , is deﬁned by π(y) T u (x) = p (y, x) u (y) dy. τ t ˆ π(x) τ t X

r r0 The operators Pτ and Kτ can be defined on other spaces L and L , with r 6= 1 and r0 6= ∞, see [26, 18] for more details. By defining the weighted duality pairing r r0 1 1 hf, giπ = f(x) g(x) π(x) dx for f ∈ L ( ) and g ∈ L ( ), where + 0 = 1, Tτ de- X π X π X r r r0 r fined on L´π (X) is the adjoint of Kτ defined on Lπ(X) with respect to h·, ·iπ:

hKτ f, giπ = hf, Tτ giπ.

For more details, see [27, 28, 29, 13, 14, 18, 30]. The two operators Pτ and Tτ are often referred to as forward operators, whereas Kτ is also called backward operator, as they are the solution operators of the forward (Fokker–Planck) and backward Kolmogorov equations [27, Section 11], respectively.

2.3 Spectral decomposition of transfer operators

In what follows, let Aτ denote one of the transfer operators deﬁned above, i.e., Pτ , Tτ , or Kτ . We are particularly interested in computing eigenvalues λ`(τ) ∈ C and eigenfunctions ϕ` : X → C of transfer operators, i.e.:

Aτ ϕ` = λ`(τ) ϕ`.

5 a) b)

Figure 2: Dominant eigenfunctions of the Ornstein–Uhlenbeck process computed analytically (dotted lines) and using VAC/EDMD (solid lines). a) Eigenfunctions of the Per- ron–Frobenius operator Pτ . b) Eigenfunctions of the Koopman operator Kτ .

Note that the eigenvalues depend on the lag time τ. For the sake of simplicity, we will often omit this dependency. The eigenvalues and eigenfunctions of transfer operators contain important information about the global properties of the system such as metastable sets or fast and slow processes and can also be used as reduced coordinates, see [31, 28, 10, 32, 29,2] and references therein.

Example 2.6. The eigenvalues λ` and eigenfunctions ϕ` of Kτ associated with the Ornstein– Uhlenbeck process introduced in Example 2.1 are given by

−αD (`−1) τ 1 √ λ`(τ) = e , ϕ`(x) = H`−1 α x , ` = 1, 2,..., p(` − 1)! where H` denotes the `th probabilists’ Hermite polynomial [23]. The eigenfunctions of Pτ are given by the eigenfunctions of Kτ multiplied by the equilibrium density π, see also Figure2. 4

In addition to the eigenvalues and eigenfunctions, an essential part of the Koopman operator analysis is the set of Koopman modes for the so-called full-state observable g(x) = x. The Koopman modes are vectors that, together with the eigenvalues and eigenfunctions, allow us to reconstruct and to propagate the system’s state [16]. More precisely, assume that each component gi of the full-state observable, i.e., gi(x) = xi for i = 1, . . . , d, can P be written in terms of the eigenfunctions as gi(x) = ` ϕ`(x) ηi`. Defining the Koopman T P modes by η` = [η1`, . . . , ηd`] , we obtain g(x) = x = ` ϕ`(x) η` and thus X Kτ g(x) = E[g(Xτ ) | X0 = x] = E[Xτ | X0 = x] = λ`(τ) ϕ`(x) η`. (3) ` For vector-valued functions, the Koopman operator is defined to act componentwise. In order to be able to compute eigenvalues, eigenfunctions, and eigenmodes numerically, we project the infinite-dimensional operators onto finite-dimensional spaces spanned by a given set of basis functions. This will be described in detail in Section4.

6 2.4 Reversibility We brieﬂy recapitulate the properties of reversible systems. For many applications, including commonly used molecular dynamics models, the dynamics in full phase space are known to be reversible.

Deﬁnition 2.7. A system is said to be reversible if the so-called detailed balance condition is fulﬁlled, i.e., it holds for all x, y ∈ X that

π(x) pτ (x, y) = π(y) pτ (y, x). (4)

Example 2.8. The Ornstein–Uhlenbeck process is reversible. It is straightforward to verify that (4) is fulﬁlled by the transition density (1) and the stationary density (2) for all values of x, y and τ. Also general Smoluchowski equations of a d-dimensional system of the form √ dXt = −D∇V (Xt) dt + 2dD dWt with dimensionless potential V (x) are reversible. The stationary density is then given by π ∝ exp(−V (x)) [33]. 4

As a result of the detailed balance condition, the Koopman operator Kτ and the Per- ron–Frobenius operator with respect to the equilibrium density, Tτ , are identical (hence also self-adjoint):

π(y) K f = p (x, y) f(y) dy = p (y, x) f(y) dy = T f. τ ˆ τ ˆ π(x) τ τ X X

Moreover, both Kτ and Pτ become self-adjoint with respect to the stationary density, i.e.

hPτ f, giπ−1 = hf, Pτ giπ−1 , hKτ f, giπ = hf, Kτ giπ.

Hence, the eigenvalues λ` are real and the eigenfunctions ϕ` of Kτ form an orthogonal basis with respect to h·, ·iπ. That is, the eigenfunctions can be scaled so that hϕ`, ϕ`0 iπ = δ``0 . Furthermore, the leading eigenvalue λ1 is the only eigenvalue with absolute value 1 and we obtain 1 = λ1 > λ2 ≥ λ3 ≥ ..., 2 see, e.g., [14]. We can then expand a function f ∈ Lπ(X) in terms of the eigenfunctions as P∞ f = `=1hf, ϕ`iπ ϕ` such that

∞ X Kτ f = λ`(τ) hf, ϕ`iπ ϕ`. (5) `=1

Furthermore, the eigenvalues decay exponentially with λ`(τ) = exp(−κ`τ) with relaxation −1 rate κ` and relaxation timescale t` . Thus, for a suﬃciently large lag time τ, the fast relaxation processes have decayed and (5) can be approximated by ﬁnitely many terms. The propagator Pτ has the same eigenvalues and the eigenfunctions ϕ˜` are given by ϕ˜`(x) = π(x) ϕ`(x).

7 3 Data-driven approximation of transfer operators

In this section, we will describe different data-driven methods to identify the dominant dynamics of dynamical systems and to compute eigenfunctions of transfer operators associated with the system, namely TICA and DMD as well as VAC and EDMD. A formal derivation of methods to compute finite-dimensional approximations of transfer operators – resulting in the aforementioned methods – will be given in Section4. Although TICA can be regarded as a special case of VAC, and DMD as a special case of EDMD, these methods are often used in different settings. With the aid of TICA, for instance, it is possible to identify the main slow coordinates and to project the dynamics onto the resulting reduced space, which can then be discretized using conventional Markov state models (a special case of VAC or EDMD, respectively, see Subsection 3.5). We will introduce the original methods – TICA and DMD – first and then extend these methods to the more general case. Since in many publications a different notation is used, we will first start with the required basic definitions. d In what follows, let xi, yi ∈ R , i = 1, . . . , m, be a set of pairs of d-dimensional data vectors, where xi = Xti and yi = Xti+τ . Here, the underlying dynamical system is not necessarily known, the vectors xi and yi can simply be measurement data or data from a black-box simulation. In matrix form, this can be written as X = x1 x2 ··· xm and Y = y1 y2 ··· ym , (6)

d×m with X,Y ∈ R . If one long trajectory {z0, z1, z2,... } of a dynamical system is given, i.e., zi = Xt0+h i, where h is the step size and τ = nτ h the lag time, we obtain X = z0 z1 ··· zm−1 and Y = znτ znτ +1 ··· znτ +m−1 .

That is, in this case Y is simply X shifted by the lag time τ. Naturally, if more than one trajectory is given, the data matrices X and Y can be concatenated. In addition to the data, VAC and EDMD require a set of uniformly bounded basis functions ∞ or observables, given by {ψ1, ψ2, . . . , ψk} ⊂ L (X). Since X is assumed to be bounded, we r have ψi ∈ L (X) for all i = 1, . . . , k and 1 ≤ r ≤ ∞. The basis functions could, for instance, be monomials, indicator functions, radial basis functions, or trigonometric functions. The optimal choice of basis functions remains an open problem and depends strongly on the system. If the set of basis functions is not sufficient to represent the eigenfunctions, the results will be inaccurate. A too large set of basis functions, on the other hand, might lead to ill-conditioned matrices and overfitting. Cross-validation strategies have been developed to detect overfitting [34]. d k For a basis ψi, i = 1, . . . , k, define ψ : R → R to be the vector-valued function given by

T ψ(x) = [ψ1(x), ψ2(x), . . . , ψk(x)] . (7)

The goal then is to ﬁnd the best approximation of a given transfer operator in the space spanned by these basis functions. This will be explained in detail in Section4. In addition to the data matrices X and Y , we will need the transformed data matrices ΨX = ψ(x1) ψ(x2) . . . ψ(xm) and ΨY = ψ(y1) ψ(y2) . . . ψ(ym) . (8)

8 3.1 Time-lagged independent component analysis Time-lagged independent component analysis (TICA) has been introduced in [3] as a solution to the blind source separation problem, where the correlation matrix and the time-delayed correlation matrix are used to separate superimposed signals. The term TICA has been introduced later [35]. TDSEP [36], an extension of TICA, is popular in the machine learning community. It was shown only recently that TICA is a special case of the VAC by computing the optimal linear projection for approximating the slowest relaxation processes, and as such provides an approximation of the leading eigenvalues and eigenfunctions of transfer operators [1]. TICA is now a popular dimension reduction technique in the ﬁeld of molecular dynamics [1,4]. That is, TICA is used as a preprocessing step to reduce the size of the state space by projecting the dynamics onto the main coordinates. The time-lagged independent components are required (a) to be uncorrelated and (b) to maximize the autocovariances at lag time τ, see [35,1] for more details. Assuming that the system is reversible, the TICA coordinates are the eigenfunctions of Tτ or Kτ , respectively, projected onto the space spanned by linear basis functions, i.e., ψ(x) = x. Let C(τ) be the time-lagged covariance matrix deﬁned by

Cij(τ) = hXt,i Xt+τ,jit = Eπ [Xt,i Xt+τ,j] .

Given data X and Y as deﬁned above, estimators C0 and Cτ for the covariance matrices C(0) and C(τ) can be computed as

m 1 X T 1 T C0 = m−1 xk xk = m−1 XX , k=1 m (9) 1 X T 1 T Cτ = m−1 xk yk = m−1 XY . k=1 The time-lagged independent components are deﬁned to be solutions of the eigenvalue problem + Cτ ξ` = λ` C0 ξ` or C0 Cτ ξ` = λ` ξ`, (10) + + respectively. In what follows, let MTICA = C0 Cτ , where C0 denotes the Moore–Penrose pseudo-inverse of C0. In applications, often the symmetrized estimators

1 T T 1 T T C0 = 2m−2 (XX + YY ) and Cτ = 2m−2 (XY + YX ) are used so that the resulting TICA coordinates become real-valued. This corresponds to averaging over the trajectory and the time-reversed trajectory. Note that this symmetriza- tion can introduce a large estimator bias that aﬀects the dominant spectrum of (10), if the process is non-stationary, or the distribution of the data is far from the equilibrium of the process. In the latter case, a reweighting procedure can be applied to obtain weighted versions of the estimators (9), to reduce that bias [30]. Example 3.1. Let us illustrate the idea behind TICA with a simple example. Consider the data shown in Figure3, which was generated by a stochastic process which will typically spend a long time in one of the two clusters before it jumps to the other. We are interested

9 in ﬁnding these metastable sets. Standard principal component analysis (PCA) leads to the coordinate shown in red, whereas TICA – shown in black – takes time-information into account and is thus able to identify the slow direction of the system correctly. Projecting the system onto the x-coordinate will preserve the slow process while eliminating the fast stochastic noise. 4

Figure 3: The diﬀerence between PCA and TICA. The top and bottom plot show the x- and y-component of the system, respectively, the plot on the right the resulting main principal component vector and the main TICA coordinate.

Algorithm 1 AMUSE algorithm to compute TICA. 1. Compute a reduced SVD of X, i.e., X = U Σ V T . 2. Whiten data: X˜ = Σ−1U T X and Y˜ = Σ−1U T Y . ˜ ˜ T −1 T T −1 3. Compute M TICA = XY = Σ U XY UΣ .

4. Solve the eigenvalue problem M TICA w` = λ` w`. −1 5. The TICA coordinates are then given by ξ` = UΣ w`.

The TICA coordinates can be computed using AMUSE5 [37] as shown in Algorithm1. Instead of computing a singular value decomposition of the data matrix X in step 1, an eigenvalue decomposition of the covariance matrix XXT could be computed, which is more eﬃcient if m d, but less accurate. The vectors ξ` computed by AMUSE are solutions of

5Algorithm for Multiple Unknown Signals Extraction.

10 the eigenvalue problem (10), since

T + T −1 MTICA ξ` = XX XY UΣ w` −1 T = UΣ X˜Y˜ w` −1 = λ` UΣ w`

= λ` ξ`.

T + −2 T In the second line, we used the fact that (XX ) = UΣ U and in the third that w` is ˜ ˜ T an eigenvector of M TICA = XY .

3.2 Dynamic mode decomposition Dynamic mode decomposition (DMD) was developed by the fluid dynamics community as a tool to identify coherent structures in fluid flows [5]. Since its introduction, several variants and extensions have been proposed, see [6,7, 38, 39, 40]. A review of the applications of Koopman operator theory in fluid mechanics can be found in [41]. DMD can be viewed as a combination of a PCA in the spatial domain and a Fourier analysis in the frequency domain [42]. It can be shown that the DMD modes are the Koopman modes for the set of basis functions defined by ψ(x) = x. Given again data X and Y as above, the idea behind

DMD is to assume that there exists a linear operator MDMD such that yi = MDMD xi. Since the underlying dynamical system is in general nonlinear, this equation cannot be fulﬁlled exactly and we want to compute the matrix MDMD in such a way that the Frobenius norm of the deviation is minimized, i.e.,

min kY − MDMDXkF . (11)

The solution of this minimization problem is given by

+ T T + T + T MDMD = YX = YX XX = Cτ C0 = MTICA. (12)

The eigenvalues and eigenvectors of MDMD are called DMD eigenvalues and modes, respectively. That is, we are solving

MDMD ξ` = λ` ξ`. The above equations already illustrate the close relationship with TICA, cf. (10). The DMD modes are the right eigenvectors of MDMD, whereas the TICA coordinates are deﬁned to be the right eigenvectors of the transposed matrix MTICA. Hence, the TICA coordinates are the left eigenvectors of the DMD matrix and the DMD modes the left eigenvectors of the TICA matrix. This is consistent with the results that will be presented in the VAC and EDMD subsections below: The TICA coordinates represent the Koopman eigenfunctions projected onto the space spanned by linear basis functions, i.e., ψ(x) = x, while the DMD modes are the corresponding Koopman modes. Similar to AMUSE, the DMD modes and eigenvalues can be obtained without explicitly computing MDMD by using a reduced singular value decomposition of X. Standard and exact DMD are presented in Algorithm2. The standard DMD modes are simply the exact DMD modes projected onto the range of the matrix X, see [7].

11 Algorithm 2 Standard and exact DMD. 1. Compute compact SVD of X, given by X = U Σ V T . T −1 2. Deﬁne M DMD = U YV Σ .

3. Compute eigenvalues and eigenvectors of M DMD, i.e., M DMD w` = λ` w`.

4. The DMD mode corresponding to the eigenvalue λ` is deﬁned as

a) ξ` = Uw`. (Standard DMD) 1 b) ξ = YV Σ−1w . (Exact DMD) ` λ `

Remark 3.2. TICA and standard DMD are closely related. When comparing with the AMUSE formulation, we obtain

˜ ˜ T −1 T T −1 T −1 −1 M TICA = XY = Σ U XY UΣ = Σ U MTICA UΣ =: W ΛW and T −1 T T T ˜ ˜ ˜ −1 M DMD = U YV Σ = U MDMD U = U MTICAU =: W ΛW . The TICA coordinates are given by Ξ = UΣ−1W and the standard DMD modes by Ξ˜ = UW˜ so that – except for the scaling Σ−1 – AMUSE and standard DMD use the same projection, the main diﬀerence is that the former computes the eigenvectors of MTICA and the latter T the eigenvectors of the transposed matrix MTICA. As a result, AMUSE could be rewritten 0 ˜ ˜ T −1 T T −1 to compute the DMD modes if we deﬁne M DMD = Y X = Σ U YX UΣ in step 3 of 0 the algorithm and ξ` = UΣ w` in step 5, where w` now denotes the eigenvectors of M DMD.

3.3 Variational approach of conformation dynamics The variational approach of conformation dynamics (VAC) [13, 14, 15] is a generalization of the frequently used Markov state modeling framework that allows arbitrary basis functions and is similar to the variational approach in quantum mechanics [14]. As described above, VAC and EDMD (see below) require – in addition to the data – a set of basis functions (also called dictionary), given by ψ. The variational approach is defined only for reversible systems – EDMD does not require this restriction – and computes eigenfunctions of Tτ or Kτ , respectively. Using the data matrices ΨX and ΨY defined in (8), C0 and Cτ defined in (9) for the transformed data can be estimated as

m 1 X T 1 T C0 = m−1 ψ(xk) ψ(xk) = m−1 ΨX ΨX , k=1 m 1 X T 1 T Cτ = m−1 ψ(xk) ψ(yk) = m−1 ΨX ΨY . k=1

+ In what follows, let MVAC = C0 Cτ for the transformed data matrices ΨX and ΨY . The matrix MVAC can be regarded as a ﬁnite-dimensional approximation of Kτ (or Tτ , since the system is assumed to be reversible; the derivation is shown in Section4), respectively.

12 Eigenfunctions of the operator can then be approximated by the eigenvectors of the matrix

MVAC. Let ξ` be an eigenvector of MVAC, i.e.,

MVAC ξ` = λ` ξ`,

∗ ∗ and ϕ`(x) = ξ` ψ(x), where denotes the conjugate transpose. Since

∗ ∗ Kτ ϕ`(x) ≈ (MVAC ξ`) ψ(x) = λ` ξ` ψ(x) = λ` ϕ`(x), we obtain an approximation of the eigenfunctions of Kτ . The derivation will be described in detail in Section4.

3.4 Extended dynamic mode decomposition Extended dynamic mode decomposition (EDMD), a generalization of DMD, can be used to compute ﬁnite-dimensional approximations of the Koopman operator, its eigenvalues, eigenfunctions, and eigenmodes [16, 17]. It was shown in [18] that EDMD can be extended to approximate also eigenfunction of the Perron–Frobenius operator with respect to the density underlying the data points. With the notation introduced above, the minimization problem (11) for the transformed data matrices ΨX and ΨY can be written as

min kΨY − MEDMDΨX kF . (13)

The solution – see also (12) – is given by

+ T T + T + T MEDMD = ΨY ΨX = ΨY ΨX ΨX ΨX = Cτ C0 = MVAC. That is, instead of assuming a linear relationship between the data matrices X and Y , EDMD aims at ﬁnding a linear relationship between the transformed data matrices ΨX and ΨY . Eigenfunctions of the Koopman operator are then given by the left eigenvectors ξ` of

MEDMD, i.e., ∗ ϕ`(x) = ξ` ψ(x).

The derivation of EDMD can be found in Section4. Since the left eigenvectors of MEDMD are the right eigenvectors of MVAC, VAC and EDMD are equivalent as they compute exactly the same eigenvalue and eigenfunction approximations for a data and basis set. As shown in [18], EDMD can also be used to approximate the Perron–Frobenius operator as follows: ˜ T T + + MEDMD = ΨX ΨY ΨX ΨX = Cτ C0 . It is important to note that the Perron–Frobenius operator is computed with respect to the density underlying the data matrices. That is, if X is sampled from a uniform distribution, we obtain the eigenfunctions of the Perron–Frobenius operator Pτ . If we, on the other hand, use one long trajectory, the underlying density converges to the equilibrium density π and we obtain the eigenfunctions of the Perron–Frobenius operator with respect to the equilibrium density, denoted by Tτ . An approach to compute the equilibrium density from oﬀ-equilibrium data is proposed in [30].

13 Example 3.3. Let us consider the Ornstein–Uhlenbeck process introduced in Example 2.1. Here, α = 4 and D = 0.25. The lag time is deﬁned to be τ = 1. We generated 105 uniformly distributed test points in [−2, 2] and used a basis comprising monomials of order up to 10. With the aid of EDMD, we computed the dominant eigenfunctions of the Perron–Frobenius operator Pτ and the Koopman operator Kτ (which is identical to Tτ here due to reversibility). The results are shown in Figure2. The corresponding eigenvalues are given by

λ1(τ) = 1.00, λ2(τ) = 0.37, λ3(τ) = 0.13, λ4(τ) = 0.049, which is a good approximation of the analytically computed eigenvalues (Example 2.6). 4 T In order to approximate the Koopman modes, let ϕ(x) = [ϕ1(x), . . . , ϕk(x)] be the vector of eigenfunctions and Ξ = ξ1 ξ2 . . . ξk d×k the matrix that contains all left eigenvectors of MEDMD. Furthermore, deﬁne B ∈ R such that g(x) = B ψ(x). That is, the full-state observable is written in terms of the basis functions6. Since ϕ(x) = Ξ∗ ψ(x), this leads to g(x) = B ψ(x) = B (Ξ∗)−1ϕ(x). Thus, the ∗ −1 `th column vector of the matrix η = B (Ξ ) represents the Koopman mode η` required for the reconstruction of the dynamical system, see (3).

3.5 Relationships with other methods For particular choices of basis functions, VAC and EDMD are equivalent to other methods (see also [15, 18]): 1. If we choose ψ(x) = x, we obtain TICA and DMD, respectively. That is, the TICA coordinates are the eigenfunctions of the Koopman operator projected onto the space spanned by linear basis functions and the DMD modes are the corresponding Koopman modes. (Note that in this case B = I and the matrix η = (Ξ∗)−1 contains the right eigenvectors of

MEDMD.) In many applications of TICA, the basis functions are modiﬁed to have zero mean. For reversible processes, this eliminates the stationary eigenvalue λ1 = 1 and its eigenfunction ϕ1 ≡ 1. The largest eigenpair then approximates the slowest dynamical eigenvalue and eigenfunction, respectively. 1 1 2. If the set of basis functions comprises indicator functions A1 ,..., Ak for a given decomposition of the state space into disjoint sets A1,...,Ak, VAC and EDMD result in Ulam’s method [43] and thus a Markov state model (MSM). These relationships are shown in Figure4. Detailed examples illustrating the use of VAC and EDMD can be found in [14, 16, 18].

3.6 Examples 3.6.1 Double gyre Let us consider the autonomous double gyre, which was introduced in [44], given by the SDE

dXt = −π A sin(π Xt) cos(π Yt) + ε dWt,1,

dYt = π A cos(π Xt) sin(π Yt) + ε dWt,2

6 The easiest way to accomplish this is by adding the observables xi, i = 1, . . . , d, to the set of basis functions.

14 TICA dual DMD approximates eigenvalues approximates eigenvalues and eigenfunctions and modes

ψ(x) = x ψ(x) = x

VAC equi- EDMD approximates eigenvalues approximates eigenvalues, and eigenfunctions valent eigenfunctions, and modes

1  1  A1 (x) A1 (x) ψ(x) =  .  ψ(x) =  .   .   .  1 1 Ak (x) Ak (x)

MSM equi- Ulam’s method approximates eigenvalues and approximates eigenvalues eigenfunctions valent and eigenfunctions

Figure 4: Relationships between data-driven methods. While VAC was derived for reversible dynamical systems, the derivation of EDMD covers non-reversible dynamics as well. on the domain X = [0, 2] × [0, 1] with reﬂecting boundary. For ε = 0, there is no transport between the left half and the right half of the domain and both subdomains are invariant sets 1 with measure 2 , cf. [45, 46]. For ε > 0, there is a small amount of transport due to diﬀusion and the subdomains are almost invariant. For the Koopman operator Kτ , this means that for ε = 0 the characteristic functions ϕ˜1 = 1[0,1]×[0,1] and ϕ˜2 = 1[1,2]×[0,1] are both eigenfunctions with corresponding eigenvalue 1. If, on the other hand, ε > 0, then the two-dimensional eigenspace subdivides into two one-dimensional eigenspaces with eigenvalues λ1 = 1 and λ2 = 1 − O(ε) and eigenfunctions ϕ1 = 1[0,2]×[0,1] and ϕ2 ≈ ϕ˜1 − ϕ˜2. Typical trajectories of the system are shown in Figure5a. Using the parameters A = 0.25 and ε = 0.05, we integrated 105 randomly generated test points using the Euler–Maruyama scheme with step size h = 10−3. For the computation of the eigenfunctions, we choose a set of radial basis functions whose centers were given by the midpoints of an equidistant 50 × 25 box discretization, and a lag time τ = 3. The resulting nontrivial leading eigenfunctions of the Koopman operator computed with EDMD are shown in Figure5b. The two almost invariant sets are clearly visible. The eigenfunctions of the Perron–Frobenius operator exhibit similar patterns (but “rotating” in the opposite direction).

3.6.2 Deca alanine As a second example, we illustrate what has become a typical workﬂow for the application of VAC/EDMD in molecular dynamics, using deca alanine as a model system. Deca alanine is a small peptide comprised of ten alanine residues, it has been used as a test system many times before. Here, we analyze equilibrium simulations of 3 µs total simulation time using the Amber03 force ﬁeld, see [14, 15] for the detailed simulation setup. A set of important

15 a)

Figure 5: a) Typical trajectories (of different lengths) of the double gyre system for ε = 0.05. The initial states are marked by dots. Due to the diffusion term, particles can cross the separatrix (dashed line). The gray lines show the trajectories with the same initial conditions for ε = 0. b) Leading eigenfunctions of the Koopman operator associated with the double gyre system computed using EDMD. quantities for our analysis are the leading implied timescales τ tm = − , (14) log |λm| for m = 2, 3,.... Implied timescales are independent of the lag time [47, Theorem 2.2.4]. However, if they are estimated using (14) and an approximation to the eigenvalues λm obtained from VAC/EDMD, the timescales will be underestimated (see Section 4.2.1) and the error will decrease as a function of the lag time [48]. Approximate convergence of implied timescales with increasing lag time has become a standard model validation criterion in molecular dynamics [49]. In the first step, a set of internal molecular coordinates is extracted from the simulation data, to which TICA is applied. In our example, we select all 16 backbone dihedral angles as

16 internal molecular coordinates. Figure6a shows the first five implied timescales estimated by TICA as a function of the lag time τ. Next, a first dimension reduction is performed, where the data is projected onto the leading M TICA eigenvectors. The number M is selected by the criterion of total kinetic variance, that is, M is the smallest number such that the cumulative sum of the first M squared eigenvalues exceeds 95 per cent of the total sum of squared eigenvalues [50]. Figure6c shows the resulting dimension M as a function of the lag time. As a third step, the reduced data set is discretized by application of a clustering method. In our case, we use k-means clustering to assign the data to 50 discrete states. A Markov state model (MSM, equivalent to Ulam’s method, see above) is estimated from the discretized time series. We show the first five implied timescales from the MSM in Figure6c and observe that estimates improve compared to the TICA approximations. Also, timescale estimates converge for lag times τ ≥ 4 ns. Finally, we use the converged model at lag time τ = 4 ns for further analysis. As the slowest implied timescale t2 dominates all others, and as it is the only one which is larger than the lag time used for analysis (indicated by the gray line in Figure6c), we attempt to extract a two-state model that captures the essential dynamics. We employ the PCCA+ algorithm [51, 52] to coarse grain all MSM states into two macrostates. Inspection of randomly selected trajectory frames belonging to each macrostate reveals that the slow dynamical process in the data corresponds to the formation of a helix, see FigureR6d. It should be noted that this coarse graining works well for visualization purposes, but some details need to be taken into account. In fact, PCCA performs a fuzzy assignment of MSM states to macrostates, where each MSM state belongs to each macrostate with a certain membership in [0, 1]. We simply assign each MSM state to the macrostate with maximal membership here. Alternatively, we could also use a hidden Markov model (HMM) to perform the coarse graining [53].

4 Derivations

In this section, we will show how VAC and EDMD as well as their respective special cases TICA and DMD can be derived and how these methods are related to eigenfunctions and eigenmodes of transfer operators.

4.1 General dynamical systems Let us begin with general, not necessarily reversible dynamical systems. In order to be able to compute eigenfunctions of transfer operators numerically, the infinite-dimensional operators are projected onto a finite-dimensional space. We will briefly outline how the EDMD minimization problem (13) leads to an approximation of the Koopman operator.

7 Theorem 4.1. Let the process {Xt}t≥0 be Feller-continuous . Let ψi, i = 1, . . . , k, be the set of at least piecewise continuous basis functions of the ﬁnite-dimensional linear space V.

7 A process {Xt}t≥0 is called Feller-continuous if the mapping x 7→ E[g(Xt)|X0 = x] is continuous for any fixed continuous function g. This implies, that the Koopman operator of a Feller-continuous process has a ∞ well defined restriction from L (X) to the set of continuous functions. Any stochastic process generated by an Itô stochastic differential equation with Lipschitz-continuous coefficients is Feller-continuous [54, Lemma 8.1.4].

17 a) c)

b) d)

Figure 6: Illustration of standard EDMD workflow in molecular dynamics using the deca alanine model system. a) Leading implied timescales tm (in nanoseconds) as estimated by TICA as a function of the lag time.b) Effective dimension M selected by applying the criterion of total kinetic variance to the TICA eigenvalues. c) Leading implied timescales tm estimated by a Markov state model after projecting the data onto the first M TICA eigenvectors and discretizing this data set into 50 states using k-means. d) Simple visualization of effective coarse grained dynamics. All MSM states are assigned to two macrostates using the PCCA algorithm. An overlay of representative structures from both macrostates shows that the dynamics between them corresponds to helix formation. Macrostates are drawn proportionally to their stationary probability.

18 Let the empirical distribution of the data points x1, x2,... converge weakly to the density ρ. Then the minimization problem

m 1 X T 2 min ψ(yj) − K ψ(xj) K∈ k×k m 2 R j=1

Pk ˆ 2 converges, as m → ∞, almost surely to minKˆ i=1 kKτ ψi − Kψikρ, where the minimization is over all linear mappings Kˆ : V → V. Pk T T Proof. Let f = i=1 ai ψi = a ψ ∈ V be an arbitrary function, where a = [a1, . . . , ak] . For ˆ a single data point xj, we have for a linear mapping K : V → V with matrix representation k×k K ∈ R that k X T T Kfˆ (xj) = (K a)i ψi(xj) = a K ψ(xj). i=1

Here, the ith column of the matrix K corresponds to Kψˆ i. Thus, we obtain

m k m 1 X T 2 X 1 X ˆ 2 ψ(yj) − K ψ(xj) = (ψi(yj) − Kψi(xj)) m 2 m j=1 i=1 j=1 k m→∞ X −→ ( [ψ (X ) | X = x] − Kψˆ (x))2ρ(x) dx ˆ E i τ 0 i i=1 X k X ˆ 2 = kKτ ψi − Kψikρ, i=1 where the convergence for m → ∞ is almost sure. From the ﬁrst line to the second we used that the yj are realizations of the random variables Xτ given X0 = xj, that Xτ is a Feller- continuous process, that the ψi are (piecewise) continuous functions, and that the sampling process of xj is independent of the noise process that decides over Xτ given X0 = xj.

With the aid of the data matrices ΨX and ΨY deﬁned in (8), this minimization problem can be written as T 2 min ΨY − K ΨX F , T which is identical to (13), where now K = MEDMD. Thus, the transposed EDMD matrix

MEDMD is an approximation of the Koopman operator. A similar setup allows for the approximation of the Perron–Frobenius operator with respect to the data point density ρ. For details, we refer to [18, Appendix A]. Note, however, that although the Perron–Frobenius and Koopman operators are adjoint, the matrix representation of the discrete Perron–Frobenius operator will in general not just be the transposed of the matrix K, unless the ansatz functions ψi are orthonormal with respect to h·, ·iρ. If the dynamical system is deterministic, we can already interpret the minimization (13) for ﬁnite values of m. As shown, e.g., in [18, 55], the solution of (13) is a Petrov–Galerkin projection of the Koopman operator on the ansatz space V.

19 4.2 Reversible dynamical systems

Let us now assume that the system is reversible. That is, it holds that π(x) pτ (x, y) = π(y) pτ (y, x) for all x and y.

4.2.1 Variational principle for the Rayleigh trace We can also derive a variational formulation for the ﬁrst M eigenvalues of the Koopman operator Kτ in the reversible setting. It is a standard result for self-adjoint operators on a Hilbert space with bounded eigenvalue spectrum, see, e.g., [56]:

Proposition 4.2. Assume that 1 = λ1 > λ2 ≥ ... ≥ λM are the dominant eigenvalues of 2 the Koopman operator Kτ on Lπ. Then

M M X X λ` = sup hKτ v`, v`iπ, (15) `=1 `=1

hv`, v`0 iπ = δ``0 The sum of the ﬁrst M eigenvalues maximizes the Rayleigh trace, which is the sum on the right-hand side of (15) over all selections of M orthonormal functions v`. The maximum is attained for the ﬁrst M eigenfunctions ϕ1, . . . , ϕM .

Proof. The M-dimensional space V spanned by the functions v` must contain an element uM which is orthonormal to the ﬁrst M − 1 eigenfunctions ϕ`, i.e., huM , ϕ`iπ = 0, ` = 1,...,M − 1, and kuM kπ = 1. By the standard Rayleigh principle for self-adjoint operators

hKτ uM , uM iπ ≤ λM .

Next, determine a normalized element uM−1 of the orthogonal complement of uM in V with huM−1, ϕ`iπ = 0, ` = 1,...,M − 2. Again, we can invoke the Rayleigh principle to ﬁnd

hKτ uM−1, uM−1iπ ≤ λM−1.

Repeating this argument another M − 2 times provides an orthonormal basis u1, . . . , uM of the space V such that M M X X hKτ u`, u`iπ ≤ λ`. `=1 `=1 As the Rayleigh trace is independent of the choice of orthonormal basis for the subspace V, and the space itself was arbitrary, this proves (15). Clearly, the maximum is attained for the ﬁrst M eigenfunctions.

Proposition 4.2 motivates the variational approach developed in [13, 57] to maximize the Rayleigh trace restricted to some ﬁxed space of ansatz functions:

Proposition 4.3. Let V be a space of k linearly independent ansatz functions ψi given by ` Pk ` a dictionary as above. The set of M ≤ k mutually orthonormal functions f = i=1 ai ψi

20 which maximize the Rayleigh trace of the Koopman operator restricted to V is given by the ﬁrst M eigenvectors of the generalized eigenvalue problem

` ˆ ` Cτ a = λ` C0 a , (16)

` `k where a = ai i=1, and the matrices Cτ ,C0 are given by

(Cτ )ij = hKτ ψi, ψjiπ,

(C0)ij = hψi, ψjiπ.

Pk Pk Proof. First, note that for any functions f = i=1 ai ψi and g = i=1 bi ψi, we have that

T hKτ f, giπ = a Cτ b, T hf, giπ = a C0 b.

Let us assume that the ansatz functions are mutually orthonormal, i.e., C0 = I. Then, maxi- 0 ` `T ` mization of the Rayleigh trace is equivalent to ﬁnding M vectors a , such that a a = δ``0 and M M T X ` ` X ` ` a Cτ a = hCτ a , a i `=1 `=1 N ` is maximal. By Proposition 4.2 applied to the operator Cτ on R , the vectors a are given by the ﬁrst M eigenvectors of Cτ . In the general case, transform the basis functions into a ˜ ˜ Pk −1/2 set of mutually orthonormal functions ψi via ψi = j=1 C0 (j, i) ψj. For the transformed basis, we need to compute the eigenvectors a˜` of

−1/2 −1/2 ` ˆ ` C0 Cτ C0 a˜ = λ` a˜ . This is equivalent to the generalized eigenvalue problem (16), the relation between the eigenvectors is given by ` −1/2 ` a = C0 a˜ .

5 Conclusion

In this review paper, we established connections between different data-driven model reduction and transfer operator approximation methods developed independently by the dynamical systems, fluid dynamics, machine learning, and molecular dynamics communities. Although the derivations of these methods differ, we have shown that the resulting algorithms share many similarities. DMD, TICA and MSMs are popular methods to approximate the dynamics of high- dimensional systems. Due to their simple basis functions, they conduct relatively rough approximations, but when only a few spectral components are required, the approximation error can be controlled by choosing sufficiently large lag times τ [58]. The more general methods VAC and EDMD are better suited to obtain accurate approximations of eigenfunctions. However, to ensure such an accurate approximation, one would have to deploy multiple basis functions in all coordinates and their combinations, which is unfeasible for

21 high-dimensional systems, and would also lead to overfitting when estimating the eigenfunctions of the Koopman operator from a finite data set [34]. A natural approach to mitigate these problems is to construct an iterative or “deep” approach in which the dynamical systems subspace in which a high resolution of basis functions is required is found by multiple successive analysis steps. A common approach is to first reduce the dimension by an inexpensive method such as TICA, in order to have a relatively low-dimensional space in which the eigenfunctions are approximated with a higher-resolution method. Another possibility is to exploit low-rank tensor approximations of transfer operators and their eigenfunctions. Tensor- and sparse-grid based reformulations of some of the methods described in this paper can be found in [15, 59, 40], and in [60], respectively. The efficiency of these tensor decomposition approaches depends strongly on the coupling structure; strong coupling between different variables typically leads to high ranks. Furthermore, some tensor formats also depend on the ordering of variables and a permutation of the vari- able’s indices would lead to different tensor decompositions. Yet another approach might be to exploit sparsity-promoting methods using L1-regularization techniques. Basis functions that are not required to represent the eigenfunctions of an operator can thus be eliminated and refined adaptively. Moreover, dictionary-learning methods could be applied to learn a basis set and to adapt the dictionary to specific data [61]. Future work includes evaluating and combining different dimensionality reduction, tensor decomposition, and sparsification methods to mitigate the curse of dimensionality.

Acknowledgements

This research has been partially funded by Deutsche Forschungsgemeinschaft (DFG) through grant CRC 1114 “Scaling Cascades in Complex Systems”, Project A04 “Eﬃcient calculation of slow and stationary scales in molecular dynamics” and Project B03 “Multilevel coarse graining of multi-scale problems”, and by the Einstein Foundation Berlin (Einstein Center ECMath). Furthermore, we would like to thank the reviewers for their helpful comments and suggestions.

References

[1] G. Pérez-Hernández, F. Paul, T. Giorgino, G. De Fabritiis, and F. Noé. Identiﬁcation of slow molecular order parameters for Markov model construction. The Journal of Chemical Physics, 139(1), 2013.

[2] G. Froyland, G. Gottwald, and A. Hammerlindl. A computational method to extract macroscopic variables and their dynamics in multiscale systems. SIAM Journal on Applied Dynamical Systems, 13(4):1816–1846, 2014.

[3] L. Molgedey and H. G. Schuster. Separation of a mixture of independent signals using time delayed correlations. Physical Review Letters, 72:3634–3637, 1994.

[4] C. R. Schwantes and V. S. Pande. Improvements in Markov State Model construction reveal many non-native interactions in the folding of NTL9. Journal of Chemical Theory and Computation, 9:2000–2009, 2013.

22 [5] P. J. Schmid. Dynamic mode decomposition of numerical and experimental data. Jour- nal of Fluid Mechanics, 656:5–28, 2010. [6] K. K. Chen, J. H. Tu, and C. W. Rowley. Variants of dynamic mode decomposition: Boundary condition, Koopman, and Fourier analyses. Journal of Nonlinear Science, 22(6):887–915, 2012. [7] J. H. Tu, C. W. Rowley, D. M. Luchtenburg, S. L. Brunton, and J. N. Kutz. On dynamic mode decomposition: Theory and applications. Journal of Computational Dynamics, 1(2), 2014. [8] J. N. Kutz, S. L. Brunton, B. W. Brunton, and J. L. Proctor. Dynamic Mode Decom- position: Data-Driven Modeling of Complex Systems. SIAM, 2016. [9] B. Koopman. Hamiltonian systems and transformation in Hilbert space. Proceedings of the National Academy of Sciences of the United States of America, 17(5):315, 1931. [10] I. Mezić. Spectral properties of dynamical systems, model reduction and decompositions. Nonlinear Dynamics, 41(1):309–325, 2005. [11] C. W. Rowley, I. Mezić, S. Bagheri, P. Schlatter, and D. S. Henningson. Spectral analysis of nonlinear ﬂows. Journal of Fluid Mechanics, 641:115–127, 2009. [12] M. Budišić, R. Mohr, and I. Mezić. Applied Koopmanism. Chaos: An Interdisciplinary Journal of Nonlinear Science, 22(4), 2012. [13] F. Noé and F. Nüske. A variational approach to modeling slow processes in stochastic dynamical systems. Multiscale Modeling & Simulation, 11(2):635–655, 2013. [14] F. Nüske, B. G. Keller, G. Pérez-Hernández, A. S. J. S. Mey, and F. Noé. Varia- tional approach to molecular kinetics. Journal of Chemical Theory and Computation, 10(4):1739–1752, 2014. [15] F. Nüske, R. Schneider, F. Vitalini, and F. Noé. Variational tensor approach for approximating the rare-event kinetics of macromolecular systems. The Journal of Chemical Physics, 144(5), 2016. [16] M. O. Williams, I. G. Kevrekidis, and C. W. Rowley. A data-driven approximation of the Koopman operator: Extending dynamic mode decomposition. Journal of Nonlinear Science, 25(6):1307–1346, 2015. [17] M. O. Williams, C. W. Rowley, and I. G. Kevrekidis. A kernel-based method for data- driven Koopman spectral analysis. Journal of Computational Dynamics, 2(2):247–265, 2015. [18] S. Klus, P. Koltai, and Ch. Schütte. On the numerical approximation of the Perron– Frobenius and Koopman operator. Journal of Computational Dynamics, 3(1):51–79, 2016. [19] B. Nadler, S. Lafon, R. R. Coifman, and I. G. Kevrekidis. Diﬀusion maps, spectral clustering and reaction coordinates of dynamical systems. Applied and Computational Harmonic Analysis, 21(1):113–127, 2006.

23 [20] R. R. Coifman, I. G. Kevrekidis, S. Lafon, M. Maggioni, and B. Nadler. Diﬀusion maps, reduction coordinates, and low dimensional representation of stochastic systems. Multiscale Modeling & Simulation, 7(2):842–864, 2008.

[21] A. L. Ferguson, A. Z. Panagiotopoulos, I. G. Kevrekidis, and P. G. Debenedetti. Non- linear dimensionality reduction in molecular simulation: The diﬀusion map approach. Chemical Physics Letters, 509(1):1–11, 2011.

[22] D. Giannakis. Data-driven spectral decomposition and forecasting of ergodic dynamical systems. ArXiv e-prints, 2015.

[23] G. A. Pavliotis. Stochastic Processes and Applications: Diﬀusion Processes, the Fokker- Planck and Langevin Equations, volume 60 of Texts in Applied Mathematics. Springer, 2014.

[24] E. Hopf. The general temporally discrete Markoﬀ process. Journal of Rational Mechan- ics and Analysis, 3(1):13–45, 1954.

[25] U. Krengel. Ergodic theorems, volume 6 of de Gruyter Studies in Mathematics. Walter de Gruyter & Co., Berlin, 1985.

[26] J. R. Baxter and J. S. Rosenthal. Rates of convergence for everywhere-positive Markov chains. Statistics & probability letters, 22(4):333–338, 1995.

[27] A. Lasota and M. C. Mackey. Chaos, fractals, and noise: Stochastic aspects of dynamics, volume 97 of Applied Mathematical Sciences. Springer, 2nd edition, 1994.

[28] Ch. Schütte, A. Fischer, W. Huisinga, and P. Deuﬂhard. A direct approach to conformational dynamics based on hybrid Monte Carlo. Journal of Computational Physics, 151(1):146–168, 1999.

[29] G. Froyland, O. Junge, and P. Koltai. Estimating long term behavior of ﬂows without trajectory integration: The inﬁnitesimal generator approach. SIAM Journal on Numerical Analysis, 51(1):223–247, 2013.

[30] H. Wu, F. Nüske, F. Paul, S. Klus, P. Koltai, and F. Noé. Variational Koopman models: Slow collective variables and molecular kinetics from short oﬀ-equilibrium simulations. The Journal of Chemical Physics, 146(15):154104, 2017.

[31] M. Dellnitz and O. Junge. On the approximation of complicated dynamical behavior. SIAM Journal on Numerical Analysis, 36(2):491–515, 1999.

[32] Ch. Schütte and M. Sarich. Metastability and Markov State Models in Molecular Dy- namics: Modeling, Analysis, Algorithmic Approaches. Number 24 in Courant Lecture Notes. American Mathematical Society, 2013.

[33] B. Leimkuhler, Ch. Chipot, R. Elber, A. Laaksonen, A. Mark, T. Schlick, Ch. Schütte, and R. Skeel. New Algorithms for Macromolecular Simulation (Lecture Notes in Com- putational Science and Engineering). Springer-Verlag New York, 2006.

24 [34] R. T. McGibbon and V. S. Pande. Variational cross-validation of slow dynamical modes in molecular kinetics. The Journal of Chemical Physics, 142(12), 2015.

[35] A. Hyvärinen, J. Karhunen, and E. Oja. Independent Component Analysis. John Wiley & Sons, 2001.

[36] A. Ziehe and K.-R. Müller. TDSEP — an eﬃcient algorithm for blind separation using time structure. In ICANN 98, pages 675–680. Springer Science and Business Media, 1998.

[37] L. Tong, V. C. Soon, Y. F. Huang, and R. Liu. AMUSE: a new blind identiﬁcation algorithm. In IEEE International Symposium on Circuits and Systems, pages 1784– 1787, 1990.

[38] M. R. Jovanović, P. J. Schmid, and J. W. Nichols. Sparsity-promoting dynamic mode decomposition. Physics of Fluids, 26(2), 2014.

[39] S. L. Brunton, J. L. Proctor, J. H. Tu, and J. N. Kutz. Compressed sensing and dynamic mode decomposition. Journal of Computational Dynamics, 2(2):165–191, 2015.

[40] S. Klus, P. Gelß, S. Peitz, and Ch. Schütte. Tensor-based dynamic mode decomposition. ArXiv e-prints, 2016.

[41] I. Mezić. Analysis of ﬂuid ﬂows via spectral properties of the Koopman operator. Annual Review of Fluid Mechanics, 45(1):357–378, 2013.

[42] B. W. Brunton, L. A. Johnson, J. G. Ojemann, and J. N. Kutz. Extracting spatial- temporal coherent patterns in large-scale neural recordings using dynamic mode decomposition. Journal of Neuroscience Methods, 258:1–15, 2016.

[43] S. M. Ulam. A Collection of Mathematical Problems. Interscience Publisher NY, 1960.

[44] S. C. Shadden, F. Lekien, and J. E. Marsden. Definition and properties of Lagrangian coherent structures from finite-time Lyapunov exponents in two-dimensional aperiodic flows. Physica D: Nonlinear Phenomena, 212(3):271–304, 2005.

[45] G. Froyland and K. Padberg. Almost-invariant sets and invariant manifolds — Con- necting probabilistic and geometric descriptions of coherent structures in ﬂows. Physica D, 238:1507–1523, 2009.

[46] G. Froyland and K. Padberg-Gehle. Almost-invariant and ﬁnite-time coherent sets: Di- rectionality, duration, and diﬀusion. In W. Bahsoun, Ch. Bose, and G. Froyland, editors, Ergodic Theory, Open Dynamics, and Coherent Structures, pages 171–216. Springer New York, 2014.

[47] A. Pazy. Semigroups of linear operators and applications to partial diﬀerential equations. Springer, 1983.

[48] N. Djurdjevac, M. Sarich, and C. Schütte. Estimating the Eigenvalue Error of Markov State Models. Multiscale Modeling & Simulation, 10(1):61–81, 2012.

25 [49] J.-H. Prinz, H. Wu, M. Sarich, B. Keller, M. Senne, M. Held, J. D. Chodera, C. Schütte, and F. Noé. Markov Models of Molecular Kinetics: Generation and Validation. Journal of Chemical Physics, 134:174105, 2011.

[50] F. Noé and C. Clementi. Kinetic distance and kinetic maps from molecular dynamics simulation. Journal of Chemical Theory and Computation, 11(10):5002–5011, 2015.

[51] P. Deuﬂhard and M. Weber. Robust Perron cluster analysis in conformation dynamics. Linear Algebra and its Applications, 398:161 – 184, 2005.

[52] S. Röblitz and M. Weber. Fuzzy spectral clustering by PCCA+: application to Markov state models and data classiﬁcation. Advances in Data Analysis and Classiﬁcation, 7(2):147–179, 2013.

[53] F. Noé, H. Wu, J.-H. Prinz, and N. Plattner. Projected and Hidden Markov Models for Calculating Kinetics and Metastable States of Complex Molecules. The Journal of Chemical Physics, 139:184114, 2013.

[54] B. Øksendal. Stochastic Diﬀerential Equations. Springer, 6th edition, 2003.

[55] M. Korda and I. Mezić. On convergence of Extended Dynamic Mode Decomposition to the Koopman operator. arXiv preprint arXiv:1703.04680, 2017.

[56] C. Bandle. Isoperimetric inequalities and applications. Monographs and studies in mathematics. Pitman, 1980.

[57] F. Nüske, B. G. Keller, G. Pérez-Hernández, A. S. J. S. Mey, and F. Noé. Variational approach to molecular kinetics. J. Chem. Theory Comput., 10:1739–1752, 2014.

[58] M. Sarich, F. Noé, and C. Schütte. On the approximation quality of Markov state models. Multiscale Model. Simul., 8:1154–1177, 2010.

[59] S. Klus and Ch. Schütte. Towards tensor-based methods for the numerical approximation of the Perron–Frobenius and Koopman operator. Journal of Computational Dynamics, 3(2), 2016.

[60] O. Junge and P. Koltai. Discretization of the Frobenius–Perron operator using a sparse Haar tensor basis: The Sparse Ulam method. SIAM Journal on Numerical Analysis, 47:3464–3485, 2009.

[61] J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online dictionary learning for sparse coding. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, pages 689–696. ACM, 2009.