<<

Bridging Data and Dynamical Theory

Tyrus Berry, Dimitrios Giannakis, and John Harlim

Modern science is undergoing what might arguably be In this short review, we describe mathematical tech- called a “data revolution,” manifested by a rapid growth niques for statistical analysis and prediction of - of observed and simulated data from complex systems, as evolving phenomena, ranging from simple examples such well as vigorous research on mathematical and computa- as an oscillator, to highly complex systems such as the tur- tional frameworks for data analysis. In many scientific bulent motion of the Earth’s atmosphere, the folding of branches, these efforts have led to the creation of statistical proteins, and the evolution of species populations in an models of complex systems that match or exceed the skill ecosystem. Our main thesis is that combining ideas from of first-principles models. Yet, despite these successes, sta- the theory of dynamical systems with learning theory pro- tistical models are oftentimes treated as black boxes, pro- vides an effective route to data-driven models of complex viding limited guarantees about stability and convergence systems, with refinable predictions as the amount of train- as the amount of training data increases. Black-box mod- ing data increases, and physical interpretability through els also offer limited insights about the operating mecha- discovery of coherent patterns around which the dynam- nisms (), the understanding of which is central to ics is organized. Our article thus serves as an invitation to the advancement of science. explore ideas at the interface of the two fields. This is a vast subject, and invariably a number of impor- tant developments in areas such as deep learning, reservoir Tyrus Berry is an assistant professor of at George Mason University. His email address is [email protected]. computing, control, and nonautonomous/stochastic sys- 1 Dimitrios Giannakis is an associate professor of mathematics at New York Uni- tems are not discussed here. Our focus will be on topics versity. His email address is [email protected]. drawn from the authors’ research and related work. John Harlim is a professor of mathematics and , and Faculty Fellow of the Institute for Computational and Data , at the Pennsylvania State Statistical Forecasting University. His email address is [email protected]. and Coherent Pattern Extraction 푡 Communicated by Notices Associate Editor Reza Malek-Madani. Consider a dynamical of the form Φ ∶ Ω → Ω, where Ω is the and Φ푡, 푡 ∈ ℝ, the map. For For permission to reprint this article, please contact: [email protected]. 1See https://arxiv.org/abs/2002.07928 for a version of this article with DOI: https://doi.org/10.1090/noti2151 references to the literature on these topics.

1336 NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY VOLUME 67, NUMBER 9 example, Ω could be Euclidean space ℝ푑, or a more general focus will be on nonparametric methods, which do not em- , and Φ푡 the solution map for a system of ODEs ploy explicit parametric models for the dynamics. In- defined on Ω. Alternatively, in a PDE setting, Ω could be stead, they use universal structural properties of dynam- an infinite-dimensional space and Φ푡 an evolu- ical systems to inform the design of data analysis tech- tion group acting on it. We consider that Ω has the struc- niques. From a learning standpoint, Problems 1 and 2 ture of a metric space equipped with its Borel 휎-, can be thought of as supervised and unsupervised learning, playing the role of an event space, with measurable func- respectively. A mathematical requirement we will impose tions on Ω acting as random variables, called observables. on methods addressing either problem is that they have a In a statistical modeling scenario, we consider that avail- well-defined notion of convergence, i.e., they are refinable, able to us are time series of various such observables, sam- as the number 푁 of training samples increases. pled along a dynamical trajectory which we will treat as be- ing unknown. Specifically, we assume that we have access Analog and POD Approaches to two observables, 푋 ∶ Ω → 풳 and 푌 ∶ Ω → 풴, respec- Among the earliest examples of nonparametric forecast- tively referred to as covariate and response functions, to- ing techniques is Lorenz’s analog method [Lor69]. This simple, elegant approach makes predictions by tracking gether with corresponding time series 푥0, 푥1, … , 푥푁−1 and the evolution of the response along a dynamical trajectory 푦0, 푦1, … , 푦푁−1, where 푥푛 = 푋(휔푛), 푦푛 = 푌(휔푛), and 휔푛 = 푛 ∆푡 in the training data (the analogs). Good analogs are se- Φ (휔0). Here, 풳 and 풴 are metric spaces, Δ푡 is a pos- lected according to a measure of geometrical similarity be- itive sampling interval, and 휔0 is an arbitrary in Ω initializing the trajectory. We shall refer to the collection tween the covariate variable observed at forecast initializa- tion and the covariate training data. This method posits {(푥0, 푦0), … , (푥푁−1, 푦푁−1)} as the training data. We require that 풴 be a Banach space (so that one can talk about expec- that past behavior of the system is representative of its fu- tations and other functionals applied to 푌), but allow the ture behavior, so looking up states in a historical record covariate space 풳 to be nonlinear. that are closest to current observations is likely to yield a Many problems in statistical modeling of dynamical sys- skillful forecast. Subsequent methodologies have also em- tems can be expressed in this framework. For instance, in phasized aspects of state space , e.g., using the a low-dimensional ODE setting, 푋 and 푌 could both be training data to approximate the evolution map through the identity map on Ω = ℝ푑, and the task could be to patched local linear models, often leveraging delay coordi- build a model for the evolution of the full system state. nates for state space reconstruction. Weather forecasting is a classical high-dimensional appli- Early approaches to coherent pattern extraction include cation, where Ω is the abstract state space of the climate the proper orthogonal decomposition (POD), which is system, and 푋 a (highly noninvertible) map represent- closely related to principal component analysis (PCA, in- ing measurements from satellites, meteorological stations, troduced in the early twentieth century by Pearson), the and other sensors available to a forecaster. The response Karhunen–Loève expansion, and empirical orthogonal 푌 could be temperature at a specific location, 풴 = ℝ, il- function (EOF) analysis. Assuming that 풴 is a Hilbert 퐿 lustrating that the response space may be of considerably space, POD yields an expansion 푌 ≈ 푌퐿 = ∑푗=1 푧푗, 푧푗 = lower dimension than the covariate space. In other cases, 푢푗휎푗휓푗. Arranging the data into a matrix 퐘 = (푦0, … , 푦푁−1), e.g., forecasting the temperature field over a geographical the 휎푗 are the singular values of 퐘 (in decreasing order), region, 풴 may be a . The two primary ques- the 푢푗 are the corresponding left singular vectors, called tions that will concern us here are: EOFs, and the 휓푗 are given by projections of 푌 onto the 휓 (휔) = ⟨푢 , 푌(휔)⟩ Problem 1 (Statistical forecasting). Given the training EOFs, 푗 푗 풴. That is, the principal compo- 휓 ∶ Ω → ℝ data, construct (“learn”) a function 푍 ∶ 풳 → 풴 that pre- nent 푗 is a linear feature characterizing the un- 푡 {푦 , … , 푦 } dicts 푌 at a lead time 푡 ≥ 0. That is, 푍 should have the supervised data 0 푁−1 . If the data is drawn from a 푡 휇 푁 → ∞ property that 푍 ∘ 푋 is closest to 푌 ∘ Φ푡 among all functions probability measure , as the POD expansion is 푡 퐿2(휇) 푌 퐿2(휇) in a suitable class. optimal in an sense; that is, 퐿 has minimal error ‖푌 − 푌퐿‖퐿2(휇) among all rank-퐿 approximations of Problem 2 (Coherent pattern extraction). Given the train- 푌. Effectively, from the perspective of POD, the important ing data, identify a collection of observables 푧푗 ∶ Ω → 풴 components of 푌 are those capturing maximal variance. that have the property of evolving coherently under the dy- Despite many successes in challenging applications 푡 namics. By that, we mean that 푧푗 ∘ Φ should be relatable (e.g., turbulence), it has been recognized that POD may to 푧푗 in a natural way. not reveal dynamically significant observables, offering These problems have an extensive history of study limited predictability and physical insight. In recent years, from an interdisciplinary perspective spanning mathemat- there has been significant interest in techniques that ad- ics, , physics, and many other fields. Here, our dress this shortcoming by modifying the linear map 퐘 to

OCTOBER 2020 NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY 1337 have an explicit dependence on the dynamics [BK86], or re- supported on the famous “butterfly” placing it by an evolution [DJ99, Mez05]. Either attractor; see Figure 1. L63 exemplifies the fact that a directly or indirectly, these methods make use of operator- smooth may exhibit invariant measures theoretic , which we now discuss. with nonsmooth supports. This behavior is ubiquitous in models of physical phenomena, which are formulated in Operator-Theoretic Formulation terms of smooth differential equations, but whose long- The operator-theoretic formulation of dynamical systems term dynamics concentrate on lower-dimensional subsets theory shifts attention from the state-space perspective, of state space due to the presence of dissipation. Our meth- and instead characterizes the dynamics through its action ods should therefore not rely on the existence of a smooth on linear spaces of observables. Denoting the structure for 퐴. of 풴-valued functions on Ω by ℱ, for every time 푡 the dy- In the setting of ergodic, measure-preserving dynam- namics has a natural induced action 푈푡 ∶ ℱ → ℱ given ics on a metric space, two relevant structures that the dy- by composition with the flow map, 푈푡푓 = 푓 ∘ Φ푡. It namics may be required to preserve are continuity and then follows by definition that 푈푡 is a linear operator; i.e., 휇-measurability of observables. If the flow Φ푡 is contin- 푈푡(훼푓 + 푔) = 훼푈푡푓 + 푈푡푔 for all observables 푓, 푔 ∈ ℱ and uous, then the Koopman operators act on the Banach every scalar 훼 ∈ ℂ. The operator 푈푡 is known as a com- space ℱ = 퐶(퐴, 풴) of continuous, 풴-valued functions on position operator, or Koopman operator after classical work 퐴, equipped with the uniform norm, by isometries, i.e., 푡 푡 푡 of Bernard Koopman in the 1930s [Koo31], which estab- ‖푈 푓‖ℱ = ‖푓‖ℱ. If Φ is 휇-measurable, then 푈 lifts to an lished that a general (potentially nonlinear) dynamical sys- operator on equivalence classes of 풴-valued functions in tem can be characterized through intrinsically linear oper- 퐿푝(휇, 풴), 1 ≤ 푝 ≤ ∞, and acts again by isometries. If 풴 is a ators acting on spaces of observables. A related notion is Hilbert space (with inner product ⟨⋅, ⋅⟩풴), the case 푝 = 2 is that of the transfer operator, 푃푡 ∶ ℳ → ℳ, which describes special, since 퐿2(휇, 풴) is a Hilbert space with inner product the action of the dynamics on a space of measures ℳ via 푡 ⟨푓, 푔⟩퐿2(휇,풴) = ∫Ω⟨푓(휔), 푔(휔)⟩풴 푑휇(휔), on which 푈 acts as a 푡 푡 −푡 the pushforward map, 푃 푚 ∶= Φ∗푚 = 푚 ∘ Φ . In a num- unitary map, 푈푡∗ = 푈−푡. ber of cases, ℱ and ℳ are dual spaces to one another (e.g., Clearly, the properties of approximation techniques for continuous functions and Radon measures), in which case observables and evolution operators depend on the under- 푡 푡 푈 and 푃 are dual operators. lying space. For instance, 퐶(퐴, 풴) has a well-defined no- If the space of observables under consideration is tion of pointwise evaluation at every 휔 ∈ Ω by a contin- equipped with a Banach or Hilbert space structure, and the uous linear map 훿휔 ∶ 퐶(퐴, 풴) → 풴, 훿휔푓 = 푓(휔), which dynamics preserves that structure, the operator-theoretic is useful for interpolation and forecasting, but lacks an formulation allows a broad range of tools from spectral inner-product structure and associated orthogonal projec- theory and approximation theory for linear operators to be tions. On the other hand, 퐿2(휇) has inner-product struc- employed in the study of dynamical systems. For our pur- ture, which is very useful theoretically as well as for nu- poses, a particularly advantageous aspect of this approach merical algorithms, but lacks the notion of pointwise eval- is that it is amenable to rigorous statistical approximation, uation. which is one of our principal objectives. It should be kept Letting ℱ stand for any of the 퐶(퐴, 풴) or 퐿푝(휇, 풴) spaces, 푡 in mind that the spaces of observables encountered in ap- the 푈 = {푈 ∶ ℱ → ℱ}푡∈ℝ forms a strongly con- plications are generally infinite-dimensional, leading to tinuous group under composition of operators. That is, behaviors with no counterparts in finite-dimensional lin- 푈푡 ∘ 푈푠 = 푈푡+푠, 푈푡,−1 = 푈−푡, and 푈0 = Id, so that 푈 ear algebra, such as unbounded operators and continuous is a group, and for every 푓 ∈ ℱ, 푈푡푓 converges to 푓 in spectrum. In fact, as we will see below, the presence of the norm of ℱ as 푡 → 0. A central notion in such evolu- continuous spectrum is a hallmark of mixing (chaotic) dy- tion groups is that of the generator, defined by the ℱ-norm 푡 namics. limit 푉푓 = lim푡→0(푈 푓 − 푓)/푡 for all 푓 ∈ ℱ for which the In this review, we restrict attention to the operator- limit exists. It can be shown that the domain 퐷(푉) of all theoretic description of measure-preserving, ergodic dynam- such 푓 is a dense subspace of ℱ, and 푉 ∶ 퐷(푉) → ℱ is a ics. By that, we mean that there is a probability measure closed, unbounded operator. Intuitively, 푉 can be thought 휇 on Ω such that (i) 휇 is invariant under the flow, i.e., of as a directional derivative of observables along the dy- 푡 푡 Φ∗휇 = 휇; and (ii) every measurable, Φ -invariant set has namics. For example, if 풴 = ℂ, 퐴 is a 퐶1 manifold, and either zero or full 휇-measure. We also assume that 휇 is the flow Φ푡 ∶ 퐴 → 퐴 is generated by a continuous vector a Borel measure with compact support 퐴 ⊆ Ω; this set field 푉⃗ ∶ 퐴 → 푇퐴, then the generator of the Koopman 푡 is necessarily Φ -invariant. An example known to rigor- group on 퐶(퐴) has as its domain the space 퐶1(퐴) ⊂ 퐶(퐴) ously satisfy these properties is the Lorenz 63 (L63) system of continuously differentiable, complex-valued functions, Ω = ℝ3 on , which has a compactly supported, ergodic and 푉푓 = 푉⃗ ⋅ ∇푓 for 푓 ∈ 퐶1(퐴). A strongly continuous

1338 NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY VOLUME 67, NUMBER 9 evolution group is completely characterized by its genera- Data-Driven Forecasting tor, as any two such groups with the same generator are Based on the concepts introduced above, one can formu- identical. late statistical forecasting in Problem 1 as the task of con- The generator acquires additional properties in the set- structing a function 푍푡 ∶ 풳 → 풴 on covariate space 풳, ting of unitary evolution groups on 퐻 = 퐿2(휇, 풴), where it 푡 such that 푍푡 ∘ 푋 optimally approximates 푈 푌 among all ∗ is skew-adjoint, 푉 = −푉. Note that the skew-adjointness functions in a suitable class. We set 풴 = ℂ, so the re- of 푉 holds for more general measure-preserving dynam- sponse variable is scalar-valued, and consider the Koop- ics than Hamiltonian systems, whose generator is skew- man operator on 퐿2(휇), so we have access to orthogonal adjoint with respect to Lebesgue measure. By the spectral projections. We also assume for now that the covariate theorem for skew-adjoint operators, there exists a unique function 푋 is injective, so 푌푡̂ ∶= 푍푡 ∘ 푋 should be able projection-valued measure 퐸 ∶ ℬ(ℝ) → 퐵(퐻), giving the to approximate 푈푡푌 to arbitrarily high precision in 퐿2(휇) generator and Koopman operator as the spectral integrals norm. Indeed, let {푢0, 푢1, …} be an orthonormal basis of 2 퐿 (휈), where 휈 = 푋∗휇 is the pushforward of the invari- 푉 = ∫ 푖훼 푑퐸(훼), 푈푡 = 푒푡푉 = ∫ 푒푖훼푡 푑퐸(훼). ant measure onto 풳. Then, {휙0, 휙1, …} with 휙푗 = 푢푗 ∘ 푋 2 ℝ ℝ is an orthonormal basis of 퐿 (휇). Given this basis, and 푡 푡 푡 because 푈 is bounded, we have 푈 푌 = lim퐿→∞ 푈퐿푌, Here, ℬ(ℝ) is the Borel 휎-algebra on the , and 퐵(퐻) 푡 퐿−1 푡 where the partial sum 푈 푌 ∶= ∑ ⟨푈 푌, 휙 ⟩ 2 휙 con- the space of bounded operators on 퐻. Intuitively, 퐸 can 퐿 푗=0 푗 퐿 (휇) 푗 2 푡 be thought of as an operator analog of a complex-valued verges in 퐿 (휇) norm. Here, 푈퐿 is a finite-rank map 2 spectral measure in Fourier analysis, with ℝ playing the on 퐿 (휇) with range span{휙0, … , 휙퐿−1}, represented by an 푡 role of frequency space. That is, given 푓 ∈ 퐻, the ℂ-valued 퐿 × 퐿 matrix 퐔(푡) with elements 푈푖푗(푡) = ⟨휙푖, 푈 휙푗⟩퐿2(휇). ⊤ 푡 푦⃗ = (푦̂ ,…, 푦̂ ) 푦̂ = ⟨휙 , 푈 푌⟩ 2 Borel measure 퐸푓(푆) = ⟨푓, 퐸(푆)푓⟩퐻 is precisely the Fourier Defining 0 퐿−1 , 푗 푗 퐿 (휇), and spectral measure associated with the time-autocorrelation ⊤ 푡 퐿−1 (0 ̂푧 (푡), … ,퐿−1 ̂푧 (푡)) = 퐔(푡)푦,⃗ we have 푈퐿푌 = ∑푗=0 푗̂푧 (푡)휙푗. 푡 function 퐶푓(푡) = ⟨푓, 푈 푓⟩퐻 . The latter admits the Fourier 2 Since 휙푗 = 푢푗 ∘ 푋, this leads to the estimator 푍푡,퐿̂ ∈ 퐿 (휈), 푖훼푡 representation 퐶 (푡) = ∫ 푒 푑퐸 (훼). 퐿−1 푓 ℝ 푓 with 푍̂ = ∑ ̂푧 (푡)푢 . The Hilbert space 퐻 admits a 푈푡-invariant splitting 퐻 = 푡,퐿 푗=0 푗 푗 The approach outlined above tentatively provides a con- 퐻 ⊕ 퐻 into orthogonal subspaces 퐻 and 퐻 associ- 푎 푐 푎 푐 sistent forecasting framework. Yet, while in principle ap- ated with the point and continuous components of 퐸, re- pealing, it has three major shortcomings: (i) Apart from spectively. In particular, 퐸 has a unique decomposition special cases, the invariant measure and an orthonormal 퐸 = 퐸 + 퐸 with 퐻 = ran 퐸 (ℝ) and 퐻 = ran 퐸 (ℝ), 푎 푐 푎 푎 푐 푐 basis of 퐿2(휇) are not known. In particular, orthogo- where 퐸 is a purely atomic spectral measure, and 퐸 is a 푎 푐 nal functions with respect to an ambient measure on Ω spectral measure with no atoms. The atoms of 퐸 (i.e., the 푎 (e.g., Lebesgue-orthogonal ) will not suffice, singletons {훼 } with 퐸 ({훼 }) ≠ 0) correspond to eigenfre- 푗 푎 푗 since there are no guarantees that such functions form a quencies of the generator, for which the eigenvalue equa- Schauder basis of 퐿2(휇), let alone be orthonormal. Even tion 푉푧 = 푖훼푧 has a nonzero solution 푧 ∈ 퐻 . Under er- 푗 푗 푗 푎 with a basis, we cannot evaluate 푈푡 on its elements without godic dynamics, every eigenspace of 푉 is one-dimensional, 푡 2 2 knowing Φ . (ii) Pointwise evaluation on 퐿 (휇) is not de- so that if 푧푗 is normalized to unit 퐿 (휇) norm, 퐸({훼푗})푓 = fined, making 푍푡,퐿̂ inadequate in practice, even if the coef- ⟨푧 , 푓⟩ 2 푧 . Every such 푧 is an eigenfunction of the Koop- 푗 퐿 (휇) 푗 푗 ficients ̂푧 (푡) are known. (iii) The covariate map 푋 is often- man operator 푈푡 at eigenvalue 푒푖훼푗 푡, and {푧 } is an orthonor- 푗 푗 noninvertible, and thus the 휙 span a strict subspace mal basis of 퐻 . Thus, every 푓 ∈ 퐻 has the quasiperiodic 푗 푎 푎 퐿2(휇) 푡 푖훼푗 푡 of . We now describe methods to overcome these ob- evolution 푈 푓 = ∑ 푒 ⟨푧 , 푓⟩ 2 푧 , and the autocorrela- 푗 푗 퐿 (휇) 푗 stacles using learning theory. tion 퐶 (푡) is also quasiperiodic. While 퐻 always contains 푓 푎 Sampling measures and . The dynamical tra- constant eigenfunctions with zero frequency, it might not jectory {휔 , … , 휔 } in state space underlying the training have any nonconstant elements. In that case, the dynamics 0 푁−1 data is the support of a discrete sampling measure 휇푁 ∶= is said to be weak-mixing. In contrast to the quasiperiodic 푁−1 ∑ 훿휔 /푁. A key consequence of ergodicity is that for evolution of observables in 퐻푎, observables in the contin- 푛=0 푛 uous spectrum subspace exhibit a loss of correlation char- Lebesgue-a.e. sampling interval Δ푡 and 휇-a.e. starting point acteristic of mixing (chaotic) dynamics. Specifically, for 휔0 ∈ Ω, as 푁 → ∞, the sampling measures 휇푁 weak- converge to the invariant measure 휇; that is, every 푓 ∈ 퐻푐 the time-averaged autocorrelation function ̄ 푡 퐶푓(푡) = ∫0 |퐶푓(푠)| 푑푠/푡 tends to 0 as |푡| → ∞, as do cross- lim ∫ 푓 푑휇푁 = ∫ 푓 푑휇 ∀푓 ∈ 퐶(Ω). (1) 푡 푁→∞ correlation functions ⟨푔, 푈 푓⟩퐿2(휇) between observables in Ω Ω 2 퐻푐 and arbitrary observables in 퐿 (휇).

OCTOBER 2020 NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY 1339 2 Since integrals against 휇푁 are time averages on dynam- 푁 → ∞ limits in 퐿 (휇), meaning that there is no suitable 푁−1 ical trajectories, i.e., ∫ 푓 푑휇푁 = ∑ 푓(휔푛)/푁, ergodic- notion of 푁 → ∞ convergence of the matrix elements of Ω 푛=0 푞 ity provides an empirical means of accessing the statistics 푈푁 in the standard basis. In response, we will construct a of the invariant measure. In fact, many systems encoun- representation of the in a different orthonor- tered in applications possess so-called physical measures, mal basis with a well-defined 푁 → ∞ limit. The main tools where (1) holds for 휔0 in a “larger” set of positive mea- that we will use are kernel integral operators, which we now sure with respect to an ambient measure (e.g., Lebesgue describe. measure) from which experimental initial conditions are Kernel integral operators. In the present context, a kernel drawn. Hereafter, we will let 푀 be a compact subset of function will be a real-valued, continuous function 푘 ∶ Ω × Ω, which is forward-invariant under the dynamics (i.e., Ω → ℝ with the property that there exists a strictly positive, Φ푡(푀) ⊆ 푀 for all 푡 ≥ 0), and contains 퐴. For example, continuous function 푑 ∶ Ω → ℝ such that in dissipative dynamical systems such as L63, 푀 can be 푑(휔)푘(휔, 휔′) = 푑(휔′)푘(휔′, 휔) ∀휔, 휔′ ∈ Ω. (2) chosen as a compact absorbing ball. Shift operators. Ergodicity suggests that appropriate data- Notice the similarity between (2) and the detailed balance 2 relation in reversible Markov chains. Now let 휌 be any driven analogs are the 퐿 (휇푁 ) spaces induced by the sam- 2 Borel probability measure with compact support 푆 ⊆ 푀 pling measures 휇푁 . For a given 푁, 퐿 (휇푁 ) consists of equiv- alence classes of measurable functions 푓 ∶ Ω → ℂ hav- included in the forward-invariant set 푀. It follows by con- tinuity of 푘 and compactness of 푆 that the integral operator ing common values at the sampled states 휔푛, and the in- 2 2 퐾휌 ∶ 퐿 (휌) → 퐶(푀), ner product of two elements 푓, 푔 ∈ 퐿 (휇푁 ) is given by an empirical time-correlation, ⟨푓, 푔⟩ = ∫ 푓∗푔 푑휇 = 휇푁 Ω 푁 퐾 푓 = ∫ 푘(⋅, 휔)푓(휔) 푑휌(휔), (3) 푁−1 ∗ 휌 ∑푛=0 푓 (휔푛)푔(휔푛)/푁. Moreover, if the 휔푛 are distinct (as Ω 2 we will assume for simplicity of exposition), 퐿 (휇푁 ) has is well-defined as a bounded operator mapping elements 2 dimension 푁, and is isomorphic as a Hilbert space to of 퐿 (휌) into continuous functions on 푀. Using 휄휌 ∶ ℂ푁 equipped with a normalized dot product. Given that, 퐶(푀) → 퐿2(휌) to denote the canonical inclusion map, we 2 2 we can represent every 푓 ∈ 퐿 (휇푁 ) by a column vector consider two additional integral operators, 퐺휌 ∶ 퐿 (휌) → ⃗ ⊤ 푁 2 푓 = (푓(휔0), … , 푓(휔푁−1)) ∈ ℂ , and every linear map 퐿 (휌) and 퐺휌̃ ∶ 퐶(푀) → 퐶(푀), with 퐺휌 = 휄휌퐾휌 and 2 2 퐴 ∶ 퐿 (휇푁 ) → 퐿 (휇푁 ) by an 푁 × 푁 matrix 퐀, so that 퐺휌̃ = 퐾휌휄휌, respectively. ⃗ ⃗푔= 퐀푓 is the column vector representing 푔 = 퐴푓. The The operators 퐺휌 and 퐺휌̃ are compact operators act- elements of 푓⃗ can also be understood as expansion coef- ing with the same integral formula as 퐾휌 in (3), but their 2 ficients in the standard basis {푒0,푁 , … , 푒푁−1,푁 } of 퐿 (휇푁 ), codomains and domains, respectively, are different. Nev- 1/2 where 푒푗,푁 (휔푛) = 푁 훿푗푛; that is, 푓(휔푛) = ⟨푒푛,푁 , 푓⟩퐿2(휇 ). ertheless, their nonzero eigenvalues coincide, and 휙 ∈ 푁 2 Similarly, the elements of 퐀 correspond to the operator 퐿 (휌) is an eigenfunction of 퐺휌 corresponding to a nonzero 2 eigenvalue 휆 if and only if 휑 ∈ 퐶(푀) with 휑 = 퐾휌휙/휆 is an matrix elements 퐴푖푗 = ⟨푒푖,푁 , 퐴푒푗,푁 ⟩퐿 (휇푁 ). Next, we would like to define a Koopman operator on eigenfunction of 퐺휌̃ at the same eigenvalue. In effect, 휙 ↦ 2 2 퐿 (휇푁 ), but this space does not admit such an operator as a 휑 “interpolates” the 퐿 (휌) element 휙 (defined only up to 휌- composition map induced by the dynamical flow Φ푡 on Ω. null sets) to the continuous, everywhere-defined function 푡 This is because Φ does not preserve null sets with respect 휑. It can be verified that if (2) holds, 퐺휌 is a trace-class op- + to 휇푁 , and thus does not lead to a well-defined compo- erator with real eigenvalues, |휆0| ≥ |휆1| ≥ ⋯ ↘ 0 . More- 2 2 sition map on equivalence classes of functions in 퐿 (휇푁 ). over, there exists a Riesz basis {휙0, 휙1,…,} of 퐿 (휌) and a 2 ′ ′ ′ Nevertheless, on 퐿 (휇푁 ) there is an analogous construct to corresponding dual basis {휙0, 휙1, …} with ⟨휙푖, 휙푗⟩퐿2(휌) = 훿푖푗, 2 ∗ ′ ′ the Koopman operator on 퐿 (휇), namely, the shift operator, such that 퐺휌휙푗 = 휆푗휙푗 and 퐺휌휙푗 = 휆푗휙푗 . We say that the ker- 푞 2 2 2 푈푁 ∶ 퐿 (휇푁 ) → 퐿 (휇푁 ), 푞 ∈ ℤ, defined as nel 푘 is 퐿 (휌)-universal if 퐺휌 has no zero eigenvalues; this 2 푓(휔 ), 0 ≤ 푛 + 푞 ≤ 푁 − 1, is equivalent to ran 퐺휌 being dense in 퐿 (휌). Moreover, 푘 푞 푛+푞 2 푈푁 푓(휔푛) = { is said to be 퐿 (휌)-Markov if 퐺 is a Markov operator, i.e., 0, otherwise. 휌 퐺휌 ≥ 0, 퐺휌푓 ≥ 0 if 푓 ≥ 0, and 퐺1 = 1. 푞 Even though 푈푁 is not a composition map, intuitively Observe now that the operators 퐺휇푁 associated with it should have a connection with the Koopman operator the sampling measures 휇푁 , henceforth abbreviated by 푞 ∆푡 푈 . One could consider, for instance, the matrix repre- 퐺푁 , are represented by 푁 × 푁 kernel matrices 퐆푁 = 푞 ̃ 2 [⟨푒 , 퐺 푒 ⟩ 2 ] = [푘(휔 , 휔 )] in the standard basis of sentation 퐔푁 (푞) = [⟨푒푖,푁 , 푈푁 푒푗,푁 ⟩퐿 (휇푁 )] in the standard 푖,푁 푁 푗,푁 퐿 (휇푁 ) 푖 푗 2 basis, and attempt to connect it with a matrix representa- 퐿 (휇푁 ). Further, if 푘 is a pullback kernel from covariate tion of 푈푞 ∆푡 in an orthonormal basis of 퐿2(휇). However, space, i.e., 푘(휔, 휔′) = 휅(푋(휔), 푋(휔′)) for 휅 ∶ 풳 × 풳 → ℝ, the issue with this approach is that the 푒푗,푁 do not have then 퐆푁 = [휅(푥푖, 푥푗)] is empirically accessible from the

1340 NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY VOLUME 67, NUMBER 9 training data. Popular kernels in applications include the 퐔푁 (푞) = [푈푁,푖푗(푞)], with 푖, 푗 ∈ {0, … , 퐿 − 1}. Checkerboard ′ ′ covariance kernel 휅(푥, 푥 ) = ⟨푥, 푥 ⟩풳 on an inner-product plots of 퐔푁 (푞) for the L63 system are displayed in Figure 1. ′ 2 space and the radial Gaussian kernel 휅(푥, 푥′) = 푒−‖푥−푥 ‖풳 /휖. This method for approximating matrix elements of It is also common to employ Markov kernels constructed Koopman operators was proposed in a technique called by normalization of symmetric kernels [CL06, BH16]. We diffusion forecasting (named after the diffusion kernels em- will use 푘푁 to denote kernels with data-dependent normal- ployed) [BGH15]. Assuming that the response 푌 is contin- izations. uous and by spectral convergence of 퐺푁 , for every 푗 ∈ ℕ0 ̂ ′ A widely used strategy for learning with integral oper- such that 휆푗 > 0, the inner products 푌푗,푁 = ⟨휙푗,푁 , 푌⟩휇푁 con- ̂ ′ ators [vLBB08] is to construct families of kernels 푘푁 con- verge, as 푁 → ∞, to 푌푗 = ⟨휙푗 , 푌⟩퐿2(휇). This implies that for 퐶(푀 × 푀) 푘 퐿−1 verging in norm to . This implies that for every any 퐿 ∈ ℕ such that 휆 > 0, ∑ 푌̂ 휑 converges nonzero eigenvalue 휆 of 퐺 ≡ 퐺 , the of eigen- 퐿−1 푗=0 푗,푁 푗,푁 푗 휇 in 퐶(푀) to a continuous representative of Π 푌, where values 휆 of 퐺 satisfies lim = 휆 . Moreover, 퐿 푗,푁 푁 푁→∞휆푗,푁 푗 Π 퐿2(휇) 2 퐿 is the orthogonal projection on mapping into there exists a sequence of eigenfunctions 휙푗,푁 ∈ 퐿 (휇푁 ) span{휙0, … , 휙퐿−1}. Suppose now that 휚푁 is a sequence of corresponding to 휆푗,푁 , whose continuous representatives, continuous functions converging uniformly to 휚 ∈ 퐶(푀), 휑푗,푁 = 퐾푁 휙푗,푁 /휆푗,푁 , converge in 퐶(푀) to 휑푗 = 퐾휙푗/휆푗, such that 휚 are probability densities with respect to 휇 2 푁 푁 where 휙푗 ∈ 퐿 (휇) is any eigenfunction of 퐺 at eigen- 1 (i.e., 휚푁 ≥ 0 and ‖휚푁 ‖퐿 (휇푁 ) = 1). By similar argu- value 휆푗. In effect, we use 퐶(푀) as a “bridge” to estab- ments as for 푌, as 푁 → ∞, the continuous function 퐿−1 lish spectral convergence of the operators 퐺푁 , which act ′ ∑ 휚푗,푁̂ 휑푗,푁 with 휚푗,푁̂ = ⟨휑 , 휚푁 ⟩퐿2(휇 ) converges to on different spaces. Note that (휆 , 휑 ) does not con- 푗=0 푗,푁 푁 푗,푁 푗,푁 Π 휚 in 퐿2(휇). Putting these facts together, and setting verge uniformly with respect to 푗, and for a fixed 푁, eigen- 퐿 ⊤ ⃗ ̂ ̂ ⊤ values/eigenfunctions at larger 푗 exhibit larger deviations 휚푁⃗ = (휚0,푁̂ ,…, 휚퐿−1,푁̂ ) and 푌푁 = (푌0,푁 ,…, 푌퐿−1,푁 ) , we from their 푁 → ∞ limits. Under measure-preserving, er- conclude that 푁→∞ godic dynamics, convergence occurs for 휇-a.e. starting state ⊤ ⃗ 푞 ∆푡 휚푁⃗ 퐔푁 (푞)푌푁 −−−−→⟨Π퐿휚, Π퐿푈 푌⟩퐿2(휇). (4) 휔0 ∈ 푀, and 휔0 in a set of positive ambient measure if 휇 is physical. In particular, the training states 휔푛 need not Here, the left-hand side is given by matrix–vector prod- lie on 퐴. See Figure 1 for eigenfunctions of 퐺푁 computed ucts obtained from the data, and the right-hand side 푞 ∆푡 from data sampled near the L63 attractor. is equal to the expectation of Π퐿푈 푌 with respect to Diffusion forecasting. We now have the ingredients to the probability measure 휌 with density 푑휌/푑휇 = 휚; i.e., 푞 ∆푡 푞 ∆푡 build a concrete statistical forecasting scheme based on ⟨Π퐿휚, Π퐿푈 푌⟩퐿2(휇) = 피휌(Π퐿푈 푌), where 피휌(⋅) ∶= data-driven approximations of the Koopman operator. In ∫Ω(⋅) 푑휌. ′ What about the dependence of the forecast on 퐿? As 퐿 particular, note that if 휙푖,푁 , 휙푗,푁 are biorthogonal eigen- functions of 퐺∗ and 퐺 , respectively, at nonzero eigen- increases, Π퐿 converges strongly to the orthogonal projec- 푁 푁 2 2 tion Π퐺 ∶ 퐿 (휇) → 퐿 (휇) onto the closure of the range of 퐺. values, we can evaluate the matrix element 푈푁,푖푗(푞) ∶= 2 2 ′ 푞 Thus, if the kernel 푘 is 퐿 (휇)-universal (i.e., ran 퐺 = 퐿 (휇)), 2 ⟨휙푖,푁 , 푈푁 휙푗,푁 ⟩퐿 (휇푁 ) of the shift operator using the contin- 휑′ , 휑 Π퐺 = Id, and under the iterated limit of 퐿 → ∞ after uous representatives 푖,푁 푗,푁 , 푞 ∆푡 푁 → ∞ the left-hand side of (4) converges to 피휌푈 푌. 2 푁−1−푞 In summary, implemented with an 퐿 (휇)-universal ker- 1 푈 (푞) = ∑ 휙′ (휔 )휙 (휔 ) nel, diffusion forecasting consistently approximates the 푁,푖푗 푁 푖,푁 푛 푗,푁 푛+푞 푛=0 expected value of the time-evolution of any continuous 푁 − 푞 observable with respect to any probability measure with = ∫ 휑′ 푈푞 ∆푡휑 푑휇 , 푖,푁 푗,푁 푁−푞 continuous density relative to 휇. An example of an 퐿2(휇)- 푁 Ω universal kernel is the pullback of a radial Gaussian ker- where 푈푞 ∆푡 is the Koopman operator on 퐶(푀). Therefore, nel on 풳 = ℝ푚. In contrast, the covariance kernel is not 2 if the corresponding eigenvalues 휆푖, 휆푗 of 퐺 are nonzero, 퐿 (휇)-universal, as in this case the rank of 퐺 is bounded by by the weak convergence of the sampling measures in (1) 푚. This illustrates that forecasting in the POD basis may and 퐶(푀) convergence of the eigenfunctions, as 푁 → be subject to intrinsic limitations, even with full observa- ∞, 푈푖푗,푁 (푞) converges to the matrix element 푈푖푗(푞 Δ푡) = tions. 푞 ∆푡 2 ⟨휙푖, 푈 휙푗⟩퐿2(휇) of the Koopman operator on 퐿 (휇). This Kernel analog forecasting. While providing a flexible convergence is not uniform with respect to 푖, 푗, but if we framework for approximating expectation values of observ- fix a parameter 퐿 ∈ ℕ (which can be thought of as spec- ables under measure-preserving, ergodic dynamics, diffu- tral resolution) such that 휆퐿−1 ≠ 0, we can obtain a statisti- sion forecasting does not directly address the problem cally consistent approximation of 퐿×퐿 Koopman operator of constructing a concrete forecast function, i.e., a func- 푡 matrices, 퐔(푞 Δ푡) = [푈푖푗(푞 Δ푡)], by shift operator matrices, tion 푍푡 ∶ 풳 → ℂ approximating 푈 푌 as stated in

OCTOBER 2020 NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY 1341 푞∗ 휄 푈 Π 피 푌 (f) 푁 2 푁 2 퐿 퐿 (⋅) Ψ푁 ℋ푁 퐿 (휇푁 ) 퐿 (휇푁 ) span({휙푖,푁 }푖=1) 풴 error 푀 ⊆ Ω Ψ 푡∗ 피 푌 휄 2 푈 2 (⋅) ∈ ℋ(푀) 퐿 (휇) 퐿 (휇) 풴 푡∗ 푡 휔 Ψ(휔) 푈 Ψ(휔) 피Ψ(휔)푈 푌

푞 푈 Π 풩 (g) 2 푁 2 푋 2 푁 휄 2 휄푁 퐿 (휇푁 ) 퐿 (휇푁 ) 퐿푋 (휇푁 ) ℋ푁 퐿푋 (휇) error 퐶(푀) 휄 푡 Π 2 푈 2 푋 2 ∈ 퐿 (휇) 퐿 (휇) 퐿푋 (휇) 푡 푡 푌 푈 푌 푍푡 ∘ 푋 = 피(푈 푌 ∣ 푋)

Figure 1. Panel (a) shows eigenfunctions 휙푗,푁 of 퐺푁 for a dataset sampled near the L63 attractor. Panel (b) shows the action of 푞 푞 ∆푡 the shift operator 푈푁 on the 휙푗,푁 from (a) for 푞 = 50 steps, approximating the Koopman operator 푈 . Panels (c, d) show the 푞 matrix elements ⟨휙푖,푁 , 푈푁 휙푗,푁 ⟩휇푁 of the shift operator for 푞 = 5 and 50. The mixing dynamics is evident in the larger far-from-diagonal components in 푞 = 50 vs. 푞 = 5. Panel (e) shows the matrix representation of a finite-difference approximation of the generator 푉, which is skew-symmetric. Panels (f, g) summarize the diffusion forecast (DF) and kernel analog forecast (KAF) for lead time 푡 = 푞 Δ푡. In each diagram, the data-driven finite-dimensional approximation (top row) converges to the true forecast 푡 (middle row). DF maps an initial state 휔 ∈ 푀 ⊆ Ω to the future expectation of an observable 피Ψ(휔)푈 푌 = 피푈푡∗Ψ(휔)푌, and KAF maps a response function 푌 ∈ 퐶(푀) to the conditional expectation 피(푈푡푌 ∣ 푋).

1342 NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY VOLUME 67, NUMBER 9 Problem 1. One way of defining such a function is to let kernel analog forecasting (KAF) [AG20]. Mathematically, 2 휅푁 be an 퐿 (휈푁 )-Markov kernel on 풳 for 휈푁 = 푋∗휇푁 , and KAF is closely related to kernel principal component re- to consider the “feature map” Ψ푁 ∶ 풳 → 퐶(푀) mapping gression. To build the KAF estimator, we work again with each point 푥 ∈ 풳 in covariate space to the kernel section integral operators as in (3), with the difference that now 2 Ψ푁 (푥) = 휅푁 (푥, 푋(⋅)). Then, Ψ푁 (푥) is a continuous proba- 퐾휌 ∶ 퐿 (휌) → ℋ(푀) takes values in the restriction of bility density with respect to 휇푁 , and we can use diffusion ℋ to the forward-invariant set 푀, denoted ℋ(푀). One ⊤ can show that the adjoint 퐾∗ ∶ ℋ(푀) → 퐿2(휌) coin- forecasting to define 푍 (푥) = Ψ⃗(푥) 퐔 (푞)푌⃗ with no- 휌 푞 ∆푡 푁 푁 푁 cides with the inclusion map 휄 on continuous functions, tation as in (4). 휌 so that 퐾∗ maps 푓 ∈ ℋ(푀) ⊂ 퐶(푀) to its correspond- While this approach has a well-defined 푁 → ∞ limit, it 휌 ing 퐿2(휌) equivalence class. As a result, the integral oper- does not provide optimality guarantees, particularly in sit- 2 2 ∗ uations where 푋 is noninjective. Indeed, the 퐿2(휇)-optimal ator 퐺휌 ∶ 퐿 (휌) → 퐿 (휌) takes the form 퐺휌 = 퐾휌퐾휌, be- 푡 coming a self-adjoint, positive-definite, compact operator approximation to 푈 푌 of the form 푍푡 ∘ 푋 is given by the + 푡 2 with eigenvalues 휆0 ≥ 휆1 ≥ ⋯ ↘ 0 , and a correspond- conditional expectation 피(푈 푌 ∣ 푋). In the present 퐿 setting 2 푡 푡 ing orthonormal eigenbasis {휙0, 휙1, …} of 퐿 (휌). Moreover, we have 피(푈 푌 ∣ 푋) = Π푋 푈 푌, where Π푋 is the orthog- 1/2 2 2 {휓0, 휓1, …} with 휓푗 = 퐾휌휙푗/휆푗 is an orthonormal set in onal projection into 퐿푋 (휇) ∶= {푓 ∈ 퐿 (휇) ∶ 푓 = 푔 ∘ 푋}. That is, the conditional expectation minimizes the error ℋ(푀). In fact, Mercer’s theorem provides an explicit repre- ′ ∞ ′ 푡 2 2 sentation 푘(휔, 휔 ) = ∑ 휓푗(휔)휓푗(휔 ), where direct evalu- ‖푓−푈 푌‖퐿2(휇) among all pullbacks 푓 ∈ 퐿푋 (휇) from covari- 푗=0 ate space. Even though 피(푈푡푌 ∣ 푋 = 푥) can be expressed ation of the kernel in the left-hand side (known as “kernel as an expectation with respect to a conditional probability trick”) avoids the of inner-product computa- measure 휇(⋅ ∣ 푥) on Ω, that measure will generally not have tions between feature vectors 휓푗. Here, our perspective is an 퐿2(휇) density, and there is no map Ψ ∶ 풳 → 퐶(푀) such to rely on the orthogonality of the eigenbasis to approxi- 푡 푡 mate observables of interest at fixed 퐿, and establish con- that ⟨Ψ(푥), 푈 푌⟩퐿2(휇) equals 피(푈 푌 ∣ 푋 = 푥). To construct a consistent estimator of the conditional vergence of the estimator as 퐿 → ∞. A similar approach expectation, we require that 푘 be a pullback of a kernel was adopted for density estimation on noncompact do- 휅 ∶ 풳 × 풳 → ℝ on covariate space which is (i) symmet- mains, with Mercer-type kernels based on orthogonal poly- ric, 휅(푥, 푥′) = 휅(푥′, 푥) for all 푥, 푥′ ∈ 풳 (so (2) holds); (ii) nomials [ZHL19]. strictly positive; and (iii) strictly positive-definite. The latter Now a key operation that the RKHS enables is the Nys- tröm extension, which interpolates 퐿2(휌) elements of appro- means that for any sequence 푥0, … , 푥푛−1 of distinct points priate regularity to RKHS functions. The Nyström operator in 풳 the matrix [휅(푥푖, 푥푗)] is strictly positive. These proper- ties imply that there exists a Hilbert space ℋ of complex- 풩휌 ∶ 퐷(풩휌) → ℋ(푀) is defined on the domain 퐷(풩휌) = 2 valued functions on Ω, such that (i) for every 휔 ∈ Ω, {∑푗 푐푗휙푗 ∶ ∑푗|푐푗| /휆푗 < ∞} by linear extension of 풩휌휙푗 = 1/2 the kernel sections 푘휔 = 푘(휔, ⋅) lie in ℋ; (ii) the evalu- 휓푗/휆푗 . Note that 풩휌휙푗 = 퐾휌휙푗/휆푗 = 휑푗, so 풩휌 maps 휙푗 to its ∗ ation 훿휔 ∶ ℋ → ℂ is bounded and satisfies continuous representative, and 퐾휌풩휌푓 = 푓, meaning that 2 훿휔푓 = ⟨푘휔, 푓⟩ℋ; (iii) every 푓 ∈ ℋ has the form 푓 = 푔 ∘ 푋 풩휌푓 = 푓, 휌-a.e. While 퐷(풩휌) may be a strict 퐿 (휌) subspace, for a continuous function 푔 ∶ 풳 → ℂ; and (iv) 휄휇ℋ lies for any 퐿 with 휆퐿−1 > 0 we define a spectrally truncated op- 2 퐿−1 1/2 dense in 퐿푋 (휇). 2 erator 풩퐿,휌 ∶ 퐿 (휌) → ℋ(푀), 풩퐿,휌 ∑푗 푐푗휙푗 = ∑푗=0 푐푗휓푗/휆푗 . A Hilbert space of functions satisfying (i) and (ii) above Then, as 퐿 increases, 퐾∗풩 푓 converges to Π 푓 in 퐿2(휌). is known as a reproducing kernel Hilbert space (RKHS), and 휌 퐿,휌 퐺휌 To make empirical forecasts, we set 휌 = 휇 , compute the the associated kernel 푘 is known as a reproducing kernel. 푁 expansion coefficients 푐 (푡) of 푈푡푌 in the {휙 } basis of RKHSs have many useful properties for statistical learning 푗,푁 푗,푁 퐿2(휇 ) 푌 = 풩 푈푡푌 ∈ ℋ(푀) [CS02], not least because they combine the Hilbert space 푁 , and construct 푡,퐿,푁 퐿,푁 . Because 휓 푢 ∈ 퐶(풳) structure of 퐿2 spaces with pointwise evaluation in spaces 푗,푁 are pullbacks of known functions 푗,푁 , we 퐿−1 2 1/2 of continuous functions. The density of ℋ in 퐿푋 (휇) is a have 푌푡,퐿,푁 = 푍푡,퐿,푁 ∘푋, where 푍푡,퐿,푁 = ∑푗=0 푐푗(푡)푢푗,푁 /휆푗,푁 consequence of the strict positive-definiteness of 휅. In par- can be evaluated at any 푥 ∈ 풳. 푡 ticular, because the conditional expectation 피(푈 푌 ∣ 푋) The function 푌푡,퐿,푁 is our estimator of the conditional 2 푡 lies in 퐿푋 (휇), it can be approximated by elements of ℋ expectation 피(푈 푌 ∣ 푋). By spectral convergence of ker- 2 to arbitrarily high precision in 퐿 (휇) norm, and every such nel integral operators, as 푁 → ∞, 푌푡,퐿,푁 converges to 푡 approximation will be a pullback 푌푡̂ = 푍푡 ∘ 푋 of a continu- 푌푡,퐿 ∶= 풩퐿푈 푌 in 퐶(푀) norm, where 풩퐿 ≡ 풩퐿,휇. Then, ∗ 2 푡 ous function 푍푡 that can be evaluated at arbitrary covariate as 퐿 → ∞, 퐾 푌푡,퐿 converges in 퐿 (휇) norm to Π퐺푈 푌. Be- values. cause 휅 is strictly positive-definite, 퐺 has dense range in 2 푡 푡 푡 We now describe a data-driven technique for construct- 퐿푋 (휇), and thus Π퐺푈 푌 = Π푋 푈 푌 = 피(푈 푌 ∣ 푋). We ing such a prediction function, which we refer to as therefore conclude that 푌푡,퐿,푁 converges to the conditional

OCTOBER 2020 NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY 1343 transfer operators for detection of invariant sets [DJ99, DFS00], harmonic averaging [Mez05], and dynamic mode decomposition (DMD) [RMB+09,Sch10,WKR15,KNK+18] techniques for approx- imating Koopman eigenfunctions, and Darboux kernels for approximating spectral projectors [KPM20]. While nat- ural from a theoretical standpoint, evolution operators tend to have more complicated spectral properties than kernel integral operators, including nonisolated eigenval- ues and continuous spectrum. The following examples il- lustrate distinct behaviors associated with the point (퐻푎) 2 and continuous (퐻푐) spectrum subspaces of 퐿 (휇). Example 1 (Torus rotation). A quasiperiodic rotation on the 2-torus, Ω = 핋2, is governed by the system of ODEs 2 ̇휔= 푉(휔)⃗ , where 휔 = (휔1, 휔2) ∈ [0, 2휋) , 푉⃗ = (휈1, 휈2), and 휈1, 휈2 ∈ ℝ are rationally independent frequency pa- 푡 rameters. The resulting flow, Φ (휔) = (휔1 + 휈1푡, 휔2 + 휈2푡) mod 2휋, has a unique Borel ergodic invariant probability measure 휇 given by a normalized Lebesgue measure. More- over, there exists an orthonormal basis of 퐿2(휇) consisting 푖(푗휔1+푘휔2) of Koopman eigenfunctions 푧푗푘(휔) = 푒 , 푗, 푘 ∈ ℤ, Figure 2. KAF applied to the L63 state vector component with eigenfrequencies 훼 = 푗휈 + 푘휈 . Thus, 퐻 = 퐿2(휇), 푌(휔) = 휔 푗푘 1 2 푎 1 with full (blue) and partial (red) observations. In the 퐻 fully observed case, the covariate 푋 is the identity map on and 푐 is the zero subspace. Such a system is said to have 3 Ω = ℝ . In the partially observed case, 푋(휔) = 휔1 is the a pure point spectrum. projection to the first coordinate. Top: Forecasts 푍푡,퐿,푁 (푥) 3 initialized from fixed 푥 = 푋(휔), compared with the true Example 2 (Lorenz 63 system). The L63 system on Ω = ℝ evolution 푈푡푌(휔) (black). Shaded regions show error bounds is governed by a system of smooth ODEs with two qua- based on KAF estimates of the conditional standard deviation, dratic nonlinearities. This system is known to exhibit 휎푡(푥). Bottom: RMS forecast errors (solid lines) and 휎푡 (dashed a physical ergodic invariant probability measure 휇 sup- lines). The agreement between actual and estimated errors ported on a compact set (the L63 attractor), with mixing indicates that 휎푡 provides useful uncertainty quantification. dynamics. This means that 퐻푎 is the one-dimensional sub- 2 expectation as 퐿 → ∞ after 푁 → ∞. Forecast results from space of 퐿 (휇) consisting of constant functions, and 퐻푐 con- 2 the L63 system are shown in Figure 2. sists of all 퐿 (휇) functions orthogonal to the constants (i.e., with zero expectation value with respect to 휇). Coherent Pattern Extraction Delay-coordinate approaches. For the point spectrum We now turn to the task of coherent pattern extraction in subspace 퐻 , a natural class of coherent observables is Problem 2. This is a fundamentally unsupervised learning 푎 provided by the Koopman eigenfunctions. Every Koop- problem, as we seek to discover observables of a dynami- man eigenfunction 푧 ∈ 퐻 evolves as a harmonic oscil- cal system that exhibit a natural time evolution (by some 푗 푎 푡 푖훼푗 푡 suitable criterion), rather than approximate a given observ- lator at the corresponding eigenfrequency, 푈 푧푗 = 푒 푧푗, 푖훼푗 푡 able as in the context of forecasting. We have mentioned and the associated autocorrelation function, 퐶푧푗 (푡) = 푒 , POD as a technique for identifying coherent observables also has a harmonic evolution. Short of temporal invari- through eigenfunctions of covariance operators. Kernel ance (which only occurs for constant eigenfunctions un- PCA [SSM98] is a generalization of this approach utilizing der measure-preserving ergodic dynamics), it is natural to integral operators with potentially nonlinear kernels. For think of a harmonic evolution as being “maximally” coher- data lying on Riemannian , it is popular to em- ent. In particular, if 푧푗 is continuous, then for any 휔 ∈ 퐴, 푡 ploy kernels approximating geometrical operators, such the real and imaginary parts of the time series 푡 ↦ 푈 푧푗(휔) as heat operators and their associated Laplacians. Exam- are pure sinusoids, even if the flow Φ푡 is aperiodic. Further ples include Laplacian eigenmaps [BN03], diffusion maps attractive properties of Koopman eigenfunctions include [CL06], and variable-bandwidth kernels [BH16]. Mean- the facts that they are intrinsic to the dynamical system while, coherent pattern extraction techniques based on generating the data, and they are closed under pointwise evolution operators have also gained popularity in re- multiplication, 푧푗푧푘 = 푧푗+푘, allowing one to generate every cent years. These methods include spectral analysis of eigenfunction from a potentially finite generating set.

1344 NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY VOLUME 67, NUMBER 9 Yet, consistently approximating Koopman eigenfunc- raw covariate map 푋. This intuition has been made precise tions from data is a nontrivial task, even for simple systems. in a number of “embedology” theorems [SYC91], which For instance, the torus rotation in Example 1 has a dense state that under mild assumptions, for any compact sub- set of eigenfrequencies by rational independence of the ba- set 푆 ⊆ Ω (including, for our purposes, the invariant set sic frequencies 휈1 and 휈2. Thus, any open interval in ℝ 퐴), the delay-coordinate map 푋푄 is injective on 푆 for suffi- contains infinitely many eigenfrequencies 훼푗푘, necessitat- ciently large 푄. As a result, delay-coordinate maps provide ing some form of regularization. Arguably, the term “pure a powerful tool for state space reconstruction, as well as for point spectrum” is somewhat of a misnomer for such sys- constructing informative predictor functions in the context tems since a nonempty continuous spectrum is present. In- of forecasting. deed, since the spectrum of an operator on a Banach space Aside from considerations associated with topological includes the closure of the set of eigenvalues, 푖ℝ ⧵ {푖훼푗푘} reconstruction, however, observe that a metric 푑 ∶ 풳×풳 → lies in the continuous spectrum. ℝ on covariate space pulls back to a distance-like function ̃ As a way of addressing these challenges, observe that if 푑푄 ∶ Ω × Ω → ℝ such that 퐺 is a self-adjoint, compact operator commuting with the Koopman group (i.e., 푈푡퐺 = 퐺푈푡), then any eigenspace 푄−1 2̃ ′ 1 2 ′ 푑 (휔, 휔 ) = ∑ 푑 (푋(휔−푞), 푋(휔−푞)). (5) 푊휆 of 퐺 corresponding to a nonzero eigenvalue 휆 is in- 푄 푄 푞=0 variant under 푈푡, and thus under the generator 푉. More- 퐺 푊 over, by compactness of , 휆 has finite dimension. Thus, In particular, 푑2̃ has the structure of an ergodic average of {휙 , … , 휙 } 푊 푄 for any orthonormal basis 0 푙−1 of 휆, the genera- a continuous function under the product dynamical flow 푉 푊 tor on 휆 is represented by a skew-symmetric, and thus Φ푡 × Φ푡 on Ω × Ω. By the von Neumann ergodic theorem, 푙 × 푙 퐕 = [⟨휙 , 푉휙 ⟩ 2 ] unitarily diagonalizable, matrix 푖 푗 퐿 (휇) . as 푄 → ∞, 푑̃ converges in 퐿2(휇 × 휇) norm to a bounded ⊤ 푙 푄 The eigenvectors ⃗푢= (푢0, … , 푢푙−1) ∈ ℂ of 퐕 then con- ̃ function 푑∞, which is invariant under the Koopman oper- tain expansion coefficients of Koopman eigenfunctions 푡 푡 푙−1 ator 푈 ⊗ 푈 of the product dynamical system. Note that ̃ 푡 푡 푧 = ∑푗=0 푢푗휙푗 in 푊휆, and the eigenvalues corresponding 푑∞ need not be 휇 × 휇-a.e. constant, as Φ × Φ need not to ⃗푢 are eigenvalues of 푉. be ergodic, and aside from special cases it will not be con- On the basis of the above, since any integral operator 퐺 tinuous on 퐴 × 퐴. Nevertheless, based on the 퐿2(휇 × 휇) 2 2 ̃ ̃ on 퐿 (휇) associated with a symmetric kernel 푘 ∈ 퐿 (휇×휇) is convergence of 푑푄 to 푑∞, it can be shown [DG19] that for Hilbert–Schmidt (and thus compact), and we have a wide any continuous function ℎ ∶ ℝ → ℝ, the integral operator 2 variety of data-driven tools for approximating integral op- 퐺∞ on 퐿 (휇) associated with the kernel 푘∞ = ℎ ∘ 푑∞ com- erators, we can reduce the problem of consistently approxi- mutes with 푈푡 for any 푡 ∈ ℝ. Moreover, as 푄 → ∞, the 2 mating the point spectrum of the Koopman group on 퐿 (휇) operators 퐺푄 associated with 푘푄 = ℎ ∘ 푑푄 converge to 퐺∞ to the problem of constructing a commuting integral op- in 퐿2(휇) operator norm, and thus in spectrum. erator. As we now argue, the success of a number of tech- Many of the operators employed in SSA, DMDC, NLSA, niques, including singular spectrum analysis (SSA) [BK86], and Hankel DMD can be modeled after 퐺푄 described diffusion-mapped delay coordinates (DMDC) [BCGFS13], above. In particular, because 퐺푄 is induced by a continu- nonlinear Laplacian spectral analysis (NLSA) [GM12], and ous kernel, its spectrum can be consistently approximated + 2 Hankel DMD [BBP 17], in identifying coherent patterns by data-driven operators 퐺푄,푁 on 퐿 (휇푁 ), as described in can at least be partly attributed to the fact that they em- the context of forecasting. The eigenfunctions of these ploy integral operators that approximately commute with operators at nonzero eigenvalues approximate eigenfunc- the Koopman operator. tions of 퐺푄, which approximate in turn eigenfunctions of A common characteristic of these methods is that they 퐺∞ lying in finite unions of Koopman eigenspaces. Thus, employ, in some form, delay-coordinate maps [SYC91]. for sufficiently large 푁 and 푄, the eigenfunctions of 퐺푄,푁 at With our notation for the covariate function 푋 ∶ Ω → nonzero eigenvalues capture distinct timescales associated 풳 and sampling interval Δ푡, the 푄-step delay-coordinate with the point spectrum of the dynamical system, provid- 푄 map is defined as 푋푄 ∶ Ω → 풳 with 푋푄(휔) = ing physically interpretable features. These kernel eigen- 푞 ∆푡 (푋(휔0), 푋(휔−1), … , 푋(휔−푄+1)) and 휔푞 = Φ (휔). That is, functions can also be employed in Galerkin schemes to 푋푄 can be thought of as a lift of 푋, which produces “snap- approximate individual Koopman eigenfunctions. shots,” to a map taking values in the space 풳푄 containing Besides the spectral considerations described above, “videos.” Intuitively, by virtue of its higher-dimensional in [BCGFS13] a geometrical characterization of the codomain and dependence on the dynamical flow, a delay- eigenspaces of 퐺푄 was given based on Lyapunov metrics of coordinate map such as 푋푄 should provide additional in- dynamical systems. In particular, it follows by Oseledets’s formation about the underlying dynamics on Ω over the multiplicative ergodic theorem that for 휇-a.e. 휔 ∈ 푀 there exists a decomposition 푇휔푀 = 퐹1,휔 ⊕⋯⊕퐹푟,휔, where 푇휔푀

OCTOBER 2020 NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY 1345 is the tangent space at 휔 ∈ 푀, and 퐹푗,휔 are subspaces satis- in the domain of the Nyström operator, which maps them 푡 fying the equivariance condition 퐷Φ 퐹푗,휔 = 퐹푗,Φ푡(휔). More- to differentiable functions in an RKHS. Here, 휏 is a posi- + ̃ over, there exist Λ1 > ⋯ > Λ푟, such that for every 푣 ∈ 퐹푗,휔, tive regularization parameter such that, as 휏 → 0 , 푉휏 con- 푡 푠 verges to 푉 in a suitable spectral sense. We will assume that Λ푗 = lim푡→∞ ∫0 log‖퐷Φ 푣‖ 푑푠/푡, where ‖⋅‖ is the norm on the forward-invariant, compact manifold 푀 has 퐶1 regular- 푇휔푀 induced by a Riemannian metric. The numbers Λ푗 are called Lyapunov exponents, and are metric-independent. ity, but will not require that the support 퐴 of the invariant Note that the dynamical vector field 푉(휔)⃗ lies in a subspace measure be differentiable. With these assumptions, let 푘 ∶ Ω × Ω → ℝ be a 퐹푗0,휔 with a corresponding zero Lyapunov exponent. If 퐹 is one-dimensional, and the norms ‖퐷Φ푡 푣‖ obey symmetric, positive-definite kernel, whose restriction on 푗0,휔 푀 × 푀 appropriate uniform growth/decay bounds with respect to is continuously differentiable. Then, the corre- ℋ(푀) 휔 ∈ 푀, the dynamical flow is said to be uniformly hyper- sponding RKHS embeds continuously in the Ba- 퐶1(푀) bolic. If, in addition, the support 퐴 of 휇 is a differentiable nach space of continuously differentiable functions 푀 manifold, then there exists a class of Riemannian metrics, on , equipped with the standard norm. Moreover, be- cause 푉 is an extension of the directional derivative 푉⃗ ⋅∇ as- called Lyapunov metrics, for which the 퐹푗,휔 are mutually or- thogonal at every 휔 ∈ 퐴. In [BCGFS13], it was shown sociated with the dynamical vector field, every function in that using a modification of the delay-distance in (5) with ℋ(푀) lies, upon inclusion, in 퐷(푉). The key point here is exponentially decaying weights, as 푄 → ∞, the top eigen- that regularity of the kernel induces RKHSs of observables (푄) which are guaranteed to lie in the domain of the generator. functions 휙 of 퐺 vary predominantly along the sub- 푗 푄 In particular, the range of the integral operator 퐺 = 퐾∗퐾 space 퐹 associated with the most stable Lyapunov ex- 푟,휔 on 퐿2(휇) associated with 푘 lies in 퐷(푉), so that 퐴 = 푉퐺 ponent. That is, for every 휔 ∈ Ω and tangent vector is well-defined. This operator is, in fact, Hilbert–Schmidt, 푣 ∈ 푇휔푀 orthogonal to 퐹푟,휔 with respect to a Lyapunov 1 (푄) with Hilbert–Schmidt norm bounded by the 퐶 (푀 × 푀) metric, lim푄→∞ 푣 ⋅ ∇휙푗 = 0. norm of the kernel 푘. What is perhaps less obvious is RKHS approaches. While delay-coordinate maps are ef- that 퐺1/2푉퐺1/2 (which “distributes” the smoothing by 퐺 fective for approximating the point spectrum and associ- to the left and right of 푉), defined on the dense subspace ated Koopman eigenfunctions, they do not address the {푓 ∈ 퐿2(휇) ∶ 퐺1/2푓 ∈ 퐷(푉)}, is also bounded, and thus problem of identifying coherent observables in the con- has a unique closed extension 푉̃ ∶ 퐿2(휇) → 퐿2(휇), which tinuous spectrum subspace 퐻푐. Indeed, one can verify that turns out to be Hilbert–Schmidt. Unlike 퐴, 푉̃ is skew- in mixed-spectrum systems the infinite-delay operator 퐺∞, adjoint, and thus preserves an important structural prop- which provides access to the eigenspaces of the Koopman erty of the generator. By skew-adjointness and compact- operator, has a nontrivial nullspace that includes 퐻푐 as a ness of 푉̃, there exists an orthonormal basis {푗 ̃푧 ∶ 푗 ∈ ℤ} of 2 subspace. More broadly, there is no obvious way of iden- 퐿 (휇) consisting of its eigenfunctions 푗̃푧 , with purely imagi- tifying coherent observables in 퐻푐 as eigenfunctions of an nary eigenvalues 푖푗 ̃훼 . Moreover, (i) all 푗̃푧 corresponding to intrinsic evolution operator. As a remedy to this problem, nonzero 푗̃훼 lie in the domain of the Nyström operator, and we relax the problem of seeking Koopman eigenfunctions, therefore have 퐶1 representatives in ℋ(푀); and (ii) if 푘 is and consider instead approximate eigenfunctions. An observ- 퐿2(휇)-universal, Markov, and ergodic, then 푉̃ has a simple 2 able 푧 ∈ 퐿 (휇) is said to be an 휖-approximate eigenfunction eigenvalue at zero, in agreement with the analogous prop- 푡 of 푈 if there exists 휆푡 ∈ ℂ such that erty of 푉.

푡 Based on the above, we seek to construct a one- ‖푈 푧 − 휆푡푧‖퐿2(휇) < 휖‖푧‖퐿2(휇). (6) parameter family of such kernels 푘휏, with associated + The number 휆 is then said to lie in the 휖-approximate spec- RKHSs ℋ휏(푀), such that as 휏 → 0 , the regularized genera- 푡 ̃ trum of 푈푡. A Koopman eigenfunction is an 휖-approximate tors 푉휏 converge to 푉 in a sense suitable for spectral conver- eigenfunction for every 휖 > 0, so we think of (6) as a relax- gence. Here, the relevant notion of convergence is strong ation of the eigenvalue equation, 푈푡푧 − 휆 푧 = 0. This sug- resolvent convergence; that is, for every element 휆 of the re- 푡 2 ̃ −1 gests that a natural notion of coherence of observables in solvent set of 푉 and every 푓 ∈ 퐿 (휇), (푉휏 − 휆) 푓 must −1 퐿2(휇), appropriate to both the point and continuous spec- converge to (푉 − 휆) 푓. In that case, for every element 푖훼 trum, is that (6) holds for 휖 ≪ 1 and all 푡 in a “large” inter- of the spectrum of 푉 (both point and continuous), there exists a sequence of eigenvalues 푖 ̃훼 of 푉̃ converging to val. 푗휏,휏 휏 + We now outline an RKHS-based approach [DGS18], 푖훼 as 휏 → 0 0. Moreover, for any 휖 > 0 and 푇 > 0, there 푖훼푗 ,휏푡 which identifies observables satisfying this condition exists 휏∗ > 0 such that for all 0 < 휏 < 휏∗ and |푡| < 푇, 푒 휏 lies in the 휖-approximate spectrum of 푈푡 and ̃푧 is an through eigenfunctions of a regularized operator 푉휏̃ on 푗휏,휏 퐿2(휇) approximating 푉 with the properties of (i) being 휖-approximate eigenfunction. skew-adjoint and compact; and (ii) having eigenfunctions

1346 NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY VOLUME 67, NUMBER 9 In [DGS18], a constructive procedure was proposed for obtaining the kernel family 푘휏 through a Markov semi- group on 퐿2(휇). This method has a data-driven implemen- tation, with analogous spectral convergence results for the 2 associated integral operators 퐺휏,푁 on 퐿 (휇푁 ) to those de- scribed in the setting of forecasting. Given these opera- ̃ ̃ 1/2 1/2 tors, we approximate 푉휏 by 푉휏,푁 = 퐺휏,푁 푉푁 퐺휏,푁 , where 푉푁 is a skew-adjoint, finite-difference approximation of the gen- 1 1∗ erator. For example, 푉푁 = (푈푁 − 푈푁 )/(2 Δ푡) is a second- order finite-difference approximation based on the 1-step 1 shift operator 푈푁 . See Figure 1 for a graphical represen- tation of a generator matrix for L63. As with our data- driven approximations of 푈푡, we work with a rank-퐿 opera- tor 푉휏̂ ∶= Π휏,푁,퐿푉휏,푁 Π휏,푁,퐿, where Π휏,푁,퐿 is the orthogonal projection to the subspace spanned by the first 퐿 eigenfunc- tions of 퐺휏,푁 . This family of operators converges spectrally to 푉휏 in a limit of 푁 → 0, followed by Δ푡 → 0 and 퐿 → ∞, 1 where we note that 퐶 regularity of 푘휏 is important for the finite-difference approximations to converge. At any given 휏, an a posteriori criterion for identify- ing candidate eigenfunctions 푗,휏̂푧 satisfying (6) for small 휖 is to compute a Dirichlet energy functional, 풟(푗,휏 ̂푧 ) = 2 2 ‖풩 ̂푧 ‖ /‖ ̂푧 ‖ 2 . Intuitively, 풟 assigns a mea- 휏,푁 푗,휏 ℋ휏(푀) 푗,휏 퐿 (휇푁 ) sure of roughness to every nonzero element in the domain of the Nyström operator (analogously to the Dirichlet en- ergy in Sobolev spaces on differentiable manifolds), and the smaller 풟(푗,휏 ̂푧 ) is, the more coherent 푗,휏̂푧 is expected to be. Indeed, as shown in Figure 3, the 푗,휏̂푧 corresponding to low Dirichlet energy identify observables of the L63 system Figure 3. A representative eigenfunction 푗,휏̂푧 of the with a coherent dynamical evolution, even though this sys- compactified generator 푉휏̂ for the L63 system, with low tem is mixing and has no nonconstant Koopman eigen- corresponding Dirichlet energy. Top: Scatterplot of Re푗,휏 ̂푧 on functions. Sampled along dynamical trajectories, the ap- the L63 attractor. Bottom: Time series of Re푗,휏 ̂푧 sampled proximate Koopman eigenfunctions resemble amplitude- along a dynamical trajectory. modulated wavepackets, exhibiting a low-frequency mod- Mori–Zwanzig formalism so as to incorporate memory ef- ulating envelope while maintaining phase coherence and a fects. Another potential direction for future development precise carrier frequency. This behavior can be thought of is to incorporate wavelet frames, particularly when the as a “relaxation” of Koopman eigenfunctions, which gen- measurements or probability densities are highly localized. erate pure sinusoids with no amplitude modulation. Moreover, when the attractor 퐴 is not a manifold, appro- Conclusions and Outlook priate notions of regularity need to be identified so as to fully characterize the behavior of kernel algorithms such as We have presented mathematical techniques at the inter- diffusion maps. While we suspect that kernel-based con- face of dynamical and data science for sta- structions will still be the fundamental tool, the choice of tistical analysis and modeling of dynamical systems. One kernel may need to be adapted to the regularity of the at- of our primary goals has been to highlight a fruitful inter- tractor to obtain optimal performance. Finally, a number play of ideas from ergodic theory, , and of applications (e.g., analysis of perturbations) concern , which, coupled with learning the- the action of dynamics on more general vector bundles ory, provide an effective route for data-driven prediction besides functions, potentially with a noncommutative al- and pattern extraction, well-adapted to handle nonlinear gebraic structure, calling for the development of suitable dynamics and complex . data-driven techniques for such spaces. There are several open questions and future research di- rections stemming from these topics. First, it should be possible to combine pointwise estimators derived from methods such as diffusion forecasting and KAF with the

OCTOBER 2020 NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY 1347 [GM12] D. Giannakis and A. J. Majda, Nonlinear Lapla- ACKNOWLEDGMENTS. Research of the authors de- cian spectral analysis for time series with intermittency and scribed in this review was supported by DARPA low-frequency variability, Proc. Natl. Acad. Sci. USA 109 grant HR0011-16-C-0116; NSF grants 1842538, DMS- (2012), no. 7, 2222–2227, DOI 10.1073/pnas.1118984109. 1317919, DMS-1521775, DMS-1619661, DMS-172317, MR2898568 DMS-1854383; and ONR grants N00014-12-1-0912, [KNK+18] S. Klus, F. Nüske, P. Koltai, H. Wu, I. Kevrekidis, C. N00014-14-1-0150, N00014-13-1-0797, N00014-16-1- Schütte, and F. No´e, Data-driven model reduction and transfer 2649, N00014-16-1-2888. operator approximation, J. Nonlinear Sci. 28 (2018), no. 3, 985–1010, DOI 10.1007/s00332-017-9437-7. MR3800253 [Koo31] B. O. Koopman, Hamiltonian systems and transforma- References tion in Hilbert space, Proc. Natl. Acad. Sci. 17 (1931), no. 5, [AG20] R. Alexander and D. Giannakis, Operator-theoretic 315–318, DOI 10.1073/pnas.17.5.315. framework for forecasting nonlinear time series with kernel [KPM20] M. Korda, M. Putinar, and I. Mezi´c, Data-driven analog techniques, Phys. D 409 (2020), 132520, 24, DOI spectral analysis of the Koopman operator, Appl. Com- 10.1016/j.physd.2020.132520. MR4093838 put. Harmon. Anal. 48 (2020), no. 2, 599–629, DOI [BBP+17] S. L. Brunton, B. W. Brunton, J. L. Proctor, E. Kaiser, 10.1016/j.acha.2018.08.002. MR4047538 and J. N. Kutz, Chaos as an intermittently forced , [Lor69] E. N. Lorenz, Atmospheric predictability as re- Nat. Commun. 8 (2017), no. 19, DOI 10.1038/s41467-017- vealed by naturally occurring analogues, J. Atmos. 00030-8. Sci. 26 (1969), 636–646, DOI 10.1175/1520- [BCGFS13] T. Berry, J. R. Cressman, Z. Greguri´c-Ferenˇcek, 0469(1969)26<636:aparbn>2.0.co;2. and T. Sauer, Time-scale separation from diffusion-mapped de- [Mez05] I. Mezi´c, Spectral properties of dynamical systems, lay coordinates, SIAM J. Appl. Dyn. Syst. 12 (2013), no. 2, model reduction and decompositions, Nonlinear Dynam. 41 618–649, DOI 10.1137/12088183X. MR3047439 (2005), no. 1-3, 309–325, DOI 10.1007/s11071-005-2824- [BGH15] T. Berry, D. Giannakis, and J. Harlim, Nonparametric x. MR2157184 + forecasting of low-dimensional dynamical systems, Phys. Rev. E. [RMB 09] C. W. Rowley, I. Mezi´c, S. Bagheri, P. Schlat- 91 (2015), 032915, DOI 10.1103/PhysRevE.91.032915. ter, and D. S. Henningson, Spectral analysis of nonlin- [BH16] T. Berry and J. Harlim, Variable bandwidth diffusion ker- ear flows, J. Fluid Mech. 641 (2009), 115–127, DOI nels, Appl. Comput. Harmon. Anal. 40 (2016), no. 1, 68– 10.1017/S0022112009992059. MR2577895 96, DOI 10.1016/j.acha.2015.01.001. MR3431485 [Sch10] P. J. Schmid, Dynamic mode decomposition of numerical [BK86] D. S. Broomhead and G. P. King, Extracting qualita- and experimental data, J. Fluid Mech. 656 (2010), 5–28, DOI tive dynamics from experimental data, Phys. D 20 (1986), 10.1017/S0022112010001217. MR2669948 no. 2-3, 217–236, DOI 10.1016/0167-2789(86)90031-X. [SSM98] B. Schölkopf, A. Smola, and K. Müller, Nonlinear MR859354 component analysis as a kernel eigenvalue problem, Neural [BN03] M. Belkin and P. Niyogi, Laplacian eigenmaps for di- Comput. 10 (1998), 1299–1319. mensionality reduction and data representation, Neural Com- [SYC91] T. Sauer, J. A. Yorke, and M. Casdagli, Embedol- put. 15 (2003), 1373–1396. ogy, J. Statist. Phys. 65 (1991), no. 3-4, 579–616, DOI [CL06] R. R. Coifman and S. Lafon, Diffusion maps, Appl. 10.1007/BF01053745. MR1137425 Comput. Harmon. Anal. 21 (2006), no. 1, 5–30, DOI [vLBB08] U. von Luxburg, M. Belkin, and O. Bousquet, 10.1016/j.acha.2006.04.006. MR2238665 Consistency of spectral clustering, Ann. Statist. 36 (2008), [CS02] F. Cucker and S. Smale, On the mathematical foun- no. 2, 555–586, DOI 10.1214/009053607000000640. dations of learning, Bull. Amer. Math. Soc. (N.S.) 39 MR2396807 (2002), no. 1, 1–49, DOI 10.1090/S0273-0979-01-00923- [WKR15] M. O. Williams, I. G. Kevrekidis, and C. W. Row- 5. MR1864085 ley, A data-driven approximation of the Koopman operator: ex- [DFS00] M. Dellnitz, G. Froyland, and S. Sertl, On the tending dynamic mode decomposition, J. Nonlinear Sci. 25 isolated spectrum of the Perron-Frobenius operator, Nonlin- (2015), no. 6, 1307–1346, DOI 10.1007/s00332-015-9258- earity 13 (2000), no. 4, 1171–1188, DOI 10.1088/0951- 5. MR3415049 7715/13/4/310. MR1767953 [ZHL19] H. Zhang, J. Harlim, and X. Li, Computing linear re- [DG19] S. Das and D. Giannakis, Delay-coordinate maps and sponse statistics using orthogonal based estimators: the spectra of Koopman operators, J. Stat. Phys. 175 (2019), An RKHS formulation, 2019, https://arxiv.org/abs no. 6, 1107–1145, DOI 10.1007/s10955-019-02272-w. /1912.11110. MR3962976 [DGS18] S. Das, D. Giannakis, and J. Slawinska, Reproduc- ing kernel Hilbert space compactification of unitary evolution groups, 2018, https://arxiv.org/abs/1808.01515. [DJ99] M. Dellnitz and O. Junge, On the approxi- mation of complicated dynamical behavior, SIAM J. Numer. Anal. 36 (1999), no. 2, 491–515, DOI 10.1137/S0036142996313002. MR1668207

1348 NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY VOLUME 67, NUMBER 9 Faculty Position in Mathematics at the Ecole polytechnique fédérale de Lausanne (EPFL) Tyrus Berry Dimitrios Giannakis The School of Basic Sciences (Physics, and Mathemat- ics) at EPFL seeks to appoint a Tenure-Track Assistant Professor of Mathematics. We seek outstanding candidates with research interests in the theory of dynamical systems, broadly construed. Indicative areas of interest include, but are not limited to: algebraic/ analytical dynamics, dynamics and geometry, ergodic theory, local- ly homogeneous spaces.

We expect candidates to establish leadership and strengthen the EPFL’s profile in the field. Priority will be given to the overall origi- nality and promise of the candidate’s work over any particular spe- cialization area. John Harlim Candidates should hold a PhD and have an excellent record of sci- Credits entific accomplishments in the field. In addition, commitment to Figures are courtesy of the authors. teaching at the undergraduate, master and doctoral levels is ex- Photo of Tyrus Berry is courtesy of Miruna Tecuci. pected. Proficiency in French teaching is not required, but willing- Photo of Dimitrios Giannakis is courtesy of Joanna Slawin- ness to learn the language expected. ska. EPFL, with its main campus located in Lausanne, Switzerland, on Photo of John Harlim is courtesy of Leonie Vachon. the shores of lake Geneva, is a dynamically growing and well-fund- ed institution fostering excellence and diversity. It has a highly international campus with first-class infrastructure, including high performance computing.

As a technical university covering essentially the entire palette of and science, EPFL offers a fertile environment for re- search cooperation between different disciplines. The EPFL envi- ronment is multi-lingual and multi-cultural, with English often serv- ing as a common interface.

Applications should include a cover letter, a CV with a list of publi- cations, a concise statement of research (maximum 3 pages) and teaching interests (one page), and the names and addresses (in- cluding e-mail) of at least three references.

Applications should be uploaded (as PDFs) by September 30, 2020 to :

https://facultyrecruiting.epfl.ch/positiondetails/23691270

Enquiries may be addressed to: Prof. Victor Panaretos Chair of the Search Committee E-mail : [email protected]

For additional information, please consult www.epfl.ch, sb.epfl.ch, math.epfl.ch

EPFL is an equal opportunity employer and family friendly university. It is committed to increasing the diversity of its faculty. It strongly encourages women to apply.

OCTOBER 2020 NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY 1349