Bridging Data Science and Dynamical Systems Theory Tyrus Berry, Dimitrios Giannakis, and John Harlim Modern science is undergoing what might arguably be In this short review, we describe mathematical tech- called a “data revolution,” manifested by a rapid growth niques for statistical analysis and prediction of time- of observed and simulated data from complex systems, as evolving phenomena, ranging from simple examples such well as vigorous research on mathematical and computa- as an oscillator, to highly complex systems such as the tur- tional frameworks for data analysis. In many scientific bulent motion of the Earth’s atmosphere, the folding of branches, these efforts have led to the creation of statistical proteins, and the evolution of species populations in an models of complex systems that match or exceed the skill ecosystem. Our main thesis is that combining ideas from of first-principles models. Yet, despite these successes, sta- the theory of dynamical systems with learning theory pro- tistical models are oftentimes treated as black boxes, pro- vides an effective route to data-driven models of complex viding limited guarantees about stability and convergence systems, with refinable predictions as the amount of train- as the amount of training data increases. Black-box mod- ing data increases, and physical interpretability through els also offer limited insights about the operating mecha- discovery of coherent patterns around which the dynam- nisms (physics), the understanding of which is central to ics is organized. Our article thus serves as an invitation to the advancement of science. explore ideas at the interface of the two fields. This is a vast subject, and invariably a number of impor- tant developments in areas such as deep learning, reservoir Tyrus Berry is an assistant professor of mathematics at George Mason University. His email address is [email protected]. computing, control, and nonautonomous/stochastic sys- 1 Dimitrios Giannakis is an associate professor of mathematics at New York Uni- tems are not discussed here. Our focus will be on topics versity. His email address is [email protected]. drawn from the authors’ research and related work. John Harlim is a professor of mathematics and meteorology, and Faculty Fellow of the Institute for Computational and Data Sciences, at the Pennsylvania State Statistical Forecasting University. His email address is [email protected]. and Coherent Pattern Extraction 푡 Communicated by Notices Associate Editor Reza Malek-Madani. Consider a dynamical system of the form Φ ∶ Ω → Ω, where Ω is the state space and Φ푡, 푡 ∈ ℝ, the flow map. For For permission to reprint this article, please contact: [email protected]. 1See https://arxiv.org/abs/2002.07928 for a version of this article with DOI: https://doi.org/10.1090/noti2151 references to the literature on these topics. 1336 NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY VOLUME 67, NUMBER 9 example, Ω could be Euclidean space ℝ푑, or a more general focus will be on nonparametric methods, which do not em- manifold, and Φ푡 the solution map for a system of ODEs ploy explicit parametric models for the dynamics. In- defined on Ω. Alternatively, in a PDE setting, Ω could be stead, they use universal structural properties of dynam- an infinite-dimensional function space and Φ푡 an evolu- ical systems to inform the design of data analysis tech- tion group acting on it. We consider that Ω has the struc- niques. From a learning standpoint, Problems 1 and 2 ture of a metric space equipped with its Borel 휎-algebra, can be thought of as supervised and unsupervised learning, playing the role of an event space, with measurable func- respectively. A mathematical requirement we will impose tions on Ω acting as random variables, called observables. on methods addressing either problem is that they have a In a statistical modeling scenario, we consider that avail- well-defined notion of convergence, i.e., they are refinable, able to us are time series of various such observables, sam- as the number 푁 of training samples increases. pled along a dynamical trajectory which we will treat as be- ing unknown. Specifically, we assume that we have access Analog and POD Approaches to two observables, 푋 ∶ Ω → 풳 and 푌 ∶ Ω → 풴, respec- Among the earliest examples of nonparametric forecast- tively referred to as covariate and response functions, to- ing techniques is Lorenz’s analog method [Lor69]. This simple, elegant approach makes predictions by tracking gether with corresponding time series 푥0, 푥1, … , 푥푁−1 and the evolution of the response along a dynamical trajectory 푦0, 푦1, … , 푦푁−1, where 푥푛 = 푋(휔푛), 푦푛 = 푌(휔푛), and 휔푛 = 푛 ∆푡 in the training data (the analogs). Good analogs are se- Φ (휔0). Here, 풳 and 풴 are metric spaces, Δ푡 is a pos- lected according to a measure of geometrical similarity be- itive sampling interval, and 휔0 is an arbitrary point in Ω initializing the trajectory. We shall refer to the collection tween the covariate variable observed at forecast initializa- tion and the covariate training data. This method posits {(푥0, 푦0), … , (푥푁−1, 푦푁−1)} as the training data. We require that 풴 be a Banach space (so that one can talk about expec- that past behavior of the system is representative of its fu- tations and other functionals applied to 푌), but allow the ture behavior, so looking up states in a historical record covariate space 풳 to be nonlinear. that are closest to current observations is likely to yield a Many problems in statistical modeling of dynamical sys- skillful forecast. Subsequent methodologies have also em- tems can be expressed in this framework. For instance, in phasized aspects of state space geometry, e.g., using the a low-dimensional ODE setting, 푋 and 푌 could both be training data to approximate the evolution map through the identity map on Ω = ℝ푑, and the task could be to patched local linear models, often leveraging delay coordi- build a model for the evolution of the full system state. nates for state space reconstruction. Weather forecasting is a classical high-dimensional appli- Early approaches to coherent pattern extraction include cation, where Ω is the abstract state space of the climate the proper orthogonal decomposition (POD), which is system, and 푋 a (highly noninvertible) map represent- closely related to principal component analysis (PCA, in- ing measurements from satellites, meteorological stations, troduced in the early twentieth century by Pearson), the and other sensors available to a forecaster. The response Karhunen–Loève expansion, and empirical orthogonal 푌 could be temperature at a specific location, 풴 = ℝ, il- function (EOF) analysis. Assuming that 풴 is a Hilbert 퐿 lustrating that the response space may be of considerably space, POD yields an expansion 푌 ≈ 푌퐿 = ∑푗=1 푧푗, 푧푗 = lower dimension than the covariate space. In other cases, 푢푗휎푗휓푗. Arranging the data into a matrix 퐘 = (푦0, … , 푦푁−1), e.g., forecasting the temperature field over a geographical the 휎푗 are the singular values of 퐘 (in decreasing order), region, 풴 may be a function space. The two primary ques- the 푢푗 are the corresponding left singular vectors, called tions that will concern us here are: EOFs, and the 휓푗 are given by projections of 푌 onto the 휓 (휔) = ⟨푢 , 푌(휔)⟩ Problem 1 (Statistical forecasting). Given the training EOFs, 푗 푗 풴. That is, the principal compo- 휓 ∶ Ω → ℝ data, construct (“learn”) a function 푍 ∶ 풳 → 풴 that pre- nent 푗 is a linear feature characterizing the un- 푡 {푦 , … , 푦 } dicts 푌 at a lead time 푡 ≥ 0. That is, 푍 should have the supervised data 0 푁−1 . If the data is drawn from a 푡 휇 푁 → ∞ property that 푍 ∘ 푋 is closest to 푌 ∘ Φ푡 among all functions probability measure , as the POD expansion is 푡 퐿2(휇) 푌 퐿2(휇) in a suitable class. optimal in an sense; that is, 퐿 has minimal error ‖푌 − 푌퐿‖퐿2(휇) among all rank-퐿 approximations of Problem 2 (Coherent pattern extraction). Given the train- 푌. Effectively, from the perspective of POD, the important ing data, identify a collection of observables 푧푗 ∶ Ω → 풴 components of 푌 are those capturing maximal variance. that have the property of evolving coherently under the dy- Despite many successes in challenging applications 푡 namics. By that, we mean that 푧푗 ∘ Φ should be relatable (e.g., turbulence), it has been recognized that POD may to 푧푗 in a natural way. not reveal dynamically significant observables, offering These problems have an extensive history of study limited predictability and physical insight. In recent years, from an interdisciplinary perspective spanning mathemat- there has been significant interest in techniques that ad- ics, statistics, physics, and many other fields. Here, our dress this shortcoming by modifying the linear map 퐘 to OCTOBER 2020 NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY 1337 have an explicit dependence on the dynamics [BK86], or re- invariant measure supported on the famous “butterfly” placing it by an evolution operator [DJ99, Mez05]. Either fractal attractor; see Figure 1. L63 exemplifies the fact that a directly or indirectly, these methods make use of operator- smooth dynamical system may exhibit invariant measures theoretic ergodic theory, which we now discuss. with nonsmooth supports. This behavior is ubiquitous in models of physical phenomena, which are formulated in Operator-Theoretic Formulation terms of smooth differential equations, but whose long- The operator-theoretic formulation of dynamical systems term dynamics concentrate on lower-dimensional subsets theory shifts attention from the state-space perspective, of state space due to the presence of dissipation. Our meth- and instead characterizes the dynamics through its action ods should therefore not rely on the existence of a smooth on linear spaces of observables.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages14 Page
-
File Size-