Neural Sdes As Infinite-Dimensional Gans

Neural SDEs as Infinite-Dimensional GANs Patrick Kidger 1 2 James Foster 1 2 Xuechen Li 3 Harald Oberhauser 1 2 Terry Lyons 1 2 Abstract stochastic differential equations (neural SDEs), such as Stochastic differential equations (SDEs) are a Tzen & Raginsky(2019a); Li et al.(2020); Hodgkinson staple of mathematical modelling of temporal dy- et al.(2020), among others. This is our focus here. namics. However, a fundamental limitation has Neural differential equations parameterise the vector been that such models have typically been rela- field(s) of a differential equation by neural networks. They tively inflexible, which recent work introducing are an elegant concept, bringing together the two dominant Neural SDEs has sought to solve. Here, we show modelling paradigms of neural networks and differential that the current classical approach to fitting SDEs equations. may be approached as a special case of (Wasser- stein) GANs, and in doing so the neural and clas- The main idea – fitting a parameterised differential equa- sical regimes may be brought together. The in- tion to data, often via stochastic gradient descent – has been put noise is Brownian motion, the output sam- a cornerstone of mathematical modelling for a long time ples are time-evolving paths produced by a nu- (Giles & Glasserman, 2006). The key benefit of the neural merical solver, and by parameterising a discrim- network hybridisation is its availability of easily-trainable inator as a Neural Controlled Differential Equa- high-capacity function approximators. tion (CDE), we obtain Neural SDEs as (in modern machine learning parlance) continuous-time 1.2. Stochastic differential equations generative time series models. Unlike previous Stochastic differential equations have seen widespread use work on this problem, this is a direct extension of for modelling real-world random phenomena, such as par- the classical approach without reference to either ticle systems (Coffey et al., 2012; Pavliotis, 2014; Lelievre` prespecified statistics or density functions. Ar- & Stoltz, 2016), financial markets (Black & Scholes, 1973; bitrary drift and diffusions are admissible, so as Cox et al., 1985; Brigo & Mercurio, 2001), population dy- the Wasserstein loss has a unique global minima, namics (Arato´, 2003; Soboleva & Pleasants, 2003) and ge- in the infinite data limit any SDE may be learnt. netics (Huillet, 2007). They are a natural extension of ordi- Example code has been made available as part of nary differential equations (ODEs) for modelling systems the torchsde repository. that evolve in continuous time subject to uncertainty. The dynamics of an SDE consist of a deterministic term 1. Introduction and a stochastic term: 1.1. Neural differential equations dX = f(t, X ) dt + g(t, X ) dW , (1) t t t ◦ t Since their introduction, neural ordinary differential equa- x tions (Chen et al., 2018) have prompted the creation of where X = Xt t [0,T ] is a continuous R -valued { } ∈ a variety of similarly-inspired models, for example based stochastic process, f : [0,T ] Rx Rx, g : [0,T ] Rx x w × → × → around controlled differential equations (Kidger et al., R × are functions and W = Wt t 0 is a w-dimensional { } ≥ 2020; Morrill et al., 2020), Lagrangians (Cranmer et al., Brownian motion. We refer the reader to Revuz & Yor 2020), higher-order ODEs (Massaroli et al., 2020; Norcliffe (2013) for a rigorous account of stochastic integration. et al., 2020), and equilibrium points (Bai et al., 2019). The notation “ ” in the noise refers to the SDE being un- ◦ In particular, several authors have introduced neural derstood using Stratonovich integration. The difference be- tween Itoˆ and Stratonovich will not be an important choice 1 2 Mathematical Institute, University of Oxford The Alan Tur- here; we happen to prefer the Stratonovich formulation as ing Institute, The British Library 3Stanford. Correspondence to: Patrick Kidger <[email protected]>. the dynamics of (1) may then be informally interpreted as Proceedings of the 38 th International Conference on Machine ∆W X ODESolve X , f( )+g( ) , [t, t+∆t] , Learning, PMLR 139, 2021. Copyright 2021 by the author(s). t+∆t ≈ t · · ∆t Neural SDEs as Infinite-Dimensional GANs where ∆W (0, ∆tI ) denotes the increment of the Arbitrary drift and diffusions are admissible, which from ∼ N w Brownian motion over the small time interval [t, t + ∆t]. the point of view of the classical SDE literature offers un- precedented modelling capacity. As the Wasserstein loss Historically, workflows for SDE modelling have two steps: has a unique global minima, then in the infinite data limit 1. A domain expert will formulate an SDE model using arbitrary SDEs may be learnt. their experience and knowledge. One frequent and Unlike much previous work on neural SDEs, this operates straightforward technique is to add “σ dW ” to a pre- ◦ t as a direct extension of the classical tried-and-tested ap- existing ODE model, where σ is a fixed matrix. proach. Moreover and to the best of our knowledge, this is 2. Once an SDE model is chosen, the model parameters the first approach to SDE modelling that involves neither must be calibrated from real-world data. Since SDEs prespecified statistics nor the use of density functions. produce random sample paths, parameters are often In modern machine learning parlance, neural SDEs become chosen to capture some desired expected behaviours. continuous-time generative models. We anticipate applica- That is, one trains the model to match target statistics: tions in the main settings for which SDEs are already used – now with enhanced modelling power. For example later E Fi(X) 1 i n, (2) ≤ ≤ we will consider an application to financial time series. where the real-valued functions F are prespecified. { i} For example in mathematical finance, the statistics (2) 2. Related work represent option prices that correspond to the functions Fi, which are termed payoff functions; for the We begin by discussing previous formulations, and appli- well-known and analytically tractable Black–Scholes cations, of neural SDEs. Broadly speaking these may be model, these prices can then be computed explicitly categorised in two groups. The first use SDEs as a way for call and put options (Black & Scholes, 1973). to gradually insert noise into a system, so that the terminal state of the SDE is the quantity of interest. The second The aim of this paper (and neural SDEs more generally) is instead consider the full time-evolution of the SDE as the to strengthen the capabilities of SDE modelling by hybri- quantity of interest. dising with deep learning. Tzen & Raginsky(2019a;b) obtain Neural SDEs as a con- 1.3. Contributions tinuous limit of deep latent Gaussian models. They train by optimising a variational bound, using forward-mode autod- SDEs are a classical way to understand uncertainty over ifferentiation. They consider only theoretical applications, paths or over time series. Here, we show that the current for modelling distributions as the terminal value of an SDE. classical approach to fitting SDEs may be generalised, and approached from the perspective of Wasserstein GANs. In Li et al.(2020) give arguably the closest analogue to the particular this is done by putting together a neural SDE neural ODEs of Chen et al.(2018). They introduce neu- and a neural CDE (controlled differential equation) as a ral SDEs via a subtle argument involving two-sided fil- generator–discriminator pair. trations and backward Stratonovich integrals, but in doing Classical approach Generalised (GAN) approach Fixed statistics Learnt statistic Final value SDE CDE Continuously Continuously inject noise perform control Brownian Data or SDE Motion Figure 1. Pictorial summary of just the high level ideas: Brownian motion is continuously injected as noise into an SDE. The classical approach fits the SDE to prespecified statistics. Generalising to (Wasserstein) GANs, which instead introduce a learnt statistic (the discriminator), we may fit much more complicated models. Neural SDEs as Infinite-Dimensional GANs Initial Hidden state Output Noise V (0,Iv) Wt = Brownian motion ∼ N Generator X0 = ζθ(V ) dXt = µθ(t, Xt) dt + σθ(t, Xt) dWt Yt = αθXt + βθ ◦ Discriminator H0 = ξφ(Y0) dHt = fφ(t, Ht) dt + gφ(t, Ht) dYt D = mφ HT ◦ · Figure 2. Summary of equations. so are able to introduce a backward-in-time adjoint equa- Rackauckas et al.(2020) and Gierjatowicz et al.(2020) tion, using only efficient-to-compute vector-Jacobian prod- we emphasise the connection of our approach to standard ucts. In applications, they use neural SDEs in a latent vari- mathematical formalisms. In terms of the two groups men- able modelling framework, using the stochasticity to model tioned at the start of this section, we fall into the second: Bayesian uncertainty. we use stochasticity to model distributions on path space. The resulting neural SDE is not an improvement to a simi- Hodgkinson et al.(2020) introduce neural SDEs as a limit lar neural ODE, but a standalone concept in its own right. of random ODEs. The limit is made meaningful via rough path theory. In applications, they use the limiting random ODEs, and treat stochasticity as a regulariser within a nor- 3. Method malising flow. However, they remark that in this setting 3.1. SDEs as GANs the optimal diffusion is zero. This is a recurring problem: Innes et al.(2019) also train neural SDEs for which the op- Consider some (Stratonovich) integral equation of the form timal diffusion is zero. X µ, dX = f(t, X ) dt + g(t, X ) dW , 0 ∼ t t t ◦ t Rackauckas et al.(2020) treat neural SDEs in classical for initial probability distribution µ, (Lipschitz continuous) Feynman–Kac fashion, and like Hodgkinson et al.(2020); functions f, g and Brownian motion W . The strong solu- Tzen & Raginsky(2019a;b), optimise a loss on just the ter- tion to this SDE may be defined as the unique function S minal value of the SDE.

Load more