<<

Choosing a Fast Initial Propagator for Rapid Convergence of the Algorithm in the Context of Simple Model Problems

Thomas Roy University of Oxford Supervisors: Andy Wathen, Debbie Samaddar

A technical report for InFoMM CDT Mini-Project 2 in partnership with Culham Centre for Fusion Energy Trinity 2016 Contents

1 Introduction 1

2 The Parareal Algorithm 2 2.1 Time-stepping methods ...... 3 2.2 Convergence Results from the Literature ...... 6 2.3 Properties and Options for Parareal ...... 8 2.4 Choice of the Coarse Solver G ...... 10

3 Models 12 3.1 Lorenz System ...... 13 3.2 Wave Equation ...... 14

4 Numerical Results 15 4.1 Scalar Linear Problem ...... 16 4.2 Lorenz System ...... 16 4.3 Wave Equation ...... 20

5 Discussion 21

6 Conclusion and Further Work 23

References 24

A Numerical Methods 26

ii 1 Introduction

In the last decades, advancements in hardware have made possible the numerical solution of increasingly complex models. However, these advancements are limited by the more recent stagnation in CPU clock speed. These limits have justified the focus on efficient parallel hardware and algorithms. In general, the parallelization of numerical solvers is done through the spatial variables, i.e. by separating the spatial domain in independent subdomains assigned to different CPUs. There have been multiple successful efforts to extend this to temporal parallelization in the case of time-dependent ordinary differential equations (ODEs), or time-dependent partial differential equations (PDEs) where spatial parallelization is saturated. These par- allel in time methods are intrinsically more challenging due to causality; the later solution depends on the earlier solution. Over the last 50 years, a variety of different time parallel time integration methods have been introduced (see [5] for a survey of current methods). Different strategies include multiple shooting methods, domain de- composition and waveform relaxation, space-time multigrid, and direct time parallel methods. This research focuses on the Parareal algorithm, introduced by Lions, Maday, and Turincini in [11]. This algorithm is a multiple shooting method where a fast initial propagator gives a coarse approximation of the solution on the whole time domain, while a fine solver is used to obtain more accurate solutions on independent smaller subdomains. The choice of these components affects the rate of convergence (con- traction) or non-convergence of the overall Parareal iteration. The Culham Centre for Fusion Energy (CCFE) is interested in Parareal for complicated physics simula- tions associated with plasmas, particularly in the behaviour of plasmas at the edge of the system where neutral transport becomes important. Previous attempts have been made by CCFE to characterise the behaviour of the algorithm in these contexts [13, 17]. CCFE seeks a better understanding of how the fast initial propagator affects the outcome and performance of the Parareal algorithm. The goal of this project is to go back to simpler problems in order to determine which factors are favourable for the convergence and stability of the algorithm. In this report, we want to determine what factors affect the convergence of the Parareal algorithm. In Section 2, we detail the Parareal algorithm in a very general formulation. Then, we introduce the needed concepts of numerical analysis before detailing theoretical results from the literature. We discuss the different choices of parameters for the algorithm, including the order of convergence of coarser solver. In Section 3, we detail the different models for which the theoretical results will be tested. In Section 4, we compare theoretical results with numerical results for our test models. In Section 5, we also include observations on the use of multi-step methods and the importance of our results.

1 2 The Parareal Algorithm

In this section, we describe the Parareal algorithm as done in [6, 7]. We consider a system of ordinary differential equations (ODEs) of the form

u0(t) = f(t, u(t)), t ∈ [0,T ], u(0) = u0, (1) where f : [0,T ] × RM → RM and u : R → RM . For the Parareal algorithm, we decompose the time domain Ω = [0,T ] into N time subdomains (time chunks, time slices) Ωn = [Tn,Tn+1], n = 0, 1,...,N − 1, with 0 = T0 < T1 < . . . < TN−1 < TN = T , and ∆Tn = Tn+1 − Tn. On each time subdomain Ωn, n = 0,...,N − 1, we consider the problem

0 un(t) = f(t, un(t)), t ∈ [Tn,Tn+1], un(Tn) = Un, (2) where the initial values Un are given by the matching condition

0 U0 = u , Un = un−1(Tn, Un−1), n = 1,...,N − 1, (3) where un−1(Tn,Un−1) denotes the solution of (2) with the initial condition un(Tn) = > > Un after time ∆Tn. Letting U = (U0 ,..., UN−1), we rewrite the system (3) in the form  0  U0 − u  U − u (T , U )   1 0 1 0  F (U) =  .  = 0, (4)  .  Un − uN−1(TN , UN−1) where F : RM×N → RM×N . Solving this with Newton’s method leads to the process

k+1 k −1 k k U = U − JF (U )F (U ), (5) where JF denotes the Jacobian of U. We can expand this into the following recur- rence: ( U k+1 = u0, 0 (6) U k+1 = u (T , U k) + ∂un (T , U k)(U k+1 − U k), n+1 n n+1 n ∂Un n+1 n n n where n = 1,...,N − 1. In general, the Jacobian terms in (6) are too expensive to compute exactly. Instead, the Parareal algorithm uses two approximations with different accuracy: let F (Tn,Tn+1, Un) be an accurate approximation of the solution k un(Tn+1, Un ) on the time subdomain Ωn, and let G(Tn,Tn+1, Un) be a less accurate approximation, for example on a coarser grid, or a lower order method, or an approx- imation using a simpler model than (1). Then, we approximate the solution in the k time subdomains in (2) by un(Tn+1, Un ) ≈ F (Tn,Tn+1, Un), and the Jacobian terms in (6) by

∂un k k+1 k k+1 k (Tn+1, Un )(Un − Un ) ≈ G(Tn,Tn+1, Un ) − G(Tn,Tn+1, Un ). (7) ∂Un

2 This gives us an approximation to (6) given by ( U k+1 = u0, 0 (8) k+1 k k+1 k Un+1 = F (Tn,Tn+1, Un ) + G(Tn,Tn+1, Un ) − G(Tn,Tn+1, Un ), which is the Parareal algorithm introduced in [11]. A natural initial guess for (8) is the 0 0 k k coarse solution, i.e. Un = G(Tn,Tn+1, Un). Let H(Tn,Tn+1, Un ) = F (Tn,Tn+1, Un )− k G(Tn,Tn+1, Un ). We illustrate the recurrence relation (8) in Figure 1. The coarse solutions given by G are computed serially, while the fine solutions given by F can be computed in parallel with each subproblem (2) is assigned to a different CPU. n

G G G 1 1 1 U0 U U U H 1 H 2 3

G G 2 2 U0 U U H 1 H 2

G G U 3 3 0 U1 U2 ...

U k n H

G U k+1 U k+1 k n n+1

Figure 1: The recurrence relation (8).

2.1 Time-stepping methods In this section, we introduce different key notions for time-stepping methods [8, 9], and the methods considered in this report. For more details on the time-stepping methods, see Appendix A. We first consider the following initial value problem y0(t) = λy, y(0) = 1, (9) the famous Dahlquist test equation. For one-step methods, we can always find a function R(z) such that the method applied to (9) may be written as

yn+1 = R(z)yn, (10) where z = ∆t λ.

3 Definition 2.1. The function R(z) is called the stability function of the method. It can be interpreted as the numerical solution after one-step for the Dahlquist test equation. The set S = {z ∈ C; |R(z)| ≤ 1} (11) is called the stability domain or stability region or region of absolute stability of the method. Definition 2.2. A method, whose stability domain satisfies

− S ⊃ C = {z; Re(z) ≤ 0}, is called A-stable. This concept of absolute stability can be extended beyond the scalar case. Con- sider a linear system y0(t) = Ay(t), (12) where A is a constant m × m matrix. For simplicity, we suppose that A is diago- nalizable, which means it has a set of m linearly independent eigenvectors vp such that Avp = λpvp for p = 1, ..., m, where λp are the corresponding eigenvalues. Let P = [v1, ..., vm] be the matrix of eigenvectors and D = diag(λ1, ..., λm) be the diag- onal matrix of eigenvalues, then A = PDP −1 and D = P −1AP . (13) Let u(t) = P −1y(t). We can rewrite (12) as u0 = Du. (14) This is a diagonal system of equations that we decouple into m independent scalar equations of the form 0 up = λpup, for p = 1, ...m, (15) where up are the components of u. For the overall method to be stable, each of the scalar problems must be stable, and this requires ∆tλp to be in the stability region of the method for p = 1, ..., m. This can be rewritten as a condition on the spectral radius of the matrix A, ρ(A). The concept of absolute stability does not directly apply to nonlinear systems. As in [9], we will consider a linearized approximation of the nonlinear system.

y0 = f(t, y). (16) Let ϕ(t) be a smooth solution of (16). We linearize f around ϕ(t) as follows ∂f y0(t) = f(t, ϕ(t)) + (t, ϕ(t))(y(t) − ϕ(t)) + O(ky − ϕk2). (17) ∂y We let u(t) = y(t) − ϕ(t) to obtain ∂f u0(t) = (t, ϕ(t))u(t) + O(kuk2) = J(t)u(t) + O(kuk2), (18) ∂y

4 where J(t) is the Jacobian matrix of the system. As an approximation, we consider the Jacobian to be constant and drop O(kuk2) to obtain the linear system

u0(t) = Ju(t). (19)

The stability analysis that we presented for linear systems can now be used on this linearized approximation. In this report, we consider a variety of one-step time-stepping methods. The forward Euler’s method (FE) is a first-order explicit method. Its stability function is given by R(z) = 1 + z. (20) The backward Euler’s method (BE) is a first-order implicit method. Its stability function is given by 1 R(z) = . (21) 1 − z The second-order Runge-Kutta method (RK2) is a second-order explicit method. Its stability function is given by

z2 R(z) = 1 + z + . (22) 2 The trapezoidal rule (TR) is a second-order implicit method. Its stability function is given by 1 + z/2 R(z) = . (23) 1 − z/2 The original fourth-order Runge-Kutta method (RK4) is a fourth-order explicit method. Its stability function is given by

z2 z3 z4 R(z) = 1 + z + + + . (24) 2 6 24 Runge-Kutta methods of order p ≤ 4 require p function evaluations. However, for p > 4, more than p function evaluations are necessary. One can obtain a fifth- order Runge-Kutta method with six function evaluations. We consider the fifth-order Runge-Kutta method (RK5) from [4]. Its stability function is given by

z2 z3 z4 z5 z6 R(z) = 1 + z + + + + + . (25) 2 6 24 120 1280 The stability regions for RK methods of order p = 1,..., 5 are illustrated in Figure 2. We also consider a multi-step method. The second order backward differentiation formula (BDF2) is a second-order implicit two-step method. When applied to the problem y0 = f(t, y), the BDF2 method is given by 4 1 2 y = y − y + f(t , y ). (26) n+1 3 n 3 n−1 3 n+1 n+1

5 Figure 2: Stability regions for some explicit Runge-Kutta methods (taken from [4]).

2.2 Convergence Results from the Literature

k Many publications include results for the accuracy of the Un values [2, 3, 6, 7]. The first result is from the original publication [11], and applies to the scalar linear problem given by 0 0 u (t) = au(t), t ∈ [0,T ], u(0) = u , with a ∈ C. (27)

Proposition 1. Let ∆T = T/N, tn = n∆T for n = 0, 1 ...,N. Consider (27) with k k a ∈ R. Let F (Tn,Tn+1,Un ) be the exact solution at Tn+1 of (27) with u(Tn) = Un , and k let G(Tn,Tn+1,Un ) be the corresponding backward Euler approximation with time-step ∆T . Then, k k+1 max |u(Tn) − Un | ≤ Ck∆T . (28) 1≤n≤N Therefore, for a fixed iteration step k, the Parareal algorithm behaves like a O(∆T k+1) method in terms of ∆T . This results can be extended for higher order time-stepping methods. Indeed, it has been shown in [3] that the error of Parareal algorithm behaves like O(∆T m(k+1)) when a method of order m is used for the coarse propagator G. A different approach was taken in [7]. The authors fix ∆T and study the behaviour of the Parareal algorithm as k goes to infinity. They obtain the following results of superlinear and linear convergence on bounded and unbounded intervals, respectively. The following theorems are from [7].

6 Theorem 1 (Superlinear convergence on bounded intervals). Let T < ∞, ∆T = k T/N, tn = n∆T for n = 0, 1 ...,N. Let F (Tn,Tn+1,Un ) be the exact solution at Tn+1 k k k of (27) with u(Tn) = Un , and let G(Tn,Tn+1,Un ) = R(a∆T )Un be a one-step method in its region or absolute stability. Then,

a∆T k k k |e − R(a∆T )| Y 0 max |u(Tn) − Un | ≤ (N − j) max |u(Tn) − Un|. (29) 1≤n≤N k! 1≤n≤N j=1

Note that max1≤n≤N |.| is equivalent to the discrete version of the infinity norm N or maximum norm k.k∞ over R .

Theorem 2 (Linear convergence on long time intervals). Let ∆T be given, and tn = k n∆T for n = 0, 1 .... Let F (Tn,Tn+1,Un ) be the exact solution at Tn+1 of (27) with k k k u(Tn) = Un , and let G(Tn,Tn+1,Un ) = R(a∆T )Un be a one-step method in its region or absolute stability. Then,

 a∆T k k |e − R(a∆T )| 0 sup |u(Tn) − Un | ≤ sup |u(Tn) − Un|. (30) n>0 1 − |R(a∆T )| n>0

Theorem 3 (Asymptotic convergence factor). Let ∆T be given, and Tn = n∆T k for n = 0, 1,.... Let F (Tn,Tn+1,Un ) be the exact solution at Tn+1 of (27) with k k k u(Tn) = Un , and let G(Tn,Tn+1,Un ) = R(a∆T )Un be a one-step method in its re- gion or absolute stability. Then, the asymptotic convergence factor of the Parareal algorithm is |ea∆T − R(a∆T )| ρ(R, a∆T ) = . (31) 1 − |R(a∆T )| Remark. The Parareal algorithm reaches asymptotic convergence if (31) is smaller than 1, which is not necessarily achieved. For a < 0, a ∈ R, ρ(R, a∆T ) < 1 is equivalent to ea∆T − 1 ea∆T + 1 ≤ R(a∆T ) ≤ . (32) 2 2 Condition (32) is essentially equivalent to the stability condition from [19]. We define stability region of the Parareal algorithm using the a one-step method with stability function R(z) as the coarse solver and the exact solution as the fine solver to be the set {z ∈ C; |ρ(R, z)| ≤ 1, |R(z)| ≤ 1}. The stability regions of the Parareal algorithm for RK methods of order p = 1,..., 5 are illustrated in Figure 3. Comparing these regions with the stability regions in Figure 2, we easily see that the Parareal algorithm is not necessarily stable when the time-stepping method used in the coarse solver is. Nonetheless, RK methods with larger stability regions do give similarly larger stability regions for the Parareal algorithm. Note that convergence can be achieved on bounded intervals without satisfying condition ρ(R, a∆T ) < 1. k We can also consider the fully discrete case, where F (Tn,Tn+1,Un ) is an approx- a∆T imate solution at Tn+1. In the previous results, it suffices to replace the term e by a term of the form Rf (a∆T ). Here, Rf (z) denotes the stability function of the

7 p=5 p=4 p=3

p=2

p=1

Figure 3: Stability regions of the Parareal algorithm for some explicit Runge-Kutta methods.

F -propagator as a solver to (1) over a time interval of length ∆T . If the approxi- mation is obtained by M steps of the fine solver with stability function r(z), then M Rf (a∆T ) = r(a∆T/M) . z By design, the fine solver F should be accurate enough such that Rf (z) ≈ e is good approximation. Similarly to what was done in Section 2.1, we extend these results to nonlinear systems of the form (1). The constant a can be taken as the most negative eigenvalue of the Jacobian of the system. In simple cases, these can be found analytically. In general, however, numerical approximations of the Jacobian are easier to obtain. These approximations usually use some kind of finite difference to approximate partial derivatives (for e.g. the numjac function in Matlab).

2.3 Properties and Options for Parareal In this section, we give an overview of the known properties of the Parareal algorithm. We then discuss the possible options for the algorithm. We start by reiterating that the solution given by the Parareal algorithm only converges to the analytical solution of the system (1) if the fine solver F gives the exact solution of the problems (2). Indeed, in general, the solution of the Parareal algorithm will converge to the solution given by the concatenation of the approximate solutions of the problems (2) as given by the fine solver F . In most cases, this approximate solution is equivalent to solving the problem (1) serially with the fine

8 solver F . In Section 5, we will discuss how this may not be the case for more complex implementations such as using a multi-step method for F or a coarser spatial grid for G. 0 1 0 1 0 We notice from the relation (8) that since U0 = U0 = u , then U1 = F (Tn,Tn+1, U0 ), i.e. the solution at T1 has converged after one Parareal iteration. Similarly, after k iterations, the solution at Tn, for n ≤ k, has converged. Therefore, after k = N iter- ations the solution will have converged over the whole time domain. This property can be thought of as the information from the initial condition travelling to later time subdomains. Obviously, the Parareal algorithm provides no computational gain if it converges in k = N iterations. In fact, each CPU would do the same work as one CPU would by solving the problem serially, not even considering the cost of the coarse solver. Therefore, the algorithm is only efficient as long as a lower number of iterations is needed for a sufficient convergence (sufficient relative to a chosen tolerance and stopping criteria). In practice, sufficient convergence can be achieved for k  N. We seek to determine what parameter options favour a fast convergence of the algorithm. The different parameters for the Parareal algorithm include the choice of the solvers F and G, and the length of the time subdomains Ωn, ∆Tn (or equivalently for a fixed time domain and fixed size of ∆Tn, the number of subdomains N). The importance of the size of ∆Tn depends on the choice F and G, in that the time- step used for the solvers may be chosen as ∆Tn. Alternatively, the solvers may use time-steps smaller than ∆Tn. The general idea for the choice of F and G is that F should be accurate, and G should be cheap to evaluate. One simple choice for F and G is using the same time- stepping method, but with smaller time-steps for F . In addition or alternatively, one could also use a higher order method for F . This leaves many choices for the time-stepping methods used for the coarse and fine solvers. First of all, there is a choice between explicit and implicit methods. The advantage of many implicit methods is that they are A-stable, or have a large stability region. As a matter of fact, using an A-stable method for the coarse solver may guarantee convergence of the algorithm, at the very least in the simple case considered in Theorems 1 and 2. However, there are cases where usually stable implicit methods cause the Parareal algorithm to become unstable. In [19], the authors observe that the Parareal algorithm is unstable for pure imaginary eigenvalues, as well as for some complex eigenvalues where the imaginary part is much larger than the real part. This also implies that the Parareal algorithm is typically unstable for hyperbolic equations. The main advantage of using explicit methods for G is simply because the are generally much cheaper to evaluate. In the case of applications considered by CCFE, coarse solvers usually use explicit methods, since implicit ones are too expensive. However, additional considerations may have to be taken for the stability of the numerical solution, especially in the case of PDEs, where spatial discretization will have an effect on stability. For PDEs, the coarse solver G could solve the equations with a coarser spatial discretization than the one used with F . This is of course cheaper to compute, but may be necessary for the stability of the numerical solution if an explicit time-stepping

9 method is used (e.g. to satisfy the Courant-Friedrichs-Lewy (CFL) condition). For finite differences, this may consist in a coarser spatial grid, and in the context of spectral methods, this includes the use of a reduced spectrum [14]. The use of two different spatial grids adds additional complexity to the algorithm since the operations (8) must be done on the same grid. To achieve this, some interpolation method can be used to transfer coarse grid solutions to the fine grid. The obtained coarse solutions are understandably less accurate than if they would have originally been solved on the fine grid. Nonetheless, the interpolation can be done in a more sophisticated way to add some smoothness and thus accuracy (e.g. by using fine solutions from previous iterations). The use of a coarser spatial grid can negatively impact convergence unless high order interpolation is used [15]. Furthermore, the coarse solver G could solve a simpler problem than the original problem solved by F [16]. This simplification of the original model should give similar solutions while being easier to solve. In this report, we consider the case where G uses a time-stepping method using one fixed ∆Tn = ∆T as its time-step, and F is another time-stepping method using a time-step δt < ∆T . The various time-stepping methods introduced in Section 2.1 are considered for both solvers.

2.4 Choice of the Coarse Solver G In most observed cases, the Parareal algorithm converges superlinearly as in Theorem N − k 1. At iteration k, the error is multiplied by the factor |ez − R(z)| , where k z = a∆T in the case of (27). Therefore, an error reduction at iteration k is only guaranteed if k |ez − R(z)| < . (33) N − k In chaotic systems, it is essential for numerical methods to be very accurate. Hence, we want to avoid Parareal iterations where the error increases. We seek an error reduction starting at the first iteration, i.e. |ez − R(z)| < 1/(N − 1). In general, a larger N does not imply that condition (33) is harder to satisfy, since z ∝ 1/N. In fact, |ez −R(z)| decreases faster than 1/N. In Figure 4, we illustrate how |ez −R(z)| decreases as N increases for real values of z = −10/N, −20/N. We observe that a larger N eventually results in error reduction. Higher order methods exhibit smaller convergence coefficients. This is expected since |ez − R(z)| is the truncation error for the (of the scalar linear problem (27)). Indeed, for a method of order p and z < 1, |ez − R(z)| = O(zp+1). This suggests that high order methods lead to a faster convergence of the Parareal algorithm, as long as z is small. In Figure 5, we illustrate the behaviour of the convergence coefficients for bounded intervals as z ∈ R− varies. We plot |ez − R(z)| for various methods and different ranges of z. In the top left graph, we observe that the higher order methods (RK4 and RK5) have very small convergence coefficients. Again, |ez − R(z)| is expected to be smaller for higher order methods when z < 1. In the top right graph, we also observe that the coefficients of the higher order method remain small for a wider range

10 Convergence coefficients Convergence coefficients 0.25 0.25 FE FE RK2 RK2 RK4 RK4 0.2 BE 0.2 BE TR TR RK5 RK5 1/N 1/N 0.15 0.15 -R(z)| -R(z)| z z

|e 0.1 |e 0.1

0.05 0.05

0 0 0 10 20 30 40 0 20 40 60 80 N N (a) z = −10/N (b) z = −20/N

Figure 4: Reduction of |ez − R(z)| as N increases. of z. Indeed, the region represented by z ∈ R such that |ez − R(z)| < 1 is larger for the higher order RK methods. We note that |ez − R(z)| only needs to be bounded for the algorithm to eventually converge due to the superlinear convergence on bounded interval. Again, we need |ez − R(z)| < k/(N − k) to have an error reduction. In the bottom left graph, we see that the coefficients for the explicit methods eventually increase exponentially, when z becomes large enough. On the other hand, as seen in the bottom right graph, the implicit methods have bounded coefficients. The BE coefficient goes asymptotically to zero, while the TR goes asymptotically to one. In Figure 6, we illustrate the behaviour of the convergence coefficients for un- |ez − R(z)| bounded intervals as z ∈ − varies. We plot for various methods and R 1 − |R(z)| different ranges of z. Similarly to what is observed in Figure 5, higher order meth- ods have very small coefficients for small values of z. The coefficients for the explicit methods eventually becomes greater than one when z becomes larger than one, which is against asymptotic convergence of the algorithm as mentioned in Remark 2.2. We observe in the two bottom graphs of Figure 6 that after becoming unbounded, the coefficients of RK4 and RK2 become negative. These remain bounded, but for val- ues of z outside the stability regions of their respective methods, which is necessary for asymptotic convergence as described in Theorem 3. It is easier to observe where the RK method have coefficients smaller than one in Figure 3. In the bottom right graph, we observe that the convergence coefficient for the trapezoidal rule becomes larger than one for values around z = −6. Indeed, this shows that using an A-stable time-stepping method does not ensure the asymptotic convergence of the parareal al- gorithm. On the other hand, the coefficients for the backward Euler method remain bounded and converge slowly to zero.

11 Convergence coefficients 0.15 FE RK2 RK4 BE TR RK5 0.1 -R(z)| z |e

0.05

0 -1 -0.8 -0.6 -0.4 -0.2 0 z

Convergence coefficients Convergence coefficients 0.25 1 FE TR RK2 0.9 BE RK4 0.2 BE 0.8 TR RK5 0.7

0.15 0.6

0.5 -R(z)| -R(z)| z z

|e 0.1 |e 0.4

0.3

0.05 0.2

0.1

0 0 -4 -3 -2 -1 0 -100 -80 -60 -40 -20 0 z z

Figure 5: |ez − R(z)| as a function of z.

3 Models

Most results from the literature, including those mentioned in Section 2.2, only con- sider the scalar linear problem (27). In addition to this simple case, we will consider other models to verify if the theory can be applied to more complex equations. We consider the Lorenz system, a system of nonlinear ODEs, as well as the wave equation, a linear PDE. In [7], two PDEs are considered: the pure heat equation (ut = uxx) and the advection equation (ut = ux). In that paper, the authors use a Fourier transform in space in order to obtain decoupled ODEs for each Fourier mode. For both the heat and advection equation, this specific spatial discretization requires time to be advanced with an A-stable method. The authors then obtain convergence results specific to this . Bounds on the convergence coefficients are given for the following

12 Convergence coefficients Convergence coefficients 0.4 1 FE FE 0.9 0.35 RK2 RK2 RK4 RK4 BE 0.8 BE 0.3 TR TR RK5 0.7 RK5 0.25 0.6

0.2 0.5

0.4 -R(z)|/(1-|R(z)|) 0.15 -R(z)|/(1-|R(z)|) z z

|e |e 0.3 0.1 0.2 0.05 0.1

0 0 -1 -0.8 -0.6 -0.4 -0.2 0 -2.5 -2 -1.5 -1 -0.5 0 z z

Convergence coefficients Convergence coefficients 1 1 FE FE 0.9 RK2 0.9 RK2 RK4 RK4 0.8 BE 0.8 BE TR TR 0.7 RK5 0.7 RK5

0.6 0.6

0.5 0.5

0.4 0.4 -R(z)|/(1-|R(z)|) -R(z)|/(1-|R(z)|) z z

|e 0.3 |e 0.3

0.2 0.2

0.1 0.1

0 0 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 -6 -5 -4 -3 -2 -1 0 z z

Figure 6: |ez − R(z)|/(1 − |R(z)|) as a function of z. coarse solvers: the backward Euler method, the trapezoidal rule, the two-stage singly diagonally implicit Runge-Kutta (SDIRK) method, and the three-stage Radau IIA method. These results are specific to the spatial discretization, which forces the use of implicit methods. As mentioned before, explicit methods are generally preferred for their cheaper computational cost. For the wave equation, we will therefore consider a spatial discretization allowing for explicit time-stepping methods.

3.1 Lorenz System The Lorenz system is a system of nonlinear ODEs introduced by Lorenz in 1963 [12]. It is known for having chaotic solutions for certain parameter values and initial conditions. The Lorenz equations are as follows: dx = σ(y − x), (34) dt dy = x(ρ − z) − y, (35) dt dz = xy − βz, (36) dt

13 where x, y, and z make up the system state, t is time and σ, ρ, and β are parameters. When ρ = 28, σ = 10, and β = 8/3, the system (34)-(36) has chaotic solutions. Almost all initial values will result in an invariant set known as the Lorenz attractor. The solution of the Lorenz system with initial state (x, y, z) = (1, 1, 1) is illustrated in Figure 7.

0 0 50

40 0 0 30 0

0 20 0 10 0 0

0 00 0 0 0 0 0 00 0 0 0 0 0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

(a) x (b) y (c) z

Figure 7: Components of the Lorenz system for 0 ≤ t ≤ 4.

In order to apply the theoretical results from Section 2.2, we need to compute the eigenvalues of the system Jacobian. The Jacobian of the Lorenz System is given by  −σ σ 0  J = ρ − z −1 −x . (37) y x −β

The eigenvalues can be computed numerically for fixed parameters and known system state.

3.2 Wave Equation

The wave equation (utt = uxx) can be written as the following first-order system: ∂u ∂v = , (38) ∂t ∂x ∂v ∂u = . (39) ∂t ∂x Here, u(t, x) can represent the velocity of the wave, while v(t, x) represents the height of the wave, t ∈ [0,T ] represents time and x ∈ [L, U] is the spatial variable. In addition to (38)-(39), we need initial conditions of the form u(0, x) = u0(x), v(0, x) = v0(x), and boundary conditions. In our case, we consider homogeneous Neumann boundary 2 −5x2 conditions and the initial conditions u0(x) = 0, v0(x) = 1 + 5 e . The solution v is shown in Figure 8 for various values of t. In order to apply the Parareal algorithm as defined in Section 2, we need a system of the form (1). Using a method of lines for the spatial derivatives, we can obtain such a system. For the spatial discretization, we use finite differences. We use staggered grid [1], which is necessary for the convergence of the numerical solution. Let Nx be

14

− − − −

(a) t = 0 (b) t = 0.25

− − − −

(c) t = 0.5 (d) t = 0.75

Figure 8: Height v of the wave as time varies.

the number of grid points for each variable, then vj ≈ v(t, xj), uj+1/2 ≈ u(t, xj+1/2), where xj = L + hj, for j = 0,...,Nx − 1 and h = (U − L)/(Nx − 1). We then use the following spatial discretization: ∂u v − v j+1/2 = j+1 , (40) ∂t h ∂v u − u j = j+1/2 j−1/2 , (41) ∂t h ∂u ∂u for j = 1,...N − 2. The boundary conditions are given by 1/2 = Nx−1/2 = 0, x ∂t ∂t ∂v ∂v 0 = Nx−1 = 0. ∂t ∂t The system (40)-(41) is of the form (1) with size M = 2Nx.

4 Numerical Results

In this section, we investigate the behaviour of the error at different Parareal iterations for the models of Section 3. We compare the numerical results with the theoretical

15 superlinear and linear bounds from Theorems 1 and 2.

4.1 Scalar Linear Problem We start by confirming the theoretical bounds of Section 2.2 for the linear scalar problem (27) with a = −1, and T = 20 and T = 30. For the Parareal algorithm, we separate the time domain in N = 64 time subdomains of equal length ∆T = T/N. For the coarse solver, we use Backward Euler with time-step ∆T . For the fine solver, we seek a very accurate solution and thus we use the fourth-order Runge-Kutta method with the very fine time-step δt = T/214. In Figures 9 and 10, we illustrate the errors as the parareal iterations progress. We evaluate the error between the solution at the kth iteration, and the analytical solution u = e−t. In the left graphs, we study the absolute error of the solution of the last time subdomain. We observe that this error reduces to zero superlinearly. In the right graphs, we study the errors as defined in Theorem 1, i.e. using the maximal norm. We observe that this error reduces according to the linear bound. See [7] for more results of the scalar linear case.

Error of Last Chunk for scalar ODE, with T=20 L∞ error for scalar ODE, with T=20 5 10 10 0

10 0 10 -5

10 -5 10 -10

10 -10 10 -15

Error Error Superlinear bound Superlinear bound Linear bound Linear bound 10 -15 10 -20 0 5 10 15 20 0 5 10 15 20 k k Figure 9: Error for the scalar linear problem for Ω = [0, 20].

4.2 Lorenz System We now study the error for the Lorenz System solved on the time interval [0,T ]. For the Parareal algorithm, we separate the time domain in N time subdomains of equal length ∆T = T/N. Here, we use T = 4. In order to compare the numerical results with the theoretical linear and super- linear bounds, we need a value for the equivalent of the constant a in the scalar linear case. We will use the eigenvalues of the Jacobian (37). These values will be different

16 Error of Last Chunk for scalar ODE, with T=30 L∞ error for scalar ODE, with T=30 4 10 10 0

2 10 10 -2

10 0 10 -4

10 -2 10 -6

10 -4 10 -8

10 -6 10 -10

-8 Error -12 Error 10 Superlinear bound 10 Superlinear bound Linear bound Linear bound 10 -10 10 -14 0 5 10 15 20 0 5 10 15 20 k k Figure 10: Error for the scalar linear problem for Ω = [0, 30]. throughout the time domain since x, y and z will change as illustrated in Figure 7. Given an initial trajectory (e.g. given by the coarse solver), we can easily obtain N sets of eigenvalues, for each set of initial values of the subproblems. Given a single time chunk, the best estimation for a would be the eigenvalue of the Jacobian with the most negative real part, denoted λmin. However, we are given N different sets n of eigenvalues and therefore N different λmin. The most natural value for a would n be to take the eigenvalue with the most negative real part out of all the λmin. This surely gives a very strict bound on convergence if the eigenvalues vary substantially throughout the time domain. Alternatively, we can update a such that it depends on the current iteration k, for example by ignoring the eigenvalues from the time subdomains which have converged (which is guaranteed for each Tn, n ≤ k). A much less formal alternative for a would be to consider the average of the eigenvalues with minimal real parts. It remains to be seen if one can find a formal yet less strict value n for a. For this specific example, the λmin at the beginning of the simulation have more negative real parts than later in the simulation. This is easily seen in the larger variations of the solution seen in Figure 7. In Figures 11-14, we illustrate how the observed errors compare to the theoretical bounds from Theorems 1 and 2 when using TR and RK4 as coarse solvers with N = 32, 64 subdomains. For each variable x, y, z, we calculate the maximal absolute error over the different time subdomain and take the sum of these errors. In the left n graphs, for the bounds, we use the minimum of all the λmin, as discussed above. For n the right graph, we use the average value of the λmin as a more lenient value. In Figure 11, we observe that the algorithm converges superlinearly. As expected, the bound is more restrictive in Figure 11a than in Figure 11b. However, in Figure 12b, we observe that the error exceeds the bound given by the average value of the n λmin. Again, the Parareal algorithm performs much better than the stricter bound in

17 Figure 12a.

L error for Lorenz, at T= 4 L error for Lorenz, at T= 4 10 5 10 5

10 0 10 0

10 -5 10 -5

10 -10

10 -10 10 -15

10 -15 Error 10 -20 Error Superlinear bound Superlinear bound Linear bound Linear bound 10 -20 10 -25 0 5 10 15 20 0 5 10 15 20 k k n n (a) Minimum λmin (b) Average λmin Figure 11: Sum of the errors of each variable for the Lorenz system for N = 64, using TR for the coarse solver.

L error for Lorenz, at T= 4 L error for Lorenz, at T= 4 10 4 10 5

10 2

10 0

10 0

10 -2 10 -5

10 -4

10 -10

10 -6 Error Error Superlinear bound Superlinear bound Linear bound Linear bound 10 -8 10 -15 0 5 10 15 20 0 5 10 15 20 k k n n (a) Minimum λmin (b) Average λmin Figure 12: Sum of the errors of each variable for the Lorenz system for N = 32, using TR for the coarse solver.

In Figure 13a, we observe that the algorithm performs even better than the strict

18 linear bound. In Figure 13b, the algorithm also performs significantly better than the superlinear bounds. This indicates further consideration as to the choice of the parameter a, especially when the methods used are very accurate. Furthermore, in Figure 14a, we see that the algorithm converges relatively quickly even when the error bound increases with iterations and the coefficient for linear convergence is greater n than one. As mentioned above, the λmin at the beginning of the simulation have more negative real parts than later in the simulation. These first eigenvalues actually n result in z = λmin∆T outside the stability region of RK4. Since these first time subdomains converge early in the algorithm, this ceases to influence later iterations. Nonetheless, the theoretical bounds exist under the assumption that the coarse solver is in its region of absolute stability. In Figure 14b, we see that the algorithm performs significantly well relative to the theoretical bounds.

L error for Lorenz, at T= 4 L error for Lorenz, at T= 4 10 0 10 0

10 -2

10 -5 10 -4

10 -6 10 -10 10 -8

10 -10 10 -15

Error Error 10 -12 Superlinear bound Superlinear bound Linear bound Linear bound 10 -14 10 -20 0 2 4 6 8 0 2 4 6 8 k k n n (a) Minimum λmin (b) Average λmin Figure 13: Sum of the errors of each variable for the Lorenz system for N = 64, using RK4 for the coarse solver.

As discussed in Section 2.4, higher order methods for the coarse solver should result in faster convergence of the Parareal algorithm for small values of z = a∆T . This is easily seen when comparing the numerical results in Figures 11-12, where a second-order method is used, to those in Figures 13-14, where a fourth-order method is used. Since CCFE already uses RK4 commonly for their coarse solver, we wish to determine if a fifth-order method could result in faster convergence. In Figure 15, we compare the errors of RK4 and RK5 for N = 32, 64. As expected, RK5 results in a faster convergence of the Parareal algorithm.

19 L error for Lorenz, at T= 4 L error for Lorenz, at T= 4 10 20 10 5

10 15

10 0 10 10

10 5 10 -5 10 0

10 -5 10 -10

Error Error 10 -10 Superlinear bound Superlinear bound Linear bound Linear bound 10 -15 10 -15 0 5 10 15 20 0 5 10 15 20 k k n n (a) Minimum λmin (b) Average λmin Figure 14: Sum of the errors of each variable for the Lorenz system for N = 32, using RK5 for the coarse solver.

L error for Lorenz, at T= 4 L error for Lorenz, at T= 4 10 2 10 0

10 0 10 -2

10 -2 10 -4

10 -4 10 -6

10 -6 10 -8

10 -8 10 -10

10 -10 10 -12 Error RK4 Error RK4 Error RK5 Error RK5 10 -12 10 -14 0 2 4 6 8 1 2 3 4 k k (a) N = 32 (b) N = 64

Figure 15: Comparison of the errors for the RK4 and RK5 coarse solvers.

4.3 Wave Equation We now consider the wave equation with the spatial discretization from Section 3.2. For the Parareal algorithm, we separate the time domain in N time subdomains

20 of equal length ∆T = T/N. In the scope of this project, we did not evaluate the eigenvalues of the Jacobian of the system given by 40-41. These could be calculated numerically. For this model, we just verify if the algorithm satisfies the qualitative behaviour of superlinear convergence. In Figure 16, we illustrate the error of the algorithm for N = 32, 64 using TR as the coarse solver. We observe superlinear convergence. It remains to be seen if the observed errors behave similarly to the theoretical bound.

∞ L error for wave equation, at T= 6 10 0

10 -1

10 -2

10 -3

10 -4

10 -5

-6 10 N = 32 N = 64 10 -7 0 2 4 6 8 10 12 k

Figure 16: Error for the wave equation using the TR coarse solver. We use the maximal norm in time, and the L2 error in space.

5 Discussion

CCFE has observed cases where the Parareal algorithm fails to converge when the size of the time subdomains are too small. However, we were unable to reproduce similar results with the simple cases and our simple implementation of Parareal. One main difference with the implementations is the use of multi-step methods for the fine solver. In the literature, multi-step method are usually not considered compatible with the Parareal algorithm. In [18], Staff wrote that multi-step methods were ill-suited to Parareal: "The algorithm favors single-step methods. The Parareal algorithm consists of N individual problems with its own initial values. The startup problems of the multi- step schemes disfavours them in the Parareal algorithm context. Single-step schemes, which suffers from no start-up problems, are therefore preferable." In the most simple implementation of the algorithm, the coarse solver G is a time- stepping method using the time-step ∆T . It is obvious from (8) that for the original Parareal algorithm, this time-stepping method is meant to be a one-step method. Indeed, this formulation only includes one initial condition for the coarse solve while a multi-step method would require more than one. To be clear, we are not considering

21 the case where the coarse solver is a time-stepping method with multiple time-steps smaller than ∆T . It can easily be verified that, for small enough time-steps, higher order methods are usually preferable to lower order methods using multiple smaller time-steps. The fine solver F solves an initial value problem on a interval of size ∆T . In most cases, the fine solver will consist of a high order method using several time-steps smaller than ∆T . For the turbulence application [13, 14, 17], the system of ODEs is solved using a pseudo-spectral method to evaluate the non-linearities. The fine solver F is the stiff solver within the VODPK package [10], which uses a (diagonally, right-) preconditioned, Krylov implicit solver based on BDF of variable order and time-step. For an mth order BDF, we need m initial values. This means there needs to be m − 1 steps before reaching the desired order. This start-up is not a problem if the interval is long and the number of time-steps is large. Special considerations must be taken when the time subdomains are small, since it leaves less time for the BDF method to reach its desired accuracy. However, this can be achieved by using lower order BDF with very small time-steps, or an mth order one-step method. Since similar precautions were taken in the case of the turbulence application, it is likely that the convergence problems are not related to the choice of the fine solver, but rather specific to the solved model. In fact, there are few practical results in the literature for other models than the scalar linear problem or other simple models. In general, it is not guaranteed that the algorithm will behave as expected for any nonlinear system of differential equations. There are even fewer results in the case of partial differential equations. The choice of the coarse solver, including spatial discretization has a great influence on the convergence of the algorithm. For exam- ple, for the advection equation, certain choices of the numerical methods used with Parareal result in non-convergence of the algorithm [7]. Further studies are needed for different classes of equations as well as different choices of spatial discretization. In this report, we determined that using higher order methods for the coarse solver results in faster convergence of the Parareal algorithm. However, the measure of fast convergence we use is simply the number of Parareal iterations needed to reach negligible error levels. We did not consider the gain in terms of computational time, which is the main motivation behind parallel processing. If we suppose that the cost of the coarse solver is negligible, then a computational gain is achieved as long as sufficient convergence is reached with k < N iterations (even if computational power needed is higher since we use N CPUs). However, the cost of the coarse solver is usually not negligible. Higher order methods usually require a larger number of function evaluations per time-step. Therefore, there is a trade-off between fast convergence and a cheap coarse solver. Given a particular application, one needs to determine where the optimal trade-off lies. For explicit one-step methods such as the considered RK methods, reaching increasingly higher order is not necessarily trivial. Indeed, RK4 requires four function evaluations, but RK5 requires six. Hence, the faster convergence provided by RK5 must be significant enough to justify a 50% increase in the cost of the coarse solver.

22 6 Conclusion and Further Work

In this report, we investigated the different options which have an influence on the convergence of the Parareal algorithm. Both theoretical and numerical studies were conducted. Based on theoretical results from the literature, we determined that the choice of the coarse solver was key in the resulting rate of convergence (or non-convergence) of the algorithm. From our observations, using higher order time-stepping methods for the coarse solver results in a faster convergence of the Parareal algorithm. This result was then confirmed by numerical results for a simple scalar linear problem, the Lorenz system, and the wave equation with finite differences for the spatial discretization. However, the faster convergence for the high order coarse solvers is also accompanied by more expensive coarse solves. We recommend that CCFE test the performance of higher order coarse solvers such as the fifth-order Runge-Kutta method. Other more complex one-step high order methods could be considered. The theoretical results from the literature prove the linear and superlinear conver- gence of the Parareal algorithm, for unbounded and bounded intervals, respectively, in the case of a simple scalar linear problem. It is not straightforward how well these results apply to nonlinear systems. In the observed cases at least, the theoret- ical bounds are satisfied by the numerical errors of the algorithm. However, these bounds tend to be much more restrictive than the actually observed errors. Nonethe- less, the qualitative behaviour of the error corresponds with superlinear convergence. Therefore, further work is needed to get better convergence results for the nonlinear cases. The obtained theoretical error bounds will potentially be very specific to the application as well as the spatial discretization in the case of PDEs. Additionally, more analysis of the more advanced versions of Parareal is needed. In fact, most theoretical results focus on simple implementations of the algorithm, similarly to what is done in this report. A formal analysis of the use of a coarser spatial grid for the coarse solver is needed, including the way the interpolation between the coarse and fine grids is made. Further analysis on the use of simpler physical models for the coarse solver is also needed.

23 References

[1] Akio Arakawa and Vivian R Lamb. Computational design of the basic dynamical processes of the UCLA general circulation model. Methods in computational physics, 17:173–265, 1977.

[2] Guillaume Bal. Parallelization in time of (stochastic) ordinary differential equa- tions. Math. Meth. Anal. Num.(submitted), 2003.

[3] Guillaume Bal. On the convergence and the stability of the parareal algorithm to solve partial differential equations. In Domain decomposition methods in science and engineering, pages 425–432. Springer, 2005.

[4] John C Butcher. Numerical methods for ordinary differential equations. John Wiley & Sons, 2008.

[5] Martin J Gander. 50 years of time parallel time integration. In Multiple Shooting and Time Domain Decomposition Methods, pages 69–113. Springer, 2015.

[6] Martin J Gander and Ernst Hairer. Nonlinear convergence analysis for the parareal algorithm. In Domain decomposition methods in science and engineering XVII, pages 45–56. Springer, 2008.

[7] Martin J Gander and Stefan Vandewalle. Analysis of the parareal time-parallel time-integration method. SIAM Journal on Scientific Computing, 29(2):556–578, 2007.

[8] E. Hairer, S.P. Nørsett, and G. Wanner. Solving Ordinary Differential Equations I: Nonstiff Problems, volume 8 of Springer Series in Comp. Math. Springer, 1993.

[9] E. Hairer and G. Wanner. Solving Ordinary Differential Equations II: Stiff Prob- lems and Differential-Algebraic Problems, volume 14 of Springer Series in Comp. Math. Springer, 1996.

[10] AC Hindmarsh. Gear: Ordinary differential equation system solver. UCID- 30001, rev, 3, 1974.

[11] Jacques-Louis Lions, Yvon Maday, and Gabriel Turinici. Résolution d’EDP par un schéma en temps «pararéel». Comptes Rendus de l’Académie des Sciences- Series I-Mathematics, 332(7):661–668, 2001.

[12] Edward N Lorenz. Deterministic nonperiodic flow. Journal of the atmospheric sciences, 20(2):130–141, 1963.

[13] José Miguel Reynolds-Barredo, David E Newman, R Sanchez, D Samaddar, Lee A Berry, and Wael R Elwasif. Mechanisms for the convergence of time- parallelized, parareal turbulent plasma simulations. Journal of Computational Physics, 231(23):7851–7867, 2012.

24 [14] José Miguel Reynolds-Barredo, DE Newman, and R Sanchez. An analytic model for the convergence of turbulent simulations time-parallelized via the parareal algorithm. Journal of Computational Physics, 255:293–315, 2013.

[15] Daniel Ruprecht. Convergence of parareal with spatial coarsening. PAMM, 14(1):1031–1034, 2014.

[16] D. Samaddar, D.P. Coster, X. Bonnin, C. Bergmeister, E. Havlickova, L. A. Berry, W. R. Elwasif, and D. B. Batchelor. Temporal parallelization of edge plasma simulations using the parareal algorithm and the SOLPS code. Computer Physics Communications, submitted for publication.

[17] Debasmita Samaddar, David E Newman, and Raúl Sánchez. Parallelization in time of numerical simulations of fully-developed plasma turbulence using the parareal algorithm. Journal of Computational Physics, 229(18):6558–6573, 2010.

[18] GA Staff and E Rønquist. The parareal algorithm, a survey of present work. NOTUR Emerging technologies, Cluster technologies, 2003.

[19] Gunnar Andreas Staff and Einar M Rønquist. Stability of the parareal algorithm. In Domain decomposition methods in science and engineering, pages 449–456. Springer, 2005.

25 A Numerical Methods

In this appendix, we detail the time-stepping methods from Section 2.1. The methods are applied to a problem of the form (1), where un ≈ u(tn) and tn = n∆t. (i) Forward Euler (FE): u − u n+1 n = f(t , u ). (42) ∆t n n (ii) Backward Euler (BE): u − u n+1 n = f(t , u ). (43) ∆t n+1 n+1 (iii) Second-order Runge-Kutta (RK2): u − u 1 1 n+1 n = f(t + ∆t, u + f(t , u )). (44) ∆t n 2 n 2 n n (iv) Trapezoidal rule (TR): u − u 1 n+1 n = (f(t , u ) + f(t , u )) . (45) ∆t 2 n n n+1 n+1 (v) Fourth-order Runge-Kutta (RK4):

k1 = f(tn, un), ∆t ∆t k = f(t + , u + k ), 2 n 2 n 2 1 ∆t ∆t k = f(t + , u + k ), 3 n 2 n 2 2 k4 = f(tn + ∆t, un + ∆tk3), u − u 1 n+1 n = (k + k + k + k ). (46) ∆t 6 1 2 3 4 (vi) Fifth-order Runge-Kutta (RK5) [4]:

k1 = f(tn, un), ∆t ∆t k = f(t + , u + k ), 2 n 4 n 4 1 ∆t ∆t ∆t k = f(t + , u + k + k ), 3 n 4 n 8 1 8 2 ∆t ∆t k = f(t + , u − k + ∆tk ), 4 n 2 n 2 2 3 3∆t 3∆t 9∆t k = f(t + , u + k + k ), 5 n 4 n 16 1 16 4 3∆t 2∆t 12∆t 12∆t 8∆t k = f(t + ∆t, u − k + k + k ) − k + k , 6 n n 7 1 7 2 7 3 7 4 7 5

u − u 1 n+1 n = (7k + 32k + 12k + 32k + 7k ). (47) ∆t 90 1 3 4 5 6

26