JUNE 2016 H O T T A E T A L . 2215

A Semi-Implicit Modification to the Lorenz N-Cycle Scheme and Its Application for Integration of Meteorological Equations

DAISUKE HOTTA Japan Meteorological Agency, Tokyo, Japan, and University of Maryland, College Park, College Park, Maryland

EUGENIA KALNAY University of Maryland, College Park, College Park, Maryland

PAUL ULLRICH Department of Land, Air and Water Resources, University of California, Davis, Davis, California

(Manuscript received 23 September 2015, in final form 13 January 2016)

ABSTRACT

The Lorenz N-cycle is an economical time integration scheme that requires only one function evaluation per time step and a minimal memory footprint, but yet possesses a high order of accuracy. Despite these advantages, it has remained less commonly used in meteorological applications, partly because of its lack of semi-implicit formulation. In this paper, a novel semi-implicit modification to the Lorenz N-cycle is proposed. The advantage of the proposed new scheme is that it preserves the economical memory use of the original explicit scheme. Unlike the traditional Robert–Asselin (RA) filtered semi-implicit leapfrog scheme whose formal accuracy is only of first order, the new scheme has second-order accuracy if it adopts the Crank– Nicolson scheme for the implicit part. A linear stability analysis based on a univariate split-frequency os- cillation equation suggests that the 4-cycle is more stable than other choices of N. Numerical experiments performed using the dynamical core of the Simplified Parameterizations Primitive Equation Dynamics (SPEEDY) atmospheric general circulation model under the framework of the Jablonowski–Williamson baroclinic wave test case confirms that the new scheme in fact has second-order accuracy and is more accurate than the traditional RA-filtered leapfrog scheme. The experiments also give evidence for Lorenz’s claim that the explicit 4-cycle scheme can be improved by running its two ‘‘isomeric’’ versions in alternating sequences. Unlike the explicit scheme, however, the proposed semi-implicit scheme is not improved by alternation of the two versions.

1. Introduction (PDEs) that describe the governing laws of geophysical fluid flows. There is thus a high demand for improvements A unique feature of the atmospheric and oceanic of accuracy of such models. sciences is that, unlike other fields of natural sciences, One of the major challenges in designing numerical controlled experiments are difficult to perform. Accord- integration schemes for atmospheric models, in par- ingly, numerical experimentation has become an increas- ticular the atmospheric general circulation models ingly important methodology in and physical (AGCMs), is the so-called stiffness problem: the equa- oceanography. A key role in numerical experimentation is tions solved by AGCMs contain not only the slower played by atmospheric or oceanic models which numeri- waves that are relevant to the actual weather phenom- cally integrate hydrodynamic partial differential equations ena, but also the faster waves that are of little meteo- rological interest. The phase speeds of the faster waves are typically an order-of-magnitude faster than those of Corresponding author address: Daisuke Hotta, Numerical Pre- diction Division, Japan Meteorological Agency, 1-3-4 Otemachi, the slower waves. To satisfy the Courant–Friedrichs– Chiyoda-ku, Tokyo 100-8122, Japan. Lewy (CFL) stability condition, an explicit temporal E-mail: [email protected] integration scheme needs to use an overly short time

DOI: 10.1175/MWR-D-15-0330.1

Ó 2016 American Meteorological Society Unauthenticated | Downloaded 10/01/21 06:06 AM UTC 2216 MONTHLY WEATHER REVIEW VOLUME 144 step just to maintain stability, making the integration issue is commonly dealt with by running a two-time- significantly more expensive. While several advanced level scheme, such as forward-Euler scheme, at the very approaches have been proposed that resolve this issue, first step. The third issue can be resolved by filtering out including Laplace transform methods (Clancy and the computational mode by applying the Robert– Lynch 2011), semi-implicit predictor-corrector methods Asselin (RA) filter (Robert 1966; Asselin 1972); this (Clancy and Pudykiewicz 2013a), and exponential in- treatment also introduces the side effect of degrading tegrator methods (Clancy and Pudykiewicz 2013b), the the scheme to first-order accuracy by damping not only solution adopted by most current AGCMs is to use a the computational mode but also the physical mode. semi-implicit scheme that treats the terms responsible Despite these disadvantages, the leapfrog scheme with for the fast waves implicitly and the other terms ex- semi-implicit modification (Robert 1969), combined plicitly (Robert 1969); this treatment clears the CFL with RA filter and separate treatment of dissipative part, condition for the fast waves and thus allows an efficient has been the most widely used scheme for AGCMs, integration with much longer time steps. The availability and a better scheme free from these limitations has been of this semi-implicit treatment has been an important long sought. factor for a temporal integration scheme to be used in One way to achieve this goal is to alleviate the AGCMs. It should be noted, however, that the advan- limitations by improving the classical RA-filtered tage of the semi-implicit approach is challenged by the leapfrog scheme. Recently, Williams (2009) pro- increasing trend in high-performance computing to ex- posed an improvement to the RA filter. The new filter, ploit massively parallel machines with relatively slow called Robert–Asselin–Williams (RAW) filter, pre- internode communications because implicit solvers typi- serves the second-order accuracy of the leapfrog cally necessitate global communication that risks making scheme without any significant increase in computa- the scheme less scalable. We shall revisit this point in the tional cost. The advantage of the RAW filter over the last section of the paper. RA filter is confirmed also for the semi-implicit leap- Traditionally, in the community of atmospheric mod- frog scheme (Williams 2011; Amezcua et al. 2011). eling, a rather simple, three-step centered-differencing Williams (2013) devised further improvements in this scheme, commonly known as the leapfrog scheme, has line, leading to schemes with even higher accuracy in been regarded as the method of choice and adopted by amplitude (up to seventh order; the phase error re- many AGCMs. In pursuit of more accuracy and sophis- mains second order). While these improved schemes tication, recent new-generation AGCMs have switched effectively eliminate the undesirable artificial damping to more advanced higher-order temporal integration of physical modes, other shortcomings of the filtered schemes. At present, however, the leapfrog is still in leapfrog scheme (i.e., the instability for dissipative wide use in many traditional AGCMs, perhaps because terms and the necessity of special treatment for the of its desirable properties, which include the following: initial steps) remain unresolved. The efficacy of ren- ease of implementation, availability of a stable semi- dering the RA filter’s second order is also diminished implicit version, low cost in computational time, low if a first-order scheme is used for dissipative terms to memory consumption, and conservation of energy for a suppress instability. nondissipative system. Attempts have also been made to seek for alternative The above desirable properties are, however, tainted schemes that are better suited for atmospheric and by the following undesirable features (Durran 1991). oceanic models. Multistep schemes such as the Adams– First, the scheme is unstable when applied to a system Bashforth family of schemes, for example, can have with dissipation. Second, being a three-time-level the order of accuracy that is higher than the leapfrog scheme, it necessitates special treatment at the very without increasing computational expenses. Durran first several steps. Last, and most importantly, the leap- (1991) found, however, that while the three-step frog scheme produces, when applied to a nonlinear third-order Adams–Bashforth scheme is a viable al- system, a spurious computational mode, which, if left ternative to the RA-filtered explicit leapfrog scheme, unattenuated, results in time-splitting instability. In this scheme cannot replace the semi-implicit leapfrog AGCMs, the first issue is typically dealt with by applying scheme because the Adams–Bashforth scheme be- the leapfrog only to nondissipative dynamics part; dis- comes unstable if it is combined with a semi-implicit sipative processes such as physics and damping are scheme for fast modes. The Runge–Kutta family of treated with separate schemes, such as the explicit schemes can also be more accurate than the leapfrog. forward-Euler or implicit backward-Euler scheme. This Kar (2006) and Whitaker and Kar (2013), for example, treatment unfortunately comes with the side effect of have successfully developed semi-implicit versions making the scheme only first-order accurate. The second of Runge–Kutta-type schemes and showed their

Unauthenticated | Downloaded 10/01/21 06:06 AM UTC JUNE 2016 H O T T A E T A L . 2217 advantages over the RA-filtered semi-implicit leap- whoseresultsarepresentedinsection 5. Section 6 frog scheme. Their schemes, however, are more concludes the paper with a summary and an outlook computationally expensive and consume more mem- for future research. ory than the leapfrog. In 1971, devised an ingenious 2. Explicit Lorenz N-cycle temporal integration scheme, now called the Lorenz N-cycle scheme, for a system of first-order ordinary dif- This section describes the algorithm of the explicit ferential equations (ODEs) (Lorenz 1971). Its ingenuity Lorenz N-cycle and discusses its advantages, espe- resides in high order of accuracy, low computational cially in comparison to the traditional RA-filtered expenses, and the economy of memory usage. It is self- leapfrog scheme. starting (i.e., it does not require model states at mul- a. The algorithm tiple steps for initiating integration), computationally as efficient as the leapfrog, both in terms of speed and Let us consider a problem of numerically integrating memory usage, but can be of Nth-order accurate at the system of nonlinear PDEs that govern the evolution every N step for an integer N # 4. Despite these ad- of a geophysical fluid system such as the atmosphere and vantages, the Lorenz N-cycle remained rather obscure ocean. After applying spatial discretization (e.g., by fi- in atmospheric and oceanic sciences. Although there nite difference, finite element, finite volume, or spectral is at least one oceanic model that uses this scheme methods), the PDEs result in the following system of (Gent and Cane 1989), it seems not to have been ap- ODEs: plied to an . In particular, a semi- dx implicit modification to this scheme has not been 5 F(x), (1) developed. dt The aim of this paper is to present a new semi-implicit where x 5 [x1(t), ..., xM(t)] is an M-dimensional vec- modification to the Lorenz N-cycle and show that this tor function of t and F(x) 5 [F1(x1, ..., xM), ..., scheme can improve the accuracy of an AGCM. In de- M M FM(x1, ..., xM)] is a function from R / R .Ingeo- signing our semi-implicit version, we put particular physical applications, the number of the prognostic emphasis on preserving the low memory consumption of variables, M, tends to be huge: as of 2015, in an opera- the original explicit scheme. As we describe in section 2, tional deterministic numerical weather prediction (NWP) the Lorenz N-cycle schemes can be thought of as a model, for example, M typically can be as large as O(109). special subfamily of Runge–Kutta schemes (in the sense Because of the huge size of the problem, in designing a that they can be represented with Butcher tableau no- numerical scheme, minimizing the memory consumption tation). In fact, treating the Lorenz 3-cycle as a special is of particular importance. In trying to find an eco- case of the Runge–Kutta scheme, Whitaker and Kar nomical scheme, Lorenz (1971) derived two ‘‘isomeric’’ (2013) proposed a semi-implicit formulation. Their versions of schemes for the above problem. Using the scheme, however, is designed from a different moti- elegant notation introduced by Purser and Leslie (1997), vation than ours: while our priority is in minimizing the algorithm of one version of the schemes, which we memory footprint, their priority was in ensuring sta- refer to by ‘‘version A’’ hereafter, can be schematically bility. Consequently, the two schemes have different written as follows: pros and cons, which are discussed in section 6 of N-cycle A this paper. This paper is organized as follows: section 2 concisely w0 5 summarizes the algorithm of the Lorenz N-cycle and 1, (2) N discusses its advantages, especially in comparison with wk 5 (k 5 1, ..., N 2 1) (3) the traditional leapfrog scheme. Section 3 first describes N 2 k the traditional semi-implicit modification to the leapfrog do k 5 0, ... scheme and then presents the formulation of our semi- w ) wmod(k,N) (4) implicit modification to the Lorenz N-cycle. It then gives ) 1 2 the analysis of its accuracy and stability. Section 4 briefly G wF(x) (1 w)G (5) describes the AGCM, called the Simplified Parameter- x ) x 1 GDt (6) izations Primitive Equation Dynamics (SPEEDY) end do. model, whose dynamical core is tested in our study. It also describes our verification method that is based Similarly, the other version, which we call ‘‘version B,’’ on a standardized baroclinic wave dynamical core test, can be schematically written as follows:

Unauthenticated | Downloaded 10/01/21 06:06 AM UTC 2218 MONTHLY WEATHER REVIEW VOLUME 144

N-cycle B he claimed that, for N 5 4, full fourth-order accuracy can be achieved by forming a 4N(516)-cycle of the 0 5 w 1, (7) sequenceA,B,B,A.Numericalcomputationsfora N simple nonlinear ODE performed by Purser (2007) wk 5 (k 5 1, ..., N 2 1) (8) k corroborate Lorenz’s claim for 3-cycle, but there seems do k 5 0, ... to have been no work that supports or denies the claim for N 5 4. In section 5c, we present a result for an w ) wmod(k,N) (9) AGCM that corroborates this claim for N 5 4. G) wF(x) 1 (1 2 w)G (10) The stability of Lorenz N-cycle schemes is investi- x ) x 1 GDt (11) gated by Lorenz (1971) for the case of scalar linear F and end do. by Israeli and Gottlieb (1974) for linear PDEs dis- cretized in space by centered finite differencing. Unlike Note that the two versions differ only in the ordering of the leapfrog scheme, which is stable for nondissipative the ‘‘weight’’ coefficients wk. In implementing these (hyperbolic) systems but unstable for dissipative (para- schemes, we only need to store the two arrays of M bolic) systems, Lorenz N-cycle schemes are shown to be words, x and G. Also, they require only one evaluation reasonably stable for both hyperbolic and parabolic of the tendency term F(x) per time step. systems. For example, as we stated above, the Lorenz The mathematical idea behind the above algorithms is 4-cycle with a time step of Dt and the classic Runge– simple: to ‘‘reuse’’ the previously computed tendencies Kutta fourth-order scheme with the time step of 4Dt [F(xk2j), j 5 1, 2, ..., k 2 1, where xk2j denotes the for a linear system yield mathematically equivalent re- value of x at the (k 2 j)th step of each cycle] by forming a sults and thus their stability conditions are identical. weighted average of them to produce a tendency that We emphasize here that Lorenz N-cycle schemes yields the highest order of accuracy after the completion (N 5 3orN 5 4) are Nth-order accurate only every N of the Nth step, under the constraint that each in- time steps and that they are only first-order accurate at termediate step retains at least first-order accuracy. the intermediate steps. As an example, we show, in Thus, in the sense that Butcher tableau representation Fig. 1, the root-mean-square errors (RMSEs) verified (Butcher 2007) is possible, the Lorenz N-cycle can be against the exact solution at every time step of the nu- regarded as a special instance of the (broadly defined) merical solutions of the Korteweg–de Vries (KdV) Runge–Kutta family. In fact, if F is linear, Lorenz equation integrated by the RA-filtered leapfrog scheme 4-cycles, both versions A and B, give, at every four time (with the smoothing parameter a 5 0:05), the Lorenz steps, the same solution as the classical four-stage 3-cycle scheme with the AB alternating sequence, the fourth-order Runge–Kutta scheme with a 4Dt time step. Lorenz 4-cycle scheme with the ABBA alternating se- If F is linear, at every N step, both versions A and B quence, and the classic Runge–Kutta fourth-order give the Taylor expansion with respect to Dt of the true (RK4) scheme. To make a fair comparison, the leap- solution truncated at the (N 1 1)th-order term. Thus, frog and the Lorenz 3- and 4-cycles are integrated with for a linear case, if we only look at results at every N the time step Dt 5 1026, whereas the RK4 scheme is in- steps, the two versions are identical Nth-order schemes. tegrated with 4Dt 5 4 3 1026; with these time steps, the If F is nonlinear, the accuracy of the N-cycle schemes four schemes have roughly the same computational cost reduces to second order for N $ 3. However, for N 5 3 (in terms of the operation counts). Detailed experi- and for N 5 4, the (N 1 1)th-order term in the truncation mental setup for this numerical integration is given in error of the versions A and B can be shown to be of the appendix B. Figure 1 clearly illustrates that same magnitude with opposite signs. Thus, for these values d the Lorenz 4-cycle (ABBA) is as accurate as the of N, Nth-order accuracy can be attained by running both classical RK4 scheme (with the same operation counts) A and B cycles simultaneously and then averaging the at every four time steps, but is comparable to the predictions, at the expense of doubling the computa- leapfrog scheme at the intermediate steps, and that tional cost in both speed and memory consumption. d the Lorenz 3-cycle (AB) has an accuracy between that To avoid doubling of computational cost, Lorenz (1971) of the leapfrog and RK4 (indicative of third-order proposed to use the versions A and B in a suitable alter- accuracy) at every three time steps, but is comparable or nating sequence, based on the intuition that the errors of less accurate than the leapfrog at the intermediate steps. the two versions should tend to cancel each other. In fact, Lorenz (1971) claims, without proof, that, for N 5 3, true It is possible, however, to significantly improve the third-order accuracy can be retained even for a non- accuracy of the intermediate steps by forming linear linear case by alternating versions A and B. Likewise, combination of all the N 1 1 steps within a single N-cycle

Unauthenticated | Downloaded 10/01/21 06:06 AM UTC JUNE 2016 H O T T A E T A L . 2219

FIG. 1. RMSE of numerical solutions of KdV equations verified against the exact solution for different temporal integration schemes, shown as a function of time steps. so that the linear part of the errors cancels out, provided from the actual solution. Being a two-time-level (or that the model state at every time step is retained (see single step) scheme also facilitates the initialization appendix C for derivation). The improvement with this process. Unlike schemes with three or more time levels method is clearly illustrated in Fig. 1 where the RMSEs such as the leapfrog or Adams–Bashforth method, the for the improved intermediate steps for the Lorenz Lorenz N-cycle does not require any special treatment 3-cycle (AB) and 4-cycle (ABBA) are shown with for the initial step(s). dashed lines with filled circles; except for the very first Still another important advantage is that the Lorenz two steps of the 3-cycle, the numerical predictions at the N-cycle is stable not only for oscillatory terms but also intermediate steps are now nearly as accurate as those at for dissipative terms. In contrast, the leapfrog scheme is every N time step. unstable for dissipative terms. For this reason, AGCMs that adopt the leapfrog scheme typically avoid in- b. Advantages of the Lorenz N-cycle stability by applying first-order forward Euler scheme The principal advantage of Lorenz N-cycle, besides its with 2Dt to the diffusion and physics tendencies (cf. high-order accuracy, is its economy in memory con- section 4a) at the expense of rendering their time in- sumption. It requires only one evaluation of F(x) per tegration scheme only first-order as a whole. The Lorenz time step which, in most cases, is the most computa- N-cycle, on the other hand, enables us to consistently tionally demanding part. Also, the scheme consumes use a single time integration for all terms of AGCMs. only 2M words of memory. Thus, the Lorenz N-cycle has the same computational cost as the widely used leapfrog 3. Semi-implicit modification scheme. Compared to the fourth-order Runge–Kutta scheme, which is accurate but too memory demanding As we discussed in the introduction, the equations for the purpose of AGCMs, the Lorenz N-cycle con- solved by AGCMs are stiff: the external inertia–gravity sumes less than half the memory. waves (also known as Lamb waves), which are contained Another important advantage of the Lorenz N-cycle is in the solutions of these equations, but are of little me- the absence of computational modes: the Lorenz N-cycle, teorological importance, exhibit very fast phase speed 2 being a two time-level method rather than a three (approximately ;300 m s 1), whereas, waves that are time-level method like the leapfrog scheme, does not relevant to the actual weather phenomena, such as in- suffer from the presence of computational modes. This ternal inertia–gravity waves and Rossby waves, exhibit feature proves to be particularly useful for nonlinear an order-of-magnitude slower phase speed. Because of systems for which computational modes can grow ex- the CFL restriction, the presence of these fast waves ponentially, causing divergence of numerical solution requires an order-of-magnitude shorter time stepping

Unauthenticated | Downloaded 10/01/21 06:06 AM UTC 2220 MONTHLY WEATHER REVIEW VOLUME 144 than what is otherwise required to resolve the slower, In summary, to solve Eq. (13) for xn11, we first evaluate meteorologically meaningful waves. In AGCMs, it is the nonlinear tendency FE(xn) at the central step, and customary to circumvent this issue by using semi- then evaluate and add the linear tendency LI xn21 at the implicit technique (Robert 1969). older step. We then multiply it by the inverse matrix 2 This section presents our new semi-implicit modifi- (I 2 2aDtLI ) 1 and finally integrate the equation by Eq. cation to the Lorenz N-cycle scheme and discusses its (17). Note that the matrix (I 2 2aDtLI )isconstantaslongas accuracy and stability. In deriving our semi-implicit the time step Dt is unchanged, so that the matrix inversion scheme, we utilize a somewhat nonconventional nota- needs to be carried out only once for the whole integrations. tion in which the tendency term (not the model state b. Formulation of the semi-implicit Lorenz N-cycle itself) is modified to account for semi-implicit treatment. To familiarize the readers with our tendency-based no- The important advantage of Lorenz N-cycle schemes tation, the classical semi-implicit leapfrog scheme is over other schemes such as the leapfrog or the Runge– presented using this notation in section 3a. Section 3b Kutta scheme is its economy in terms of memory con- describes our semi-implicit formulation of the Lorenz sumption. Thus, in designing a semi-implicit modification N-cycle. Section 3c then discusses its accuracy and to it, we sought to preserve this favorable property. Our stability for illustrative linear cases. proposed scheme achieves this goal by applying tendency adjustment similar to Eq. (16) oneachstepoftheN-cycle: a. Semi-implicit leapfrog scheme 5 ... Consider integration of an equation of the following do k 0, form: w ) wmod(k,N) (18) G ) wFE(x) 1 (1 2 w)G (19) dx 5 E 1 LI F (x) x, (12) 2 dt dx 5 (I 2 aDtLI ) 1(G 1 LI x) (20) where FE: RM / RM is a nonlinear function and LI is an x ) x 1Dtdx (21) I M 3 M matrix. It is assumed that the term L x is re- end do. sponsible for the fast external inertia–gravity waves. The semi-implicit modification to the leapfrog scheme that Note that no additional variables are introduced in this was originally introduced by Robert (1969) takes the semi-implicit formulation. Also, unlike other semi-implicit following form: Runge–Kutta-type schemes (e.g., Whitaker and Kar 2013), this scheme involves only one matrix inversion, which al- xn11 2 xn21 5 FE(xn) 1 LI [axn11 1 (1 2 a)xn21], (13) lows simplicity in implementation. 2Dt c. Stability and accuracy analysis where xn signifies the predicted state at the nth step and 0 # a # 1 is a ‘‘centering factor.’’ Here a 5 1/2, a 5 1, Semi-implicit time-stepping schemes are traditionally and a 5 0 correspond, respectively, to the Crank– examined by applying them to the following split- Nicolson, backward Euler, and forward Euler scheme. frequency linear oscillation equation (Durran 1991; To solve Eq. (13) for xn11, let us first define the Williams 2011; Whitaker and Kar 2013): ‘‘tendency’’ dx by dx 1 2 5 iv x 1 iv x, (22) xn 1 2 xn 1 dt L H dx 5 (14) 2Dt where the first and second terms on the right-hand side and express xn11 on the right-hand side as xn11 5 correspond, respectively, to FE(x) and LI x in Eq. (12). xn21 1 2Dtdx. Then, substituting this expression to This equation arises from spectrally discretizing (or Eq. (13),wehave equivalently, applying the von Neumann stability anal- ysis to) the linear advection equation for a slow and fast d 5 E n 1 LI n21 1 aD LI d x F (x ) x 2 t x, (15) wave, each with the phase speed of cslow 5 vL/k and 2 5 v 5 dx 5 (I 2 2aDtLI ) 1[FE(xn) 1 LI xn21], (16) cfast H/k, respectively, for a mode with wavenumber k. Positive values of vL or vH correspond to forward . . where I is the identity matrix. Once dx is obtained, the advection (cslow 0orcfast 0). After applying the integration can be completed by temporal discretization algorithm, the particular case of vL 5 0 corresponds to a fully implicit case; by choosing xn11 5 xn21 1 2Dtdx. (17) the centering factor to be a 5 1/2, all schemes lead to the

Unauthenticated | Downloaded 10/01/21 06:06 AM UTC JUNE 2016 H O T T A E T A L . 2221

FIG. 2. The modulus of the average amplification factor per step jAj defined by Eq. (24) for the Crank–Nicolson semi-implicit Lorenz N-cycle schemes applied to the scalar split-frequency problem in Eq. (22).HereN 5 (a) 1, (b) 2, (c) 3, (d) 4, (e) 5, and (f) 6. The contour intervals are 0.1. Regions with 1 , jAj , 1:01 are filled with light gray to show where the schemes are slightly unstable. Regions of instability with jAj exceeding 1.01 are filled with dark gray. Red thick lines in each panel represent jvH j 5 jvLj; we are interested in the region above these lines.

unconditionally stable Crank–Nicolson scheme, which Stability of these schemes for each vL and vH can has the constant amplification factor jAj 5 1 for all vH be visualized by plotting the modulus of the corre- (see the y axis in each panel of Fig. 2). sponding amplification factor jAj as a function of By carrying out the algorithm, we can show, as in the (vLDt, vHDt).TheschemeisunstableifjAj exceeds case of explicit N-cycle, that, for a linear system, the unity. In order for a fair comparison among different versions A and B give identical expression at every N values of N, we define the average amplification factor step. The truncation errors are per step A by ffiffiffiffiffiffiffiffiffiffiffi N Exact p x 2 x 1 5 N N 0 5 (1 2 2a)v (v 1 v )(NDt)2 1 O(Dt3). A: x /x . (24) x0 2N H H L a 5 (23) Figure 2 shows jAj of Crank–Nicolson ( 1/2) semi- implicit Lorenz N-cycle schemes for values of N from Thus, the semi-implicit Lorenz N-cycle can be of second N 5 1toN 5 6. The areas of instability are filled with order by taking a 5 1/2 (i.e., the Crank–Nicolson scheme). light gray (1 , jAj , 1:01) or with dark gray (jAj . 1:01).

Unauthenticated | Downloaded 10/01/21 06:06 AM UTC 2222 MONTHLY WEATHER REVIEW VOLUME 144

In general, semi-implicit techniques are applied when the oscillations produced by the implicitly treated term are faster than those produced by the explicitly treated term. Thus, in interpreting Fig. 2, we should focus on the area above the thick red lines (jvH j . jvLj). From Figs. 2a and 2b, we see that the Lorenz 1- and 2-cycles are stable only when the two frequencies vH and vL are of opposite signs and the frequency is larger for vH.Un- fortunately, however, these schemes are unconditionally unstable if vH and vL areofsamesign.The3-cyclescheme (Fig. 2c) exhibit better stability for cases where vH and vL areofsamesign,butitalsointroduces weak instability for regions where the two frequencies are of opposite signs. The 4-cycle (Fig. 2d) has larger stability region than the 3-cycle, but increasing N further does not improve the situation; the 5- and 6-cycles (Figs. 2e and 2f) show stability regions that are smaller than that of the 4-cycle. Values of N larger than 6 (from 7 to 12; not shown) result in even smaller stability regions. For these reasons, hereafter we focus only on the Lorenz 3- and 4-cycles. Figure 3 shows the relative phase errors (in percent) of Lorenz 3- (Fig. 3a) and 4-cycles (Fig. 3b). Here, the relative phase errors are defined as

arg A 2 (v 1 v )Dt FIG.3.AsinFig. 2, but for the average phase error per step L H 3 100, (25) (v 1 v )Dt defined by Eq. (25) for the Crank–Nicolson semi-implicit Lo- L H renz (a) 3-cycle and (b) 4-cycle schemes, applied to the scalar The Lorenz 3- and 4-cycles show similar phase errors. split-frequency problem in Eq. (22). The contour levels are 650%, 610%, 65%, 61%, and 0. Nonnegative and negative The errors are small (less than 10%) in most of the contours are drawn, respectively, with solid and dashed lines. Re- stability regions except very near to the boundaries. gions where the magnitude of the relative errors exceeds 1% and Interestingly, the phase errors are predominantly neg- 10% are filled, respectively, with light and dark gray. Red thick v 5 v ative, which means the oscillations in the numerical so- lines in each panel represent j H j j Lj; we are interested in the lutions tend to be slower than the exact solution. This is region above these lines. Phase errors are drawn only for areas where the modulus of the amplification factor (shown in Figs. 2c,d) consistent with our intuition that the semi-implicit is smaller than 1.01. method stabilizes the scheme by slowing down high- frequency waves (Kalnay 2003). Stability analysis based on the split-frequency equa- stability by looking at the modulus of the amplification tion in Eq. (22) provides us with an insight on how the factor jAj. We first fix the frequency of the implicitly schemes would behave for a pure, nondissipative oscil- treated term, vHDt to a prescribed value and draw the lating system. However, it is also important to examine contour of jAj on a complex plane for vLDt. Figure 4 shows how the schemes behave for a system with dissipation be- the stability regions of the semi-implicit Lorenz 3- (Fig. 4a) cause most geophysical fluid systems, including AGCMs, and 4-cycles (Fig. 4b). Each curve represents the boundary contain dissipative terms. Following S. K. Kar (2013, per- of stability region for the corresponding value of vHDt sonal communication), we account for dissipation in the shown in the legend. For both 3- and 4-cycles, the insta- stability analysis by introducing imaginary component to bility that is present in the nondissipative case (Figs. 2c,d) vL in Eq. (22): for negative Re(vL)Dt and positive vHDt can be sup- pressed by very small damping [Im(v )Dt & 0:025]. If we dx L 5 iv x 1 iv x 5 [iRe(v )x 2 Im(v )x] 1 iv x. focus on the stability of 3- and 4-cycles for small damping dt L H L L H [Im(v )Dt & 0:1, highlighted with pink shades], we find (26) L that 4-cycle has broader range of stability than 3-cycle. AsinthecaseofEq.(22), we integrate this equation with From the above stability analysis we conclude that, the semi-implicit Lorenz N-cycle schemes treating the vL with our semi-implicit formulation, the Lorenz 4-cycle is term explicitly and vH term implicitly, and examine the more stable than the Lorenz 3-cycle. For this reason, in

Unauthenticated | Downloaded 10/01/21 06:06 AM UTC JUNE 2016 H O T T A E T A L . 2223 the experiments with an AGCM described in later sec- the linear part, and FPhys(x) is the tendency terms from tions we focus only on the Lorenz 4-cycle. physical parameterizations. The term k in Eq. (29) represents the diffusion coefficient for the biharmonic 4. Experimental setup hyper-diffusion. Note that, since the SPEEDY model is a spectral model based on spherical harmonics, in a. SPEEDY model spectral space, the horizontal biharmonic operator =4 Here, we implement and test the semi-implicit Lorenz and its inverse both become simple scalar multiplication. N-cycle in a simplified low-resolution AGCM known as The smoothing parameter n in the RA filter in Eq. (31) is the SPEEDY model (Molteni 2003). This model is a set to 0.05. The centering parameter for the implicit primitive equation model with T30L7 resolution; its hor- component a is 1/2 for the Crank–Nicolson and 1 for the izontal discretization is spectral representation with re- backward Euler. The use of the implicit backward Euler spect to the spherical harmonics triangularly truncated at scheme for harmonic damping in Eq. (29) reduces the the total wavenumber of 30, and its vertical discretization formal accuracy of the scheme to only first order. This is finite differencing on seven layers in the s-coordinate reduction in accuracy can be justifiable for the SPEEDY’s system. In gridpoint space, it has 96 longitudinal points default dynamical core since its formal accuracy is only and 48 latitudinal points. For temporal discretization, first order due to the use of RA filter. In our study, the SPEEDY model uses the standard RA-filtered semi- however, it is undesirable, because our goal is to achieve implicit leapfrog scheme. However, due to the intrinsic second-order accuracy by adopting our semi-implicit instability of leapfrog scheme against dissipative pro- version of the Lorenz N-cycle. Also, as we have shown in cesses, the SPEEDY model treats physical parameteri- Fig. 4, unlike the leapfrog scheme, the Lorenz N-cycle zations with the first-order forward Euler scheme (with does not suffer from instability when applied to a dissi- the time step of 2Dt) to avoid numerical instability. We pative system. For this reason, in implementing our note that this well-known limitation of the leapfrog scheme schemes to the SPEEDY’s dynamical core, we modified can be avoided with our semi-implicit Lorenz 4-cycle it so that the harmonic dampings are included in the NL schemes; we have in fact confirmed, through 100-yr in- term FDyn(x) and thus are treated explicitly. tegrations for each of the schemes, that, with semi-implicit The SPEEDY model includes a simplified set of Lorenz 4-cycle schemes, the tendencies from physical pa- physical parameterizations whose descriptions can be rameterizations can be treated together with the nonlinear found in the appendix of Molteni (2003). Simplified phys- part of the dynamics tendencies without introducing in- ical process and coarse resolution enable the SPEEDY stability and that such schemes do not alter the model’s model to be integrated very fast. Despite such simplifica- climatology in a statistically significant way (not shown). tion, this model is able to produce realistic simulations of a In addition to the special treatment of physical pa- wide range of the atmospheric phenomena including pre- rameterizations, the SPEEDY model treats the hori- cipitation, midlatitude synoptic features, and climatology. zontal spectral biharmonic dampings for momentum As we describe in the next section, however, we switch off and temperature equations with the implicit backward all physical parameterizations in the SPEEDY model to Euler scheme to achieve further stabilization. The evaluate its performance under the framework of a dry scheme can be written in pseudocode as dynamical core test. b. Jablonowski–Williamson dynamical core test do k 5 1, ... d ) NL k 1 k21 1 L k21 In our study, we are interested in assessing how our x FDyn(x ) F (x ) x (27) Phys Dyn semi-implicit Lorenz N-cycle schemes behave for vari- d ) 2 aD L 21d D x (1 2 t Dyn) x (28) ous values of the time step t, in particular, compared to 2 the conventional RA-filtered leapfrog scheme. For this dx ) [I 1 k(2Dt)=4] 1[dx 2 k(2Dt)=4xk21] (29) purpose, physical parameterizations are undesirable 1 2 xk 1 ) xk 1 1 2Dtdx (30) because some of them are designed to work well only xk ) xk 1 n(xk11 2 2xk 1 xk21) (31) for a specific time step. Adjustment processes such as large-scale condensation violates the assumption that t ) t 1Dt (32) the tendency term F(x) is smooth (i.e., has continuous 2 1 xk 1 ) xk, xk ) xk 1 (33) first derivatives), which also complicates the interpre- end do, tation of the results. Thus, we test our schemes under the framework of a dry dynamical core test. NL where FDyn(x) represents the nonlinear part of the ten- Several standardized benchmarks have been proposed dency terms associated with dynamical process, LDynx is for dynamical cores of AGCMs (e.g., Held and Suarez

Unauthenticated | Downloaded 10/01/21 06:06 AM UTC 2224 MONTHLY WEATHER REVIEW VOLUME 144

FIG. 4. Stability regions of semi-implicit Lorenz (a) 3- and (b) 4-cycles applied to the split- frequency damped oscillation problem in Eq. (26) for different values of vH Dt. Each curve represents the contour of jAj 5 1 for vH Dt labeled in the legend. The scheme is stable in the regions enclosed by theses curves. Note that we are interested in the area with small damping [0 , Im(vL)Dt & 0:1]. This area is highlighted with pink shading.

1994; Boer and Denis 1997). Among those standardized specification of the initial and boundary conditions for test cases we adopt the baroclinic wave test case pro- this test case is given in appendix A. posed by Jablonowski and Williamson (2006).Unlike c. Verification method other previously proposed test cases that are primarily focused on quantifying long-term or climatological An analytic solution is not known to this problem. performance, this baroclinic wave test case allows us to Jablonowski and Williamson (2006), thus, provide a set evaluate the performance of a dynamical core in an of reference solutions that can be used by the users of initial-value problem. this test case. Their reference solutions are produced by In the baroclinic wave test case of Jablonowski and integrating multiple different high-resolution models. Williamson (2006), the initial and boundary conditions The differences among these reference solutions can be are designed so that a train of unstable baroclinic waves used to quantify their uncertainties. We do not use their develops from a small disturbance superposed on a reference solutions, however, because our main focus zonally uniform steady state. This specific configuration of this study is on temporal integration schemes: com- enables us to measure the performance of a model in parison with solutions from a higher-resolution model terms of its ability to simulate baroclinic waves. Precise would complicate the interpretation by introducing spatial

Unauthenticated | Downloaded 10/01/21 06:06 AM UTC JUNE 2016 H O T T A E T A L . 2225 discretization as another factor. To facilitate a fair com- thus obtained is regarded and used as ‘‘truth’’ in our parison among different temporal integration schemes, verification. we produce a reference solution by implementing the Following Jablonowski and Williamson (2006), for traditional explicit fourth-order Runge–Kutta (RK4) quantitative comparison, we use the difference of sur- scheme to the SPEEDY model and then integrating it face pressure ps from the reference RK4 solution mea- with a very small time step Dt 5 10 s. The reference solution sured with l2 norm:

( ) ð ð 1/2 1 2p p/2 l [p (t)] :5 [ p (l, u, t) 2 pRK4(l, u, t)]2cosu du dl , (34) 2 s p s s 4 0 2p/2 where l, u, and t denote, respectively, the longitude, and the traditional RA-filtered semi-implicit leapfrog RK4 latitude, and forecast time. Here ps denotes the ref- scheme. To grasp the qualitative features of these erence solution produced from RK4 scheme described schemes, we first show the snapshot pictures of the in the last paragraph. Jablonowski and Williamson evolution of the baroclinic waves. We then show the (2006) report that the choice of verified variables results of the estimation of the orders of accuracy. (temperature or vorticity vertically interpolated to a. Snapshots some specified pressure level, in addition to surface pressure) and error norms (l1 and l‘,inadditiontol2) Figure 5 shows the snapshots of surface pressure field do not sensitively affect results of verification. at the ninth day of integration for different integration schemes. In the reference solution produced from RK4 d. Temporal integration schemes implemented on the scheme with a small time step Dt 5 10 s (Fig. 5a), a deep SPEEDY model low with a minimum of less than 960 hPa develops on a grid (61.238N, 146.258W). The SPEEDY’s default The temporal integration schemes implemented on 1 Dt 5 the SPEEDY model in our study are summarized in scheme ImLF CN with the time step of 1200 s Table 1. This table also defines the abbreviated names of (Fig. 5b) also produces a low with its minimum at the each scheme that we use in the following sections. As we same grid point as the reference solution, but the in- described in section 4a, the SPEEDY model, by default tensity is weaker. With this traditional leapfrog scheme, (ImLF 1 CN), adopts a somewhat complicated combi- successful simulation of the intensity of the low requires a much smaller time step, as shown in Fig. 5c nation of different schemes. This is necessary because of (ImLF 1 CN with Dt 5 10 s). On the other hand, our new the limited stability of leapfrog scheme when it is applied ImL4 1 CN-A scheme successfully produces the deep to dissipative terms. In contrast, our Lorenz N-cycle Dt 5 schemes (ImL4 1 CN, ImL4 1 BE, and ExL4) treat all low even with the longer time step of 1200 s 1 the terms consistently with a single scheme; this is possible (Fig. 5d). Other versions of ImL4 CN schemes (ver- because the Lorenz N-cycle, being a variant of the Runge– sions B, AB, and ABBA; not shown) also produced Kutta scheme, can better tolerate dissipation. solutions that are visually indistinguishable from Fig. 5d. 1 Since the Lorenz N-cycle scheme has two versions This qualitative result suggests that ImL4 CN is the most advantageous scheme in that it alone successfully (which we refer to by A and B), we can construct several reproduces the deep low with the larger time step. variants of our Lorenz N-cycle schemes by using one of Quantitative results shown in the following section the two versions or by using them in alternating se- quences. For each of our Lorenz N-cycle schemes strongly supports this supposition. (ImL4 1 CN, ImL4 1 BE, and ExL4), we tried versions b. Order estimation A, B, AB, and ABBA. For example, the ImL4 1 CN As we discussed in section 3c, theoretically, our Lorenz scheme with Lorenz 4-cycle version ABBA is referred 4-cycle schemes combined with the Crank–Nicolson to, hereafter, as ImL4 1 CN-ABBA. scheme for the semi-implicit part (ImL4 1 CN-A, ImL4 1 CN-B, ImL4 1 CN-AB, and ImL4 1 CN- 5. Results ABBA schemes) should have second-order accuracy. This section presents the results of the Jablonowski– Figure 6 verifies this expectation. For 5-day forecasts Williamson baroclinic wave test case described in the (the left panel), if we focus on small values of time step previous section. Particular emphasis is placed on com- (Dt , 300 s), all versions of ImL4 1 CN schemes collapse parison between our semi-implicit Lorenz 4-cycle scheme on a single line that has a slope of 2.0 (meaning that they

Unauthenticated | Downloaded 10/01/21 06:06 AM UTC 2226 MONTHLY WEATHER REVIEW VOLUME 144

TABLE 1. A list of temporal integration schemes implemented to the SPEEDY model in this study. Each scheme treats different terms of the right-hand side of the governing equation differently. In the leftmost column, ‘‘Ext.Grav.Wav.’’ signifies the terms that are re- sponsible for the external gravity waves; ‘‘Diffusion’’ signifies spectral biharmonic diffusion; ‘‘Other Dyn.’’ signifies the rest of the terms that arise from dynamical processes; ‘‘Phys. Param.’’ signifies tendencies from physical parameterizations. An EM dash (—) in each column of the last row (‘‘Filter’’) indicates that no filters are applied. (Im) or (Ex) in each item indicate that the scheme that follows them is implicit or explicit, respectively.

Name ImL4 1 CN ImL4 1 BE ExL4 ImLF1CN ImLF1BE RK4 Ext.Grav.Wav. (Im) Crank–Nicolson (Im) backward (Im) Crank–Nicolson (Im) backward Euler Euler Diffusion (Ex) Lorenz 4-cycle (Im) backward Euler (Ex) RK4 Other Dyn. (Ex) Lorenz 4-cycle (Ex) leapfrog Phys. Param. (Ex) forward Euler Filter — Robert–Asselin — are of second-order accuracy). The errors for these terms introduces large errors even when using short schemes are clearly better than that of the ImLF 1 CN time steps. scheme whose slope is 1.0 (first-order accuracy). If we The result becomes somewhat different if we look at look at larger values of Dt, however, the curves begin to 25-day forecasts (Fig. 6, right panel). All the schemes saturate as Dt becomes larger. Curiously, all the schemes with the backward-Euler scheme for the implicit part with the backward-Euler scheme for the implicit part now exhibit orders of accuracy that are higher than 0.4 (ImLF 1 BE, ImL4 1 BE-A, ImL4 1 BE-B, ImL4 1 (from 0.6 to 0.7), which are more consistent with the BE-AB, and ImL4 1 BE-ABBA) exhibit slopes of 0.4 theoretical expectation. The ImL4 1 CN-A and ImL4 1 instead of the theoretical expectation 1. This indi- CN-B schemes continue to be of second-order accuracy, cates that the use of the backward-Euler scheme in- with the latter being more accurate for Dt . 300s. stead of the Crank–Nicolson scheme for the implicit However, the alternating combinations of the two

FIG. 5. Snapshots of surface pressure (in hPa) at day 9 of the integration. (a) Reference solution produced from RK4 scheme with time step Dt 5 10 s. (b) SPEEDY’s default ImLF 1 CN scheme with Dt 5 1200 s. (c) As in (b), but with Dt 5 10 s. (d) ImL4 1 CN-A scheme with Dt 5 1200 s. Contour intervals are 10 hPa.

Unauthenticated | Downloaded 10/01/21 06:06 AM UTC JUNE 2016 H O T T A E T A L . 2227

FIG. 6. Errors of surface pressure measured with the l2 norm for various semi-implicit temporal integration schemes plotted against time steps Dt on a log–log plane. Shown are the errors for Dt 5 1200, 600, 300, 180, 120, 90, 60, 45, and 30 s. The errors are computed with respect to the reference solution produced by running RK4 scheme with Dt 5 10 s. The slopes of regression lines fitted using errors for 30 ,Dt , 180 s are shown for each scheme in the legend. The names of the schemes are defined in Table 1 and section 4d. The results for (left) 5-day and (right) 25-day forecasts. Note that on the left plot, green, blue, and magenta lines, both solid and dashed, are hidden underneath the yellow ones. versions (ImL4 1 CN-AB and ImL4 1 CN-ABBA) baroclinic wave dynamical core test, the integration exhibit larger errors for Dt . 300 s. In fact, the ImL4 1 starts from a zonally uniform steady-state solution, CN-ABBA scheme with Dt 5 1200 s even blows up at which is superposed with a weak and localized pertur- day 30 of the integration. It is not clear why the al- bation. Thus, during the initial period of the integration, ternation of versions A and B does not improve the the system is only weakly nonlinear, until the breaking accuracy of the ImL4 1 CN schemes. Nonetheless, of the baroclinic wave occurs at ; day 10. By day 25, the superiority of uncombined versions ImL4 1 CN-A system develops into a fully nonlinear regime, making or ImL4 1 CN-B over the traditional ImLF 1 CN versions A and B only second order. As claimed by is clear for all time steps. Interestingly, ImL4 1 CN-B Lorenz (1971), the combination ABBA actually regains is more accurate than ImL4 1 CN-A, although, fourth-order accuracy. What is surprising is that, unlike for a linear system, the two schemes give identical what was suggested in Lorenz (1971), the simpler com- predictions. bination AB also regains fourth-order accuracy; in fact, for Dt . 150 s, it yields more accurate predictions c. Explicit Lorenz 4-cycles than ABBA. While the main focus of this paper is on our new semi- implicit Lorenz N-cycle scheme, it is interesting to see 6. Summary and discussion how the original explicit Lorenz N-cycle schemes per- form when applied to the AGCM. As we mentioned in The Lorenz N-cycle scheme for numerical integration section 2a, we show a result that supports the claim of of ODEs proposed by Lorenz (1971) has remained es- Lorenz (1971) that the explicit Lorenz 4-cycle can attain sentially not used in the atmospheric and oceanic sciences fourth-order accuracy by running it in the alternating despite its major advantages, notably its economical use A–B–B–A sequence. of memory, higher-order accuracy, and the ease of im- Figure 7 shows the equivalent of Fig. 6 for the explicit plementation. This may be partly due to the lack of a Lorenz 4-cycle schemes (ExL4-A, ExL4-B, ExL4-AB, semi-implicit formulation. Recently, Whitaker and and ExL4-ABBA). At the fifth day, all versions exhibit Kar (2013) proposed a semi-implicit scheme based on fourth-order accuracy, while, at day 25, the versions A the Lorenz 3-cycle and reported promising results and B only have second-order accuracy. However, their with both an idealized shallow-water system and an alternating combinations AB and ABBA both retain operational numerical weather prediction (NWP) fourth-order accuracy. These results are consistent with model. Their focus, however, was on improving sta- theoretical expectations that both versions A and B bility of their previously proposed scheme and thus of the Lorenz 4-cycle should be fourth-order accurate theeconomicalmemoryuseoftheLorenzN-cycle was for a linear problem. In the Jablonowski–Williamson’s lost in their formulation. In this study, we presented a

Unauthenticated | Downloaded 10/01/21 06:06 AM UTC 2228 MONTHLY WEATHER REVIEW VOLUME 144

FIG.7.AsinFig. 6, but for explicit Lorenz 4-cycle schemes. Shown are the errors for Dt 5 240, 180, 150, 90, 60, and 30 s. The slopes of regression lines fitted using errors for 30 ,Dt , 120 s are shown for each scheme in the legend. For ease of comparison, the errors for ImL4 1 CN-B are shown here again. new semi-implicit formulation of the Lorenz N-cycle We have also confirmed that, for the explicit Lorenz that can preserve the memory efficiency. 4-cycle, running the two versions (ExL4-A and ExL4-B) The accuracy and stability analysis conducted for a in alternating sequences (ExL4-AB and ExL4-ABBA) linear, univariate split-frequency oscillation equation in fact improves the accuracy to fourth order. In view of shows that the accuracy of our new schemes all can be the critical comment posed by Purser (2001, p. 5) that made second-order if we adopt the Crank–Nicolson ‘‘the efficacy of this strategy in a highly nonlinear nu- scheme for the implicit part. For a purely oscillatory merical weather prediction model is extremely doubt- equation with no damping, our semi-implicit Lorenz ful,’’ the fact that the alternation strategy suggested by N-cycle combined with the Crank–Nicolson scheme for Lorenz (1971) in fact worked for the primitive equation the implicit part is found to be unstable for N 5 1 and system is itself a surprising result. Unlike what is claimed N 5 2. The stability is improved by increasing N to 3 and by Lorenz (1971), however, for this particular problem, 4, with the latter having larger stability region, but it a simpler combination ExL4-AB turned out to be more becomes less stable again if N is further increased. A accurate than the suggested ExL4-ABBA. linear stability analysis that allows dissipation shows It remains unclear why, for our semi-implicit scheme, that the small instability regions found in the semi- alternation of the two versions did not lead to improved implicit 4-cycle scheme disappear if a small damping is accuracy. The reason may be that, unlike in the original present in the system. explicit scheme, truncation errors of the two versions Numerical experiments performed using the dynam- that arise from nonlinearity of the explicit tendency ical core of the SPEEDY AGCM under the framework [FE(x) of Eq. (12)] do not cancel each other. of the baroclinic wave test case of Jablonowski and In summary, the semi-implicit Lorenz 4-cycle schemes Williamson (2006) confirmed that our semi-implicit we propose in this paper are computationally as ef- Lorenz 4-cycle scheme combined with the Crank– ficient as the traditional RA-filtered semi-implicit Nicolson scheme for the implicit part, both version A leapfrog scheme, in terms of both the amount of com- (ImL4 1 CN-A) and version B (ImL4 1 CN-B), exhibit putation and memory usage. Moreover, our schemes second-order accuracy and are more accurate than the have second-order accuracy, which is higher than that traditional semi-implicit leapfrog scheme (ImLF1CN) of the traditional scheme. Another advantage of our for any time step Dt. Intriguingly, however, contrary to scheme over the traditional leapfrog is that it is a two- our expectation that running the two versions in an al- time level scheme, meaning that it requires only the ternating sequence should improve the scheme because model state at a single time to initialize the integra- their truncation errors tend to cancel each other, the tion. Being a two-time level scheme also means that it is alternating combinations of the two (ImL4 1 CN-AB free from troublesome computational modes. Further- and ImL4 1 CN-ABBA) actually proved to be less more, unlike the leapfrog scheme, it is stable also for accurate than the nonalternating versions. ImL4 1 CN- dissipative terms. ABBA even proved to be unstable for Dt 5 1200 s al- Given our success with the primitive equations model, though other versions were stable for this Dt. it is tempting to hope that we might be able to improve

Unauthenticated | Downloaded 10/01/21 06:06 AM UTC JUNE 2016 H O T T A E T A L . 2229 operational weather forecasts by adopting our scheme Acknowledgments. The authors express their sincere to operational NWP systems. We conclude this paper gratitude to Dr. Sajal Kar for his helpful suggestions and by suggesting some future work in this direction. Most discussions. Constructive comments from Drs. Jim advanced operational NWP centers, including the Purser, Jeff Whitaker, and Takemasa Miyoshi are also European Centre for Medium-Range Weather Fore- gratefully acknowledged. The Python scripts for plotting casts (ECMWF), Environment and Climate Change Figs. 2 and 3 were kindly provided by Dr. Whitaker. The Canada, the Japan Meteorological Agency (JMA), SPEEDY model was kindly provided by Drs. Franco and the Met Office, all adopted semi-implicit semi- Molteni and Fred Kucharski, extended by Dr. Miyoshi Lagrangian temporal integration schemes in their to enable 6-hourly output. Three anonymous reviewers global models. The National Centers for Environ- provided numerous constructive comments and sug- mental Prediction (NCEP) also upgraded their global gestions that improved the accuracy and readability of model in February 2015 to adopt the semi-Lagrangian the manuscript, to which the authors are most grateful. scheme. Our next challenge is therefore to formulate The author D. Hotta also expresses appreciation to a semi-implicit semi-Lagrangian version of the Lorenz Profs. Kayo Ide and Radu Balan of the University of N-cycle. For a semi-Lagrangian scheme, temporal and Maryland for their advising through coursework. spatial discretizations cannot be clearly separated; in D. Hotta’s graduate study was generously supported testing such a scheme, it would be advantageous to by the Japan Meteorological Agency (JMA) through evaluate the accuracy of the schemes within the frame- the Japanese Government Long-term Overseas work of the shallow-water model of Williamson et al. Fellowship Program. (1992) for which the analytical solutions are available, as has been adopted, for example, by Clancy and Pudykiewicz (2013a,b). APPENDIX A Another direction to be explored is feasibility of using the explicit Lorenz N-cycle on global models: The Initial and Boundary Conditions for the as we mentioned in the introduction, semi-implicit Baroclinic Wave Test Case schemes are now facing difficult scalability issues be- cause, on the one hand, semi-implicit schemes neces- The initial condition for the baroclinic wave test case sitateaglobalexchangeof data to solve implicit of Jablonowski and Williamson (2006) comprises two equations of the forms in Eqs. (16) and (20) and on the parts: first, a zonally symmetric state that is an analytic other hand, the current trend in high-performance steady-state solution of the primitive equations, and computers to rely increasingly on massive parallelism second, a horizontally localized disturbance to the imposes that the algorithms that run on them should steady state that triggers development of a baroclinic minimize long-range internode communication. Thus, wave train. We first describe the steady state and then it is possible, depending on how high-performance describe the disturbance, followed by the description of computers evolve, that global NWP models in the the boundary condition. foreseeable future would adopt fully explicit time in- First, we define an intermediate vertical coordinate tegration schemes, in which case, the original explicit sy by Lorenz N-cycle becomes more appropriate than the p semi-implicit version presented in this paper. In this s 5 (s 2 s ) 3 (A1) y 0 2 respect, the promising results that we obtained for the explicit Lorenz 4-cycle are encouraging. with s0 5 0:252. Here s 2 [0, 1] represents the s-vertical The trend in regional NWP is to use nonhydrostatic coordinate. The zonal wind of the steady state is models that include acoustic waves in their solutions. In defined as fact, for example, JMA, Météo-France, NCEP, Met Office, and the German weather service (DWD) are u(u, s) 5 u cos3/2s sin2(2u) (A2) already using nonhydrostatic regional models in their 0 y operation. In such models, the very fast acoustic waves 21 with u0 5 35 m s .Hereu represents the latitude are typically accommodated by using split-explicit in radian. The meridional wind y is set to zero techniques. Implementing the Lorenz N-cycle within a everywhere: split-explicit scheme should be straightforward. Thus, application of the Lorenz N-cycle in regional non- y(u, s) 5 0. (A3) hydrostatic models would also be an attractive research topic. The temperature is defined by

Unauthenticated | Downloaded 10/01/21 06:06 AM UTC 2230 MONTHLY WEATHER REVIEW VOLUME 144     3 spu 1 10 T(u, s) 5 hT(s)i 1 0 sins cos1/2s 3 22 sin6u cos2u 1 1 2u cos3/2s 4 R y 0 3 63 0 0   d    8 2 p 1 cos3u sin2u 1 2 aV , (A4) 5 3 4

21 21 where Rd 5 289:0Jkg K is the ideal gas constant for On top of this steady state, a horizontally localized dry air, a 5 6. 371 229 3 106 m is the mean radius of the disturbance of the zonal wind u0(l, u, s) centered at s l u 5 p p earth, with the horizontal average temperature hT( )i ( , c) ( /9, 2 /9) is superposed to form the complete given by initial condition (other prognostic variables are not ( touched). The disturbance in the zonal wind u0 is spec- R G/g T s d for 1 $ s $ s , ified by hT(s)i 5 0 t R G/g 5   T s d 1DT(s 2 s) for s . s,   0 t t r 2 u0(l, u, s) 5 u exp 2 , (A7) (A5) p R

2 5 2 21 where g 9. 806 16 m s is the gravitational acceleration, with radius R 5 a/10 and up 5 1ms . The great circle G50:005 K m21 is the temperature lapse rate, DT 5 l u distance r from the center ( c, c) is defined by 5 4:8 3 10 K, and st 5 0:2 is the tropopause level. The 5 21 u u 1 u l 2 l surface pressure ps is globally set to a constant value: r a cos (sin c sin cos c cos c). (A8) 5 5 Finally, we describe the boundary condition. The ps 10 Pa, (A6) orography (or surface height) zs is also zonally uniform which completes the specification of the steady state. and is specified by

         1 10 8 2 p z (l, u) 5 u cos3/2(1 2 s ) 3 22 sin6u cos2u 1 1 u cos3/2(1 2 s )1 cos3u sin2u 1 2 aV . s 0 0 3 63 0 0 5 3 4 (A9)

For the upper boundary condition, Jablonowski and for examination of numerical schemes because exact Williamson (2006) require that no Rayleigh friction near solutions in analytic form are known for several the model top be applied. Thus, in our experiment, we classes of special cases. A particular case that has an switched off the Rayleigh friction which, in the analytic exact solution is known as the ‘‘single SPEEDY model, is applied to only the horizontal winds soliton’’: at the topmost layer. rffiffiffiffiffi  c u(x, t) 5 c 3 sech2 [x 2 x 2 (c/3)t] , (B2) 12 0 APPENDIX B which means that the sech2-shaped solitary wave with its

Numerical Integration of the Korteweg–de Vries peak height c at x 5 x0 at the initial time t 5 0 moves to (KdV) Equation the right with the phase speed c/3 without changing its shape. The width of the wave is narrower and the phase The Korteweg–de Vries equation, commonly abbre- speed is faster as the amplitude (height at the peak) of viated as the KdV equation, is a one-dimensional non- the wave is larger. linear PDE that describes evolution of surface waves In our numerical experimentation, the KdV equation on a shallow water. In its simplest form, the equation is in Eq. (B1) is solved by the Galerkin spectral method for given by the periodic spatial domain I 5 [0, 2p] with trigono- Ne 3 metric functions fexp(ikx)g 52 as the horizontal basis ›u ›u › u k Ne 52u 2 , (B1) functions. The spectral representation of the quadratic ›t ›x ›x3 advection term [the first term on the right-hand side of where u denotes the height of the surface, t the time, and Eq. (B1)] is computed by the transform method of x the space. The KdV equation is particularly suitable Orszag (1970).

Unauthenticated | Downloaded 10/01/21 06:06 AM UTC JUNE 2016 H O T T A E T A L . 2231

The spatial domain I is discretized by Ng 5 256 equally APPENDIX C 2 5 p3 Ng 1 spaced grid points fxj 2 j/Nggj50 . Discretized values of u on these grid points are represented in spectral Improvement of Intermediate Steps by a ^ Ne space by the Fourier coefficients fukgk50 truncated at Postprocessing the largest wavenumber Ne 5 65. The initial condition is given by Eq. (B2) (for t 5 0) with the parameters c 5 720 As shown in section 2a, the intermediate steps of the Lorenz N-cycle schemes, which are only first-order ac- and x0 5 p. The time integration schemes examined are as curate, can be made significantly more accurate by 1 follows: the leapfrog scheme with the time step forming weighted averages of all the N 1 steps within a 2 N Dt 5 1 3 10 6, the Lorenz 3-cycle scheme with the single -cycle. This appendix shows how to derive the N alternating AB sequence with the same time step weights for general and gives the explicit forms of the Dt 5 1 3 1026, the Lorenz 4-cycle scheme with the linear transform for versions A and B of 3- and 4-cycles. alternating ABBA sequence with the same time step As a special case of Eq. (1), consider the following Dt 5 1 3 1026, and the classical Runge–Kutta fourth- system of linear ODEs: 26 order scheme with the time step 4Dt 5 4 3 10 ,which dx 5 Lx, (C1) is 4 times larger than for the other schemes. This choice dt of time step ensures that each scheme has about equal where x(t) 5 [x (t), x (t), ..., x (t)]T is an M-dimensional computational cost. 1 2 M vector function of time t and L is an M 3 M matrix, and we The Robert–Asselin (RA) filter with the smoothing numerically integrate this system from time t to t 1 NDt parameter a 5 0:05 is applied the leapfrog scheme to with the initial condition x(t) 5 x0 by one of the Lorenz avoid numerical instability due to nonlinearity and dif- N-cycle schemes. The numerical solution at the kth step fusion [the second term on the right-hand side of Eq. (k 5 0, 1, ..., N)canbewritteninthefollowingform: (B1)]. Also, a special treatment is performed, as illus- " # trated in Fig. 3.2.4 of Kalnay (2003), for the first steps for k D 1 D 5 å Lj D j 0 the leapfrog scheme: the forward Euler scheme with t/2 x(t k t) ajk ( t) x , (C2) 5 is used to generate the guess of u at t 5Dt/2, then, using j 0 this guess at t 5Dt/2 and the initial condition, the leap- with ajk 2 Q. Lorenz (1971) showed that ajk for version A frog scheme with half the regular time step (Dt) com- j canbewrittenexplicitlyasajk 5 [k!(N 2 j)!/j!(k 2 j)!N!]N . putes the guess of u at t 5Dt, followed by the regular For version B, no explicit form of ajk seems to be known. leapfrog scheme. Lj D j N Now, observing that f ( t) gj50 formally spans an 1 The root-mean-square errors (RMSEs) of each (N 1 1)-dimensional vector space VN 1, we can repre- scheme at the nth time step are evaluated in the grid sent, with this basis set, the model state at the kth step space by T x(t 1 kDt) by a vector ak 5 (a0k, a1k, ..., akk,0,...,0) . N 21 Similarly, the exact solution to this problem for time 1 g 2 5 å Numerical D 2 Exact D 2 t 1 kDt is given by RMSE [u (xj, n t) u (xj, n t)] , Ng j50 " # N (B3) Exact 1 D 5 LkDt 0 5 å Ll D l 1 D N11 0 x (t k t) e x blk ( t) O( t ) x , l50 where uNumerical denotes the numerical solution obtained by the schemes described above and uExact denotes the (C3) exact solution given in Eq. (B2). l with blk 5 k /l!. The projection of this exact solution 1 All computations are performed with FORTRAN dou- xExact(t 1 kDt) onto VN 1 can be represented by a vector ble precision (IEEE 64-bit floating-point representation). T bk 5 (b0k, b1k, ..., bNk) . It should be noted that, strictly speaking, the exact We wish to form, for each l 2f0, 1, ..., Ng, a linear solution given in Eq. (B2) is only valid for the infinite N combination of fakgk50 so that it agrees with the trun- domain (2‘, 1‘) and is not the exact solution for our cated exact solution bk: problem with the periodic boundary condition. To avoid complication arising from this, we have chosen a N å 5 5 ... 5 cklak bl, l 0, 1, , N , (C4) steep soliton (c 720) centered away from the k50 boundaries (x 5 p). The solution in Eq. (B2) remains 0 5AC 5 B, (C5) fairly accurate for small t as long as the peak of the soliton x0 1 ct remains away from the boundaries x 5 0 where A, B, and C are (N 1 1) 3 (N 1 1) matrices de- and x 5 2p. fined by A 5 (a0, a1, ..., aN), B 5 (b0, b1, ..., bN), and

Unauthenticated | Downloaded 10/01/21 06:06 AM UTC 2232 MONTHLY WEATHER REVIEW VOLUME 144

C 5 A ( )kl ckl. The matrix is an upper triangular matrix To derive these coefficients as rational numbers with nonzero diagonal elements and, hence, is in- rather than as floating-point numbers, we used a com- vertible; we can thus solve this equation for the puter algebra system, Maxima. weighting coefficients ckl by inverting A to give The postprocessed predictions for the intermedi- ate steps x(t 1 kDt) are, by construction, Nth order 5 A21B ckl ( )kl . (C6) accurate for linear problems; for nonlinear prob- lems, however, they are formally only first-order By carrying out the computation, we find, for Lorenz accurate but are substantially more accurate than 3-cycle A, that the following postprocessing (or ‘‘filter- the raw predictions. In fact, as shown in Fig. 1,forthe ing’’) gives an improved numerical solution: weakly nonlinear KdV single soliton problem, the postprocessed solutions exhibit significant improve- x(t) 5 x0 ment over the raw N-cycles on the intermediate 8 4 2 1 x(t 1Dt) 5 x0 1 x1 1 x2 1 x3 steps. 27 9 9 27 1 2 4 1 x(t 1 2Dt) 5 x0 1 x1 1 x2 1 x3 27 9 9 27 REFERENCES x(t 1 3Dt) 5 x3 , (C7) Amezcua, J., E. Kalnay, and P. D. Williams, 2011: The effects of the RAW filter on the climatology and forecast skill of the where, for convenience, we have used an abbreviated SPEEDY model. Mon. Wea. Rev., 139, 608–619, doi:10.1175/ notation xk 5 x(t 1 kDt). 2010MWR3530.1. Asselin, R., 1972: Frequency filter for time integrations. Mon. Wea. Similarly, for Lorenz 3-cycle B: Rev., 100, 487–490, doi:10.1175/1520-0493(1972)100,0487: . 5 0 FFFTI 2.3.CO;2. x(t) x Boer, G., and B. Denis, 1997: Numerical convergence of the dy- namics of a GCM. Climate Dyn., 13, 359–374, doi:10.1007/ 5 0 2 1 1 2 1 3 x(t 1Dt) 5 x 1 x 1 x 1 x s003820050171. 27 3 9 27 Butcher, J., 2007: Runge–Kutta methods. Scholarpedia, 2, 3147, 5 2 2 8 x(t 1 2Dt) 52 x0 1 x1 1 x2 1 x3 doi:10.4249/scholarpedia.3147. 27 3 9 27 Clancy, C., and P. Lynch, 2011: Laplace transform integration of x(t 1 3Dt) 5 x3 , (C8) the shallow-water equations. Part I: Eulerian formulation and Kelvin waves. Quart. J. Roy. Meteor. Soc., 137, 792–799, doi:10.1002/qj.793. for Lorenz 4-cycle A: ——, and J. Pudykiewicz, 2013a: A class of semi-implicit predictor– corrector schemes for the time integration of atmospheric x(t) 5 x0 models. J. Comput. Phys., 250, 665–684, doi:10.1016/ j.jcp.2012.08.032. 81 27 27 3 1 x(t 1Dt) 5 x0 1 x1 1 x2 1 x3 1 x4 ——, and ——, 2013b: On the use of exponential time integration 256 64 128 64 256 methods in atmospheric models. Tellus, 65A, 20898, doi:10.3402/ 1 1 3 1 1 tellusa.v65i0.20898. x(t 1 2Dt) 5 x0 1 x1 1 x2 1 x3 1 x4 16 4 8 4 16 Durran, D. R., 1991: The third-order Adams–Bashforth method: An attractive alternative to leapfrog time differencing. Mon. 1 3 27 27 81 x(t 1 3Dt) 5 x0 1 x1 1 x2 1 x3 1 x4 Wea. Rev., 119, 702–720, doi:10.1175/1520-0493(1991)119,0702: 256 64 128 64 256 TTOABM.2.0.CO;2. x(t 1 4Dt) 5 x4 , (C9) Gent, P., and M. Cane, 1989: A reduced gravity, primitive equation model of the upper equatorial ocean. J. Comput. Phys., 81, 444–480, doi:10.1016/0021-9991(89)90216-7. and finally, for Lorenz 4-cycle B: Held, I. M., and M. J. Suarez, 1994: A proposal for the in- tercomparison of the dynamical cores of atmospheric 0 x(t) 5 x general circulation models. Bull. Amer. Meteor. Soc., , 37 47 13 1 1 75, 1825–1830, doi:10.1175/1520-0477(1994)075 1825: x(t 1Dt) 5 x0 1 x1 1 x2 1 x3 1 x4 APFTIO.2.0.CO;2. 256 64 128 64 256 Israeli, M., and D. Gottlieb, 1974: On the stability of the N cycle 17 11 7 1 1 scheme of Lorenz. Mon. Wea. Rev., 102, 254–265, doi:10.1175/ x(t 1 2Dt) 52 x0 1 x1 1 x2 1 x3 1 x4 48 12 24 12 16 1520-0493(1974)102,0254:OTSOTC.2.0.CO;2. Jablonowski, C., and D. L. Williamson, 2006: A baroclinic insta- 107 39 45 9 81 x(t 1 3Dt) 52 x0 1 x1 1 x2 1 x3 1 x4 bility test case for atmospheric model dynamical cores. 256 64 128 64 256 Quart. J. Roy. Meteor. Soc., 132, 2943–2975, doi:10.1256/ x(t 1 4Dt) 5 x4 . qj.06.12. Kalnay, E., 2003: Atmospheric Modeling, Data Assimilation and (C10) Predictability. Cambridge University Press, 368 pp.

Unauthenticated | Downloaded 10/01/21 06:06 AM UTC JUNE 2016 H O T T A E T A L . 2233

Kar, S. K., 2006: A semi-implicit Runge–Kutta time-difference Robert, A. J., 1966: The integration of a low order spectral form of scheme for the two-dimensional shallow-water equations. the primitive meteorological equations. J. Meteor. Soc. Japan, Mon. Wea. Rev., 134, 2916–2926, doi:10.1175/MWR3214.1. 44, 237–245. Lorenz, E. N., 1971: An N-cycle time-differencing scheme for ——, 1969: The integration of a spectral model of the atmosphere stepwise numerical integration. Mon. Wea. Rev., 99, 644–648, by the implicit method. Proc. WMO-IUGG Symp. on Nu- doi:10.1175/1520-0493(1971)099,0644:ATSFSN.2.3.CO;2. merical Weather Prediction, Vol. VII, Tokyo, Japan, Japan Molteni, F., 2003: Atmospheric simulations using a GCM with Meteorological Agency, 19–24. simplified physical parametrizations. I: Model climatology and Whitaker, J. S., and S. K. Kar, 2013: Implicit–explicit Runge–Kutta variability in multi-decadal experiment. Climate Dyn., 20, methods for fast–slow wave problems. Mon. Wea. Rev., 141, 175–191, doi:10.1007/s00382-002-0268-2. 3426–3434, doi:10.1175/MWR-D-13-00132.1. Orszag, S. A., 1970: Transform method for the calculation of Williams, P. D., 2009: A proposed modification to the Robert– vector-coupled sums: Application to the spectral form of the Asselin time filter. Mon. Wea. Rev., 137, 2538–2546, doi:10.1175/ vorticity equation. J. Atmos. Sci., 27, 890–895, doi:10.1175/ 2009MWR2724.1. 1520-0469(1970)027,0890:TMFTCO.2.0.CO;2. ——, 2011: The RAW filter: An improvement to the Robert– Purser, R. J., 2001: Proposed semi-implicit adaptations of two low- Asselin filter in semi-implicit integrations. Mon. Wea. Rev., storage Runge–Kutta schemes. Part I: Theoretical formula- 139, 1996–2007, doi:10.1175/2010MWR3601.1. tion and stability analysis. NCEP Office Note 435, 40 pp. ——, 2013: Achieving seventh-order amplitude accuracy in leap- ——, 2007: Accuracy considerations of time-splitting methods for frog integrations. Mon. Wea. Rev., 141, 3037–3051, doi:10.1175/ models using two-time-level schemes. Mon. Wea. Rev., 135, MWR-D-12-00303.1. 1158–1164, doi:10.1175/MWR3339.1. Williamson, D. L., J. B. Drake, J. J. Hack, R. Jakob, and P. N. ——, and L. M. Leslie, 1997: High-order generalized Lorenz Swarztrauber, 1992: A standard test set for numerical ap- N-cycle schemes for semi-Lagrangian models employing second proximations to the shallow water equations in spherical derivatives in time. Mon. Wea. Rev., 125, 1261–1276, doi:10.1175/ geometry. J. Comput. Phys., 102, 211–224, doi:10.1016/ 1520-0493(1997)125,1261:HOGLNC.2.0.CO;2. S0021-9991(05)80016-6.

Unauthenticated | Downloaded 10/01/21 06:06 AM UTC