FEBRUARY 2017 B U E H N E R E T A L . 617

An Ensemble for Numerical Weather Prediction Based on Variational : VarEnKF

MARK BUEHNER Data Assimilation and Satellite Meteorology Research Section, Environment and Climate Change Canada, Dorval, Quebec, Canada

RON MCTAGGART-COWAN Numerical Weather Prediction Research Section, Environment and Climate Change Canada, Dorval, Quebec, Canada

SYLVAIN HEILLIETTE Data Assimilation and Satellite Meteorology Research Section, Environment and Climate Change Canada, Dorval, Quebec, Canada

(Manuscript received 15 March 2016, in final form 27 September 2016)

ABSTRACT

Several NWP centers currently employ a variational data assimilation approach for initializing deterministic forecasts and a separate ensemble Kalman filter (EnKF) system both for initializing ensemble forecasts and for providing ensemble background error covariances for the deterministic system. This study describes a new ap- proach for performing the data assimilation step within a perturbed-observation EnKF. In this approach, called VarEnKF, the analysis increment is computed with a variational data assimilation approach both for the ensemble and for all of the ensemble perturbations. To obtain a computationally efficient algorithm, a much simpler configuration is used for the ensemble perturbations, whereas the configuration used for the ensemble mean is similar to that used for the deterministic system. Numerous practical benefits may be realized by using a varia- tional approach for both deterministic and ensemble prediction, including improved efficiency for the develop- ment and maintenance of the computer code. Also, the use of essentially the same data assimilation algorithm would likely reduce the amount of numerical experimentation required when making system changes, since their impacts in the two systems would be very similar. The variational approach enables the use of hybrid background error covariances and may also allow the assimilation of a larger volume of observations. Preliminary tests with the Canadian global 256-member system produced significantly improved ensemble forecasts with VarEnKF as compared with the current EnKF and at a comparable computational cost. These improvements resulted entirely from changes to the ensemble mean analysis increment calculation. Moreover, because each ensemble pertur- bation is updated independently, VarEnKF scales perfectly up to a very large number of processors.

1. Introduction Data assimilation is used to provide the analyses (i.e., initial conditions) for both deterministic and ensemble Most numerical weather prediction (NWP) centers forecasts. For deterministic forecasts, variational data operationally produce both deterministic and ensem- assimilation approaches are most often used. These in- ble forecasts. The deterministic forecast represents clude three-dimensional variational data assimilation the best single estimate of the atmospheric conditions (3DVar), four-dimensional variational data assimilation in the future, whereas the ensemble forecast provides (4DVar), and, more recently, ensemble–variational as- information on the range of possible conditions that similation (EnVar; Buehner et al. 2013; Kleist and Ide could occur given the uncertainties inherent in all as- 2015; Wang and Lei 2014). For ensemble data assimi- pects of the prediction system. lation approaches, the goal is to produce an ensemble of model states consistent with the probability density of Corresponding author e-mail: Mark Buehner, mark.buehner@ the initial condition uncertainty. Several of these en- canada.ca semble techniques are based on Monte Carlo simulation

DOI: 10.1175/MWR-D-16-0106.1 For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/ PUBSReuseLicenses). Unauthenticated | Downloaded 10/07/21 08:39 PM UTC 618 MONTHLY WEATHER REVIEW VOLUME 145 in which all uncertain components of the prediction variational algorithm used for deterministic data assimi- system are randomly perturbed in a way that is consis- lation. The analysis increment is computed with a varia- tent with the presumed uncertainty of each component. tional assimilation approach separately for the ensemble This includes the observations and forecast model pa- mean and for all of the ensemble perturbations (i.e., the rameters. It is also common to directly modify the en- deviations of each member from the ensemble mean). To semble spread of the complete model state to account for obtain a computationally efficient algorithm, a much sim- multiple sources of uncertainties (e.g., Houtekamer et al. pler configuration is used for the ensemble perturbations, 2009; Whitaker and Hamill 2012). Such data assimilation whereas the configuration used for the ensemble mean is approaches include the perturbed-observation ensemble similar to that used for the deterministic system. Since the Kalman filter (EnKF; Houtekamer et al. 2005) and the new approach is essentially an EnKF implemented with a ensemble of data assimilations (EDA; Isaksen et al. variational approach, it is called VarEnKF. 2010). Other EnKF algorithms, referred to as ensemble The next section includes a description of the VarEnKF square root filters, rely on a modification to the original approach together with its expected benefits. Section 3 algorithm to avoid the need to perturb the observations provides a description of the numerical data assimilation (e.g., Whitaker and Hamill 2002; Tippett et al. 2003; Hunt experiments performed using either a standard EnKF et al. 2007). Alternatively, some centers compute per- approach, VarEnKF, or a combination of the two. The turbations with an approach not directly related to data results from these experiments are presented in section 4. assimilation (e.g., singular vectors, bred vectors; Buizza The final section provides the conclusions. et al. 2005) and these are then added to the deterministic analysis for initializing the ensemble forecast. Ensembles of short-term forecasts are also used within 2. VarEnKF approach several types of data assimilation algorithms to either a. General approach partially or fully specify the background error covariances. This includes the EnKF (both the perturbed observation The most straightforward approach for using varia- version and all of the ensemble square root filter variants), tional data assimilation within an EnKF is to simply run EnVar, and some implementations of 4DVar that use en- independent data assimilation cycles for each ensemble semble covariances to specify the background error co- member. Each assimilates independently perturbed variances at the beginning of the data assimilation time observations, while the other sources of uncertainty are window (Buehner et al. 2010; Clayton et al. 2013). simulated with appropriate random perturbations. This Several NWP centers currently employ a variational is similar to the EDA approach (Isaksen et al. 2010) and data assimilation approach for their deterministic fore- the so-called system simulation approach (Houtekamer casts and a separate EnKF system that is used for both et al. 1996), except in those approaches, unlike with initializing the ensemble forecasts and for providing EnVar and the EnKF, the assimilation does not fully use ensemble covariances for the deterministic system [e.g., the ensemble covariances. However, if the analysis step for Environment and Climate Change Canada (ECCC), the each member was performed with a variational approach National Centers for Environmental Prediction (NCEP), that uses the ensemble of background states to define the and the Met Office]. The data assimilation procedures background error covariances, such as EnVar, this would used within current EnKF systems differ fundamentally be theoretically equivalent (other than the unavoidable from the variational approach by relying on either the differences related to the use of a different solution tech- serial assimilation of individual (or small batches) of nique for obtaining each member’s analysis increment) to observations (e.g., Whitaker and Hamill 2002; Tippett the perturbed-observation EnKF (e.g., Fairbairn et al. et al. 2003; Houtekamer et al. 2005) or an algorithm that 2014).Becauseafulldataassimilationsystemisusedfor independently updates spatial subdomains by simulta- each member with the same complexity as a typical de- neously assimilating all surrounding observations (e.g., terministic system, the computational cost is comparable Hunt et al. 2007). Several practical benefits may be re- to that of the deterministic system times the number of alized by using the same data assimilation approach for members, which is much higher than the cost of current both deterministic and ensemble prediction, including a EnKF approaches. Consequently, this would typically reduction in the effort required to develop and maintain limit the number of ensemble members to O(10), too little the computer code and an improved consistency of the to be used to fully specify the background error covari- impacts from major changes made to the two systems. To ances. For example the Canadian EnKF currently uses 256 that end, the goal of the present study is to evaluate a new members and, in contrast, Météo-France and ECMWF approach for performing the data assimilation step both use an EDA consisting of only 25 members of within a perturbed-observation EnKF by adapting the 4DVar. Because of this small ensemble size, a large

Unauthenticated | Downloaded 10/07/21 08:39 PM UTC FEBRUARY 2017 B U E H N E R E T A L . 619 amount of filtering is applied to the EDA ensemble computing the increments for the ensemble perturbations. covariances by combining members over several anal- As will be shown later, these increments generally act to ysis times, using a wavelet-diagonal approach for the reduce the ensemble spread. Consequently, a reduction in correlations, and using spatially smoothed ensemble the number of iterations would lead to less reduction in variance estimates (Berre et al. 2015). Simplifying the the ensemble spread during the analysis step, which could data assimilation procedure used for all the members, potentially be compensated for by a reduction to the such as by using 3DVar instead of 4DVar to allow for a multiplicative or additive ensemble inflation used to larger ensemble size, would result in a degradation in maintain a realistic spread (e.g., Houtekamer et al. 2009; the quality of the resulting ensemble forecasts. Whitaker and Hamill 2012). However, the use of EnVar An approach with the potential to be competitive with with this simplification alone would likely not reduce the current EnKF algorithms in terms of computational cost cost to a point that is comparable with many EnKF al- is based on a variational ‘‘mean-pert’’ approach, first gorithms. Additional simplifications should therefore be suggested by Lorenc et al. (2014). Similar to many for- considered for computing the perturbation increment, mulations of the ensemble square root filter, the idea of including the following: this approach is to compute the analysis increment for d a reduction in ensemble size used to compute the the ensemble mean separately from the increment for ensemble background error covariances; the deviation of each member from the ensemble mean d the use of 3D instead of 4D ensemble background (hereafter referred to as the ensemble perturbations). error covariances; Then, both are computed with a variational technique, such d the use of only the static climatological background as the four-dimensional version of EnVar (4DEnVar). error that is typically based on The increment for the ensemble mean shifts all back- homogeneous and isotropic correlations (i.e., 3DVar); ground members by the same amount in phase space d a decrease in spatial resolution or a more severe spectral and therefore has no effect on the ensemble spread, truncation for the static climatological background whereas the increments for the ensemble perturbations error covariance matrix; and modify the ensemble spread with no effect on the en- d a reduction in the number of observations assimilated. semble mean. We hypothesize that the overall quality of the ensemble forecasts depends to a much greater When combined, such simplifications could potentially extent on the analysis increment for the ensemble provide a large reduction in computational cost and mean as compared with those for the ensemble per- could therefore make the cost of VarEnKF comparable turbations. This is supported by a comparison of vari- with, for example, the 256-member EnKF currently ous ensemble prediction systems that found that the operational at ECCC. The numerical data assimilation ensemble mean has a large impact on the quality of experiments performed during this investigation were ensemble forecasts (e.g., Buizza et al. 2005)andthat designed to evaluate the impact of a particular combi- simple approaches for computing the initial ensemble nation of such simplifications on the quality of the en- perturbations may be as effective as more theoreti- semble forecasts. cally and computationally complex approaches (e.g., b. Benefits of VarEnKF approach Magnusson et al. 2008; Raynaud and Bouttier 2016). If this hypothesis is valid, then the overall computational Only a single execution of variational data assimila- cost of obtaining the analysis ensemble could be sig- tion is required to compute the analysis increment for nificantly reduced by simplifying only the component the ensemble mean. Nearly the same configuration as of the assimilation procedure that is used for the en- the deterministic system can be used for this, including a semble perturbations. This is critical, since an inde- much larger volume of assimilated observations than pendent calculation of the analysis increment for the what is currently used in most operational EnKF sys- ensemble perturbation is required for each member. tems. The lower number of assimilated observations in This computation must be highly efficient and paral- most EnKF algorithms follows from the fact that their lelized to be feasible for a large ensemble of O(100) computational cost scales linearly with the number of members. It must be emphasized, however, that no observations. This increase in the number of assimilated simplifications need be made to the assimilation pro- observations for computing the analysis increment for cedure, such as EnVar, used to compute the analysis the ensemble mean in VarEnKF could potentially im- increment for the ensemble mean. prove the quality of ensemble forecasts. Some centers The only simplification proposed by Lorenc et al. (e.g., NCEP; National Weather Service 2015) obtain a (2014) is a significant reduction in the number of itera- similar benefit from directly using the high-resolution tions used in the minimization of the cost function for deterministic analysis to replace the EnKF ensemble

Unauthenticated | Downloaded 10/07/21 08:39 PM UTC 620 MONTHLY WEATHER REVIEW VOLUME 145 mean analysis at each analysis time. In this case, the for the ensemble mean and a set of separate minimi- EnKF data assimilation procedure is, in effect, only used zations to obtain the increment for each ensemble to compute the analysis increments for the ensemble member perturbation. To derive these cost functions, perturbations. first the cost function for the kth member of a Because it uses the same data assimilation approach as perturbed-observation ensemble is written in its in- the deterministic system, VarEnKF has the practical ad- cremental form as vantage that only a single data assimilation algorithm need 1 2 be developed and maintained, reducing the required soft- J(Dx )5 (Dx )TB 1(Dx ) ware development effort. For example, the introduction k 2 k k 1 2 of a new type of observation or an improved treatment of 1 [y 2H(xb)2HDx ]TR 1[y 2H(xb)2HDx ], the error covariances need only be implemented once for it 2 k k k k k k to be available for both the deterministic and ensemble (1) prediction systems. Moreover, a much higher degree of consistency would exist between the deterministic and en- where the following approximation has been used: semble systems if they were to employ the same assimila- H xb 1Dx ’ H xb 1 HDx tion algorithm and use the same set of observations (for ( k k) ( k) k . computing the ensemble mean analysis increment). Con- xb y sequently, the impact on forecast accuracy from any major Here k is the background state for the kth member, k is change, such as adding a new type of observation, would be the vector of perturbed observations for the kth mem- similar in both systems. This would likely reduce the overall ber, which is obtained by adding Gaussian random r R amount of numerical experimentation required. perturbations, k, with covariance to the observations yo Dx The variational approach also allows for covariance . The value of k that minimizes this cost function is Dxa localization to be applied directly to the background denoted as k, the analysis increment for the kth error covariances in gridpoint space (Lorenc 2003; member, such that the analysis state itself is given by xa 5 xb 1Dxa Buehner 2005) instead of partially in observation space, k k k . The background error and observation B R as in the EnKF (e.g., Houtekamer et al. 2005). This is an error covariance matrices are denoted by and , re- H important distinction for nonlocal observations, such as spectively, and is the tangent linear version of the satellite radiances. Campbell et al. (2010) showed that observation operator H. The explicit solution to this gridpoint space localization can result in improved minimization problem is given by the Kalman filter analysis accuracy as compared with observation space analysis equation: localization when assimilating satellite radiances in an Dxa 5 K[y 2 H(xb)], (2) idealized experimental context. For some situations, k k k however, Lei and Whitaker (2015) showed that locali- where the Kalman gain matrix is zation in observation space can be superior, though the difficulty of assigning an appropriate single vertical lo- 2 K 5 BHT(HBHT 1 R) 1 . (3) cation to each radiance observation remains. The use of a variational approach within VarEnKF could Then, by expressing each vector as the sum of the en- lead to additional benefits from covariance modeling ap- semble mean value (first terms on the rhs) plus an en- proaches that are more readily implemented in the context semble perturbation (second terms on the rhs): of the variational approach. These include the use of hybrid covariances that combine localized ensemble covariances Dx a 5Dxa 1Dx0a, with static climatological covariances (e.g., Hamill and k k xb 5 xb 1 x0b Snyder 2000; Lorenc 2003) and scale-dependent covariance k k , y 5 yo 1 r localization (Buehner and Shlyaeva 2015). In addition to k k, these covariance modeling approaches, the use of the var- iational quality control technique for rejecting potentially the analysis equation above [Eq. (2)] can be separated erroneous observations can also be easily incorporated in into a single expression for the analysis increment of the the calculation of the ensemble mean analysis increment ensemble mean, (Anderson and Järvinen 1999). Dx a 5 K yo 2 xb c. Formulation [ H( )], (4) The VarEnKF approach performs a variational mini- and an equation for the analysis increment for the en- mization of a cost function to obtain the analysis increment semble perturbation of each member,

Unauthenticated | Downloaded 10/07/21 08:39 PM UTC FEBRUARY 2017 B U E H N E R E T A L . 621

Dx0a 5 K r 2 Hx0b results in a significant reduction in the overall computa- k ( k k ), (5) tional cost of computing the O(100) ensemble member where the following approximation has been used: perturbation increments, while maintaining the highest quality possible for the ensemble mean analysis incre- xb 5 xb 1 x0b ’ xb 1 Hx0b H( k) H( k ) H( ) k . ment. The ensemble of analyses is then obtained by summing the ensemble mean increment and the ensem- The cost function for the ensemble mean increment can ble perturbation increment for each member. then be written as The tangent linear observation operator H used in both cost functions is linearized with respect to the ensemble 1 2 J(Dx) 5 (Dx)TB 1(Dx) mean background state. As a result, the same linearized 2 operator is used for the calculation of the analysis in- 1 2 1 [yo 2 H(xb)2HDx]TR 1[yo 2 H(xb)2HDx], crement for the ensemble mean and for all of the ensemble 2 member perturbations. Because the cost functions (6) [Eqs. (8) and (9)] have no dependence on each other, where Dxa is the value of Dx that minimizes this cost the minimizations for the ensemble mean and all of function. Similarly, the cost function for the ensemble the ensemble perturbations can be performed in par- perturbation increment can be written as allel, making the VarEnKF approach embarrassingly parallel by nature. Dx0 5 1 Dx0 TB21 Dx0 J( k) ( k) ( k) 2 3. Description of numerical experiments 1 2 1 [r 2 Hx0b 2 HDx0 ]TR 1[r 2 Hx0b 2 HDx0 ], 2 k k k k k k A series of numerical data assimilation experiments are (7) performed to evaluate the impact on the ensemble analyses and forecasts of using a variational approach to compute the Dx0a Dx0 where k is the value of k that minimizes this cost analysis increments within a perturbed observation en- function. In practice, these two cost functions are re- semble data assimilation system. Table 1 provides a sum- v v0 formulated in terms of the control vectors and k,re- mary of the configurations of these data assimilation spectively, such that the background term of the cost experiments. For comparison purposes, the EnKF-control function is perfectly preconditioned. Making the follow- experiment uses the same assimilation algorithm as in the ing change of variable: ECCC operational system that serially assimilates batches of observations [described by Houtekamer et al. (2014a), Dx 5 B1/2v, with subsequent modifications described by Gagnon et al. Dx0 5 B1/2v0 , (2014) and Gagnon et al. (2015)]. The VarEnKF experi- k k ment employs the variational approach for both the en- results in the preconditioned cost functions: semble mean analysis increment and the ensemble perturbation analysis increments, while the EnVar-mean 1 T and EnVar-mean2 experiments use the variational ap- J(v)5 (v) (v) 2 proach only for the ensemble mean increment and rely on 1 2 the serial batch EnKF algorithm for the ensemble pertur- 1 [yo2H(xb)2HB1/2v]TR 1[yo2H(xb)2HB1/2v], 2 bation increments. To test the forecast error sensitivity to (8) the set of assimilated observations, the EnVar-mean2 ex- periment assimilates a much larger set of observations for 1 1 2 computing the ensemble mean analysis increment than the J(v0 )5 (v0 )T(v0 )1 (r 2Hx0b 2HB1/2v0 )TR 1 k 2 k k 2 k k k EnVar-mean experiment. 3 r 2Hx0b 2HB1/2v0 All experiments begin from the same ensemble at ( k k). (9) k 0000 UTC 28 December 2014. The initial ensemble is It is these cost functions that form the foundation of the generated by adding to a single 24-h deterministic VarEnKF approach. The cost function for the ensemble forecast, a set of 256 random Gaussian perturbations perturbation increments [Eq. (9)] employs a configura- computed using the covariances of the static climatological tion with a much lower computational cost (e.g., fewer background error covariances from the variational data assimilated observations, a much simpler configuration assimilation system. No results are considered from the for the background error covariance matrix) than that first six days to allow the ensemble to spin up from these used for the ensemble mean increment [Eq. (8)]. This initial conditions. The data assimilation experiments are

Unauthenticated | Downloaded 10/07/21 08:39 PM UTC 622 MONTHLY WEATHER REVIEW VOLUME 145

TABLE 1. Summary of the configurations used for the numerical experiments. Note that Bens refers to the background error covariance matrix estimated from the ensemble of background states with spatial localization applied, Bnmc refers to the static climatological background error covariances, and VarQC refers to the variational quality control procedure for observations.

Name Mean increment Perturbation increment Assimilated observations EnKF-controla EnKF serial batch algorithm: EnKF serial batch algorithm: Mean and perturbations: EnKF subset Bens (224 members), no Bens (224 members), no interchannel interchannel correlations, correlations, no VarQC no VarQC EnVar-mean 4DEnVar: 90% Bens (256 EnKF serial batch algorithm: As in Mean and perturbations: EnKF subset members) 1 10% Bnmc EnKF-control (T270), nonzero interchannel correlations, VarQC applied EnVar-mean2 4DEnVar: As in EnVar-mean EnKF serial batch algorithm: As in Mean: full set, perturbations: EnKF EnKF-control subset VarEnKF 4DEnVar: As in EnVar-mean 3DVar: 100% Bnmc (T108), no Mean: full set, perturbations: interchannel correlations, no VarQC EnKF subset plus ground-based GPS a For the EnKF-control experiment, the ensemble mean and perturbations are not computed separately. Instead, the entire analysis increment for each ensemble member is computed using the serial batch assimilation algorithm. then run with four analyses per day (i.e., 6-h assimilation in all model configurations. In all experiments, a set of windows) until 1200 UTC 15 January 2015, for the random perturbations are added to the analysis ensemble EnVar-mean and EnVar-mean2 experiments, and until to simulate the additional uncertainty due to various 1200 UTC 28 January 2015, for the EnKF-control and sources of system error. As in the operational EnKF, VarEnKF experiments. Five-day, 20-member ensemble these are computed using the same static climatological forecasts and an additional ‘‘control member’’ forecast background error covariances used in the variational data are launched twice per day, at 0000 and 1200 UTC. As in assimilation system, but after multiplying the standard the operational system, the ensemble forecasts are ini- deviations for all variables by 0.33 when initializing the tialized with the first 20 analysis ensemble members forecasts for the ensemble of background states within after they are recentered about the mean of the full the data assimilation cycle, and by 0.66 when initializing 256-member analysis ensemble. This ensemble mean is the medium-range ensemble forecasts. also used to initialize the control member forecast. The configuration of EnVar used for the ensemble All experiments are performed using the same set of mean is similar to the version of the deterministic analysis configurations of the Global Environmental Multiscale that has been used operationally since 15 November 2015 (GEM) model (Girard et al. 2014) for the 256-member [Buehner et al. (2015) with subsequent modifications ensemble within the data assimilation cycle and for the described by Qaddouri et al. (2015)]. Unlike the opera- 20-member medium-range ensemble forecasts. This in- tional version, however, the ensemble covariances are cludes the use of the same digital filter initialization pro- obtained using each experiment’s own 256-member en- cedure applied to the full analysis fields in all experiments. semble with model top at 0.1 hPa. Also, hybrid co- These configurations differ from those used operationally variances are used with only a 10% contribution from starting 15 December 2015 (Gagnon et al. 2015) in that the climatological covariances [referred to as ‘‘Bnmc’’ in Table 1 model top was raised from 2 to 0.1 hPa and the number of since they are computed using the so-called NMC vertical levels increased from 74 to 80 to be more consis- method of Parrish and Derber (1992)] at all levels, instead tent with the deterministic prediction system (Buehner of 50% within troposphere and gradually tapering to et al. 2015). Together with these changes, microwave ra- 100% between 40 and 10 hPa as in the operational ver- diances from channels 13 and 14 of AMSU-A instruments sion. Above 5 hPa, the contribution from the ensemble are additionally assimilated in these experiments. Com- covariances is gradually reduced to zero to avoid large pared with the operational EnKF configuration, these analysis increments near the model top caused by large changes result in large improvements for the ensemble ensemble spread and the lack of any direct observations. forecasts in the stratosphere starting at 100 hPa and above, It should be noted that increments near the model top are but with little impact in the troposphere (results not also forced to be small or zero in the serial batch EnKF shown). Also unlike the operational system, which uses algorithm due to the application of vertical covariance fewer vertical levels for the medium-range ensemble localization partially in observation space. This results in forecasts than for the background states in the data analysis increments near the model top that are qualita- assimilation cycle, the same 80 vertical levels are used tively similar in the two systems.

Unauthenticated | Downloaded 10/07/21 08:39 PM UTC FEBRUARY 2017 B U E H N E R E T A L . 623

TABLE 2. Summary of the types of observations assimilated in each experiment for computing the analysis increments for the ensemble mean and ensemble perturbation. Note that AMW refers to atmospheric motion winds from both geostationary and polar-orbiting satellites.

EnKF-control EnVar-mean EnVar-mean2 VarEnKF Observation type Meana Perturbationa Meana Perturbationa Meanb Perturbationa Meanb Perturbation Radiosonde XXXXXXXX Aircraft XXXXXXXX AMW XXXXXXXX Surface observations XXXXXXXX Scatterometer winds XXXXXXXX GPS radio occultation XXXXXXXX Ground-based GPS XXX Radiances: AMSU-A/B, XXXXXXXX MHS, ATMS Radiances: SSMIS XX Radiances: AIRS, IASI, XX CrIS, geostationary a This is the ‘‘EnKF subset’’ referred to in Table 1. b This is the ‘‘full set’’ referred to in Table 1.

As in the operational EnKF, a smaller subset of ob- observation error covariances as used in the EnVar to servation types are assimilated in the EnKF-control and initialize the operational deterministic prediction sys- EnVar-mean experiments (referred to as the ‘‘EnKF tem. In that system, it was found that inclusion of subset’’) than in the operational deterministic system nonzero correlations did not significantly affect the con- (referred to as the ‘‘full set’’). The EnVar-mean2 and vergence of the cost function minimization. In compari- VarEnKF experiments both use the full set of observa- son, all observation error correlations are assumed to be tion types for computing only the ensemble mean in- zero in the EnKF serial batch algorithm used in this study, crement. For the ensemble perturbation increments, consistent with the operational EnKF configuration. the EnVar-mean2 experiment uses the EnKF subset, The VarEnKF experiment uses a much simpler con- whereas the VarEnKF experiment uses the EnKF sub- figuration for computing the increments to the ensemble set with the addition of ground-based GPS data. The perturbations than it does for the increment to the en- specific types of observations assimilated in each experi- semble mean. This includes, as already mentioned, the ment for computing both the ensemble mean increment assimilation of much fewer observations (Table 2). In and the ensemble perturbation increments are summa- addition, only the static climatological background- rized in Table 2. As an example of the difference in the error covariances at a reduced spectral truncation of number of observations used in each experiment, for the T108 are used (instead of the spectral truncation of T270 6-h time window centered at 1800 UTC 12 January 2015, used for the ensemble mean). Since the ensemble co- the number of assimilated observations of all types was variances are not used in specifying B in the cost func- 1.22 million in the EnKF subset versus 3.14 million in the tion for the ensemble perturbations [Eq. (9)], we refer to full set of observations. Of these observations, 0.85 million this as 3DVar, as opposed to the four-dimensional ver- in the EnKF subset and 2.77 million in the full set of ob- sion of EnVar that is used for the ensemble mean. Also, servations were satellite radiances. unlike for the ensemble mean increment, the inter- The observation error covariances differ between channel observation error correlations are assumed to the EnVar used for computing the ensemble mean be zero. This diagonal observation error covariance analysis increment and the EnKF serial batch algo- matrix is also used for computing the random observa- rithm. For the EnVar configuration, nonzero interchan- tion perturbations, consistent with all of the other ex- nel observation error correlations are included for all periments. However, contrary to the suggestion of types of satellite radiance observations. These error cor- Lorenc et al. (2014), the number of iterations used for relations were estimated using the method of Desroziers the minimization was not reduced from the number used et al. (2005) as described by Heilliette and Garand (2015). in the EnVar assimilation for the ensemble mean and In combination with the inclusion of these correlations, it the operational deterministic system (70). This is be- was found to be beneficial to decrease the error variances cause preliminary experiments with a reduced number for water vapor and surface-sensitive channels, which of iterations showed that the ensemble spread was not are the most highly intercorrelated. These are the same sufficiently reduced in specific areas of rapid growth in

Unauthenticated | Downloaded 10/07/21 08:39 PM UTC 624 MONTHLY WEATHER REVIEW VOLUME 145 spread during the preceding forecasts. This lack of re- relatively large spread in the extratropical oceanic re- duction in spread could not be compensated for by re- gions, likely caused by rapid error growth related to low ducing the magnitude of the random system error pressure systems. Several of these areas, however, have perturbations because of their nearly horizontally ho- higher background spread in the VarEnKF experiment mogeneous variance. Nonetheless, taken together, the than in the EnVar-mean2 experiment. As expected, the simplifications to the background error covariances and analysis ensemble spread is generally reduced in each the reduction in the number of assimilated observations experiment relative to the background ensemble spread. have a major impact on the computational cost, as de- This is most noticeable in areas of high background en- tailed in section 4d. semble spread. In some areas with low background en- A cross-validation approach is used in the serial batch semble spread (e.g., over Greenland), however, the EnKF algorithm to avoid an excessive reduction in en- analysis spread is actually larger than the background semble spread during the analysis step that would other- spread, likely due to the addition of the random system wise occur (Mitchell and Houtekamer 2009). Following error perturbations. As a result, the amount of spatial this approach, the 256-member background ensemble is variation of the ensemble spread is reduced in the analysis split into 8 subensembles and only the other 7 of these ensembles relative to the background ensembles. (224 members) are used to estimate the background error The relationship between the analysis and background covariances when computing the analysis increments for ensemble spreads is more clearly shown by computing the the 32 members of each subensemble (Houtekamer et al. relative frequency of occurrence of different combina- 2014b). However, by calculating the ensemble mean tions of the background and analysis spread over all grid analysis increment separately from the ensemble pertur- points for each experiment as shown in Fig. 2. The com- bation increments in the EnVar-mean, EnVar-mean2, bined effect of adding the perturbation analysis in- and VarEnKF experiments, this problem is avoided. crements and the random system error perturbations Consequently, the entire 256-member ensemble of back- results in a slight flattening of the relationship between ground states is used to estimate the B matrix in the cost the analysis and background spread, consistent with the function for the ensemble mean analysis increment [Eq. reduction in spatial variation of the analysis ensemble (8)]. The problem is also avoided in the VarEnKF ex- spread seen in Fig. 1. The analysis spread also appears to periment for the calculation of the ensemble perturbation deviate more from the background spread in the VarEnKF increments since the background ensemble members are experiment (Fig. 2b) than in the EnVar-mean2 experi- not used to estimate the B matrix in the corresponding ment (Fig. 2a). Figure 3 also shows the relative frequency cost function [Eq. (9)]. of occurrence of different combinations of ensemble spread, but comparing either the background spread (Fig. 3a) or the analysis spread (Fig. 3b) between the two 4. Results from numerical experiments experiments. This shows that the two approaches gen- erally result in very similar background and analysis a. Analysis and background ensembles ensemble spread across all grid points, except that for The impact on the ensemble spread of using the sim- areas of low ensemble spread (around 0.4 hPa), the plified variational approach instead of the serial batch VarEnKF experiment has slightly lower spread than EnKF algorithm for computing the ensemble pertur- the EnVar-mean2 experiment. This can be explained by bation analysis increments is shown for a single analysis the differences in the background error covariances used time for surface pressure in Fig. 1. Only the ensemble for computing the ensemble perturbation increments. As spread from the EnVar-mean2 and VarEnKF experi- part of the static climatological covariances used in the ments are compared since these experiments use identical VarEnKF approach, the background error variance is configurations for computing the ensemble mean in- nearly spatially constant, whereas the EnVar-mean2 crement. This allows the impact of using 3DVar instead of experiment uses the spatially varying background error the serial batch EnKF algorithm for the ensemble per- variance obtained from the ensemble spread. Conse- turbation increments to be isolated, which is critical for quently, in areas of low background ensemble spread the VarEnKF approach to have a computation cost the EnVar-mean2 experiment will correctly produce comparable to the current EnKF. The overall amplitude lower-amplitude analysis increments for the ensemble and large-scale spatial variation of the background and perturbations than the VarEnKF experiment. Since analysis ensemble spreads are very similar in the EnVar- these analysis increments generally act to reduce the mean2 and VarEnKF experiments. The background en- ensemble spread [as seen in Eq. (5)], the spread will be semble spreads for the two experiments also have many reduced less strongly in areas with low spread in the smaller-scale features in common, including areas of EnVar-mean2 experiment and the opposite will occur

Unauthenticated | Downloaded 10/07/21 08:39 PM UTC FEBRUARY 2017 B U E H N E R E T A L . 625

FIG. 1. The ensemble spread of surface pressure (in hPa) at 0000 UTC 10 Jan 2015 for the (top) EnVar-mean2 and (bottom) VarEnKF experiments computed from the 256-member (left) background (6-h forecast) ensemble and (right) analysis ensemble. The analysis ensemble spread is computed after the addition of the random perturbations that simulate the various sources of system uncertainty. in areas of high ensemble spread. This may also explain rapidly developing cyclone ‘‘Nina’’ is more notable in why several regions of high background ensemble the VarEnKF experiment (cf. Figs. 1a and 1c). To spread have larger values in the VarEnKF experiment partially remove this difference between the two ap- than in the EnVar-mean2 experiment. For example, proaches, a simple modification with negligible additional large ensemble spread in the North Sea related to the computation cost could be made to the VarEnKF approach

FIG. 2. Normalized frequency of occurrence of the ensemble spread standard deviation of surface pressure (in hPa) from the analysis ensemble vs from the background ensemble for the (a) EnVar-mean2 and (b) VarEnKF experiments at 0000 UTC 10 Jan 2015. The frequencies are normalized such that the maximum value is 1. The solid contour indicates the 0.05 level.

Unauthenticated | Downloaded 10/07/21 08:39 PM UTC 626 MONTHLY WEATHER REVIEW VOLUME 145

FIG.3.AsinFig. 2, but for the EnVar-mean2 vs the VarEnKF experiments computed from (a) the background ensemble and (b) the analysis ensemble. to use the background ensemble spread only for specifying deterministic prediction system to isolate the impact of the background error variance, while still using the static only including the interchannel observation error cor- climatological correlations. relations showed that a reduction in global 72-h fore- cast error standard deviation for temperature of up to b. Control member forecast 0.02 K can be expected from this change, with little The control member forecasts from each experiment impact on the mean error (Heilliette and Garand 2015). are evaluated by comparing them with the set of in- The length of the experiments is likely too short to dependent atmospheric analyses from the ERA-Interim determine definitively if the larger improvements seen (Dee et al. 2011). This comparison is done after first in Fig. 4 (up to 0.05 K for the temperature standard using a spatial averaging procedure for both the fore- deviation) are due to the use of EnVar with hybrid casts and the ERA-Interim analyses to interpolate them background error covariances for the ensemble mean onto a global 1.58 latitude–longitude grid. Consequently, instead of the serial batch EnKF algorithm. A signifi- this evaluation does not include scales that cannot be cant improvement in EnVar-mean relative to EnKF- resolved on this relatively low-resolution grid. Since the control is also seen, except over even more pressure length of the experiments after the spinup period is levels, for the 24- and 48-h global forecasts (not shown). relatively short (i.e., 26 forecasts over 13 days or 52 A similar comparison of 72-h global forecast error forecasts over 26 days) only the globally averaged is shown in Fig. 5 for the EnVar-mean2 (red) and forecast error statistics were computed and only for EnKF-control (blue) experiments. A small additional forecasts up to a maximum lead time of 72 h. reduction in the forecast error standard deviation of all Figure 4 shows the global error standard deviation and variables in the troposphere can be seen due to the as- bias over the 13-day period measured relative to ERA- similation of a much larger volume of satellite observations Interim analyses on pressure levels between 1000 and for computing the ensemble mean analysis increment (cf. 10 hPa for the 72-h forecasts initialized from the analysis Figs. 4 and 5). This comparison shows the benefit of using a ensemble from the EnVar-mean (red) and configuration of EnVar nearly identical to the configura- EnKF-control (blue) experiments. A significant1 re- tion used for the current deterministic prediction system duction in error standard deviation and bias is seen at when computing the ensemble mean analysis increment, several levels in the troposphere for the temperature since both experiments use the serial batch EnKF algo- and geopotential height. Significant improvements to rithm for the perturbation increments. the bias for geopotential height extend into the strato- As in section 4a, the impact of changing the approach sphere. Recall that the only difference between these for computing the ensemble perturbation analysis in- experiments is the assimilation procedure used for the crements is evaluated by comparing the VarEnKF (red) ensemble mean (Table 1). Tests performed with the with EnVar-mean2 (blue) experiments (Fig. 6). The use of a highly simplified variational approach for the ensem- ble perturbation increments appears to result in a nearly 1 Statistical significance is computed using a permutation test and neutral impact on the accuracy of the control member reported for p values $ 0.9. forecast relative to using the serial batch algorithm.

Unauthenticated | Downloaded 10/07/21 08:39 PM UTC FEBRUARY 2017 B U E H N E R E T A L . 627

FIG. 4. The error in the 72-h global control member forecasts relative to ERA-Interim analyses from the EnVar- mean (red) and EnKF-control (blue) experiments computed over the period 0000 UTC 3 Jan–1200 UTC 15 Jan 2015. Both the standard deviation (solid curves) and average (dashed curves) of the error are shown for (a) zonal wind, (b) relative humidity, (c) geopotential height, and (d) temperature. Boxes containing numbers on the right and left side of each panel indicate the level of statistical significance that, respectively, the error standard deviation, and the mean error are different for the two experiments. The color of the box—EnVar-mean (red) and EnKF- control (blue)—indicates the experiment with the error statistic closer to zero.

A small negative impact is only seen for zonal wind and All comparisons shown previously were based on geopotential height in the stratosphere. This negative only a 13-day period. To obtain a more robust result, the impact, however, is much smaller than the positive VarEnKF and EnKF-control experiments were ex- impact of computing the ensemble mean analysis in- tended until 1200 UTC 28 January 2015, a total period of crement using EnVar with hybrid background error 26 days. A direct comparison of these experiments covariances, nonzero interchannel observation error (Fig. 7) shows the combined impact of using a varia- correlations for satellite radiances, and a much larger tional approach for computing the analysis increments number of assimilated observations. for both the ensemble mean and perturbations and all of

Unauthenticated | Downloaded 10/07/21 08:39 PM UTC 628 MONTHLY WEATHER REVIEW VOLUME 145

FIG.5.AsinFig. 4, but showing the 72-h global forecast error relative to ERA-Interim analyses for the EnVar-mean2 (red) and EnKF-control (blue) experiments. the other changes for computing the ensemble mean in- 500-hPa geopotential height forecasts during the entire crement. Even though the period is twice as long, most of period. For the 72-h forecast lead time, the error standard the improvements seen in the comparison of EnVar- deviation is shown to increase substantially for both ex- mean2 versus EnKF-control (Fig. 5) are maintained in periments during the second half of the period. The the VarEnKF versus EnKF-control comparison. The sta- VarEnKF experiment produces forecasts with a lower tistical significance of the difference appears to be lower for global error standard deviation for 77% of the forecasts tropospheric wind and geopotential height in this com- during this period, further supporting the significance of the parison relative to EnVar-means2 versus EnKF-control. improvement relative to the EnKF-control experiment. The This can be partially explained by the increased variability time series for the 24- and 120-h forecast lead times show in the error standard deviations for both experiments that this improvement is even more robust for the shorter during the second half of the 26-day period. Figure 8 shows forecasts, whereas the forecast accuracy is less systemat- the time series of error standard deviation for the global ically improved for the longer forecasts. Qualitatively

Unauthenticated | Downloaded 10/07/21 08:39 PM UTC FEBRUARY 2017 B U E H N E R E T A L . 629

FIG.6.AsinFig. 4, but showing the 72-h global forecast error relative to ERA-Interim analyses for the VarEnKF (red) and EnVar-mean2 (blue) experiments. similar and statistically significant improvements for very similar to that from the EnKF-control experiment the VarEnKF experiment were also seen in compari- (Figs. 9a,b), especially for the 24-h lead time. This dem- sons of forecast error relative to radiosonde observa- onstrates that the changes to the assimilation approach tions (not shown). used for the ensemble mean have only a small visible impact on the medium-range ensemble forecast spread, at c. Ensemble forecast least for surface pressure. This is somewhat surprising The analysis ensembles are used to initialize 20-member given the significant impact on the accuracy of the control medium-range ensemble forecasts. Figure 9 shows an ex- member forecasts from this change (Fig. 5). A slightly ample of the ensemble spread for surface pressure com- more noticeable difference is apparent when comparing puted from the 24- and 72-h forecasts valid at 0000 UTC the ensemble spread from the VarEnKF experiment 10 January 2015 from three experiments. The ensemble (Figs. 9e,f) with the EnVar-mean2 experiment. While spread from the EnVar-mean2 experiment (Figs. 9c,d)is most of the same areas of large spread are seen in both, the

Unauthenticated | Downloaded 10/07/21 08:39 PM UTC 630 MONTHLY WEATHER REVIEW VOLUME 145

FIG.7.AsinFig. 4, but showing the 72-h global forecast error relative to ERA-Interim analyses for the VarEnKF (red) and EnKF-control (blue) experiments and over the longer period of 0000 UTC 3 Jan 2015–1200 UTC 28 Jan 2015. amplitude of the spread is different in several of these. the cumulative probability distribution computed from This is caused by the difference in the assimilation pro- the ensemble members relative to observations (e.g., cedure used for computing the analysis increments for the Hersbach 2000). The CRPS for the VarEnKF (red) and ensemble perturbations, though this change had little im- EnKF-control (blue) experiments is shown in Fig. 10 for pact on the accuracy of the control member forecasts three pressure levels (250, 500, and 850 hPa) as a func- (Fig. 6). tion of lead time between 24 and 120 h. Consistent with The continuous ranked probability score (CRPS) was the impacts on the control member forecasts, the change computed relative to radiosonde zonal wind and tem- of the assimilation procedure for the ensemble mean perature observations to evaluate changes to the overall results in a reduction to the CRPS in the VarEnKF ex- accuracy of the ensemble forecasts for the VarEnKF periment relative to the EnKF-control experiment for experiment relative to the EnKF-control experiment both temperature and zonal wind at the three pressure over the 26-day period. The CRPS measures the error in levels shown and for most lead times. This also indicates

Unauthenticated | Downloaded 10/07/21 08:39 PM UTC FEBRUARY 2017 B U E H N E R E T A L . 631

FIG. 8. Time series of the error standard deviation (solid curves) in the global control member forecasts of 500-hPa geopotential height relative to ERA-Interim analyses from the VarEnKF (red) and EnKF-control (blue) experiments computed over the period 0000 UTC 3 Jan 2015–1200 UTC 28 Jan 2015. Results are shown for forecast lead times of (a) 24, (b) 72, and (c) 120 h. Note that the proportion of the 52 forecasts for which the global error standard deviation is lower for the VarEnKF experiment is 100%, 77%, and 69% for the 24-, 72-, and 120-h forecast lead times, respectively. The dotted lines are linear fits to the time evolution of the standard deviations. that the use of a simplified variational assimilation the EnVar-mean2 experiment would be simple to de- approach instead of the serial batch EnKF algorithm termine since it equals the cost of the serial batch EnKF for the ensemble perturbation increments has a small algorithm used for the ensemble perturbations plus the impact on the overall quality of the ensemble forecasts, cost of the full EnVar used for the ensemble mean, with relative to the changes for the ensemble mean. The the two being executed simultaneously. statistical significance of these differences was deter- The 3DVar used in the VarEnKF experiment to com- mined using the two-sided 90% confidence interval pute the ensemble perturbation increments takes ap- computed using a bootstrap method (Candille et al. proximately 90 s to complete the 70 iterations of the cost 2007). By this measure, the improvements to the fore- function minimization for one member using 128 pro- casts are statistically significant for both variables, the cessors on the IBM Power7 cluster at ECCC. Currently, three pressure levels shown, and for most lead times (as 16 such jobs are submitted in parallel, each computing the indicatedwiththecoloredcirclesinthefigure). analysis increments for 16 members sequentially, taking a total of about 24 min (using 2048 processors). This com- d. Computational cost pares with the four-dimensional version of EnVar used In this section, the computational cost of the serial batch for the ensemble mean analysis increment that cur- EnKF algorithm used in the current operational system is rently takes about 16 min on 512 processors to also compared with the overall cost of the VarEnKF configu- complete 70 iterations. Therefore, the EnVar requires ration evaluated in this study. The comparison focuses approximately 40 times the computational resources of mostly on the cost of calculating the ensemble perturba- the 3DVar due to both the larger number of assimilated tion analysis increments since this dominates the overall observations and the use of 4D background error co- cost of the VarEnKF approach. The computational cost of variances obtained from the 256 ensemble members.

Unauthenticated | Downloaded 10/07/21 08:39 PM UTC 632 MONTHLY WEATHER REVIEW VOLUME 145

FIG. 9. Ensemble spread of surface pressure (in hPa) for the (top) EnKF-control, (middle) EnVar-mean2, and (bottom) VarEnKF experiments computed from the 20-member ensemble forecasts with (left) 24- and (right) 72-h lead times valid at 0000 UTC 10 Jan 2015.

The EnVar for the ensemble mean can be executed in ensemble as used in the current EnKF takes about 11 min parallel with multiple 3DVar jobs for the ensemble using 2048 processors. This algorithm, however, also perturbations. Moreover, since the analysis increment requires a separate preprocessing of the observations that for each ensemble perturbation is independent of all takes another 8 min on 256 processors. Accounting for this the others, their calculation scales perfectly up to a very additional execution time (assuming it cannot be signifi- large number of processors. It would therefore be cantly decreased by increasing the number of processors straightforward to reduce the execution time for com- beyond 256) results in a total of approximately 19 min. puting the perturbation increments, if more processors Therefore the total execution time of VarEnKF is only were available, by simply splitting the 256 members moderately higher than that of the current EnKF algo- over more than 16 jobs (up to a maximum of 256 sep- rithm, but with the benefit of only needing to develop, arate parallel jobs). maintain and test a single data assimilation system for For comparison, the assimilation algorithm of serially both deterministic and ensemble prediction. This increase assimilating batches of observations to update the entire in computational cost relative to the current EnKF

Unauthenticated | Downloaded 10/07/21 08:39 PM UTC FEBRUARY 2017 B U E H N E R E T A L . 633

FIG. 10. The continuous ranked probability score (CRPS) relative to radiosonde observations of (left) zonal wind and (right) temperature at (top) 250, (middle) 500, and (bottom) 850 hPa for lead times between 24 and 120 h for the VarEnKF (red) and EnKF-control (blue) experiments. The small colored circles indicate the lead times for which the difference in CRPS is statistically significant, with lower values for VarEnKF in all cases. The CRPS is com- puted over the period 0000 UTC 3 Jan 2015–1200 UTC 28 Jan 2015. algorithm is small compared with the two orders of mag- EnVar for computing the ensemble mean analysis in- nitude increase that would be required to use the basic crement within the ensemble data assimilation system. EDA approach with an ensemble of 256 data assimilation In other words, the EnKF analysis ensemble would be systems with each equivalent to the deterministic system. recentered on an EnVar analysis produced using the The total execution time of VarEnKF could be reduced to background ensemble mean as the background state. match the execution time of the current EnKF, when in- The results from this study indicate that this relatively cluding its observation preprocessing step, by increasing easy modification would lead to significantly improved the total number of processors used for the ensemble ensemble forecasts. This would also increase the con- perturbation analysis increments from 2048 to 2560. sistency between the ensemble mean analysis and the deterministic analysis, since the EnVar configurations 5. Conclusions for the corresponding systems would be nearly identical. Alternatively, it may be possible to obtain a similar In the context of the currently operational prediction improvement by using the deterministic analysis state systems at ECCC, a dramatic increase in the volume of itself to recenter the EnKF analysis ensemble. This assimilated observations along with other benefits could would be more similar to how most other NWP centers be realized by using the four-dimensional version of construct their analysis ensembles. Such a change would

Unauthenticated | Downloaded 10/07/21 08:39 PM UTC 634 MONTHLY WEATHER REVIEW VOLUME 145 also significantly reduce the impact of further improve- Quart. J. Roy. Meteor. Soc., 141, 2803–2812, doi:10.1002/ ments to the serial batch EnKF assimilation approach used qj.2565. in the current system (e.g., from assimilating additional Buehner, M., 2005: Ensemble-derived stationary and flow-dependent background-error covariances: Evaluation in a quasi-operational types of observations), since it would only be used to setting. Quart. J. Roy. Meteor. Soc., 131, 1013–1043, doi:10.1256/ modify the ensemble spread and not the ensemble mean. qj.04.15. Consequently, more research resources could be dedicated ——, and A. Shlyaeva, 2015: Scale-dependent background-error to improving the quality of the EnVar analysis technique, covariance localisation. Tellus, 67A, 28027, doi:10.3402/ having an automatic benefit for both deterministic and tellusa.v67.28027. ——, P. L. Houtekamer, C. Charette, H. L. Mitchell, and B. He, ensemble systems. Alternatively, by also using a simplified 2010: Intercomparison of variational data assimilation and the variational approach for computing the ensemble pertur- ensemble Kalman filter for global deterministic NWP. Part I: bation analysis increments, that is, the complete VarEnKF Description and single-observation experiments. Mon. Wea. approach, the need to maintain two different data assim- Rev., 138, 1550–1566, doi:10.1175/2009MWR3157.1. ilation algorithms would be avoided. ——, J. Morneau, and C. Charette, 2013: Four-dimensional ensemble-variational data assimilation for global de- The conclusion that the analysis increments for the terministic weather prediction. Nonlinear Processes Geophys., ensemble perturbations can be computed using a simpler 20, 669–682, doi:10.5194/npg-20-669-2013. data assimilation approach without significant impact on ——, and Coauthors, 2015: Implementation of deterministic the accuracy of the ensemble forecasts could possibly be weather forecasting systems based on ensemble-variational used to adapt other ensemble data assimilation schemes. data assimilation at Environment Canada. Part I: The global system. Mon. Wea. Rev., 143, 2532–2559, doi:10.1175/ For example, it suggests that the EDA approach used at é é MWR-D-14-00354.1. ECMWF and M t o-France could be made much more Buizza, R., P. L. Houtekamer, G. Pellerin, Z. Toth, Y. Zhu, and efficient without a significant degradation to forecast skill. M. Wei, 2005: A comparison of the ECMWF, MSC, and This could be accomplished by using the full 4DVar only NCEP global ensemble prediction systems. Mon. Wea. Rev., for updating the ensemble mean and a much cheaper 133, 1076–1097, doi:10.1175/MWR2905.1. 3DVar (possibly assimilating fewer observations) for the Campbell, W. F., C. H. Bishop, and D. Hodyss, 2010: Vertical co- variance localization for satellite radiances in ensemble Kalman ensemble perturbations, instead of the current approach filters. Mon. Wea. Rev., 138, 282–290, doi:10.1175/ of using 4DVar for all members. The resulting reduction 2009MWR3017.1. in computational cost would facilitate an increase in the Candille, G., C. Cot^ é, P. L. Houtekamer, and G. Pellerin, 2007: ensemble size and thus improve the usefulness of the re- Verification of an ensemble prediction system against obser- 135 sulting ensembles for application in both the ensemble vations. Mon. Wea. Rev., , 2688–2699, doi:10.1175/ MWR3414.1. and deterministic prediction systems. Clayton, A. M., A. C. Lorenc, and D. M. Barker, 2013: Operational Additional research is required to determine whether implementation of a hybrid ensemble/4D-Var global data as- simplifications similar to those employed in this study for similation system at the Met Office. Quart. J. Roy. Meteor. computing the ensemble perturbation increments would Soc., 139, 1445–1461, doi:10.1002/qj.2054. also have little impact on ensemble forecast accuracy in Dee, D. P., and Coauthors, 2011: The ERA-Interim reanalysis: Configuration and performance of the data assimilation sys- other types of systems. For example, the VarEnKF ap- tem. Quart. J. Roy. Meteor. Soc., 137, 553–597, doi:10.1002/ proach could be tested in a higher-resolution NWP system qj.828. with a more rapid cycling strategy (e.g., 3-hourly or hourly Desroziers, G., L. Berre, B. Chapnik, and P. Poli, 2005: Diagnosis cycling). More numerical experiments are also needed to of observation, background and analysis-error statistics in 131 confirm the results presented in this study for a global sys- observation space. Quart. J. Roy. Meteor. Soc., , 3385–3396, doi:10.1256/qj.05.108. tem over longer test periods and during different seasons Fairbairn,D.,S.R.Pring,A.C.Lorenc,andI.Roulstone,2014:A before proposing any changes to the operational system. comparison of 4DVar with ensemble data assimilation methods. Quart. J. Roy. Meteor. Soc., 140,281–294,doi:10.1002/ Acknowledgments. The authors thank Martin Charron qj.2135. and two anonymous reviewers whose comments helped Gagnon, N., X.-X. Deng, P. L. Houtekamer, S. Beauregard, to improve an earlier version of the paper. A. Erfani, M. Charron, R. Lahlou, and J. Marcoux, 2014: Im- provements to the Global Ensemble Prediction System (GEPS) from version 3.1.0 to 4.0.0. Canadian Meteorological Centre REFERENCES Tech. Note, 49 pp. [Available online at http://collaboration.cmc. ec.gc.ca/cmc/CMOI/product_guide/docs/lib/technote_geps-400_ Anderson, E., and H. Järvinen, 1999: Variational quality control. 20141118_e.pdf.] Quart. J. Roy. Meteor. Soc., 125, 697–722, doi:10.1002/ ——, and Coauthors, 2015: Improvements to the Global Ensemble qj.49712555416. Prediction System (GEPS) from version 4.0.1 to version 4.1.1. Berre, L., H. Varella, and G. Desroziers, 2015: Modelling of flow- Canadian Meteorological Centre Tech. Note, 36 pp. [Available dependent ensemble-based background-error correlations online at http://collaboration.cmc.ec.gc.ca/cmc/CMOI/product_ using a wavelet formulation in 4D-Var at Météo-France. guide/docs/lib/technote_geps-411_20151215_e.pdf.]

Unauthenticated | Downloaded 10/07/21 08:39 PM UTC FEBRUARY 2017 B U E H N E R E T A L . 635

Girard, C., and Coauthors, 2014: Staggered vertical discretization Lorenc, A. C., 2003: The potential of the ensemble Kalman filter of the Canadian Environmental Multiscale (GEM) model for NWP—A comparison with 4D-Var. Quart. J. Roy. Meteor. using a coordinate of the log-hydrostatic-pressure type. Mon. Soc., 129, 3183–3203, doi:10.1256/qj.02.132. Wea. Rev., 142, 1183–1196, doi:10.1175/MWR-D-13-00255.1. ——, N. Bowler, A. Clayton, and S. Pring, 2014: Development of Hamill, T. M., and C. Snyder, 2000: A hybrid ensemble kalman filter– the Met Office’s 4DEnVar System. Sixth EnKF Data Assim- 3D variational analysis scheme. Mon. Wea. Rev., 128, 2905–2919, ilation Workshop, Buffalo, NY, Pennsylvania State Univer- doi:10.1175/1520-0493(2000)128,2905:AHEKFV.2.0.CO;2. sity. [Available online at http://hfip.psu.edu/fuz4/EnKF2014/ Heilliette, S., and L. Garand, 2015: Impact of accounting for inter- EnKF-Day1/Lorenc_4DEnVar.pptx.] channel error covariances at the Canadian Meteorological Cen- Magnusson, L., M. Leutbecher, and E. Källén, 2008: Comparison ter. Oral Proceedings of the 2015 EUMETSAT Meteorological between singular vectors and breeding vectors as initial per- Satellite Conference, Session 1, Toulouse, France, EUMETSAT. turbations for the ECMWF Ensemble Prediction System. Mon. [Available online at www.eumetsat.int/website/home/News/ Wea. Rev., 136, 4092–4104, doi:10.1175/2008MWR2498.1. ConferencesandEvents/PreviousEvents/DAT_2305526.html.] Mitchell, H. L., and P. L. Houtekamer, 2009: Ensemble Kalman Hersbach, H., 2000: Decomposition of the continuous ranked filter configurations and their performance with the logis- probability score for ensemble prediction systems. Wea. Fore- tic map. Mon. Wea. Rev., 137, 4325–4343, doi:10.1175/ casting, 15,559–570,doi:10.1175/1520-0434(2000)015,0559: 2009MWR2823.1. DOTCRP.2.0.CO;2. National Weather Service, 2015: Technical implementation notice Houtekamer, P. L., L. Lefaivre, J. Derome, H. Ritchie, and H. L. 15-43: Global Ensemble Forecast System (GEFS). (Changes Mitchell, 1996: A system simulation approach to ensemble effective 2 December 2015), accessed 8 January 2016. prediction. Mon. Wea. Rev., 124, 1225–1242, doi:10.1175/ [Available online at http://www.nws.noaa.gov/os/notification/ 1520-0493(1996)124,1225:ASSATE.2.0.CO;2. tin15-43gefsaad.htm.] ——, H. L. Mitchell, G. Pellerin, M. Buehner, M. Charron, Parrish, D., and J. Derber, 1992: The National Meteorological Cen- L. Spacek, and B. Hansen, 2005: Atmospheric data assimilation ter’s spectral statistical interpolation analysis scheme. Mon. Wea. with the ensemble Kalman filter: Results with real observations. Rev., 120, 1747–1763, doi:10.1175/1520-0493(1992)120,1747: Mon. Wea. Rev., 133,604–620,doi:10.1175/MWR-2864.1. TNMCSS.2.0.CO;2. ——, ——, and X. Deng, 2009: Model error representation in an Qaddouri, A., C. Girard, L. Garand, A. Plante, and D. Anselmo, operational ensemble Kalman filter. Mon. Wea. Rev., 137, 2015: Changes to the Global Deterministic Prediction System 2126–2143, doi:10.1175/2008MWR2737.1. (GDPS) from version 4.0.1 to version 5.0.0—Yin–Yang grid ——, X. Deng, H. L. Mitchell, S.-J. Baek, and N. Gagnon, 2014a: configuration. Canadian Meteorological Centre Tech. Note, 59 Higher resolution in an operational ensemble Kalman filter. Mon. pp. [Available online at http://collaboration.cmc.ec.gc.ca/cmc/ Wea. Rev., 142, 1143–1162, doi:10.1175/MWR-D-13-00138.1. CMOI/product_guide/docs/lib/technote_gdps-500_20151215_ ——, B. He, and H. L. Mitchell, 2014b: Parallel implementation of e.pdf.] an ensemble Kalman filter. Mon. Wea. Rev., 142, 1163–1182, Raynaud, L., and F. Bouttier, 2016: Comparison of initial pertur- doi:10.1175/MWR-D-13-00011.1. bation methods for ensemble prediction at convective scale. Hunt, B. R., E. J. Kostelich, and I. Szunyogh, 2007: Efficient data Quart. J. Roy. Meteor. Soc., 142, 854–866, doi:10.1002/qj.2686. assimilation for spatiotemporal chaos: A local ensemble Tippett, M. K., J. L. Anderson, C. H. Bishop, T. M. Hamill, and transform Kalman filter. Physica D, 230, 112–126, doi:10.1016/ J. S. Whitaker, 2003: Ensemble square root filters. Mon. Wea. j.physd.2006.11.008. Rev., 131, 1485–1490, doi:10.1175/1520-0493(2003)131,1485: Isaksen, L., M. Bonavita, R. Buizza, M. Fisher, J. Hasler, ESRF.2.0.CO;2. M. Leutbecher, and L. Raynaud, 2010: Ensemble of data as- Wang, X., and T. Lei, 2014: GSI-based four dimensional ensemble- similations at ECMWF. ECMWF Tech. Memo. 636, 48 pp. variational (4DEnsVar) data assimilation: Formulation and [Available online at http://www.ecmwf.int/sites/default/files/ single resolution experiments with real data for NCEP Global elibrary/2010/10125-ensemble-data-assimilations-ecmwf.pdf.] Forecast System. Mon. Wea. Rev., 142, 3303–3325, doi:10.1175/ Kleist, D., and K. Ide, 2015: An OSSE-based evaluation of hybrid MWR-D-13-00303.1. variational-ensemble data assimilation for the NCEP GFS. Whitaker, J. S., and T. M. Hamill, 2002: Ensemble data assimi- Part II: 4DEnVar and hybrid variants. Mon. Wea. Rev., 143, lation without perturbed observations. Mon. Wea. Rev., 452–470, doi:10.1175/MWR-D-13-00350.1. 130, 1913–1924, doi:10.1175/1520-0493(2002)130,1913: Lei, L., and J. S. Whitaker, 2015: Model space localization is not EDAWPO.2.0.CO;2. always better than observation space localization for assimi- ——, and ——, 2012: Evaluating methods to account for system lation of satellite radiances. Mon. Wea. Rev., 143, 3948–3955, errors in ensemble data assimilation. Mon. Wea. Rev., 140, doi:10.1175/MWR-D-14-00413.1. 3078–3089, doi:10.1175/MWR-D-11-00276.1.

Unauthenticated | Downloaded 10/07/21 08:39 PM UTC