FEBRUARY 2009 R U I Z E T A L . 319

Comparison of Methods Used to Generate Probabilistic Quantitative Precipitation Forecasts over South America

JUAN RUIZ AND CELESTE SAULO Centro de Investigaciones del Mar y la Atmo´sfera–CONICET/University of Buenos Aires, and Departamento de Ciencias de la Atmo´sfera y los Oce´anos, Facultad de Ciencias Exactas y Naturales, University of Buenos Aires, Buenos Aires,

EUGENIA KALNAY Department of Atmospheric and Oceanic Sciences, University of Maryland, College Park, College Park, Maryland

(Manuscript received 6 December 2007, in final form 22 August 2008)

ABSTRACT

In this work, the quality of several probabilistic quantitative precipitation forecasts (PQPFs) is examined. The analysis is focused over South America during a 2-month period in the warm season. Several ways of generating and calibrating the PQPFs have been tested, using different ensemble systems and single-model runs. Two alternative calibration techniques (static and dynamic) have been tested. To take into account different precipitation regimes, PQPF performance has been evaluated over two regions: the northern part of South America, characterized by a tropical regime, and the southern part, where synoptic-scale forcing is stronger. The results support the adoption of such area separation, since differences in the precipitation regimes produce significant differences in PQPF performance. The more skillful PQPFs are the ones obtained after calibration. PQPFs derived from the ensemble mean also show higher skill and better reliability than those derived from the single ensemble members. The performance of the PQPFs derived from both ensemble systems is similar over the southern part of the region; however, over the northern part the superensemble approach seems to achieve better results in both reliability and skill. Finally, the impact of using Climate Prediction Center morphing technique (CMORPH) estimates to calibrate the precipitation forecast has been explored since the more extensive coverage of this dataset would allow its use over areas where the rain gauge coverage is insufficient. Results suggest that systematic biases present in the CMORPH estimates produce only a slight degradation of the resulting PQPF.

1. Introduction and even to the assimilation of single observations. Continuous efforts are devoted to improving forecast Quantitative precipitation forecasts (QPFs) are one quality, with being an example of of the most difficult and least accurate products avail- one possible strategy for dealing with errors arising able via numerical weather prediction (NWP) (Ebert from uncertainties in the initial conditions (Toth and 2001; Stensrud and Yussouf 2007). Moreover, the con- Kalnay 1993, 1997; Molteni et al. 1996, and many tinuous increase in model resolution poses an extra others). Although initially conceived for use in medium- challenge to QPFs given the highly unpredictable to long-range global forecasts, in the last few years en- character of the mesoscale precipitation features. Ac- semble forecasting has been tailored to short-range cording to Zhang et al. (2002, 2006), the forecasted weather prediction through the use of regional model details of precipitation patterns for a particular day are ensembles (Du et al. 1997; Hamill and Colucci 1997). This sensitive to the initial conditions, the model resolution, possibility makes ensemble systems more appealing to operational and/or research centers with less computa- Corresponding author address: Juan Ruiz, Centro de Inves- tional resources, as in our own case. An interesting char- tigaciones del Mar y Atmo´ sfera, Facultad de Ciencias Exactas y Naturales, Ciudad Universitaria, Pabello´ n II, 2do Piso, Buenos acteristic of ensemble systems is that probability forecasts Aires 1428, Argentina. can easily be created, leading to the generation of prob- E-mail: [email protected] abilistic QPFs (PQPFs) (Du et al. 1997, among others).

DOI: 10.1175/2008WAF2007098.1

Ó 2009 American Meteorological Society Unauthenticated | Downloaded 09/29/21 10:36 PM UTC 320 WEATHER AND FORECASTING VOLUME 24

Different methodologies for obtaining PQPFs, and ensemble’’ approaches have the potential to produce corresponding measures to quantify their usefulness, reliable forecasts, since they incorporate both uncer- have been developed. Of particular interest is how to tainty in the initial conditions (since members start from obtain a reliable PQPF, that is, a system where the different times or from different analyses) and in the forecasted frequency of a particular weather phenom- model errors (particularly in the representation of enon is close to the observed probability. The impor- subgrid-scale processes) because of their multimodel tance of PQPF reliability is directly related to its effect nature, while being essentially cost free in terms of upon the economic value of the forecast. As discussed computer use. by Zhu et al. (2002) for a simple cost–loss analysis of The effects of static and dynamic calibrations upon the forecast economic value, the optimum probability PQPF quality are assessed in this study. Previous work threshold to take protection from a particular weather (e.g., HC98) used a fixed dataset to calibrate the PQPFs phenomenon can be determined theoretically, provided (i.e., static calibration), but here a dynamic calibration that the information is reliable. However, as shown by dataset consisting of data from days previous to the Hamill and Colucci (1998, hereafter HC98) the PQPFs actual forecast date is also tested. The potential benefit derived directly from the ensemble are usually not re- of the dynamic calibration is that it includes information liable since model errors and the methods selected to about the current weather regime in the calibration construct the ensemble introduce biases in the fore- process, while it is not affected by model changes, since casted probabilities. it is recalculated every day at very low computational Several techniques have been developed to generate cost. These advantages, if accompanied by a similar reliable PQPFs. For example, Hamill and Colucci level of performance as that obtained with a static cal- (1997) introduced a technique based on rank histograms ibration procedure, make this alternative very attractive constructed with previous forecasts, which are then used within an operational framework. to calibrate the PQPF. This PQPF proved to be more PQPFs obtained via the combination of the above- reliable than the uncalibrated forecasts. This technique mentioned ensemble systems/calibration techniques are was further investigated and improved by Eckel and analyzed through the computation of several scores that Walters (1998), who performed a more detailed analysis also allow for comparison with results obtained in pre- of the dependence of the rank histograms based on vious works. To the authors’ knowledge, this is the first ensemble spread. Another interesting alternative arises time that a comparison of a wide variety of PQPFs has from creating PQPFs based on the ensemble mean been carried out, especially for South America, a region (HC98), which can be as reliable as the calibrated en- that is limited by poor data coverage. semble PQPF. Gallus and Segal (2004) applied this idea The lack of enough rain gauge precipitation data to to a single-model run and showed that the PQPF de- perform the calibration process is one of the main rived from a deterministic forecast is reliable and pro- constraints in applying the previous methodologies to vides good guidance to the forecasters. These results operational PQPFs over this region. Moreover, the should be taken into account when a cost–benefit amount of rain gauge data is usually insufficient to even strategy is under consideration: Are PQPFs derived perform a proper verification of the precipitation fore- from ensemble systems reliable, skillful, and valuable? casts (Saulo and Ferreira 2003). However, in the last If so, do they outperform those generated from single- decade, precipitation datasets such as the Climate Pre- model runs, which are much cheaper and faster to obtain? diction Center morphing technique (CMORPH; Joyce To address this issue, various PQPFs generated et al. 2004), which combine microwave estimates of through two different regional ensemble systems and precipitation with high temporal resolution IR estimates through individual model runs over South America, are of cloud motion, have become available. CMORPH has evaluated in this work. One regional ensemble system a homogeneous regional coverage and high spatial as is based on the scaled lagged averaged forecasting well as temporal resolution (30-min accumulated pre- (SLAF) technique (Ebisuzaki and Kalnay 1991), which cipitation). For these reasons, we have explored the po- is one of the simplest and computationally most inex- tential of using CMORPH data for PQPF calibration, pensive methods to incorporate uncertainties in the since it could be an interesting alternative for PQPF initial and boundary conditions into a regional ensem- calibration over this and other regions where the gauge ble. The other one is based on a mixed global–regional network is too coarse. model ensemble system [Super Model Ensemble Sys- This work is organized as follows: in section 2, data- tem (SMES)] approach (Silva Dias et al. 2006). As has sets used for verification and calibration, and method- been discussed by Krishnamurti et al. (1999) and Ebert ologies used to obtain PQPFs, are presented. Also, this (2001), PQPFs based on such multicenter ‘‘super- section introduces the scores selected to describe PQPF

Unauthenticated | Downloaded 09/29/21 10:36 PM UTC FEBRUARY 2009 R U I Z E T A L . 321 attributes. The results obtained with the different PQPFs turbed members (generated using the global forecasts are analyzed in section 3, while section 4 has a discussion started 12, 24, 36, and 48 h before the initialization time) and suggestions for future research. plus a control run. The global forecast data are available through the National Oceanic and Atmospheric Ad- ministration’s (NOAA) ftp site (ftpprd.ncep.noaa.gov). 2. Data and methodology The SLAF ensemble runs were performed between 22 a. The construction of regional ensemble systems October 2005 and 27 November 2005 (period used for calibration) and also between 3 October 2006 and 31 1) THE SLAF–WRF ENSEMBLE SYSTEM December 2006 (period used for verification). This is a single-model, single-configuration ensemble 2) SUPER MODEL ENSEMBLE SYSTEM based on version 2.0 of the Advanced Research Weather Research and Forecasting model (ARW- The Super Model Ensemble System (SMES), available WRF; Skamarock et al. 2005). The grid follows a Mer- through the Applied to Regional Weather cator projection with a horizontal resolution of ap- Systems (MASTER) Laboratory at the University of Sao proximately 50 km and 31 vertical sigma levels. The Paulo (Silva Dias et al. 2006), is a coordinated effort of microphysics scheme utilized is the Eta grid-scale cloud several operational and research institutions involved in and precipitation approach (EGCP; information online weather prediction for the South American region. The at http://www.emc.ncep.noaa.gov/mmb/mmbpll/eta12tpb/). SMES is composed of more than 10 different models and Convection is parameterized using the Grell scheme more than 34 alternative configurations and settings. It (Grell and Devenyi 2002); radiative fluxes are treated also includes diverse initial conditions obtained by dif- following the rapid radiative transfer model (RRTM; ferent data assimilation systems. In this work a subset of Mlawer et al. 1997) and Dudhia (1989). Boundary layer the SMES is selected to evaluate PQPFs over South processes are parameterized using Mellor and Yamada America. The selection is based mainly on domain size: (1982), and the Noah four-layer surface model is used to only the models that cover larger domains were consid- represent surface processes (Chen and Dudhia 2001). ered because this increases the number of rain gauges Different ensemble members are obtained through available for verification. Calibration and verification the perturbation of initial and boundary conditions us- were only performed over the area where all the SMES ing the SLAF technique (Ebisuzaki and Kalnay 1991). subsample forecasts were available (see Fig. 1). This technique was applied to a regional ensemble The individual members of the Centro de Previsao de forecast for the first time during the Storm and Meso- Tempo e Estudos Clima´ticos (CPTEC) global ensemble scale Ensemble Experiment (SAMEX; Hou et al. 2001). were not included in the SMES in order to avoid a bias As described by Hou et al. (2001), the SLAF technique of the ensemble toward the behavior of this particular seeks to create dynamically growing perturbations that model; instead, the global ensemble mean was treated can be applied both to initial and boundary conditions. as a member of the SMES. The models included in this These perturbations include ‘‘errors of the day’’; that is, subset of the MASTER superensemble and their basic they are flow dependent. One of the major advantages characteristics are listed in Table 1. of this technique is that no extra computational cost is Most of the SMES members included in the subsample required to generate the perturbations. In the SLAF are initialized and archived twice a day at 0000 and 1200 method, perturbations to the initial conditions are the UTC; for these cases, both runs are included as different differences between previous global forecasts verifying ensemble members. These members can be considered at the same time and the analysis corresponding to that to be representative perturbations of initial and bound- time. The perturbations are scaled in order to account ary conditions for a particular model configuration based for the fact that perturbations associated with ‘‘older’’ on lagged averaged forecasts (Dalcher et al. 1988). In this forecasts are usually larger than the ones associated way the SMES takes into account not only model errors with ‘‘younger’’ ones. Ensemble dispersion and root- but also errors in the initial and boundary conditions. mean-square errors (RMSEs) were previously analyzed This also holds for the CPTEC global ensemble mean, for this particular ensemble configuration, showing which intrinsically contains information about different dispersion growth rates comparable to other short- initial conditions. On the other hand, the Medium-Range range ensemble systems. Forecast model (MRF), the Eta Model at the University Global Forecasting System (GFS) forecasts started up of Maryland, and the Coupled Aerosol and Tracer to 48 h before each initialization time were employed to Transport Model within the Brazilian Developments on generate the perturbations so each ensemble is com- the Regional Atmospheric Modeling System (CATT- posed of four pairs of positively and negatively per- BRAMS) are only archived once a day at 0000 UTC.

Unauthenticated | Downloaded 09/29/21 10:36 PM UTC 322 WEATHER AND FORECASTING VOLUME 24

FIG. 1. SLAF-ensemble domain (gray rectangle), SMES subsample domain (black rectangle), and GTS rain gauge locations (black dots). The black dashed line indicates the boundary between the NR and SR (see explanation in the text).

It should be noted that the SMES precipitation of passive microwave–IR precipitation estimates (e.g., forecasts are stored and interpolated to a large number CMORPH; information online at http://www.cpc.ncep. of stations in the domain area by MASTER (D. Soares noaa.gov/products/janowiak/cmorph_description.html), Moreira 2007, personal communication). For this rea- which have a horizontal resolution of 8 km and a tem- son, the PQPF product derived here is constructed on poral resolution of 30 min (Joyce et al. 2004). This par- the basis of station values, and not on grid space. SMES ticular dataset has been selected because in this type of forecasts used in this study span from 1 September to 31 algorithm the use of microwave data produces better December 2006. b. Calibration process TABLE 1. List of members included in the SMES subsample used in this study. 1) REFERENCE DATASETS Resolution Two different datasets were used to perform the Model (km) Reference forecast calibration and verification. Most of the study Eta 40 40 Mesinger et al. (1988) has been performed using rain gauge precipitation ob- Eta 20 20 Mesinger et al. (1988) servations from the Global Telecommunication System Global T213 63* Kinter et al. (1997) (GTS), which provides 24-h accumulated precipitation BRAMS University 80–20 http://www.bramsuba.com.ar/ of Buenos Aires at individual stations, measured at 1200 UTC. The (UBA) number of stations over the domain of interest is 251, HRM 30 http://www.mar.mil.br/dhn/ and their locations are indicated in Fig. 1. Only pre- chm/meteo/ cipitation values under 300 mm day21 where considered Eta RPSAS 40 http://www.cptec.inpe.br/ ; in the calibration–verification process. All of the veri- ioweb/index.shtml BRAMS-CATT 30 Freitas et al. (2006) fication scores and diagrams discussed later in the text WRF 60–20 Skamarock et al. (2005) are computed by comparing the forecasts interpolated Eta University of 80–22 Mesinger et al. (1988) to the position of the rain gauge locations with the ob- Maryland (UMD) served precipitation at these locations. MRF 200* Kalnay et al. (1990) As mentioned in the introduction, an alternative to the Ensemble CPTEC 100* Mendocxa and Bonatti (2006) coarse rain gauge network over South America is the use * The approximate resolution of the spectral model.

Unauthenticated | Downloaded 09/29/21 10:36 PM UTC FEBRUARY 2009 R U I Z E T A L . 323

FIG. 2. Rank histograms for the SMES during the whole experimental period at 48-h forecast time: (a) low dispersion, (b) medium dispersion, (c) high dispersion, and (d) total for the NR. (e), (f), (g), (h) Same as in (a), (b), (c), (d) but for the SR. estimates of the precipitation amounts than the previous threshold is 70%. However, it has been shown that infrared-based estimations (Joyce et al. 2004, Ebert et al. uncalibrated PQPFs do not represent the observed 2007). CMORPH performance and its potential to be probability; that is, they are not reliable, as can be used for verification purposes over the region of interest inferred from the rank histogram in Fig. 2. To deal with have been addressed in Kousky et al. (2006) and Ruiz this limitation, different calibration strategies based on 2008, manuscript submitted to Rev. Bras. Meteor., here- the use of rank histograms are tested (refer to HC98 for after RRBM). For comparison with the GTS rain gauge a detailed explanation on the method). The HC98 tech- data, the 0.258-resolution CMORPH estimates were inter- nique uses the rank histogram of previous forecasts to polated using the nearest neighbor to the GTS station lo- compute the probability associated with each precipita- cations. To assess the potential error resulting from using tionthreshold.TheonlydifferencewithHC98isinthe these precipitation estimates, only the CMORPH data in- estimation of probabilities of threshold exceedances for terpolated to the GTS stations will be used for calibration. thresholds beyond the maximum ensemble forecast value. For these thresholds, instead of using the parametric 2) STATIC AND DYNAMIC CALIBRATION distribution with the best fit to the probability distribution Uncalibrated PQPFs can be directly calculated from of precipitation, we have estimated this probability di- any ensemble system: for example, if 7 out of 10 mem- rectly from the data for the corresponding period. bers forecast precipitation above a specified thresh- As discussed in Hamill and Colucci (1997) and Eckel old, the forecasted precipitation probability for this and Walters (1998), the rank histogram is highly dependent

Unauthenticated | Downloaded 09/29/21 10:36 PM UTC 324 WEATHER AND FORECASTING VOLUME 24 on the ensemble spread. To take into account this depen- for the calibration of 2-m temperature and by Stensrud dence, three dispersion categories are defined (low, me- and Yussouf (2007). In this way, the current weather dium, and high) and different rank histograms are com- regime is taken into account. This advantage cannot be puted for each category (Fig. 2). These categories are assured when data from previous years and/or long time equally probable since the thresholds selected to define periods are employed, but with the dynamic calibration them are the terciles of the dispersion distribution there is a slight shift in the seasonal cycle used for (without taking into account the null dispersion cases). training. A drawback of this implementation becomes The number of categories can be increased as in Eckel clear over a region with a limited number of observa- and Walters (1998) to obtain a more detailed calibration, tions, since the use of few data could lead to a noisier butinthepresentcase,thelackofdatatocompute rank histogram, which in turn could degrade the cali- various rank histograms would lead to small—and bration. A way to avoid this problem would be to use unrepresentative—samples for each category. satellite precipitation estimates as the verifying truth to A relevant issue for the area of interest is the sensi- construct the rank histogram, since this would increase tivity of rank histograms to different precipitation re- the number of points for calibration. This particular is- gimes and/or times of the year (Eckel and Walters 1998). sue will be explored in section 3c of this work. In the present study the geographical dependence of the The dynamic calibration has been applied to both rank histogram is handled through the division of the ensemble systems. In the case of SMES, the imple- domain into two main subdomains separated by the 208S mentation is more complex since it is possible that not parallel. We will refer to these regions as the northern all the ensemble members are available for a particular region (NR), broadly characterized by a tropical regime, day and/or for the whole calibration period. To avoid and the southern region (SR), which is more affected by this source of variability in the calibration, a minimum midlatitude synoptic variability (Fig. 1). Important dif- number of 10 models (past and current forecast data) is ferences in individual model performance over these required to start the calibration process. In the case of two areas (Ruiz et al. 2006a) further support this geo- the SMES the rank histogram has to be recomputed graphical division. The seasonal dependence of rank every day for the available members of the ensemble histograms is also taken into account in both calibration (because the number of bins changes as the number of strategies as will be discussed in what follows. members changes). Consequently, the number of mem- The essential difference between the static and the bers in the SMES is as large as or larger than the number dynamic calibrations arises from the choice of which of members in the SLAF ensemble. time periods are employed to perform the calibration. Figure 2 shows the rank histograms for the SMES In the static calibration the rank histograms are com- over the southern and northern regions that illustrate puted using ensemble forecasts for a similar period but some of the issues discussed above. There is a strong wet from a previous year (or years, if available), keeping in bias in the SMES especially in the NR, as denoted by some way the information about the ‘‘typical’’ precipi- the higher frequency of the truth lying in the first ranks tation regime during a certain month (or season) of the of the histogram (e.g., Figs. 2b–d). A strong dependence year. In this study, this calibration could only be applied of histogram shape upon ensemble spread can be to the SLAF ensemble since the SMES model outputs identified: for cases where the ensemble dispersion is were not available from the previous year. SLAF fore- high, the rank histogram denotes an underdispersive casts starting on 22 October 2005 and ending on 27 ensemble (most of the verifications lie in the lower November 2005 were used to perform the static cali- ranks). This behavior is similar to that observed in the bration. Notice that the period for calibration is not SLAF ensemble rank histogram (not shown) and also to exactly the same as the period used for the evaluation. that reported in previous studies, and suggests a rela- This limitation is related to the availability of model tionship between the ensemble spread and the model data and could not be handled in any other way. error. For the low ensemble dispersion (high agreement On the other hand, in the dynamic calibration the among the forecasts; Figs. 2a and 2e), the probability of rank histograms are computed using a fixed number of the truth falling outside the ensemble members (bins 0 forecasts verifying x number of days prior to the ac- or 17) is low, indicating a low probability of large errors. tual forecast day. In this case, 1000 verified forecasts are As the ensemble spread increases, the probability of the used to construct each rank histogram, which is equiv- truth falling outside the ensemble range grows, indi- alent to 15–20 days prior to the initialization time of the cating an increase in the probability of larger errors in ensemble forecast. The exact number of days depends the precipitation forecast. However, as discussed in on the availability of previous forecasts. This time HC98, a strong relation exists between model error and window is similar to that used by Silva Dias et al. (2006) the ensemble mean precipitation amount: increasing

Unauthenticated | Downloaded 09/29/21 10:36 PM UTC FEBRUARY 2009 R U I Z E T A L . 325

TABLE 2. Summary of the different ensemble systems and calibration strategies adopted in this work and their corresponding training and verification periods.

PQPF Type of calibration Training period (*) Verification period SLAF ensemble Static and dynamic 22 Oct–27 Nov 2005 28 Oct–31 Dec 2006 SLAF ensemble mean Static and dynamic 22 Oct–27 Nov 2005 28 Oct–31 Dec 2006 SLAF control Static and dynamic 22 Oct–27 Nov 2005 28 Oct–31 Dec 2006 SMES ensemble Dynamic — 18 Oct–31 Dec 2006 SMES ensemble mean Dynamic — 18 Oct–31 Dec 2006 SMES BEM Dynamic — 18 Oct–31 Dec 2006

* The training period is only indicated for the static calibration. forecasted precipitation leads to larger ensemble spread selected scores). For this particular period the BEM was and larger ensemble error. To test the ability of the the operational run of the Eta Model at CPTEC with a ensemble spread to forecast model skill, the depen- horizontal resolution of 20 km. dence of the ensemble spread and error upon the en- Figure 3 shows the relation between PoP and fore- semble mean was removed. After this procedure the casted accumulated precipitation above different correlation between the spread and error was small (i.e., thresholds (0.254, 2.54, 12.7, and 50.8 mm), for the pe- correlation values around 0.02) for both ensemble systems. riod between 1 October and 31 December 2006 over the c. Construction of probability forecasts from SR for the SLAF control run and for the SLAF en- single runs semble mean. A strong relationship exists between both variables, and similar results are obtained for the SMES As has been discussed by HC98, the ensemble mean BEM and ensemble mean (not shown). As can be seen can be used to generate a reliable and competitive from Fig. 3, the PQPF obtained from the ensemble PQPF forecast. This method relies on the existence of a mean possesses slightly better resolution than the PQPF relation between forecasted precipitation and the derived from the control run, as inferred from the wider probability of rainfall above a certain threshold, as in a range in observed precipitation frequency. conditional probability calculation. Here, a technique d. Scores selected for verification similar to that documented by HC98 has been employed: the forecasted precipitation is divided into There are a variety of attributes that contribute to several categories and for each category a sample of forecast quality (Murphy 1993). Our choice of scores is forecasts and observations is obtained. Using this sam- such that at least three of these attributes can be ple, the probability for each precipitation threshold quantified: reliability, skill, and resolution. As stated within each category can be evaluated (e.g., what is the before, the main goal of the calibration process is to probability of rainfall above 2.5 mm given that the increase PQPF reliability. Reliability diagrams are used forecasted precipitation is between 20 and 30 mm?). in this paper to study the relation between the fore- After the computation of the probability of each casted probability and the observed frequency of an threshold at each precipitation category, a curve can be event. Usually, these diagrams are complemented with fitted to determine the probability of precipitation a measure of forecast resolution, given by the number of (PoP) above a certain threshold as a function of the times that each probability is forecasted by the PQPF forecasted precipitation. As the PoP is obtained directly system. A desirable characteristic of a PQPF is that from previous data, the dynamic and static strategies most of the time the forecasted probability is near 0% or will also be applied to this PQPF. 100%, meaning that the forecast has the ability to dis- This methodology can be applied not only to the en- criminate between the occurrence and nonoccurrence semble mean but also to individual model runs (Gallus of the event. Another common measure of reliability and Segal 2004). In this work, PQPFs from both en- and resolution is given by the Brier skill score (BSS; semble means (SLAF and SMES) as well as from the Buizza et al. 2005), defined in Eq. (1), where N is the

SLAF control run (CR) and the SMES best ensemble total number of observations, pi is the predicted prob- member (BEM) were obtained (see Table 2 for a syn- ability, and oi is 1 when the event did occur and 0 when thesis of the PQPFs analyzed throughout this study). the event did not occur. In addition, BR is the Brier The BEM has been determined to be the individual score (Brier 1950) [see Eq. (2)] of the forecast and member with the highest relative operating character- BRclim stands for the BR of the climatology. According istic (ROC) area, best equitable threat score (ETS), and to Hamill and Juras (2006) the BR of the climatology bias (see subsequent section for a brief discussion of should be computed by taking into account the spatial

Unauthenticated | Downloaded 09/29/21 10:36 PM UTC 326 WEATHER AND FORECASTING VOLUME 24

FIG. 3. Probability of precipitation above different thresholds as a function of forecasted precipitation amount (mm) as calculated from the (a) SLAF control and (b) SLAF ensemble mean for the 0.254- (black solid line), 2.54- (black dashed line), 12.7- (open circles), and 50.8- mm (triangles) precipitation thresholds during the period between 1 Oct 2006 and 31 Dec 2006 over the SR. A logarithmic scale is used in the abscissas.

and temporal variabilities of the climatology. In the 1 2 1 2 BR 5 å nkðf k okÞ å nkðok oÞ 1 oð1 oÞ. present case, regions of 2.5832.58 were used and 3 yr of N k N k precipitation observations corresponding to the same (3) time of the year were taken into account to compute the BR of the climatology. This 3-yr period includes the To compute the BR decomposition, the forecasted year of the experiment. To take into account the tem- probability range should be divided into k ranks (10 in poral variability of the climatology, this computation this study). At each rank, fk is the forecasted probability was performed for each month of the period under at the center of the kth rank, ok is the observed prob- consideration. The BR of the climatology for each ability at the rank k, and nk the number of times that the month and each region was averaged over the NR and forecast falls in the kth rank. In addition, N is the size of the SR using weights proportional to the amount of data the full verification sample and o is the observed prob- available. The BSS ranges from minus infinity to one, ability over the full sample. where one is the perfect score and zero indicates that Skill is assessed through the equitable threat score the performance of the PQPF is similar to that of the (ETS; Schaefer 1990). This score accounts for the climatology (i.e., the forecast has no added value). As number of hits by chance and ranges from 21/3 to 1, the stated in Buizza et al. (2005), the BR can be partitioned latter being the value for a perfect forecast. As de- into three components as shown in Eq. (3), where the scribed by Hamill (1999), the ETS is dependent on the first term of the r.h.s. of Eq. (3) is the reliability, the forecast bias (computed as the ratio between the fore- second term is the resolution, and the last one the un- casted frequency of an event and its observed fre- certainty. While the uncertainty component depends only quency) in such a way that the ETS of a forecast can be on the characteristics of the climatology (i.e., the proba- increased or decreased by only modifying the bias, so bility of exceedance of a particular precipitation thresh- that the ETS of two forecasts with significantly different old), the other two components depend on the forecast: biases cannot be compared. In the present case, given that an ETS for each probability threshold can be con- BR BSS 5 1 , (1) structed, and each ETS diagram will have a different bias, BRclim the limitation discussed by Hamill (1999) is potentially N worse. To solve this problem, an ETS–bias diagram is 1 2 used in this paper, which allows for a more meaningful BR 5 å ðpi oiÞ , and (2) N i51 comparison of the ETS from the different PQPFs.

Unauthenticated | Downloaded 09/29/21 10:36 PM UTC FEBRUARY 2009 R U I Z E T A L . 327

FIG. 4. Reliability diagrams for the 2.54-mm precipitation threshold. Uncalibrated forecast (Dark gray solid line) and its confidence limits (dark gray dashed line), dynamic calibration (black solid line), static calibration (black dashed line, where available), ensemble-mean-based PQPF (open circles), and single-model-forecast-based PQPF (triangles) for (a) a 24-h forecast for the SLAF over the NR, (b) a 24-h forecast for the SMES over the NR, (c) a 24-h forecast over the SR for the SLAF, and (d) a 24-h forecast over the SR for the SMES. The inset shows the frequency of forecasts as a function of the forecasted probability. No-resolution and perfect- reliability lines are indicated in pale gray solid line.

A bootstrap resampling technique to test the statis- The ensemble-mean-, the CR-, and the BEM-based tical significance of the difference between the scores PQPFs also produce significant increases in forecast obtained with the uncalibrated PQPF—derived from reliability compared with the uncalibrated PQPFs for SLAF and SMES—and the rest of the PQPFs has been the 2.54 mm threshold. employed using a 90% confidence level and a sample The SMES performs better over both the SR and the size of 1000 (Hamill 1999). NR, but differences between the two ensemble systems seem to be more evident over the NR where the max- imum reliable probability for the SLAF is around 50% 3. Results while for the SMES it shifts to 70%, indicating a sig- nificant improvement in the resolution for the SMES. a. Reliability of the PQPFs This superiority of the superensemble approach may Reliability diagrams have been created for several be due to its better estimation of the precipitation thresholds as well as for 24- and 48-h forecasts, but only probability density function (PDF) over the NR (even a sampling of these results are included in this section. more than over the SR) because it includes perturbations Figure 4 shows the reliability diagrams corresponding to in the model physics. It is also possible that the configu- the 2.54-mm threshold: in all cases the different calibra- ration of the WRF model used for the SLAF experiment tions seem to improve the relation between the forecasted may not be adequate for rainfall prediction over tropical probability and the observed frequency of the event, in regions, thus degrading the quality of the PQPF forecast. close agreement with what has been found by HC98 and Given that the SMES uncalibrated forecast resolution is Eckel and Walters (1998). When the static and dynamic higher than that of the SLAF (the flatter the curve in the calibrations are compared (black solid and dashed lines in reliability diagram, the less resolution it has), there is a Figs. 4a and 4c), the differences appear to be small. smaller impact from calibration on the SMES reliability.

Unauthenticated | Downloaded 09/29/21 10:36 PM UTC 328 WEATHER AND FORECASTING VOLUME 24

FIG. 5. BSS as a function of the precipitation threshold (mm) for an uncalibrated forecast (dark gray solid line) and its confidence limits (dark gray dashed line), dynamic calibration (black solid line), static calibration [black dashed lines in (a) and (c) only], ensemble-mean- based PQPF (open circles), and single-model-forecast-based PQPF (triangles) for (a) a 24-h forecast for the SLAF over the NR, (b) a 24-h forecast for the SMES over the NR, (c) a 24-h forecast over the SR for the SLAF, and (d) a 24-h forecast over the SR for the SMES.

The forecasts’ frequency distributions are also very dicating the occurrence of an event, there is still a rel- different (see the small plots in the upper-left corner of atively large probability that the event will not occur. each figure). Over the SR there is a maximum frequency Consistently, the calibration acts to reduce this ten- at low probability values (most of the time the forecasts dency. This effect is evident not only for the SMES- and indicate near-zero values of PoP) while over the NR the SLAF-calibrated PQPFs but also for the CR and BEM shape of the curve for the uncalibrated forecast is sim- PQPFs. The results shown in Fig. 4 are representative of ilar but there is a significant increase in the frequency of the behavior at 48-h forecasts too. In general, the forecasts with probability values around 50%. This be- maximum reliable probability decreases with increasing havior could be expected since the PoP above 2.54 mm is forecast lead time. For example, for the SMES at the higher over the NR (tropics) than over the SR (subtropics 2.54-mm threshold, the maximum reliable forecasted and midlatitudes), and also the skill of the precipitation probability is reduced around 10% over the two regions. forecast is less over the NR than over the SR, both factors Figure 5 shows that statistically significant improve- contributing to the loss of resolution over the NR. ment can be found in the BSS for the calibrated PQPFs, As can be seen, the calibration in most cases reduces except for the SMES over the SR. As can be seen, the 5% the number of forecasts with high probability and in- significance interval—for the uncalibrated forecasts—gets creases the frequency of values around 50%. As a wider as the precipitation threshold increases, indicating whole, the effect of the calibration upon the forecast that the results become less stable due to the small distribution is to reduce their resolution. As explained number of high-precipitation events. The lack of im- by Eckel and Walters (1998), the uncalibrated forecasts provement in the calibrated SMES over the BEM PQPFs overestimate higher probabilities. This means that (open triangles in Figs. 5b and 5d) suggests that the in- usually, even when all ensemble members agree in in- clusion of members with lower skill in the ensemble could

Unauthenticated | Downloaded 09/29/21 10:36 PM UTC FEBRUARY 2009 R U I Z E T A L . 329 be degrading the PQPF. In the case of the SLAF en- PQPF (cf., e.g., the uncalibrated reliabilities between semble, the PQPF derived from the ensemble shows a Figs. 6c and 6d). The improvement achieved with the significant improvement over the control PQPF although different calibration methodologies is quite similar; the overall performance of the WRF ensemble is less than however, some significant differences arise for the the BEM- and the SMES-calibrated PQPFs. The combi- SLAF ensemble over the NR and for the lowest nation of these results suggests that significant improve- threshold of the SR, indicating a slightly better level of ment can be achieved by selecting the BEM and performance for the calibration based on the ensemble performing a single-model ensemble based on this partic- mean and the control forecast. ular model. If the improvement of the SLAF technique With respect to the resolution component of the BR, applied to this model is the same as that observed for the Fig. 7 shows that calibration has little or no impact on this WRF ensemble, then such an ensemble would outper- attribute. This is because the different calibration algo- form the SMES over the SR and the NR. rithms were developed to improve reliability. In Fig. 7 Figure 5 also shows that the BSSs are better for the some interesting differences between the SMES and the southern region. The uncalibrated forecasts over the SLAF ensembles arise. For the NR (Figs. 7a and 7b), the NR exhibit negative values of BSS at lower thresholds best results in terms of resolution are obtained with the for both ensemble systems. The results also show that SMES ensemble, which also shows an improvement over the PQPF derived from the ensemble mean has the the BEM. The SLAF ensemble shows less resolution highest BSS at the lowest threshold, while its perfor- than the SMES and is closer to the resolution of the CR. mance is quite similar to that of the calibrated ensemble As has been shown by Ruiz et al. (2006a), the SLAF PQPF. The CR- and BEM- (open triangles in Fig. 5) ensemble shows little perturbation growth over the NR derived PQPFs also show good performances. This and a weaker correlation between ensemble spread and confirms the potential for single deterministic forecasts forecast error. This could be indicating that the SLAF in the generation of a PQPF and suggests that forecast technique is not as effective as the superensemble over improvements introduced by the use of ensemble the NR. There is also a significant difference between the techniques should be assessed with respect to these low- BEM and the SLAF CR over the NR, which could be cost PQPFs, and not just with the standard deterministic expected given that the BEM (Eta Model at CPTEC at yes–no forecasts, which produces far worse results (not 20-km resolution) has better skill than the WRF model shown). This also applies to the measures of skill, as will (our CR). Over the SR, the resolution of the SLAF en- be discussed later in this paper. BSS also confirms the semble is slightly better than for the SMES; moreover, results obtained from the reliability diagrams with re- the SMES ensemble shows no significant resolution im- spect to the low sensitivity of calibration to the use of provement with respect to the BEM, while the SLAF the dynamic or the static approach (see Figs. 5a and 5c), ensemble shows a large improvement with respect to the with a tendency to better performance in the latter. This CR and also a slight improvement with respect to the might be because the static calibration uses a larger SMES BEM. This suggests that for the SR the SLAF dataset to construct the rank histogram, which results in technique improves forecast resolution, providing a bet- more stable rank histograms. Some additional tests have ter PQPF. On the other hand, the SMES technique been performed, increasing the number of events con- produces more reliable forecasts over this region; how- sidered for the dynamic calibration (i.e., increasing the ever, the SLAF can achieve similar results after calibra- number of verification observations to 2000 instead of tion. The dissimilar behavior of these ensembles systems 1000 shows similar behavior; still, differences between could be due to the fact that the SLAF ensemble takes the dynamic and the static approaches remain small; not into account the ‘‘errors of the day.’’ These kinds of shown). The differences between the static and dynamic perturbations will have greater impact in areas where the calibrations are also small for the case of PQPFs derived flow is more affected by westerly waves as is the case for from the ensemble mean and the deterministic forecasts the SR. (SLAF control and BEM). b. Skill of the PQPFs As described before, the BR is a combined measure of reliability and resolution. The reliability component of Model skill was assessed with the ETS–bias combined the BR for the different forecasts analyzed in this paper diagram. Figures 8a and 8b show that the maximum is shown in Fig. 6. As expected, calibration of the fore- ETS is reached near a bias of 1. The bias is highly de- cast produces a significant improvement in the reliabil- pendent on the selected probability threshold (not ity, which is more evident for the SLAF ensemble. This shown); in the case of lower-probability thresholds, for is mainly due to the fact that the uncalibrated SMES example, the bias increases because the number of times PQPF is more reliable than the uncalibrated SLAF that the event is forecasted increases.

Unauthenticated | Downloaded 09/29/21 10:36 PM UTC 330 WEATHER AND FORECASTING VOLUME 24

FIG. 6. As in Fig. 5 but for the reliability component of the BR.

PQPFs obtained from the ensemble outperformed been repeated using the subset of days where both en- those derived from the deterministic forecast at 24-h semble systems were available, and the conclusions were (Fig. 8a) and 48-h (not shown) forecast time. However, essentially the same. if the ETS were computed independently from the The quality assessment performed for the different bias, the value of both QPFs would have been similar, PQPFs analyzed in this paper shows that the calibration particularly for lower precipitation thresholds. This is process of the ensemble-derived PQPFs has little or no because the ensemble mean has a wet bias at lower impact upon forecast skill; however, an improvement in precipitation thresholds, and this wet bias reduces its the forecast reliability has been found. The dynamic corresponding ETS when used as a deterministic approach has a skill level similar to that of the static forecast. The ensemble mean skill is similar to that of method but is not as effective as the static approach in the calibrated PQPFs but the control is much poorer. generating a reliable PQPF. One of the reasons for this InthecaseoftheSMESensembleovertheSR,the particular behavior might be that the rank histogram difference in skill between the ensemble mean and used to perform the calibration in the case of the the BEM is small. This indicates less improvement by dynamic calibration is derived using a smaller sample. theSMESwithrespecttoasinglerun,whichisin The use of precipitation estimates as a possible tool to agreement with the results obtained for the ensemble solve this problem will be addressed in the next sub- reliability. section. From Fig. 8 it can also be seen that the calibration c. Impact of using CMORPH data for the calibration process does not affect model skill, which confirms that the selected methodologies lead to an increase in forecast The rain gauge network over South America is very reliability without changing the skill, as the differences sparse, particularly over the central part of the northern between the ETSs for the calibrated and uncalibrated region where there are few gauges available for verifi- forecasts are small. cation and forecast calibration (see Fig. 1). In a dynamic We note that the results so far were obtained by in- framework this means that to reach the number of ob- cluding all available days for each ensemble system, over servations required to generate a stable rank histogram, slightly different time periods (Table 2). The analysis has the training period must be extended. The restrictions

Unauthenticated | Downloaded 09/29/21 10:36 PM UTC FEBRUARY 2009 R U I Z E T A L . 331

FIG. 7. As in Fig. 5 but for the resolution component of the BR. imposed by a limited dataset affect not only the length counted for. Consequently, it is important to explore the of the training period but also the categories used to potential of high-resolution, wide geographical cover- classify the degree of the ensemble spread as well as the age data like CMORPH precipitation estimates (Joyce number of different weather regimes that can be ac- et al. 2004) in the calibration of ensemble forecasts.

FIG. 8. ETS vs bias diagram for the 2.54-mm threshold over the SR for (a) a 24-h SLAF forecast and (b) a 24-h SMES forecast: uncalibrated forecast (gray solid line), dynamic calibration (black solid line), static calibration (black dashed line in a only), ensemble mean PQPF (open circles), deterministic forecast PQPF (triangles), and individual members (crosses).

Unauthenticated | Downloaded 09/29/21 10:36 PM UTC 332 WEATHER AND FORECASTING VOLUME 24

FIG. 9. Rank histogram using GTS data (gray bars), CMORPH data (black bars), and the difference between them (GTS 2 CMORPH) (black line) for the 24-h forecast and for the low- spread case over the SR.

Only locations where both GTS and CMORPH data higher frequencies at higher ranks for this particular were available were included in the new training dataset subset, consistent with the wet bias of the CMORPH obtained for the 22 October–30 November 2006 period. estimates. Results for the NR and for other ensemble This was done to guarantee that the same amount of spreads exhibit a similar behavior (not shown). data was used to construct the training datasets. Given this result, a static calibration has been tested Although this is far from the optimal use of CMORPH for the SLAF ensemble and a PQPF derived from the data, in the sense that much of the CMORPH data SLAF CR has also been computed using CMORPH is excluded, it is the only way to perform a controlled as the verifying truth. Figure 10 shows the BSS for the comparison between the GTS and CMORPH new PQPFs and that obtained with the GTS data data. Only a static calibration has been performed since and its corresponding 5% confidence limits (as has it has been shown that both strategies lead to similar been used in previous sections) over the NR and the results. SR. Differences between GTS and CMORPH cali- As has been documented for other regions (Ebert bration for the SLAF ensemble are only significant et al. 2007), CMORPH data exhibit a tendency to over the SR for the 0.254-mm threshold. Above this overestimate precipitation amounts over South Amer- threshold, GTS seems to perform slightly better, par- ica (e.g., Ruiz et al. 2006b; RRBM and it is of particular ticularly at higher values. Smaller differences are interest to see whether this tendency affects the cali- found in the case of PQPFs derived from deterministic bration process. We first explore the impact of using forecasts. This figure suggests that although there are CMORPH or GTS in the construction of the rank his- conditional biases in CMORPH data estimations over togram, provided that the calibration of the PQPFs is the South American region, their use for PQPF cali- based on these rank histograms. Figure 9 shows the rank bration does not produce significant negative impacts. histogram obtained using the GTS data and the These results encourage a wide range of experiments CMORPH data in the calibration process and the dif- that should take advantage of the increased amount ference between them for the low-spread subset and for of information that is available from CMORPH data. the SLAF ensemble over the SR. The differences are This should lead to better calibration options, includ- small, less than 10%, with CMORPH data producing ing more ensemble spread categories and a better

Unauthenticated | Downloaded 09/29/21 10:36 PM UTC FEBRUARY 2009 R U I Z E T A L . 333

FIG. 10. BSS for the SLAF ensemble over the (a) NR and (b) SR as a function of the precipitation threshold (mm). Shown are the static calibration using GTS (gray continuous line) and its 5% confidence limits (gray dashed lines), the static calibration using CMORPH (black continuous line), and the static calibration of the control forecast using GTS (open circles) and using CMORPH (triangles). representation of the diverse precipitation regimes than that of the calibrated ensembles based on rank that characterize South America. histograms. This is one key reason to adopt the ensemble strategy to improve forecast quality. The question that arises from this result is how much of the 4. Concluding remarks and future research potential of the ensemble mean to generate a PQPF The performance levels of several PQPFs, obtained is due to the ensemble information contained in this from two alternative ensemble systems with and without particular forecast and how much is due to the poten- calibration and also derived from single-model runs, have tial of PQPF generation from a deterministic fore- been tested. These experiments were designed to explore cast. To answer this question, a similar calibration the strengths and weaknesses of PQPFs over the region process was performed to compute a PQPF from two and to achieve a better understanding of the regional deterministic forecasts: the control (CR) of the SLAF ensemble forecasting potential over South America. ensemble and the best ensemble member (BEM) of One of the main conclusions derived from this as- the SMES. The results show that the performance of sessment is that calibration improves forecast quality, these two deterministic forecasts is lower than the en- leading to more reliable and skillful forecasts. This re- semble mean considering mainly resolution and forecast sult seems robust in the sense that it holds for both skill. It is important to mention that this result has been ensemble systems, and for the northern and southern obtained while taking into consideration forecast biases regions. Also, it has been shown that when using the when assessing forecast skill, through combined ETS–bias relation between forecasted accumulated precipitation diagrams. and PoP, a useful PQPF can be obtained from single Still, the potential of using calibrated deterministic model runs. These derived PQPFs are, essentially, forecasts to generate PQPFs is an important issue and forecasts calibrated using the conditional probability of should be taken into account when the ensemble’s precipitation occurrence given a certain precipitation performance is compared against a single deterministic forecast, and these forecasts have also significantly forecast, particularly when comparing resolution. Fu- outperformed the uncalibrated forecasts. ture work will be oriented to the assessment of the po- In agreement with the results for other regions (and tential economic value of PQPFs derived from several other ensemble systems) it has been found that the strategies. ensemble mean as a ‘‘single’’ run has significant po- For the multimodel SMES ensemble there is a lack of tential for the generation of valuable PQPFs, since the improvement in the reliability with respect to the BEM skill of these forecasts proved to be as good as or better particularly over the SR. This might be indicating that

Unauthenticated | Downloaded 09/29/21 10:36 PM UTC 334 WEATHER AND FORECASTING VOLUME 24 the inclusion of lower skill members reduces the reso- tial energy. Also a dry bias evident in 48-h precipita- lution of the ensemble-derived PQPFs. One possible tion forecasts has been observed. Analysis of the WRF solution to this problem is the application of a calibra- model bias in the tropics and over the southern region tion algorithm that takes into account the skill of the is under way. individual members of the ensemble, such as the one Finally, we explored the impacts of using CMORPH developed by Raftery et al. (2005) based on Bayesian precipitation estimates upon the reliability and skill of model averaging and applied by McLean et al. (2007) to the calibrated PQPFs. The results show only a very PQPF calibration. slight degradation of the forecast reliability measured One of the points raised in this work concerns the through the Brier skill score. This means that more use of a dynamic calibration versus a static calibration. complex calibration strategies, like introducing more The former seems to be more attractive in the sense categories to better describe the dependence of the rank that it can capture the recent weather regime (i.e., last histogram with ensemble spread and with the precipi- 15–20 days) and is less affected by model changes tation regime, could be explored using the CMORPH compared with the static calibration that makes use of precipitation estimates. However, the conditional bias forecasts issued in previous periods (e.g., same month in the CMORPH data should be removed, as in RRBM from a year before). No significant differences between to obtain a useful calibration. either of the strategies have been found in any of the It should be mentioned that the PQPF perfor- attributes that help to quantify forecast quality. Still, mance scores obtained in this study are comparable this issue should be further explored using a larger en- to those obtained by several authors over other semble dataset and a denser precipitation dataset, which regions where observations are considerably more would help to reduce the training period and probably abundant, indicating that PQPFs have potential to would optimize the potential advantages of a dynamic improve short-range precipitation forecasts over calibration. South America. This result is particularly valuable for a With respect to the different ensemble strategies region with a sparse observational network, where used in this paper, both show similar skill and reliability calibration and verification procedures become less re- over the southern (extratropical) region, although the liable. SLAF shows slightly better resolution. However, over the northern (tropical) region, the SMES approach shows better results. The reason may be associated with Acknowledgments. The authors are thankful to Pedro the fact that the SLAF ensemble does not include per- Leite Silva Dias and MASTER for providing the pre- turbations in the physics, whereas the SMES is based on cipitation forecasts of the SMES. Special thanks to the use of different models, and different physical pa- Demerval Soares Moreira for providing assistance in rameterizations may be particularly important in the processing SMES forecasts. The provision of the GFS deep tropics. This is confirmed by the fact that the SLAF forecasts by NCEP is also acknowledged. This study has perturbations (ensemble differences) showed little or no been supported by the following projects: ANPCyT growth over the NR during the first part of the forecast, PICT 2004 25269, UBACyT X155, CONICET PIP 5417, suggesting that the perturbations in the initial condition and GC06-085 from NOAA/OGP/CPPA. The authors over this area have little impact on the forecasts (Ruiz are also thankful to the two anonymous reviewers for et al. 2006a). Small-scale processes (mainly convection) their invaluable comments that helped to substantially are more active and important in driving the regional improve this work. atmospheric circulation over the NR; the greater impact from changes in the parameterization of such processes could explain the greater reliability of the SMES over REFERENCES this region. Also the configuration of the ARW-WRF model Atger, F., 2003: Spatial and interannual variability of the reliability of ensemble-based probabilistic forecasts: Consequences for used in this work is not adequate for the northern re- calibration. Mon. Wea. Rev., 131, 1509–1523. gion. The SMES members based on the Eta Model Brier, G. W., 1950: Verification of forecasts expressed in terms of outperformed the WRF model over the NR as can be probability. Mon. Wea. Rev., 78, 1–3. seen in the differences between the SMES BEM and Buizza, R., P. L. Houtekamer, Z. Toth, G. Pellerin, M. Wei, and the SLAF CR. Some important biases have been Y. Zhu, 2005: A comparison of the ECMWF, MSC, and NCEP global ensemble prediction systems. Mon. Wea. Rev., detected in the WRF model forecasts: for example, a 133, 1076–1097. cold bias in the PBL particularly during warm hours, Chen, F., and J. Dudhia, 2001: Coupling an advanced land surface– which would reduce the lower levels’ available poten- hydrology model with the Penn State–NCAR MM5 modeling

Unauthenticated | Downloaded 09/29/21 10:36 PM UTC FEBRUARY 2009 R U I Z E T A L . 335

system. Part I: Model description and implementation. Mon. COLA Tech. Rep. 51, Center for Ocean–Land–Atmosphere Wea. Rev., 129, 569–585. Studies, Calverton, MD, 46 pp. Dalcher, A., E. Kalnay, and R. N. Hoffman, 1988: Medium range Kousky, V. E., J. E. Janowiak, and R. J. Joyce, 2006: The diurnal lagged average forecasts. Mon. Wea. Rev., 116, 402–416. cycle of precipitation over South America based on Du, J., S. L. Mullen, and F. Sanders, 1997: Short-range ensemble CMORPH. Proc. Eighth Int. Conf. on Southern Hemisphere forecasting of quantitative precipitation. Mon. Wea. Rev., 125, Meteorology and Oceanography, Foz do Iguacxu, Brazil, INPE, 2427–2459. 1113–1116. Dudhia, J., 1989: Numerical study of convection observed during Krishnamurti,T.N.,C.M.Kishtawal,T.E.LaRow,D.R. the Winter Monsoon Experiment using a mesoscale two- Bachiochi, Z. Zhang, C. E. Willford, S. Gadgil, and S. dimensional model. J. Atmos. Sci., 46, 3077–3107. Surendran, 1999: Improved weather and seasonal climate Ebert, E. E., 2001: Ability of a poor man’s ensemble to predict the forecast from multi-model superensemble. Science, 285, probability and distribution of precipitation. Mon. Wea. Rev., 1548–1550. 129, 2461–2480. McLean, J., A. Raftery, T. Gneiting, and C. Fraley, 2007: ——, J. E. Janowiak, and C. Kidd, 2007: Comparison of near- Probabilistic quantitative precipitation forecasting using real-time precipitation estimates from satellite observa- Bayesian model averaging. Mon. Wea. Rev., 135, 3209– tions and numerical models. Bull. Amer. Meteor. Soc., 88, 3220. 47–64. Mellor, G. L., and T. Yamada, 1982: Development of a turbulence Ebisuzaki, W., and E. Kalnay, 1991: Ensemble experiments with a closure model for geophysical fluid problems. Rev. Geophys. new lagged average forecasting scheme. WMO Research Space Phys., 20, 851–875. Activities in Atmospheric and Oceanic Modeling Rep. 15, 308 Mendocxa, A. M., and J. P. Bonatti, 2006: Experiments with pp. [Available from WMO, C.P. No 2300, CH1211, Geneva, EOF-based perturbation methods to ensemble weather Switzerland.] forecasting in middle latitudes. Proc. Eighth Int. Conf. on Eckel, F. A., and M. K. Walters, 1998: Calibrated probabilistic Southern Hemisphere Meteorology and Oceanography, Foz quantitative precipitation forecasts based on the MRF en- do Iguazu, Brazil, Amer. Meteor. Soc. and INEP, 1829– semble. Wea. Forecasting, 13, 1132–1147. 1832. Freitas,S.R.,K.Longo,M.SilvaDias,P.SilvaDias,R.Chat- Mesinger, F., Z. I. Janjic, S. Nickovic, D. Gavrilov, and D. G. field, A. Fazenda, and L. F. Rodrigues, 2006: The coupled Deaven, 1988: The step-mountain coordinate: Model de- aerosol and tracer transport model to the Brazilian devel- scription and performance for cases of Alpine lee cyclogenesis opments on the Regional Atmospheric Modeling System: and for a case of Appalachian redevelopment. Mon. Wea. Validation using direct and remote sensing observations. Rev., 116, 1493–1518. Proc. Eighth Int. Conf. on Southern Hemisphere Meteorol- Mlawer, E. J., S. J. Taubman, P. D. Brown, M. J. Iacono, and ogy and Oceanography, Foz do Iguazu, Brazil, INPE, S. A. Clough, 1997: Radiative transfer for inhomoge- 101–107. neous atmosphere: RRTM, a validated correlated-k model Gallus, W. A., Jr., and M. Segal, 2004: Does increased predicted for the long wave. J. Geophys. Res., 102 (D14), 16 663– warm-season rainfall indicate enhanced likelihood of rain 16 682. occurrence? Wea. Forecasting, 19, 1127–1135. Molteni,F.,R.Buizza,T.N.Palmer,andT.Petroliagis,1996: Grell, G. A., and D. Devenyi, 2002: A generalized approach to The ECMWF Ensemble Prediction System: Methodol- parameterizing convection combining ensemble and data as- Quart. J. Roy. Meteor. Soc., similation techniques. Geophys. Res. Lett., 29, 1639, doi:10.1029/ ogy and validation. 122, 73– 2002GL015311. 119. Hamill, T., 1999: Hypothesis tests for evaluating numerical pre- Murphy, A. H., 1993: What is a good forecast? An essay on the cipitation forecasts. Wea. Forecasting, 14, 155–167. nature of goodness in weather forecasting. Wea. Forecasting, ——, and S. J. Colucci, 1997: Verification of Eta–RSM short-range 8, 281–293. ensemble forecasts. Mon. Wea. Rev., 125, 1312–1327. Raftery, A., T. Gneiting, F. Balabdaoui, and M. Polakowski, 2005: ——, and ——, 1998: Evaluation of Eta–RSM ensemble pro- Using Bayesian model averaging to calibrate forecast en- babilistic precipitation forecasts. Mon. Wea. Rev., 126, 711– sembles. Mon. Wea. Rev., 133, 1155–1174. 724. Ruiz, J. J., A. C. Saulo, and E. Kalnay, 2006a: A regional ensemble ——, and J. Juras, 2006: Measuring forecast skill: Is it real skill or is forecast system for southeastern South America: Preliminary it the varying climatology? Quart. J. Roy. Meteor. Soc., 132, assessment. Proc. Eighth Int. Conf. on Southern Hemisphere 2905–2923. Meteorology and Oceanography, Foz do Iguazu, Brazil, INPE, Hou, D., E. Kalnay, and K. K. Droegemeier, 2001: Objective 1977–1984. verification of the SAMEX’98 ensemble forecast. Mon. Wea. ——, ——, Y. Garcı´a Skabar, and P. Salio, 2006b: Representation Rev., 129, 73–91. of a Mesoscale Convective System using the RAMS model (in Joyce, R. J., J. E. Janowiak, P. A. Arkin, and P. Xie, 2004: Spanish). Meteorologica, 31, 13–36. CMORPH: A method that produces global precipita- ——, L. Ferreira, and A. C. Saulo, 2007: WRF-ARW sensi- tion estimates from passive microwave and infrared data at tivity to different planetary boundary layer parameteriza- high spatial and temporal resolution. J. Hydrometeor., 5, tion over South America. Research Activities in Atmospheric 487–503. and Oceanic Modelling: CAS/JSC Working Group in Kalnay, E., M. Kanamitsu, and W. E. Baker, 1990: Global nu- Numerical Experimentation (WEGNE) Blue Book, WMO merical weather prediction at the National Meteorological 11–12. Center. Bull. Amer. Meteor. Soc., 71, 1410–1428. Saulo, A. C., and L. Ferreira, 2003: Evaluation of quantitative Kinter, J. L., III, and Coauthors, 1997: The COLA Atmosphere– precipitation forecasts over southern South America. Aust. Biosphere general circulation model. Volume 1: Formulation. Meteor. Mag., 52, 81–93.

Unauthenticated | Downloaded 09/29/21 10:36 PM UTC 336 WEATHER AND FORECASTING VOLUME 24

Schaefer, J. T., 1990: The critical success index as an indicator of Toth, Z., and E. Kalnay, 1993: Ensemble forecasting at NMC: The warning skill. Wea. Forecasting, 5, 570–575. generation of pertubations. Bull. Amer. Meteor. Soc., 74, Silva Dias, P. L., D. Soares Moreira, and G. D. Neto, 2006: 2317–2330. The MASTER Model Ensemble System (MSMES). Proc. ——, and ——, 1997: Ensemble forecasting at NCEP and the Eighth Int. Conf. on Southern Hemisphere Meteorology breeding method. Mon. Wea. Rev., 125, 3297–3319. and Oceanography, Foz do Iguacxu, Brazil, INPE, 1751– Zhang, F., C. Snyder, and R. Rotunno, 2002: Mesoscale predict- 1757. ability of the ‘‘surprise’’ snowstorm of 24–25 January 2000. Skamarock, W. C., J. B. Klemp, J. Dudhia, D. O. Gill, D. M. Mon. Wea. Rev., 130, 1617–1632. Barker, W. Wang, and J. G. Powers, 2005: A description of ——, M. Odins, and J. W. Nielsen-Gammon, 2006: Mesoscale the Advanced Research WRF Version 2. NCAR Tech. Note predictability of an extreme warm-season precipitation event. NCAR/TN-468+STR, 88 pp. Wea. Forecasting, 21, 149–166. Stensrud, D., and N. Yussouf, 2007: Reliable probabilistic quan- Zhu, Y., Z. Toth, R. Wobus, D. Richardson, and K. Mylne, 2002: titative precipitation forecasts from a short-range ensemble The economic value of ensemble-based weather forecasts. forecasting system. Wea. Forecasting, 22, 3–17. Bull. Amer. Meteor. Soc., 83, 73–83.

Unauthenticated | Downloaded 09/29/21 10:36 PM UTC