<<

1APRIL 2002 NOTES AND CORRESPONDENCE 793

NOTES AND CORRESPONDENCE

Climate Predictions with Multimodel Ensembles

VIATCHESLAV V. K HARIN AND FRANCIS W. Z WIERS Canadian Centre for Climate Modelling and Analysis, Meteorological Service of Canada, Victoria, British Columbia, Canada

2 January 2001 and 2 November 2001

ABSTRACT Several methods of combining individual forecasts from a group of climate models to produce an ensemble forecast are considered. These methods are applied to an ensemble of 500-hPa geopotential height forecasts derived from the Atmospheric Model Intercomparison Project (AMIP) integrations performed by 10 different modeling groups. Forecasts are veri®ed against reanalyses from the European Centre for Medium-Range Weather Forecasts. Forecast skill is measured by means of error variance. In the Tropics, the simple ensemble mean produces the most skillful forecasts. In the extratropics, the regression-improved ensemble mean performs best. The ``superensemble'' forecast that is obtained by optimally weighting the individual ensemble members does not perform as well as either the simple ensemble mean or the regression-improved ensemble mean. The sample size evidently is too small to estimate reliably the relatively large number of optimal weights required for the superensemble approach.

1. Introduction to long-range of the monthly mean tropical The skill of climate predictions is limited by internal Paci®c sea surface . In a recent paper, De- atmospheric variability that is largely unpredictable be- rome et al. (2001) discussed a linear multivariate method yond the deterministic predictability limit of about two for blending climate forecasts produced by two dynam- weeks (e.g., Lorenz 1982). A standard approach that is ical models. used to reduce climate noise in model predictions is to Ensembles of predictions can be used for determin- average an ensemble of forecasts initiated from different istic or probability forecasting. Doblas-Reyes et al. initial conditions (e.g., Kharin et al. 2001, and refer- (2000) investigate the performance of multimodel cli- ences therein). mate predictions produced by three general circulation With the availability of climate predictions produced models and ®nd that the multimodel approach offers a by several dynamical models, multimodel ensemble systematic improvement when using the ensemble to forecasting has drawn some attention recently. In par- produce probabilistic forecasts. They ®nd that the mul- ticular, two recent studies (Krishnamurti et al. 1999, timodel ensemble improves skill only marginally when 2000) discuss a superensemble approach, in which a verifying the ensemble mean, however. On the other multimodel linear regression technique is used to im- hand, Krishnamurti et al. (2000) ®nd an apparent sys- prove deterministic forecasts locally. tematic improvement in mean square error for a mul- Other studies have also used linear methods to pro- timodel forecast over that of the individual model fore- duce better predictions by combining several indepen- casts. dent forecasts. For example, Danard et al. (1968) and With the increasing popularity of multimodel ensem- Thompson (1977) showed that the mean square error of ble forecasting, it is important to understand its virtues forecasts constructed from a particular linear combi- and limitations. In this paper, we consider deterministic nation of two independent predictions is less than that forecasting only. The goal is to evaluate the performance of the individual predictions. Fraedrich and Smith of several forecasts constructed from a multimodel en- (1989) discussed a linear regression method to combine semble of forecasts, as measured by forecast error var- two statistical forecast schemes and applied this method iance. We use monthly 500-hPa geopotential height data from several model simulations of the 1979±88 period produced for the Atmospheric Model Intercomparison Corresponding author address: Francis W. Zwiers, Canadian Cen- tre for Climate Modelling and Analysis, University of Victoria, P.O. Project (AMIP; Gates 1992). These integrations can be Box 1700, Stn CSC, Victoria, BC V8W 2Y2, Canada. viewed as two-tier climate forecasts (Bengtsson et al. E-mail: [email protected] 1993) in which the lower boundary condition predic-

᭧ 2002 American Meteorological Society

Unauthenticated | Downloaded 09/28/21 04:12 AM UTC 794 JOURNAL OF CLIMATE VOLUME 15

tions are error free and in which initial conditions at the where Cov(Xi, Xk) and Cov(Xk, Y) are the covariances beginning of each forecast period are not speci®ed. and cross covariances between the model forecasts and The language that is used to describe ensemble-fore- the model forecasts and verifying observations Y, re- casting schemes can be confusing because similar words spectively. are used in the literature to describe schemes with sev- Some studies (e.g., Krishnamurti et al. 2000) have eral different con®gurations. The ensemble of forecasts interpreted the regression coef®cients as indicators of may have been produced with a single dynamical model, the relative model ``reliability.'' However, this interpre- or it may consist of a number of forecasts, each produced tation is not generally correct. To appreciate the dif®- with a different model. Krishnamurti et al. call the latter culty of doing so, consider a simple example in which a superensemble. However, this nomenclature appro- there are two model forecasts X1 and X 2, one that sys- priates a that more aptly describes ensemble-fore- tematically oversimulates and the other that undersi- casting schemes in which each of several models is used mulates the amplitude of the predictable signal ␤ in the to produce an ensemble of forecasts. In this paper, we observations Y. In particular, suppose use the term ensemble to denote a collection of fore- casts, each of which is produced with a different model. Y ϭ ␤ ϩ ⑀, X11ϭ 0.5␤ ϩ ⑀ , and The plan of the paper is the following. First we de- X ϭ 1.5␤ ϩ ⑀ . scribe a general linear regression method that will serve 22 as the basis for constructing several improved forecasts. Here ⑀, ⑀1, and ⑀2 represent the unpredictable internal Results are presented as a function of the size of the variability that is present in the observations and in the multimodel ensemble, and we conclude with a sum- corresponding model forecasts. For convenience, we as- mary. sume that these are independent random variables with 2 0 time mean and the same variance␴ ⑀ . The forecasts 2. Multimodel linear regression X1 and X 2 have the same mean square error MSE ϭ 2 22 2 0.5 ␴␴␤⑀ϩ 2 , where ␴ ␤ is the variance of the predict- Let {Xi(t)}, i ϭ 1,...,M denote an ensemble of fore- able signal ␤. The regression coef®cients are given by casts produced by M models at a ®xed location. An ar- 22 2 a1 ϭ ␴␴␤␤/(5 ϩ 2 ␴ ⑀ ) and a 2 ϭ 3a1. Thus, in this ex- bitrary linear combination of these forecasts is given by ample and in general, equally reliable model forecasts M may not necessarily be weighted equally when com- F(t) ϭ a ϩ aX(t). (1) 0 ͸ ii bined optimally. Hasselmann (1979) discusses this prob- iϭ1 lem in greater generality and points out that the optimal The M ϩ 1 coef®cients ai, i ϭ 0,...,M may be subject coef®cients weight the model forecasts so as to maxi- to some constraints, as is discussed below. Within the mize the signal-to-noise ratio. Similar ideas apply to the bounds of those constrains, coef®cients may be chosen optimal signal detection problem (e.g., to minimize the mean square error, Hasselmann 1997; Zwiers 1999). MSE ϵ (F Ϫ Y)22ϭ Var(F Ϫ Y) ϩ B , (2) An attractive property of the multimodel linear re- of the multimodel regression forecast F. Here Y rep- gression forecast Rall is that it is superior to all other resents the verifying observations, the overbar denotes linear combinations of the individual forecasts when time averaging, and B ϭϪFYis the mean bias. De- very large ``training'' datasets are available for esti- pending on the constraints, several versions of this gen- mating the regression coef®cients. In practice, however, eral regression approach are possible, as will be outlined the coef®cients depend on estimates of covariances below. These versions will be identi®ed variously by Cov(Xi, Xk) and Cov(Xk, Y) that must be obtained from symbols C, U, or R depending upon the speci®c con- relatively short datasets. In these circumstances, esti- straints and the extent to which the coef®cients in Eq. mating too many regression coef®cients leads to over- (1) are adjustable. ®tting that causes a degradation in skill (Davis 1976). With no constraints on the coef®cients, Eqs. (1)±(2) Over®tting results in optimistically biased estimates describe the standard multimodel linear regression tech- of forecast skill when skill is assessed with the same nique that is described by Krishnamurti et al. (2000). data that are used to train the regression model. This bias is avoided by assessing skill in veri®cation data We denote this forecast by Rall, where the subscript all is used to indicate that all coef®cients are adjustable and that are independent of the training data. Cross vali- thus that all ensemble members may be weighted dif- dation (Michaelson 1987) is used here, as described ferently by the regression. The regression coef®cients subsequently, to increase the amount of veri®cation data. ai are found by solving M ϩ 1 linear equations M Over®tting can be reduced by imposing constraints a Cov(X , X ) ϭ Cov(X , Y), k ϭ 1,...,M, on the coef®cients in the regression equation and there- ͸ iikk iϭ1 by effectively reducing the number of free parameters

M that must be estimated from the data. One approach is a ϩ aX ϭ Y, to set some coef®cients to 0, thereby reducing the num- 0 ͸ ii iϭ1 ber of models providing forecasts that enter into Eq.

Unauthenticated | Downloaded 09/28/21 04:12 AM UTC 1APRIL 2002 NOTES AND CORRESPONDENCE 795

(1). However, because the forecasts contain chaotic multimodel ensemble mean against the observations.

``noise,'' reducing the multimodel ensemble size will This is equivalent to constraining the coef®cients ai likely increase the noise variance in the linear combi- ϭ a, i ϭ 1,...,M, to be equal. The two unknown nation and thus may cause skill degradation. Other kinds coef®cients a and a0 are estimated as a ϭ Cov(Y, UEM)/ M of constraints that effectively reduce the number of co- [M Var(UEM)] and a 0 ϭϪY a ⌺iϭ1 X i. Skill improve- ef®cients to be estimated may also be imposed. For ment, relative to that of UEM, is achieved by rescaling example, one might assume that the regression coef®- the ensemble mean forecast to reduce the effect of the cients are constant throughout the year. This constraint climate noise in the ensemble mean and to correct reduces the number of coef®cients that must be esti- systematic error in the boundary forced response that mated per unit of data. It also implies a statistical model is common to all models. that does not take the cyclical behavior of climate data • The regression-improved multimodel forecast Rall is into account, however. Nonetheless, the improved fore- obtained by ®tting Eq. (1) to the verifying observa- cast may bene®t from relatively more robust coef®cient tions with no constraints on the regression coef®- estimates when only short training datasets are avail- cients. Skill improvement in this forecast results from able. bias removal, climate noise reduction due to ensemble Several forecast variants derived from Eq. (1) by im- averaging, and the rescaling of the individual fore- posing various constraints on the coef®cients are ex- casts. amined below. Some, such as the climatological fore- • The regression-improved multimodel subset forecast cast, are trivial. Others take advantage of the multimodel Rsubset is obtained by linearly regressing a subset of ensemble. All are unbiased. The mean bias B in Eq. (2) models against the observations. To select the best often contributes substantially to MSE. However, the subset of models we use the Akaike information cri- mean bias can be removed whenever historical model terion (AIC; Akaike 1974) given by AIC ϭ simulations and the corresponding observational records 2 N log(Y Ϫ Rsubset,K) ϩ 2(K ϩ 1), where N is the sam- are available, as is the prerequisite for any statistical ple size, K is the number of models in a subset, and forecast improvement scheme. Rsubset,K is the regression forecast produced by com- We consider the following unbiased forecast variants. bining the K models. The general idea is to penalize the mean square error by the number of the coef®- • The trivial climatological forecast C, or zero-anom- cients that are estimated. Given M models, the total aly forecast, is obtained by setting the coef®cients number of all possible subset combinations is 2M, in- a ,...,a to 0. The only coef®cient to be estimated 1 M cluding a zero-size subset, that is, the climatological is the intercept a , which is given by a ϭ Y. This 0 0 forecast. We use the models for which the AIC is the forecast is the baseline upon which all other pro- spective schemes must improve. smallest in the training period for constructing a lin- • The bias-removed individual forecast U , produced by early improved forecast in the independent veri®cation i period. the ith model, is obtained from Eq. (1) by using ai ϭ 1 and setting all other coef®cients to 0, except for the These seven forecast variants are summarized in Table intercept a 0 ϭϪBi, where Bi ϭ XYi Ϫ is the bias 1 for easy reference. of the ith model. Another potential forecast variant is the ensemble av- • The regression-improved individual forecast Ri is ob- erage of regression-improved individual forecasts (1/M) tained by linearly regressing the ith forecast against M ⌺iϭ1 Ri. However, it is not clear that such a forecast is the observations Y. This is equivalent to setting all optimal. For example, this forecast variant is asymp- coef®cients in Eq. (1) to 0, except for ai and a 0. These totically suboptimal for an ensemble of ``perfect-mod- two coef®cients are estimated as ai ϭ Cov(Y, Xi)/ el'' forecasts Var(Xi) and a 0 ϭϪYXaii. The regression coef®cient a rescales the forecast to correct systematic errors in i Y ϭ ␤ ϩ ⑀, and Xiiϭ ␤ ϩ ⑀ , i ϭ 1,...,M. simulating the atmospheric response to the lower boundary conditions and to minimize the effect of In this case, the regression-improved individual forecast

climate noise in the model forecast on error variance. is given by Ri ϭ ␳Xi where ␳ is the correlation between 2 22 2 The intercept a 0 removes the bias from the rescaled Xi and Y, ␳ ϭ ␴␴␤␤/( ϩ ␴ ⑀). Thus the ensemble average forecast. of the regression-improved individual forecasts asymp- • The bias-removed multimodel ensemble mean forecast totically approximates ␳␤ with increasing ensemble size

UEM is obtained by using ai ϭ 1/M, i ϭ 1,...,M. M. If the signal-to-noise ratio is small, as is normally M The intercept, which is set to a 0 ϭϪY 1/M ⌺iϭ1 X i, the case for monthly and seasonal individual forecasts removes the multimodel mean bias. Skill improve- in midlatitudes, this forecast variant would substantially ments result from the bias removal and from the re- underestimate the amplitude of the optimal forecast ␤. duction of the climate noise by ensemble averaging. As might be anticipated, the ensemble average of re- • The regression-improved multimodel ensemble mean gression-improved individual forecasts did not perform

forecast REM is obtained by linearly regressing the as well as the regression-improved ensemble mean REM

Unauthenticated | Downloaded 09/28/21 04:12 AM UTC 796 JOURNAL OF CLIMATE VOLUME 15

TABLE 1. The constraints and the number of adjustable coef®cients in each forecast variant. The simplest con®gurations have 12 adjustable

parameters, that is, a different value of a0 for each month of the year. Here M is the ensemble size. Details are given in the text. Constraints on No. Parameters

Symbol Description a0, i ϭ 1,...,M adjusted

C Climatological ai ϭ, i ϭ 1,...,M 12

Ui Bias-removed individual ai ϭ 1, aj ϭ 0, j± i 12

Ri Regression-improved individual aj ϭ 0, j ± i 13

UEM Bias-removed ensemble mean ai ϭ 1/M, i ϭ 1,...,M 12 Regression-improved ensemble

REM mean ai ϭ a, i ϭ 1,...,M 13

Rall Regression-improved multi-model None 12 ϩ M

Regression-improved multimodel ai ϭ 0, i ԫ subset of size K (objec- 12 ϩ K,0Յ KՅ M

Rsubset subset tively determined)

for an ensemble of AMIP simulations in our preliminary of models in the forecast Rsubset is also selected inde- tests. We therefore dropped it from consideration. pendently of the withheld year. We use monthly data The performance of the various forecast variants was for the whole year and assume that the regression co- evaluated with the error variance skill score: ef®cients, except for the intercept a 0, are independent Var(F Ϫ Y) of the annual cycle. Allowing the intercept a 0 to depend S ϭ 1 Ϫ . on the calendar month in the linear regression technique Var(Y) is equivalent to taking into account the climatological This score is a natural measure of performance of an annual cycle that may be present in the data. If the unbiased forecast F. It is proportional to error variance coef®cient a 0 were constant, the forecast skill would Var(F Ϫ Y) (and mean square error) rescaled in such a also include contributions associated with the annual way that S ϭ 1 for the perfect forecast and S ϭ 0is cycle, which should not be considered as very useful. for the climatological forecast C. Thus, positive (neg- In principle, the dependence on the annual cycle could ative) values of S indicate that the forecast is more (less) be built into the linear regression technique for all re- skillful than the climatological forecast. gression coef®cients. However, this approach is im- practical for the relatively short AMIP integrations. In- 3. Results deed, given nine years of verifying data in the training period and an ensemble of eight models or more, the We used the 10 10-yr AMIP simulations listed in total number of ®tted coef®cients in the multimodel re- Table 2. Eight of the models are the same as in Krish- namurti et al. (2000). Documentation of the models is gression technique would be equal to, or greater than, found in Phillips (1994). For validation purposes we use the number of data points. Thus, it is imperative to reduce the number of coef®cients, for example, by as- 500-hPa geopotential heights Z 500 from the European Centre for Medium-Range Weather Forecasts suming that the regression coef®cients for each model (ECMWF) reanalysis (Gibson et al. 1997). All data are are independent of the calendar month. interpolated onto a 96 ϫ 48 Gaussian grid. The various In the following we show skill scores averaged over versions of Eq. (1) are ®tted at each point in the grid. two regions, the Tropics (30ЊS±30ЊN) and the Paci®c± The skill scores are estimated with a cross-validation North America sector (PNA; 20Њ±80ЊN, 180Њ±45ЊW). procedure in which one year of data is repeatedly with- Boundary forcing accounts for more than one-half of held from the dataset and the regression coef®cients are the total variability on seasonal timescales in the Tropics estimated from the retained nine years. The best subset (Zwiers 1996; Rowell and Zwiers 1999; Zwiers et al.

TABLE 2. Selected AMIP models. Acronym AMIP group BMRC Bureau of Meterology Research Centre, Melbourne, Australia CCC Canadian Centra for Climate Modelling and Analysis, Victoria, British Columbia, Canada CSIRO Commonwealth Scienti®c and Industrial Research Organization, Mordialloc, Australia ECMWF European Centre for Medium-Range Weather Forecasts, Reading, United Kingdom GFDL Geophysical Laboratory, Princeton, New Jersey LMD Laboratoire de MeÂteÂorologie Dynamique, Paris, France MPI Max-Planck-Institut fuÈr Meteorologie, Hamburg, Germany NMC National Centers for Environmental Prediction, Suitland, Maryland UGAMP The U.K. Universitites' Global Atmospheric Modelling Programme, Reading, United Kingdom UKMO Met Of®ce, Bracknell, Berkshire, United Kingdom

Unauthenticated | Downloaded 09/28/21 04:12 AM UTC 1APRIL 2002 NOTES AND CORRESPONDENCE 797

2000). The extratropical is much less pre- dictable and has a much lower signal-to-noise ratio.

Figure 1 shows cross-validated skill scores S for Z 500 for the individual AMIP models. In the Tropics, the performance of the individual unbiased forecasts Ui varies substantially from one model to another. Skill score values range from about Ϫ0.4 to 0.1, with an average value of about 0. Skill scores for the regression- improved forecasts Ri, which are substantially better, are all positive, with an average value of about 0.2. Skill is lower in the PNA sector (Fig. 1, lower panel). The individual regression-improved forecasts, although sub- stantially better than the raw bias-corrected forecasts, still have small negative skill scores, indicating that they are not able to outperform the climatological forecast. Figure 2 shows cross-validated skill scores for the biased-removed ensemble mean (UEM) forecast, the re- gression-improved ensemble mean (REM) forecast, the regression-improved multimodel (Rall) forecast, and the regression-improved multimodel subset (Rsubset) forecast as a function of the multimodel ensemble size. An en- semble of size M is constructed by using the ®rst M models in Table 2. The results shown in Fig. 2 are not very sensitive to the order in which models are included in the ensemble. In the Tropics, the regression-improved ensemble mean REM has the highest skill scores for ensembles of fewer than ®ve models. The bias-corrected multimodel ensemble mean UEM becomes a better deterministic fore- cast in larger ensembles. As the ensemble becomes larg- er, it is evident that the uncertainty introduced by es- timating an additional adjustable coef®cient overcomes any gain in skill due to forecast rescaling. The skill of

REM and UEM increases steeply with increasing ensemble size up to about six but then levels off at a value slightly over 0.3 for larger ensembles. The skill of the regres- sion-improved multimodel forecast Rall initially increas- es with increasing ensemble size but then decreases for the larger ensemble sizes when the number of regression coef®cients becomes too large. Overall, skill scores for

Rall are lower than those for REM. FIG. 1. Cross-validated error variance skill score S for Z500 in the In the PNA sector, the regression-improved ensemble (top) Tropics and (bottom) PNA sector for individual AMIP models. mean R is the best. It barely outperforms the clima- Black bars are skill scores of bias-removed individual forecasts Ui. EM Gray bars are skill scores of regression-improved individual forecasts tological forecast, however. Note that we use monthly R . data for the whole year. We expect seasonal means to i be more skillful in some seasons. Deviations from the monotonic skill increase as a function of the ensemble is indicated by the numbers on the blank bars in Fig. size are likely due to sampling variability and differ- 2. There is some skill improvement for Rsubset in the PNA ences in the individual model performance. Skill scores sector over that for Rall, which comes about mainly be- of the bias-corrected multimodel ensemble mean UEM cause fewer, rather than ``better,'' models are used in increase nearly monotonically with increasing ensemble the linear regression (we found no strong preference for size as more and more climate noise is ®ltered out from any particular subset of models). The data record ap- the ensemble mean. The performance of the regression- parently is too short to distinguish reliably among the improved multimodel forecast Rall decreases monoton- performances of the AMIP models in Table 2. In the ically with increasing ensemble size. Tropics we found some preference for the ECMWF, The attempt to select the ``best'' forecast subset ob- Geophysical Fluid Dynamics Laboratory (GFDL), La- jectively with the AIC was not very successful. The boratoire de MeÂteÂorologie Dynamique (LMD), and average number of models used for constructing Rsubset Max-Planck-Institut fuÈr Meteorologie (MPI) models,

Unauthenticated | Downloaded 09/28/21 04:12 AM UTC 798 JOURNAL OF CLIMATE VOLUME 15

coef®cients. It would appear that the AIC penalty term, 2(K ϩ 1), does not guard adequately against over®tting.

4. Summary We evaluated the performance of several versions of

deterministic unbiased monthly forecasts of Z 500,as measured by the error variance, based on an ensemble of 10 10-yr AMIP integrations. The summary of the results is as follows. 1) In the Tropics, where predictability is relatively high,

the regression-improved ensemble mean REM per- forms best for small ensembles and the bias-removed

ensemble mean UEM is better for ensembles with more than six models. For large ensembles, esti- mating just one regression coef®cient to ``improve''

UEM degrades skill, apparently because the coef®- cient estimate is subject to sampling errors. 2) In the extratropics, where atmospheric predictability

is low, the regression-improved ensemble mean REM yields the deterministic forecast with the smallest error variance. However, this forecast barely out- performs the climatological forecast. There is no substantial skill improvement for ensembles of more than six models. 3) The performance of the regression-improved mul-

timodel (superensemble) forecast Rall is generally not as good as that of the regression-improved ensemble

mean REM and becomes increasingly poorer for en- sembles with many models. The main reason for the poor performance of the multimodel linear regression technique in this study is over®tting that results in overly optimistic estimates of skill when the number of parameters estimated from the data is large relative to the sample size. In these cir- FIG. 2. Cross-validated error variance skill score S for several Z500 forecast variants in the (top) Tropics and (bottom) PNA sector as a cumstances, the ®tted model performs poorly on inde- function of the ensemble size. Skill scores are displayed for the bias- pendent data because it has adapted itself to the unpre- removed ensemble mean U (black bars), the regression-improved EM dictable variability within the available data in the train- ensemble mean R (dark gray bars), the regression-improved mul- EM ing period. timodel forecast Rall (light gray bars), and the regression-improved multimodel subset forecast Rsubset for which the best subset of models is determined objectively with AIC (blank bars). The average number of models used for constructing Rsubset is indicated by the numbers on REFERENCES the blank bars. Akaike, H., 1974: A new look at the statistical model identi®cation. IEEE Trans. Auto. Control, 19, 716±723. Bengtsson, L., U. Schlese, E. Roeckner, M. Latif, T. P. Barnett, and N. Graham, 1993: A two-tiered approach to long-range climate which were about 2 times as likely to be chosen than forecasting. Science, 261, 1026±1029. the other models. However, the performance of Rsubset is Danard, M. B., M. M. Holl, and J. R. Clark, 1968: Fields by cor- about the same as that of Rall and is below that of REM. relation assemblyÐA numerical analysis technique. Mon. Wea. The performance of R is noticeably worse than Rev., 96, 141±149. subset Davis, R. E., 1976: Predictability of sea surface and sea that of Rall for the same ensemble size as the averaged level pressure anomalies over the North Paci®c Ocean. J. Phys. number of models used for constructing Rsubset. The ad- Oceanogr., 6, 249±266. ditional ¯exibility of selecting the best model subset Derome, J., G. Brunet, A. Plante, N. Gagnon, G. J. Boer, F.W. Zwiers, apparently aggravates the over®tting problem. The pro- S. Lambert, and H. Ritchie, 2001: Seasonal predictions based on two dynamical models. Atmos.±Ocean, in press. cedure ®nds the combination of models that ``adapts'' Doblas-Reyes, F. J., M. DeÂqueÂ, and J.-P. Piedelievre, 2000: Multi- best to the available data sample in the training period, model spread and probabilistic forecasts in PROVOST. Quart. at the expense of the accuracy of the ®tted regression J. Roy. Meteor. Soc., 126, 2069±2087.

Unauthenticated | Downloaded 09/28/21 04:12 AM UTC 1APRIL 2002 NOTES AND CORRESPONDENCE 799

Fraedrich, K., and N. R. Smith, 1989: Combining predictive schemes casts for weather and seasonal climate. J. Climate, 13, 4196± in long-range forecasting. J. Climate, 2, 291±294. 4216. Gates, W. L., 1992: AMIP: The Atmospheric Model Intercomparison Lorenz, E. N., 1982: Atmospheric predictability with a large nu- Project. Bull. Amer. Meteor. Soc., 73, 1962±1970. merical model. Tellus, 34, 505±513. Gibson, J. K., P. Kalberg, S. Uppala, A. Hernandes, A. Nomura, and Michaelson, J., 1987: Cross-validation in statistical climate forecast models. J. Climate Appl. Meteor., 26, 1589±1600. E. Serrano, 1997: ECMWF Reanalysis Report Series 1ÐERA Phillips, T. J., 1994: A summary documentation of the AMIP models. Description. ECMWF, Reading, United Kingdom, 72 pp. Tech. Report PCMDI Rep. 18, PCMDI, Lawrence Livermore Hasselmann, K. F., 1979: On the signal-to-noise problem in atmo- National Laboratory, Livermore, CA, 343 pp. spheric response studies. of the Tropical Ocean, D. Rowell, D. P., and F. W. Zwiers, 1999: The global distribution of the B. Shaw, Ed., Royal Meteorological Society, 251±259. sources of decadal variability and mechanisms over the tropical ÐÐ, 1997: Multi-pattern ®ngerprint method for detection and attri- Paci®c and southern North America. Climate Dyn., 15, 751±772. bution of climate change. Climate Dyn., 13, 601±612. Thompson, P. D., 1977: How to improve accuracy by combining Kharin, V. V., F. W. Zwiers, and N. Gagnon, 2001: Skill of seasonal independent forecasts. Mon. Wea. Rev., 105, 228±229. hindcasts as a function of the ensemble size. Climate Dyn., 17, Zwiers, F. W., 1996: Interannual variability and predictability in an 835±843. ensemble of AMIP climate simulations conducted with the CCC GCM2. Climate Dyn., 12, 825±848. Krishnamurti, T. N., C. M. Kishtawal, T. E. LaRow, D. R. Bachiochi, ÐÐ, 1999: The detection of climate change. Anthropogenic Climate Z. Zhang, C. E. Willifor, S. Gadgil, and S. Surendran, 1999: Change, H. von Storch and G. FloÈser, Eds., Springer-Verlag, Improved weather and seasonal climate forecasts from multi- 161±206. model superensemble. Science, 285, 1548±1550. ÐÐ, X. L. Wang, and J. Sheng, 2000: The effects of specifying ÐÐ, ÐÐ, Z. Zhang, T. E. LaRow, D. R. Bachiochi, C. E. Willifor, bottom boundary conditions in an ensemble of GCM simulations. S. Gadgil, and S. Surendran, 2000: Multimodel ensemble fore- J. Geophys. Res., 105, 7295±7315.

Unauthenticated | Downloaded 09/28/21 04:12 AM UTC