This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/copyright Author's personal copy

Atmospheric Research 100 (2011) 246–262

Contents lists available at ScienceDirect

Atmospheric Research

journal homepage: www.elsevier.com/locate/atmos

Superposition of three sources of uncertainties in operational flood forecasting chains

Massimiliano Zappa a,⁎, Simon Jaun a,b, Urs Germann c, André Walser c, Felix Fundel a a Swiss Federal Research Institute WSL, Birmensdorf, b Institute for Atmospheric and Climate Science, ETH Zurich, Switzerland c Swiss Federal Office of Meteorology and Climatology MeteoSwiss, Switzerland article info abstract

Article history: One of the less known aspects of operational flood forecasting systems in complex topographic Received 29 May 2009 areas is the way how the uncertainties of its components propagate and superpose when they Received in revised form 1 December 2010 are fed into a hydrological model. This paper describes an experimental framework for Accepted 3 December 2010 investigating the relative contribution of meteorological forcing uncertainties, initial condi- tions uncertainties and hydrological model parameter uncertainties in the realization of Keywords: hydrological ensemble forecasts. Simulations were done for a representative small-scale basin Flood forecasting of the , the Verzasca river basin (186 km2). Uncertainty superposition For seven events in the time frame from June 2007 to November 2008 it was possible to Weather radar ensemble quantify the uncertainty for a five-day forecast range yielded by inputs of an ensemble Atmospheric EPS Model uncertainty numerical weather prediction (NWP) model (COSMO-LEPS, 16 members), the uncertainty in PREVAH real-time assimilation of weather radar precipitation fields expressed using an ensemble MAP D-PHASE approach (REAL, 25 members), and the equifinal parameter realizations of the hydrological COST 731 model adopted (PREVAH, 26 members). Combining the three kinds of uncertainty results in a hydrological ensemble of 10,400 members. Analyses of sub-samples from the ensemble provide insight in the contribution of each kind of uncertainty to the total uncertainty. The results confirm our expectations and show that for the operational simulation of peak- runoff events the hydrological model uncertainty is less pronounced than the uncertainty obtained by propagating radar precipitation fields (by a factor larger than 4 in our specific setup) and NWP forecasts through the hydrological model (by a factor larger than 10). The use of precipitation radar ensembles for generating ensembles of initial conditions shows that the uncertainty in initial conditions decays within the first 48 hours of the forecast. We also show that the total spread obtained when superposing two or more sources of uncertainty is larger than the cumulated spread of experiments when only one uncertainty source is propagated through the hydrological model. The full spread obtained from uncertainty superposition is growing non-linearly. © 2010 Elsevier B.V. All rights reserved.

1. Introduction time. This is particularly challenging in mountainous areas, where the orography strongly complicates the setup and Operational flood forecasting is an important task in order to operational workflow of most components of an end-to-end detect potentially hazardous extreme rainfall-runoff events in flood forecasting system. Such systems consists of atmospheric models (e.g. Rotach et al., 2009), hydrological prediction systems (e.g. Zappa et al., 2008), nowcasting tools used for ⁎ Corresponding author. Swiss Federal Institute for Forest, Snow and estimating initial conditions (e.g. Germann et al., 2009)and Landscape Research WSL, Mountain Hydrology and Torrents, Zürcherstrasse 111, CH-8903 Birmensdorf. Tel.: +41 44 739 24 33. warnings for end-users (Bruen et al., 2010; Frick and Hegg, E-mail address: [email protected] (M. Zappa). 2011-this issue).

0169-8095/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.atmosres.2010.12.005 Author's personal copy

M. Zappa et al. / Atmospheric Research 100 (2011) 246–262 247

Each component of the system is affected by uncertainties systems (Collier, 2007), such as radar precipitation estima- linked to the physical representation of orography, to the tion and nowcasting (e.g. Berenguer et al., 2005; Bowler et al., parameterization schemes of the models involved and the 2006; Szturc et al., 2008; Lee et al., 2009; Germann et al., limitations of the observing platforms providing real-time data 2009), pluviometer-based ensembles (Ahrens and Jaun, (Zappa et al., 2010). For an integral consideration of uncertainty 2007; Villarini and Krajewski, 2008; Moulin et al., 2009; three key sources of errors have to be considered: a) the Pappenberger et al., 2009), or satellite rainfall retrieval (e.g. uncertainty arising from incomplete process representation Bellerby and Sun, 2005; Clark and Slater, 2006). In addition including the error in the estimation of model parameters the use of observation-based ensembles allows obtaining a (Vrugt et al., 2005), b) the uncertainty in the initial conditions hydrologically consistent ensemble of initial conditions for and c) the uncertainty of the observed/forecasted hydromete- simulations coupled with atmospheric EPS. orological input. This “uncertainty triplet” (Fig. 1) superposes The hydrological model uncertainty is a further measure when data are fed into a hydrological model. The integral that is needed being accounted and communicated in uncertainty is the result of the interactions of all sources of hydrological forecasting. The problem of parameter estimation uncertainty that are propagating. and equifinality is not a prerogative of hydrology (Beven, 1993, In the field of numerical weather prediction, ensemble 2006; Beven and Freer, 2001; Vrugt et al., 2003; Pappenberger systems are established as standard tools to estimate and and Beven 2006), but is a common issue in environmental describe prediction uncertainties. Deterministic numerical modelling (see Matott et al., 2009 for a review). weather predictions (NWPs) are intrinsically limited by the This paper describes an experimental flood-forecasting chaotic nature of the atmospheric dynamics. Already in the chain emerging from the joint activities of the MAP–D-PHASE 1960s, Lorenz (1963) demonstrated in a seminal study that project (Rotach et al., 2009) and the COST action 731 (Rossa small errors in the initial conditions of a weather forecast can et al., 2011-this issue). A novel approach from our study is the grow rapidly, leading to highly diverging solutions. In order to superposition (or “cascading”, Pappenberger et al., 2005)of estimate predictability, much research has been undertaken to the “uncertainty triplet” described above. To summarize we develop probabilistic forecasting methodologies (see the will: reviews by Ehrendorfer, 1997 and Palmer, 2000). In the last years, several studies have been devoted to the regional scales - Propagate COSMO-LEPS (Section 2.3) and the radar using limited-area ensembles, in particular for forecasting ensemble fields from REAL (Germann et al., 2009; heavy precipitation events (e.g. Stensrud et al., 2000; Walser et Section 2.2) through the hydrological model PREVAH al., 2004). Motivated by the reported results, initiatives for (Viviroli et al., 2009a; Section 2.1) operational limited-area ensemble prediction systems (EPSs) - Estimate the uncertainty of PREVAH tunable parameters have emerged, e.g. the SRNWP-PEPS (Quiby and Denhard by Monte Carlo sampling and select different parameter 2003) and COSMO-LEPS (Marsigli et al., 2005). It is nowadays sub-samples (Section 3.2) common to apply atmospheric EPS as a forcing in operational -Define different experimental settings for superposing the flood-forecasting systems (Siccardi et al., 2005; Verbunt et al., uncertainties from PREVAH, REAL and COSMO-LEPS 2007; Bartholmes et al., 2009 and see Cloke and Pappenberger, (Section 3.4) 2009 for a review). - Quantify uncertainty and express it as average spread for a One of the advantages all meteorological ensemble forecast period of 120 hours, as defined by the lead-time approaches have in common is the simple interface with of COSMO-LEPS forecasts (Section 3.5). hydrological impact models. Each member of the ensemble can be fed into the hydrological model and generate forecast. As experimental area the Swiss Verzasca river basin The spread arising from the outcomes of all members (186 km2, Section 3.1) has been selected. This was the authors' represents the sensitivity of the hydrological system to the main test bed during MAP D-PHASE. Data are available since the meteorological ensemble. Recently, ensemble techniques beginning of the MAP D-PHASE demonstration period in June have been proposed to quantify uncertainties in observing 2007.

Fig. 1. Main sources of uncertainties propagating and superposing through a hydrological model in hydrometeorological forecasting chains. Author's personal copy

248 M. Zappa et al. / Atmospheric Research 100 (2011) 246–262

Our main goal is to estimate the different magnitudes of the logarithm of the ratio between the true (unknown) spread generated by our particular definitions of input precipitation values divided by the radar estimate. This is a uncertainties (REAL and COSMO/LEPS), initial conditions reasonable definition given the fact that most radar errors are uncertainties (REAL for estimating initial conditions before actually multiplicative (Germann et al., 2006). In a second feeding COSMO/LEPS into PREVAH) and hydrological model step REAL generates a number of perturbation fields using uncertainties (use of different set of calibrated parameters). singular value decomposition of the radar error covariance As a further goal we want to identify how spread grows when matrix, stochastic simulation using the LU decomposition different sources of uncertainty are superposed. algorithm, and autoregressive filtering. Each ensemble mem- ber is a possible realization of the unknown true precipitation 2. Methods field time series given the radar reflectivity measurements and the radar error covariance matrix. For the complete 2.1. The operational hydrological model PREVAH mathematical derivation of REAL we refer to Germann et al. (2009). We adopt the semi-distributed hydrological catch- A prototype ensemble generator has been implemented as ment modelling system PREVAH (Precipitation-Runoff- part of MAP D-PHASE and COST-731 and is running in real- Evapotranspiration HRU Model; Viviroli et al., 2009a), which time in an automatic mode since spring 2007. The ensemble has been developed to improve the understanding of the spatial of precipitation field time series from REAL consists of 25 and temporal variability of hydrological processes in catch- members and is updated operationally every 60 min and ments with complex topography. A review on previous work propagated through PREVAH. with PREVAH is presented in Viviroli et al. (2009a), which also thoroughly introduces the model physics, parameterizations 2.3. Quantification of uncertainty from ensemble NWP-systems and pre- and post-processing tools. Besides application for investigating water resources in Early identification of severe long-lasting rainfall events mountainous basins (Zappa et al., 2003; Zappa and Kan, 2007; within the next five days is obtained from the Limited-area Koboltschnig et al., 2009), in recent times PREVAH has been Ensemble Prediction System of the COnsortium for Small- more and more used in quasi-operational hydrological applica- scale MOdelling COSMO-LEPS (Marsigli et al., 2005). In the tions and re-forecasts of flooding events in Switzerland. current configuration, COSMO-LEPS provides once a day a 16 Verbunt et al. (2006) presented an indirect verification of member ensemble forecast with 132 hours lead-time for deterministic quantitative precipitation forecasts (QPF) for the large parts of Europe. COSMO-LEPS is initialized at 12:00 UTC river Rhine. Verbunt et al. (2007) and Jaun et al. (2008) whereas the first 12 forecast hours are not used due to presented case studies on coupling PREVAH with the ensemble misrepresentations during model spin up. Initial and bound- numerical weather prediction system COSMO-LEPS. Jaun and ary conditions are taken from the European Centre for Ahrens (2009) verify a two-year reforecast experiment of the Medium-Range Weather Forecast EPS (Molteni et al., 1996). PREVAH/COSMO-LEPS forecasting chain for the Swiss Rhine The horizontal grid-spacing of COSMO-LEPS is 10×10 km2 basin. Romang et al. (2011) introduce the application of which is rather coarse for the small Verzasca basin, but due to PREVAH for early flood warning in Swiss mesoscale basins. the high computational costs ensemble forecasts with higher PREVAH is adopted as a “hydrological engine” for superposing resolutions are not yet available for the medium-range. Six three sources of uncertainty (Fig. 1). meteorological surface variables (air temperature, precipita- tion, humidity, wind, sunshine duration derived from cloud 2.2. Dealing with uncertainties within operational weather cover, and global radiation) are obtained from the ensemble radar systems NWP and downscaled for hydrological modelling. The setup adopted for downscaling information from COSMO-LEPS for In the past decade MeteoSwiss, the Swiss Federal Office of hydrological applications is the same as presented in Jaun et Meteorology and Climatology, developed and implemented a al. (2008) and relies on bilinear interpolation. Air tempera- series of sophisticated algorithms to obtain best estimates of ture is adjusted according to elevation by adopting a constant surface precipitation rates over Switzerland using a radar lapse rate of 0.65 °C per 100 m. network (Germann et al., 2006). In spite of significant improvements, the residual uncertainty is still relatively large. 3. Experimental design A novel promising solution to express this residual uncertainty is to generate an ensemble of radar precipitation fields by 3.1. Study area combining stochastic simulations and detailed knowledge of the radar signal error structure. The method is called REAL, The Verzasca basin has an area of 186 km² up to the main which stands for Radar Ensemble generator designed for usage gauge in (Fig. 2). The basin is located in the in the Alps using LU decomposition (Germann et al., 2009). southern part of Switzerland and is little affected by human In REAL, the original (deterministic) radar precipitation activities. Its elevation range is 490–2870 m a.s.l. Forests field (1×1 km2 resolution) is perturbed with a stochastic (30%), shrub (25%), rocks (20%) and alpine pastures (20%) are component, which has the same mean and covariance the predominant land cover classes. Soils are rather shallow structure in space and time as the covariance matrix of the (generally smaller than 30 cm) and the plant available field radar errors. In a first step mean and covariance structure of capacity is below 5% volume. The discharge regime is radar errors are determined by comparing radar estimates governed by snowmelt in spring and early summer and by with rain gauge measurements. Radar errors are defined as heavy rainfall events in fall (Ranzi et al., 2007). The river is Author's personal copy

M. Zappa et al. / Atmospheric Research 100 (2011) 246–262 249

Fig. 2. Situation map of the Verzasca river basin in southern Switzerland including hydrometric (FOEN) and meteorological networks (MeteoSwiss and UCA). Additionally, the location of the Monte Lema weather radar few kilometers southern of the basin is displayed. Graphic elements reproduced by kind authorization of “swisstopo” (JA022265) and BFS GEOSTAT. rather prone to flash floods (Wohling et al., 2006) and leads (Wohling et al., 2006; Ranzi et al., 2007). The used default into the “Lago di Vogorno” an artificial reservoir maintained calibration is focused on the identification of a single parameter by a private Hydropower Company. set with highest performance in the simulation of the average The hydrological properties of the catchment are derived flows and with the smallest volume error between observed from gridded maps of elevation, land use, land cover and soil and simulated time series (Zappa and Kan, 2007; Viviroli et al., properties (Gurtz et al., 1999), which are available at 2009a). Since the target of this study is the quantification of 100×100 m2 resolution. For the present application a resolu- uncertainty propagation in hydrometeorological flood fore- tion of 500×500 m2 is generated previous to the delineation of casting chains, only seven parameters being relevant for surface hydrological response units (Viviroli et al., 2009a). The runoff runoff generation were allowed to randomly change during the gauging station at the catchment outlet is maintained by the MC experiment (Table 1). The identification of these seven Swiss Federal Office for Environment, which provides data at sensitive parameters relies on experience (Zappa, 2002), on 10 min resolution operationally. Flood peaks at Lavertezzo may consideration of the model structure (Gurtz et al., 2003)and exceed 600 m3 s−1 (~3.2 m3 s−1km²). Base flow in winter can targeted sensitivity studies on flood peak calibration (Viviroli et be less than one m3 s−1. al., 2009b). Table 1 indicates the basic value of the seven The operational meteorological forcing is obtained from parameters after the default calibration and the ranges allowed several sources. MeteoSwiss maintains a network of auto- for parameter sampling during the MC experiment. Further matic stations providing a detailed set of meteorological uncertainties linked to the parameters controlling snow variables with a sampling interval of up to 10 min (Fig. 2). The accumulation, snow melting and base-flow have been dis- administration of the Canton (UCA Ct. Ticino on Fig. 2) regarded. A total of 2527 MC runs were computed for the period maintains an additional network of pluviometers, which 1996–2001, whereby the year 1996 was only used as a spin-up samples the precipitation data in real-time with a temporal year. Please note, that we are not addressing the full predictive resolution of 30 min. One of the latter is the only automatic uncertainty of the forecasting chain as defined in Draper (1995) pluviometer within the basin. Furthermore weather radar and Todini (2009), but we focus on the parameter uncertainty precipitation fields are available (Section 2.2.). as obtained by selecting equifinal realizations from a Monte Carlo (MC) experiment, as well as observation and algorithm 3.2. Consideration of hydrological uncertainty uncertainty by the ensemble methods for the NWP and the radar systems (see above). However, for the model chain used, The initial setup and calibration of the hydrological model the obtained uncertainty is the best available estimate of the full was based on previous applications in the Verzasca river basin predictive uncertainty and our hydrological experiments which Author's personal copy

250 M. Zappa et al. / Atmospheric Research 100 (2011) 246–262

Table 1 time step t and n the number of time steps. NSE quantifies the Definition of model parameters allowed varying in the Monte-Carlo runs. The relative improvement of the model compared to the mean of “default” parameters are the result of a standard calibration procedure the observations. NSE is particularly adequate for our present (Viviroli et al., 2009a). The random sampling of the parameters was limited application, since it is particularly sensitive to high flows. Its to values included in the interval defined by MCMin and MCMax. use is less advisable for studies focussed on obtaining the best Symbol Parameter Unit Default MCMin MCMax calibrated values for both high and low-flows (Legates and a fl Pcorr Rainfall adjustment [%] 12.8 0.0 30.0 McCabe, 1999; Schae i and Gupta, 2007). a Scorr Snow adjustment [%] 37.4 20.0 50.0 In addition to NSE a second function is used. Lamb (1999) – BETA Soil moisture recharge 3.8 3.0 6.0 and Viviroli et al. (2009b) introduce and discuss several exponent fl SGR Threshold for surface mm 41 30 50 scores for obtaining tailored parameters sets for ood-peak runoff estimations. One of them is the sum of weighted absolute fi K0 Storage coef cient for h211030 errors (SWAE), which is defined as: surface runoff

K1 Storage coefficient for h 127 100 150 n ÀÁ fl ∑ a j − j ; ∈½ ; ∞½: ð Þ inter ow SWAE = Q t Q t qt SWAE 0 2 PERC Deep Percolation mm h− 1 0.153 0.10 0.20 t =1

a The two parameters controlling the bias adjustment of the precipitation input (rain or snow) are only used if the hydrological model is fed by A value of a=1.5 was used as proposed by Lamb (1999) interpolated pluviometers data. Although the NWP models and precipita- for evaluation of peak flow conditions. Behavioural simula- tions estimates with the weather radar contain systematic errors, it was tions show a lower SWAE. decided to avoid bias-corrections (Verbunt et al., 2006). The 2527 MC runs (Fig. 3) were ranked according to their rely on assessing different sets of model parameters to fit past performance in the defined calibration period. As a com- observations provide a practicable way to quantify how pound measure of performance a weighted product of NSE parameter uncertainty might contribute to the full predictive (weight=3) and SWAE (weight=1) was adopted to build a uncertainty of the system (Fig. 1). single score Li: The decision if a model run is behavioural or not is based  3 on a subjective choice of likelihood function(s) (Beven, 1993; NSEi ⋅ SWAEAVG : ð Þ Li = 3 Madsen, 2000, 2003; Viviroli et al., 2009b; Bosshard and NSEAVG SWAEi Zappa, 2008). As the goal of the modelling experiments is the estimation of flood peaks, two goodness-of-fit measures A MC realization i having NSEi above the average NSEAVG of focused on peak-discharge have been computed for each all realizations and SWAEi lower than the average SWAEAVG of MC realization. As a first measure, the well-known Nash and all realizations will be ranked higher than the MC runs Sutcliffe (1970) (NSE)efficiency is used: showing an opposite behaviour with respect to the average NSE and SWAE. The analysis of the MC runs showed that SWAE ∑n j Q −q j 2 varied between 3500 and 6000 while the range of NSE was NSE =1− t =1 t t ;NSE∈−∞; 1ð1Þ ∑n j − j 2 0.71 to 0.84 (Fig. 3). Finally a L range between 0.5 and 1.34 t =1 Q t Q i was obtained for all runs. where Qt is the observed hourly runoff at the time step t, Q Fig. 3 shows a dot-plot of all MC realizations with NSE on the the average of observed runoff, qt the simulated runoff at the y-axis and SWAE on the x-axis. The obtained pattern allows for a

Fig. 3. Dot-plot of the 2527 Monte Carlo realizations for the application of PREVAH in the Verzasca river basin during the calibration period 1996–2001. NSE and SWAE are used to select three sub-samples (99.5%, 95% and 80%) of acceptable parameter sets consisting of 26 realizations each. Author's personal copy

M. Zappa et al. / Atmospheric Research 100 (2011) 246–262 251

Table 2 Summary of the three parameter-sets of 26 members each after inferring the Monte Carlo simulations (see text for details). The numbers declare the median (Med.) and standard deviation (St.Dev) of the seven parameters that were randomly varied. “Range” indicates the ratio between St.Dev. and the dimension of the interval allowed for this parameters (Table 1).

Symbol Unit MOD_99.5% Med./St.Dev/Range MOD_95% Med./St.Dev/Range MOD_80% Med./St.Dev/Range

Pcorr [%] 11.54/2.8/0.09 14.3/5.1/0.17 19.3/6.6/0.22

Scorr [%] 32.1/7.2/0.24 29.6/9.2/0.31 33.6/9.5/0.32 BETA – 4.6/0.87/0.29 4.5/0.82/0.27 4.1/1.03/0.34 SGR mm 33.2/3.6/0.18 39.1/5.1/0.26 38.1/5.8/0.29

K0 h 12.7/0.97/0.05 11.8/2.0/0.1 15.6/2.4/0.12

K1 h 127/14.5/0.29 122/15.6/0.31 129/12.8/0.26 PERC mm h− 1 0.11/0.013/0.13 0.13/0.022/0.22 0.14/0.031/0.31 visual discrimination between realizations with higher and Initial conditions for September 1st 2005 (Fig. 4)are lower performances, with the best realizations being in the generated by a reference run using interpolated observed upper-left region of the dot-plot. For the analysis in the pluviometer data. This reference run starting on January 1st remaining sections of the paper three sub-samples of 26 1996 was obtained from an offline meteorological database, parameter sets each were isolated by ranking all realizations by which also includes stations that are not available in real- sorting Li.Thefirst sub-sample consists of the best 26 time. Starting from September 1st 2005 a second long-term realizations (99.5%; Li: 1.289/1.339). The second sub-samples simulation relying on operationally available data only has collect the 26 sets around the 95% ranking (Li: 1.238/1.246). The been run to produce initial conditions for March 1st 2007. third sub-sample is a selection of 26 runs around the 80% This run used precipitation data from the operational ranking (Li: 1.152/1.157). Table 2 displays some statistical pluviometers operated by MeteoSwiss and the river network measures about the three sub-samples of 26 parameter sets. administration of the Canton of Ticino (Fig. 2) as an input. Except for the storage coefficient controlling the generation of From March 1st 2007 operational time series of radar QPE and interflow K1, the 26 runs with highest performance present for REAL are also available. all seven tuneable parameters the lowest standard deviation The time frame for the implementation of PREVAH in within the sub-sample itself. The highest variability is comput- operational mode was decided in order to have good initial ed within the 80% sub-sample. conditions for the MAP-D-PHASE demonstration period. In the period of March 1st 2007 to November 23rd 2008 seven 3.3. The selected peak-flow events events with peak-runoff ranging between 77 and 541 m3 s− 1 have been identified (Table 3). The return period of the All experiments rely on a long-term simulation with highest flood peak in the considered period on September 7th PREVAH using the basic parameter calibration (Table 1). 2008 is approximately 5 years on the basis of extreme value

Fig. 4. Design of the seven experiments run for quantification of uncertainty superposition. The time window for the statistics is defined by the lead time of the C-LEPS forecasts. Author's personal copy

252 M. Zappa et al. / Atmospheric Research 100 (2011) 246–262

Table 3 Accumulated precipitation during the five day previous to the seven peak-flow events investigated. The column “Day-10/-20” declares the moment where initial conditions from a deterministic run are stored in order to trigger experiments on uncertainty propagation and superposition (Fig. 4). The list of the used COSMO- LEPS forecasts is sorted after the lead time in days before the event.

Event Peak Runoff Day-10/-20 Accumulated precipitation 120 hours COSMO/LEPS forecast initialization (month/day) (year/month/day) [m3 s− 1] (month/day) until day-5 [mm]

Pluviometers Radar Day-5 Day-4 Day-3 Day-2 Day-1 Day-0

2007/08/22 100.7 08/01 151 153 08/19 08/20 08/21 2008/07/07 80.3 06/28 10 38 07/04 07/05 07/06 2008/07/13 163.0 06/28 87 113 07/10 07/11 2008/08/15 76.9 07/20 45 98 08/10 08/11 08/12 08/13 08/14 2008/09/07 541.0 08/20 28 29 09/04 2008/10/29 210.6 10/09 9 4 10/27 10/28 10/29 2008/11/05 157.5 10/09 219 186 11/03 11/04 statistics of a time series starting in 1990 and having an deterministic or estimated with REAL) no bias in rainfall and average yearly flood of 385 m3 s− 1. The cumulative precip- snowfall is accounted for. The radar QPE is already corrected itation in the period previous to the 7 events is also indicated for biases during the pre-processing (Germann et al., 2006). in Table 3, both as spatially interpolated pluviometer data Therefore: the two parameters of PREVAH controlling such (areal precipitation estimate with inverse distance weighting corrections are set to 0% (Table 2). interpolation) and as assimilated QPE from the weather radar. 3) REAL: in this experiment only the uncertainty arising from It can be observed, that the event on October 29th 2008 the weather radar QPE is accounted for. 25 ensemble occurred after a relative dry antecedent period, while in the members from the radar ensemble generator (Section 2.2) days and weeks previous to the August 22nd 2007 over are used to force PREVAH. The initial conditions at 150 mm rainfall was estimated for the Verzasca basin. In the initialization of the nested runs are obtained from the antecedent cumulative precipitation for the November 5th MOD/RAD experiment, being forced with the deterministic 2008 event the precipitation event that triggered the October radar QPE since March 1st 2007 (Fig. 4). 29th 2008 peak flow is included. 4) REAL/MOD: this experiment is the first one where uncer- tainty superposition is considered. To reduce the computa- 3.4. The seven experiments towards estimation of uncertainty tional effort, only one of the 3 parameter sub-sets is superposition accounted for (see Section 4.1), namely the 95% sub set (Table 2), which includes the 26 model runs being ranked in The availability of several different data sets of deterministic the top 94.5% to 95.5% among the 2527 MC realizations (see and probabilistic precipitation measurements and forecasts and Section 3.2). In detail: for each nested period 25 (REAL)×26 the identification of sets of hydrological model parameters (MOD_95%) runs were completed in order to estimate the allows the computation of uncertainty superposition. Seven interaction between the radar-QPE and the uncertainties of different experiments (Fig. 4 and Table 3) have been completed: the hydrological model. Also in this case restart points for the hydrological model were saved for later initialization of 1) MOD/PLUV: in this experiment the simulations from March probabilistic forecasts with COSMO-LEPS (see below and 1st 2007 have been continued until November 25th 2008 Table 4). with the same configuration used since September 1st 2005. 5) LEPS: in this experiment only the uncertainty arising from No sources of uncertainty were considered. During the feeding PREVAH with the 16 COSMO-LEPS ensemble simulation a series of model starting points were stored 10 members is accounted for. The initial conditions at to 20 days ahead of a major discharge event (see Section 3.3 and Table 3). The timing for saving the initial conditions was fl chosen in order to guarantee that almost only base- ow is Table 4 – 3 − 1 contributing to the discharge at initialization and that a Average ensemble spread (q100 q0) in m s for seven peak-runoff events minor rainfall event is included in the time span between the when adopting different sets of model parameter realizations and either models restart point and the peak-flow event. In a second pluviometers (MOD_%/PLUV) or weather radar QPF (MOD_%/RAD) as precipitation forcing. stage a temporally nested simulation was run starting from the defined initialization date (Day-10/-20, Table 3)until10 Event Initialization MOD_%/PLUV MOD_%/RAD (year/month/day) (month/day) to 15 days after the event. For the nested sub-period 3 times 99.5% 95% 80% 95% 26 model runs were run (Figs. 3 and 4 and Table 2). – 2) MOD/RAD: this experiment is identical to the MOD/PLUV Members 26 26 26 26 experiment, with the only change that the precipitation 2007/08/22 08/20 10.3 14.2 18.5 9.3 forcing is obtained from the weather radar (see Section 2.2). 2008/07/07 07/05 4.9 8.3 10.4 6.5 2008/07/13 07/11 5.3 7.2 10.0 9.5 Also in this case model runs for temporally nested sub- 2008/08/15 08/12 5.9 10.3 11.7 8.3 periods in correspondence to peak-flow events were run by 2008/09/07 09/04 22.8 30.0 38.9 30.4 accounting the uncertainty in the determination of calibrat- 2008/10/29 10/27 9.9 14.6 19.6 10.8 ed model parameters. It is important to declare here that in 2008/11/05 11/03 10.8 13.9 18.8 9.9 Average [m3 s− 1] 10.0 14.1 18.3 12.1 the case of simulations forced with radar data (either Author's personal copy

M. Zappa et al. / Atmospheric Research 100 (2011) 246–262 253

initialization of COSMO-LEPS forecasts (Table 4 and Fig. 4) forecasts with COSMO-LEPS and nowcast runs with REAL, are obtained from the model being forced with the MOD/PLUV and MOD/RAD is not detailed here. Nevertheless deterministic radar QPE since March 1st 2007 (Fig. 4). A Appendix A and Fig. A1 give a concise summary on the quality total of 19 COSMO-LEPS 5-day forecasts for the Verzasca of the probabilistic (COSMO-LEPS, REAL) and deterministic river basin were selected (Table 3). COSMO-LEPS QPF are (MOD, RAD) simulation during the period June 2007 to not bias corrected (Table 2). November 2008, in which all selected events are included in 6) LEPS/MOD: in this experiment both the uncertainty of the and for which there is a detailed verification report (Diezig et model parameters and of the NWP forecasts are consid- al., 2010). The verification indicates that all used determin- ered. In detail: for each COSMO-LEPS initialization point istic and probabilistic meteorological inputs result in dis- 16 (COSMO/LEPS)×26 (MOD-95%) runs were completed. charge estimations that perform better than climatology. This gives an ensemble of 416 5-days forecasts. Even if REAL and COSMO-LEPS present similar skill against 7) FULL: the final experiment is the combination of REAL/MOD observations, the following sections will outline that the and LEPS/MOD. For each of the 19 COSMO-LEPS ensemble spread of these two sources of ensemble precipitation input forecasts 650 different initial conditions are available from may differ quite a lot for events leading to high discharge the superposition of REAL with the model parameter events. uncertainty (see above). Thus, 650 (REAL/MOD)×16 (COSMO-LEPS) runs were computed for all 19 forecasts 4.1. Parameter uncertainty (Fig. 4 and Table 3). An overall ensemble of 10,400 members results for evaluation and quantifying uncertainty superpo- The MOD/PLUV and MOD/RAD experiments have been sition by simultaneous consideration of uncertainties in the evaluated by quantifying the average ensemble spread – QPE (REAL), in the NWP forecasts (COSMO-LEPS) and in the (q100 q0) during the seven events (Table 4). MOD/PLUV was determination of the parameters of the hydrological model run using each of the three different sub-sets of parameter (MOD-95%). realizations (Table 2). For MOD/RAD only the results from the 26 realizations from the set MOD_95% are shown. Depending 3.5. Quantification of uncertainty on the intensity of the event (peak-flow) and the differences in antecedent precipitation (Table 3) different values of spread are We aim at quantifying the propagation and superposition obtained for the different events. The largest average ensemble of uncertainty when forcing PREVAH with different meteo- spread (about 30 m3 s− 1 for both MOD_95%/PLUV and rological time series and different configuration of its tunable MOD_95%/RAD) is found during the event leading to the parameters. In all experiments a time frame of 120 hours is September 7th 2008 peak-flow of 541 m3 s−1. evaluated (Fig. 4). The time frame is defined by the The application of parameter sub-samples with higher NSE initialization time of the COSMO-LEPS forecast used. We and SWAE results in reduced spread. The average spread assume that the average spread of the simulated ensemble resulting by propagating the MOD_99.5% sub-sample is 30% hydrographs is related to the uncertainty of the experimental lower than the one computed when propagating MOD_95%. settings used. For allowing intercomparison between experi- The spread obtained by propagating the MOD_80% sub- ments all statistics have been computed for the same sample is on 30% higher than the one obtained from 120 hours period. We take the average of the ensemble MOD_95% (Table 4). quantiles during the 120 hours as an objective measure for The average spread for the seven events obtained from 26 quantifying the uncertainty. Prior to the averaging, quantiles realizations of PREVAH forced with weather radar QPE is about i (q%) are determined for each of the 120 hours being 14% (MOD95%/RAD) lower than the corresponding spread of evaluated. Eq. (4) defines the computation of the average of the runs forced with interpolated pluviometer data (MOD95%/ fi fl quantiles q% for the de ned time frame: PLUV). Only for the event leading to the July 13 2008 peak ow the spread of the weather radar-driven simulations are larger i = n than the ones run with the rain gauge data. This is due to a local ∑ qi % convective rainfall event that was not recorded by the q = i =1 ; n = 120 time steps ð4Þ % n pluviometers, but that resulted in locally very high radar QPE. The main reason for having a lower spread with radar QPE than “ ” q% denotes the average quantile of discharge during with pluviometer forcing is the effect of bias correction, which n=120 time steps. q% has been computed for the levels 0%, is applied to the pluviometer data only. The variation in the bias 25%, 50% (the median), 75% and 100%. The average inter- correction (Table 3) covers both the input and model quartile range IQR can be obtained by subtracting q25 from uncertainties. This is the way errors in estimating precipitation q75, while the average range of spread is computed by are currently accounted for. However, there is an important subtracting q0 from q100. constraint as compared to the state-of-the-art observation- based precipitation ensembles (e.g. Ahrens and Jaun, 2007; 4. Results Moulin et al., 2009; Pappenberger et al., 2009). The hydrological model uses the precipitation bias corrections (Table 2)asa In this section the findings from the different experiments global tunable parameter for accounting for different sources of are discussed. The observed runoff hydrograph and the error in the treatment of rain gauge data: a) direct measure- average discharge during the events are also plotted, and ment errors, b) systematic errors due to the choice, location and should give a subjective indication on the plausibility of the availability of meteorological stations and, c) uncertainties in obtained result. The evaluation of long series of operational the generation of spatially interpolated fields. Additionally the Author's personal copy

254 M. Zappa et al. / Atmospheric Research 100 (2011) 246–262 bias correction parameters also contribute to a compensation of Table 5 – 3 − 1 systematic errors in the estimation of evapotranspiration and Average ensemble spread (q100 q0)inm s for seven peak-runoff events when adopting different experimental settings for propagating and super- other water fluxes by PREVAH (Zappa, 2002; Viviroli et al., posing uncertainty in operational hydrological simulations. 2009a). Methods for generating observation-based ensembles (both based on weather radar and simulations) are only Event (year/ Initialization REAL REAL/ LEPS LEPS/ FULL focusing on the estimation uncertainties in the gridding of month/day) (month/day) MOD MOD precipitation information and are therefore better suited for the Members – 25 650 16 416 10400 propagation of input uncertainties. 2007/08/22 08/20 37.2 48.9 117.0 137.0 141.0 Fig. 5 shows in detail simulations for the November 5th 2008/07/07 07/05 24.8 33.8 84.5 100.0 105.0 2008 event. The related evaluation of the average spread for the 2008/07/13 07/11 42.0 53.9 123.7 146.0 146.0 120 hours window starting from November 3rd 2008 00:00 is 2008/08/15 08/12 37.9 48.6 100.0 123.0 142.0 2008/09/07 09/04 167.0 216.0 288.0 328.0 338.0 summarized in Table 4. The spread arising from adopting three 2008/10/29 10/27 34.9 48.9 82.0 102.0 110.0 different parameter sub-sets clearly increases by using sets 2008/11/05 11/03 43.3 56.3 116.0 138.0 139.0 3 − 1 with lower Li during the calibration period. While the shape of Average [m s ] 55.3 72.3 130.0 153.0 160.0 the simulated ensembles above and below the median remains very similar among the three cases, the distance of the upper and lower ensemble envelopes grows with decreasing likeli- hood within the calibration period. As a consequence, the now on. This decision is taken with the intent of avoiding over number of observations falling within the uncertainty band fitting (when using MOD_99.5% as a benchmark). drawn by the ensembles increases when using MOD_80 as compared to both MOD_99.5% and MOD_95%. The spread 4.2. Weather radar uncertainty and superposition with parameter computed when using weather radar information is slightly uncertainty higher at the start of the event. During the event the spread obtained from radar forcing gets clearly smaller as the one Following the proof-of-concept presented in Germann et obtained from forcing using interpolated data from pluvi- al. (2009), PREVAH was run by adopting ensemble radar QPE ometers. This is confirmed by the average values of spread ensembles obtained from REAL. The runs forced by REAL during the event (Table 4). members use the initial conditions of a deterministic run Spreads resulting from this analysis range between 7 and forced by the operational radar QPE of MeteoSwiss until some 30 m3 s− 1 for the seven investigated events (MOD_95%). We days ahead of the event (Fig. 4 and Table 4). From that selected MOD_95% as a benchmark against which to compare initialization point the procedure described in Section 3.4 spreads resulting from the other sources of uncertainty from (experiment “REAL”) is applied. As for the results presented

Fig. 5. November 5th 2008 event: visualization of the spread obtained by adopting different sets of model parameter realizations for simulations with PREVAH. The observed hydrograph is drawn as black line. The shaded dark and light grey areas are delimitated by the quantiles of the ensemble realizations (q0,q25,q75, and q100). The dashed black line draws the ensemble median (q50). Top left: realizations obtained from pluviometric data and the MOD_99.5% set. Top right: same but MOD_95% set is applied. Bottom left: same for MOD_80%. Bottom right: weather radar data are used combined with the MOD_95% parameters realizations set. Author's personal copy

M. Zappa et al. / Atmospheric Research 100 (2011) 246–262 255 in the previous section the average ensemble spread of the deterministic radar QPE and 26 parameter realizations from seven selected events has been computed for a 120 hours MOD_95%). time frame (Table 5). In analogy also the experiment REAL/ - The coupling of PREVAH with REAL results in hydrograph MOD was completed by varying both, the REAL member and ensembles with an average spread of over 55 m3 s− 1 for the calibrated parameter realization from the MOD_95% set the same seven events. one after the other. - If both REAL and MOD_95% are applied an ensemble of 650 The model runs resulted in an average spread ranging members is generated. The obtained average spread in between 25 and 167 m3 s− 1 for REAL and between 34 and this case is about 72 m3 s− 1. 216 m3 s− 1 for REAL/MOD. If we compare these results with the outcomes of MOD_95%/RAD, the REAL and REAL/MOD This means that REAL/MOD generates a 6% to 7% larger runs (Table 4) present a higher spread by a factor of 4.3 spread than the sum of the spread obtained from the (REAL) and 5.6 (REAL/MOD). experiment MOD/RAD_95% and REAL (67 m3 s− 1). This Fig. 6 shows two examples of 5-days ensemble hydro- indicates that an amplification of spread by superposition of graphs obtained for the experiments REAL and REAL/MOD. two sources of uncertainty is occurring. Our particular Contrarily to the ensembles shown in Fig. 5 almost all modelling system is characterized by non-linear responses, observed values fall within the ensemble envelopes. Only mostly explained by conceptual threshold processes in the the falling limbs close to the end of the simulation are runoff generation module of PREVAH (Gurtz et al., 2003). At underestimated by both REAL and REAL/MOD ensembles. For the level of interquartile range amplification of spread has the cases August 22nd 2007 and November 5th 2008 events been observed in only one of the 19 cases considered there is clear evidence that the spread arising by joint (Table 3). Thus only a subset of all considered REAL and consideration of two sources of uncertainty is higher than MOD combinations triggers a non linear reaction within the the one obtained by propagating only the REAL members runoff generation module of PREVAH. In all other cases the through the hydrological model. The spread from the REAL/ IQR-spread of REAL/MOD is in average 9% smaller than the MOD ensemble is 25% to 40% higher than that of the REAL cumulative spread of REAL and MOD. realizations. The average additional spread for the seven − events is 17 m3 s 1 (Table 5). Combining the analyses of 4.3. COSMO-LEPS uncertainty and superposition with parameter Tables 4 and 5, the following findings can be stated for uncertainty simulations REAL and REAL/MOD: As expected the average spread obtained by propagating - The average spread for the seven events stemming from the NWP forecasts through the hydrological model is much larger 3 −1 parameter ensemble is about 12 m s (PREVAH forced by than the one obtained from the experiments discussed above

Fig. 6. Ensemble hydrographs for the August 22nd 2007 (upper panels) and November 5th 2008 (bottom panels) events as obtained by forcing PREVAH with 25 radar ensemble members (REAL, left panels) and by jointly accounting for both radar and model parameter uncertainty (REAL/MOD, right panels). The observed hydrograph is drawn as black line. The shaded dark and light grey areas are delimitated by the quantiles of the ensemble realizations (q0,q25,q75, and q100). The dashed black line draws the ensemble median (q50). Author's personal copy

256 M. Zappa et al. / Atmospheric Research 100 (2011) 246–262

(Fig. 7 and Table 4). The computation of LEPS generates and superposed to the uncertainty originating from the LEPS average spreads that are about 10 times higher than the ones (right panels in Fig. 7). The average spread from the LEPS/ of MOD_95%/RAD and 2.3 times higher than the ones from MOD ensemble is 13% to 25% higher than the one of the model REAL (Tables 4 and 5). Contrary to previous experiments, that realizations based on LEPS only. The average additional are always related to an occurred precipitation event, the spread for the seven events is 23 m3 s−1 (Table 5). LEPS ensemble (initialized as declared in Table 5) also In analogy to joint consideration of Tables 4 and 5 in includes members that are forecasting very low or no Section 4.2 the experiments with LEPS and LEPS/MOD allow precipitation at all for the respective event (e.g. Fig. 7 for the following statements: the August 22nd 2007 event, upper panels). The forecast initialized on August the 20th 2007 includes a relevant - Average spread from MOD_95%/RAD is about 12 m3 s− 1 number of members that show no runoff increase at all within (see above). the 120 forecast hours. Even the 25% quantile shows a - The coupling of PREVAH with LEPS generated hydrograph maximum discharge that is slightly higher than the discharge ensembles with an average spread of over 130 m3 s− 1 for at initialization time. In case of this event the whole observed the seven events considered. time series falls within the envelope drawn by the LEPS - Applying both LEPS and MOD_95% results in an ensemble experiment. It is unfortunate that the spread is very large. of 416 members. The obtained average spread is larger This makes any kind of decision making related to that case than 150 m3 s− 1. almost impossible. Anyway, in this specific case a potential end-user taking actions on the basis of the 75% quartile would This means that REAL/MOD generates a 9% to 10% larger have been very efficient in his decision making. Further spread than the sum of the spread obtained from the considerations on skill for decision making are only possible experiment MOD/RAD_95% and LEPS (142 m3 s− 1). Also in after sound verification of long-term time series of consec- this case the superposition of the two sources of uncertainty utive forecasts (e.g., Fundel and Zappa, 2011). causes an amplification of the full spread. In this case an The results from the November 5th 2008 event (lower amplification of spread measured by the interquartile range panels in Fig. 7) show different characteristics. All LEPS has been observed in seven cases (Table 3). On average the ensemble members agree that the first runoff first peak is to IQR-spread of REAL/MOD is 2% smaller than the cumulative be expected in the second half of the first day of the forecast, spread of LEPS and MOD. and that a second (higher) peak will arrive about 60 hours When propagating numerical forecasts from an ensemble after initialization of the forecast. Potential users focusing on prediction system such as COSMO-LEPS through a hydrolog- the 75% quantile would have probably over-reacted at the ical model for mesoscale areas such as the Verzasca basin start of the event, but would have been able to cope with the (186 km2), the big mismatch between the basin area and the peak on November 5th 2008. resolution of the ensemble prediction system (10×10 km2 The LEPS/MOD experiments represent a second series of mesh size) has to be kept in mind. Nevertheless studies with simulations, for which parameter uncertainty is accounted for such kind of hydrological ensemble predictions have been

Fig. 7. The same as Fig. 6 but with PREVAH forced by 16 COSMO-LEPS ensemble members (LEPS, left panels) and by jointly accounting for both COSMO-LEPS and model parameter uncertainty (LEPS/MOD, right panels). Author's personal copy

M. Zappa et al. / Atmospheric Research 100 (2011) 246–262 257

Fig. 8. Ensemble hydrographs for the August 15th 2008 with PREVAH initialized on August 12th 2008 and forced by jointly accounting for both COSMO-LEPS and model parameter uncertainty (LEPS/MOD, left panel) and by accounting all three sources of uncertainty in the experimental chain (FULL, right panel). Legend as Figs. 6 and 7. very popular in the last few years (Cloke and Pappenberger, experiment (Fig. 8). The FULL ensemble shows already a large 2009; Jaun et al., 2008) and have found already application in spread at initialization, as determined by the antecedent operational chains. This scale restriction is less problematic conditions obtained from REAL/MOD runs. This difference in for applications in macro-scale basins (Pappenberger et al., the overall spread gradually converges but it is still well defined 2005; Bartholmes et al., 2009). at the time of the first runoff peak shortly after 2:00 on August 13th 2008, where the maximum peak-flow of FULL is about − 4.4. Superposition of three sources of uncertainty 50 m3 s 1 higher than the corresponding LEPS/MOD peak. The difference is also well visible in the IQR. The second peak, late in The last experiment combines the initial conditions the evening of August 15th 2008 shows nearby identical shape obtained from the REAL/MOD experiment (650 members) and ranges for both FULL and LEPS/MOD. The uncertainties with the 16 ensemble members of COSMO-LEPS (see Section 3.4 owed to the REAL influence on REAL/MOD decays during the and Fig. 4) and thus considers the entire “uncertainty triplet” first part of the event. (Fig. 1). The LEPS/MOD experiment discussed above is Fig. 9 shows an overview on all 19 “FULL” experiments, extended by additionally perturbing the initial conditions each of them summarizing the spread arising from 10,400 forcing PREVAH with REAL, up to start of the COSMO-LEPS 5-days forecasts. In 12 cases the observed average discharge is propagation through PREVAH. By accounting for these addi- found within the IQR. Only the experiment with the longest tional perturbations the average spread for the seven events lead time initialized on August 10th 2008 produced a q100 increases by about 4.5%, from 153 (LEPS/MOD) to 160 m3 s−1 lower than the observed average discharge during the 120 (FULL, Table 5). Only the run initialized on August 12th 2008 forecast hours considered. The model run initialized 24 hours shows a distinctly higher additional uncertainty (~15% more) later (August 11th 2008) generates an ensemble spread that in the FULL experiment, as compared to the LEPS/MOD strongly overestimates the observed value. The correspondent

Fig. 9. Box plots summarizing the average ensemble discharge quantiles related 19 experiments (lower captions on the x-axis) of superposing the uncertainty from three sources. The experiments related to different events (upper captions of the x-axis) are separated by a vertical line crossing the x-axis. The observed average discharge during the 120 hours of each experiment is displayed as a thick horizontal black line. The thick horizontal white line depicts q50 within the box plot drawn by q0, q25, q75 and q100. Author's personal copy

258 M. Zappa et al. / Atmospheric Research 100 (2011) 246–262

Table 6 – – 3 − 1 Attributing the contribution of different sources of uncertainty to the average spread ensemble spread (q100 q0) and IQR (q75 q25)inm s for the November 5th 2008 peak runoff event. The observed value and the correspondent ensemble median (q50) are also summarized. Sub-samples of three experiments are evaluated to estimate the different contribution of MOD, LEPS and REAL to the total experimental uncertainty. Details on the experiments and acronyms are found in Section 3.4.

3 − 1 3 − 1 – 3 − 1 – 3 − 1 Experiment Filter Varying Average of n realizations Members Observation [m s ] q50 [m s ] q100 q0 [m s ] q75 q25 [m s ] FULL None All three 1 10400 62.4 67.0 139.0 68.4 MOD REAL & LEPS 26 400 62.4 65.3 125.4 62.3 REAL MOD & LEPS 25 416 62.4 66.8 136.1 67.0 LEPS REAL & MOD 16 650 62.4 71.9 15.9 10.8 REAL/MOD NONE Both 1 650 62.4 56.7 56.3 31.0 REAL MOD 25 26 62.4 57.7 10.1 6.9 MOD REAL 26 25 62.4 55.7 45.9 25.5 LEPS/MOD None Both 1 416 62.4 66.9 138.0 67.9 MOD LEPS 26 16 62.4 64.4 121.8 59.6 LEPS MOD 16 26 62.4 71.5 15.7 10.1

runs for three following days show a gradual reduction in forecasts initialized on November 3rd 2008 (see also Figs. 5 to ensemble spread. The reason for the large spread is that some 7). The following procedure was applied: COSMO-LEPS members are forecasting severe convective - Calculate the quantiles of all runs of the experiment precipitation, while others predicted no precipitation at all. (Eq. (4)); Finally a moderate thunderstorm occurred in the evening of - Grouping in turn all runs sharing the same MOD, REAL or August 11th 2008. REAL also generated large spread in its LEPS member (Table 6); members with cumulated rainfall for August 11th 2008 - Averaging the quantiles of the sub-sample and calculate ranging between 3 and 40 mm. This explains the large correspondent spread metrics (Fig. 10). discrepancy in initial conditions observed at initialization of the LEPS forecasts on August 12th 2008 (see Fig. 8). The three main findings from Table 6 are: a) FULL: The 10,400 FULL runs give an average ensemble “ – ” 3 − 1 4.5. Attributing the contribute to the total uncertainty spread q100 q0 of 139.0 m s . There are 400 model runs sharing the same parameter set. This means that we “ – ” The outcome from the three experiments dealing with can compute 26 different q100 q0 and average them to uncertainty superposition (FULL, REAL/MOD, and LEPS/MOD) obtaining an integral measure indicating the spread can be sorted out in order to allocate the contribution of one attributed to the two sources of uncertainty that have of the sources of spread to the whole experimental uncer- been varied in this specific case (REAL & LEPS). In this “ – ” tainty. For this analysis we put the focus on one event only, example the q100 q0 that cannot be attributed to MOD is namely the November 5th 2008 event with COSMO-LEPS 125.4 m3 s− 1 (90% of the total spread). When REAL is used

Fig. 10. Box plots summarizing the average ensemble discharge quantiles for the November 5th 2008 event initialized on November 3rd 2008. Three experiments (upper captions of the x-axis) are evaluated as complete set (“no filter”) and by separating the influence of different sources of uncertainty (“filter MOD/REAL/ LEPS”). The observed average discharge during the 120 hours of each experiment is displayed as a thick horizontal black line. The thick horizontal white line depicts q50within the box plot drawn by q0, q25, q75 and q100. Author's personal copy

M. Zappa et al. / Atmospheric Research 100 (2011) 246–262 259

as a filter and both MOD and LEPS are varied, then almost sampled and ranked. Numerous approaches are possible for “ – ” fi 98% of the q100 q0 is obtained. REAL contributes in a very this kind of problem (Matott et al., 2009). We are con dent, that limited way to the whole ensemble spread. Finally, if the the chosen approach is appropriate to estimate the parameter influence of LEPS is averaged then only 11.5% of the FULL uncertainty of PREVAH within the presented superposition spread can be attributed (Table 6). Similar outcomes are experiment. Of course the parameter uncertainty is estimated “ – ” observed when looking at the IQR q75 q25 . on the basis of the whole calibration period. Current literature b) REAL/MOD: The REAL/MOD ensemble generates a (He et al., 2009; Cullmann and Wriedt, 2008 and Pappenberger “ – ” 3 − 1 “ – ” q100 q0 of 56.3 m s . Here the q100 q0 that cannot and Beven, 2004) offers some examples of approaches that try be allocated to MOD is 45.9 m3 s− 1 (80% of the total to combine parameter configurations being successful in the spread). When REAL is used as a filter and only MOD is complete data basis with other parameter configurations varied, then 17.9% of the spread can be allocated (Table 6). estimated for single events or series of events. c) LEPS/MOD: The 400 LEPS/MOD realizations are resulting Amplification of spread is obtained if the combination of “ – ” 3 − 1 in an average q100 q0 of 138 m s . When focusing on LEPS (or REAL) and triggers of a non linear reaction of the runoff the role of changing MOD and averaging the influence of generation module of PREVAH (Gurtz et al., 2003; Viviroli et al., “ – ” 3 − 1 LEPS then q100 q0 is only 15.6 m s (11% of the total 2009a) which includes a threshold parameter for activating the spread). If we make a sub-sample that filters the spread of generation of surface runoff (Table 1). Such a non linear MOD, then about 88% of the whole spread of LEPS/MOD response needs to be accounted for by hydrological models, can still be allocated (Table 6). since a sudden increase of discharge coefficients has been observed in many basins during long lasting heavy precipita- Fig. 10 is a graphic rendering of Table 6 in form of box- tion events (e.g. Naef et al., 2008). Such threshold processes can plots. All experiments in which LEPS contributes to the spread be also identified in for of step-structures in the flood frequency variation show an average spread close to the one of the statistic (e.g. Merz and Bloschl, 2008). In all considered cases we spread of the whole experiment. If only LEPS is propagated observed an amplification of the full spread, while the then the average spread is 130 m3 s− 1 (Table 5). If also model corresponding interquartile range is mostly smaller when two uncertainty is propagated, then the average spread increases error sources are superposed. by about 23 m3 s− 1 to 153 m3 s− 1. If different initial condi- By use of REAL, input uncertainties are considered for tions from REAL are also considered the additional increases nowcasting. We showed that the simultaneous application of is 7 m3 s− 1 only (total: 160 m3 s− 1, Table 5). If REAL is used REAL and parameter uncertainties generates ensembles that to generate initial conditions only, its influence to the total nicely envelop the observed hydrograph. Besides weather- spread is smaller than the influence of the hydrological model radar based approaches, observation-based ensembles with uncertainty. Using REAL as a forcing during the event pluviometer data have been recently proposed. Recent studies increases the spread by about 4.5 times (in the specific case propose the use of the Kriging variance (Ahrens and Jaun, 2007; of November 5th 2008) compared to the spread that can be Moulin et al., 2009; Pappenberger et al., 2009) for the attributed to the model parameters. This confirms the estimation of the interpolation uncertainty of ground-based outcomes summarized in Tables 4 and 5. precipitation data for hydrological purposes. Jaun (2008) showed that hydrological simulation forced by observation- 5. Discussion and conclusions based ensembles is sensitive to the density and number of stations available. The interpolation uncertainty increases with The experimental setup, accounting for three sources of decreasing number of representative stations available. These uncertainty, presented in this paper, provides interesting restrictions do not apply to REAL, which is able to operationally answers to questions linked to uncertainty propagation and generate high resolution observation-based ensembles for superposition in a hydrometeorological forecasting system. hydrology. Nevertheless, observation-based pluviometers The used setup showed that the hydrological model ensembles are certainly a feasible way to consider input (PREVAH) uncertainty is less pronounced than the uncer- uncertainty in regions where the weather radar coverage is tainty obtained by propagating radar precipitation fields not adequate. Further efforts are planned in order to implement (REAL) and NWP forecasts (COSMO-LEPS) through the interpolation-based ensembles within our experimental chain. hydrological model. The average difference in spread for a The use of weather radar ensembles for generating five-days forecast range in the seven events considered hydrologically consistent ensembles of initial conditions results in a factor larger than four between MOD/RAD and previous to the propagation of COSMO-LEPS through the REAL and in a factor above ten between MOD/RAD and LEPS. hydrological model show that the uncertainty in initial Since the size of the Verzasca basin is only a few square conditions decays within the first 48 hours of the forecast. kilometers larger than the mesh size of COSMO-LEPS there is The magnitude of the uncertainty attributed to the difference almost no averaging effect. This contributes to the large in initial conditions is smaller than the uncertainty attributed spread of the obtained hydrographs when COSMO-LEPS is to the hydrological model parameters and almost negligible used. Gallus (2002) warns about using NWPs grid-point with respect to the spread owed to COSMO-LEPS. information as for verification against point data. In the case The operational implementation of this experiment for the of the Verzasca this is almost the case, since we use small Verzasca river basin would be a priori possible. To information of few COSMO-LEPS grid points in order to realize a run with all 10,400 ensembles including 650 runs for force our impact model and compare it to observations. the determination of initial conditions requires about 6 hours The estimation of PREVAH parameter uncertainty is CPU time. The application on larger river basin requires a strongly dependent on the way the parameters have been reduction in number of simulations. The adaptive forecasting Author's personal copy

260 M. Zappa et al. / Atmospheric Research 100 (2011) 246–262 concept proposed by Romanowicz et al. (2006, 2008) could The obtained ROC and BSS show that both REAL and LEPS be a possible approach to estimate which members need to be are skillful for all selected thresholds. When low discharge computed. percentiles are tested (50% and 75%) BSS of LEPS decreases only slightly between day one and day five forecasts. The skill 3 − 1 Acknowledgments of forecast for discharges above 75.8 m s (95% percentile) is better for LEPS forecasts with lead time of one day than for fi We want to acknowledge the Swiss Federal Office for LEPS with ve days lead time. Environment providing us runoff data from their operational BSS of REAL is high for the lowest and the highest 3 − 1 networks. Thanks to the Ufficio dei corsi d'acqua (Canton percentiles considered. For the 75% discharge (17.2 m s ) Ticino) and Istituto Scienze della Terra (SUPSI) for additional percentile, REAL tends to have an increased rate of false rain-gauge data. This study is part of COST-731 and MAP alarms. D-PHASE, and was funded by MeteoSwiss, WSL and the MOD/RAD and MOD/PLUV show similar behaviour as the State Secretariat for Education and Research SER (COST ensemble products. MOD/RAD has a higher hit rate than 731). The comments of the two reviewers F. Pappenberger MOD/PLUV when the 75% discharge percentile is tested. On and L. Moulin and of the Guest Editor A. Rossa helped the other hand MOD/PLUV has fewer false alarms than MOD/ 3 − 1 clarifying the paper. RAD when discharge above 5.97 m s (50% percentile) is verified. An extended objective quantitative verification of the ensemble simulations against observed data will be Appendix A presented in follow-up studies.

In this appendix we give a concise summary on the fi veri cation of the probabilistic (COSMO-LEPS and REAL) and References deterministic (MOD and RAD) simulations during the period June 2007 to November 2008, expressed with probabilistic Ahrens, B., Jaun, S., 2007. On evaluation of ensemble precipitation forecasts measures of skill. Such kind of verification is established in with observation-based ensembles. Advances in Geosciences 10, 139–144. atmospheric sciences (Brier, 1950; Wilks, 2006; Weigel et al., Ahrens, B., Walser, A., 2008. Information-based skill scores for probabilistic 2007; Ahrens and Walser, 2008) and is enjoying increasing forecasts. Monthly Weather Review 136 (1), 352–363. popularity in hydrological sciences both for the analysis of Bartholmes, J.C., Thielen, J., Ramos, M.H., Gentilini, S., 2009. The european fi flood alert system EFAS — part 2: statistical skill assessment of single events and for veri cation of long time series (Jaun et probabilistic and deterministic operational forecasts. Hydrology and al., 2008; Jaun and Ahrens, 2009; Bartholmes et al., 2009; Earth System Sciences 13 (2), 141–153. Roulin and Vannitsem, 2005, Roulin 2007; Laio and Tamea Bellerby, T.J., Sun, J.Z., 2005. Probabilistic and ensemble representations of 2007, Brown et al., 2010). the uncertainty in an IR/microwave satellite precipitation product. Journal of Hydrometeorology 6 (6), 1032–1044. Fig. A1 shows the relative operating characteristic curves Berenguer, M., Corral, C., Sanchez-Diezma, R., Sempere-Torres, D., 2005. (ROC, Wilks, 2006) of LEPS and REAL for the period June 2007 Hydrological validation of a radar-based nowcasting technique. Journal – to November 2008. The ROC for the deterministic simulations of Hydrometeorology 6 (4), 532 549. Beven, K., 1993. Prophecy, reality and uncertainty in distributed hydrological MOD/RAD and MOD/PLUV are also indicated as a point. The modeling. Advances in Water Resources 16 (1), 41–51. analysis has been completed for three different thresholds, all Beven, K., 2006. On undermining the science? Hydrological Processes 20 (14), of them representing a percentile (50%, 75%, and 95%) of the 3141–3146. Beven, K.J., Freer, J., 2001. Equifinality, data assimilation, and uncertainty observed discharge during these 18 months. Additionally the estimation in mechanistic modelling of complex environmental systems. Brier Skill Score (BSS, Wilks, 2006) of the ensemble products Journal of Hydrology 249, 11–29. is declared. For LEPS the analysis has been completed for Bosshard,T.,Zappa,M.,2008.Regionalparameterallocationand predictive uncertainty estimation of a rainfall-runoff model in the different lead-times (Jaun and Ahrens, 2009). The lead-time poorly gauged Three Gorges Area (PR China). Physics and Chemistry of of one and five days is displayed in Fig. A1. the Earth 33 (17–18), 1095–1104.

Fig. A1. Relative operating characteristic curves (ROC, Wilks, 2006) of LEPS and REAL for the period June 2007 to November 2008. The ROC for the deterministic simulations MOD/RAD and MOD/PLUV is indicated as a point. ROC are plot for three different discharge thresholds corresponding to the 0.5 (left), 0.75 (middle) and 0.9 (right) quantiles. Author's personal copy

M. Zappa et al. / Atmospheric Research 100 (2011) 246–262 261

Bowler, N.E., Pierce, C.E., Seed, A.W., 2006. STEPS: A probabilistic Lorenz, E.N., 1963. Deterministic nonperiodic flow. Journal of Atmospheric precipitation forecasting scheme which merges an extrapolation nowcast Sciences 20 (2), 130–141. with downscaled NWP. Quarterly Journal Royal Meteorological Society 132 Madsen, H., 2000. Automatic calibration of a conceptual rainfall-runoff model (620), 2127–2155. using multiple objectives. Journal of Hydrology 235 (3–4), 276–288. Brier, G.W., 1950. Verification of forecasts expressed in terms of probability. Madsen, H., 2003. Parameter estimation in distributed hydrological catch- Monthly Weather Review 78 (1), 1–3. ment modelling using automatic calibration with multiple objectives. Brown, J.D., Demargne, J., Seo, D.J., Liu, Y.Q., 2010. The Ensemble Verification Advances in Water Resources 26 (2), 205–216. System (EVS): a software tool for verifying ensemble forecasts of Marsigli, C., Boccanera, F., Montani, A., Paccagnella, T., 2005. The COSMO- hydrometeorological and hydrologic variables at discrete locations. LEPS mesoscale ensemble system: validation of the methodology and Environmental Modelling and Software 25 (7), 854–872. verification. Nonlinear Processes in Geophysics 12 (4), 527–536. Bruen, M., et al., 2010. Visualizing flood forecasting uncertainty: some Matott, L.S., Babendreier, J.E., Purucker, S.T., 2009. Evaluating uncertainty in current European EPS platforms-COST731 working group 3. Atmospher- integrated environmental models: a review of concepts and tools. Water ic Science Letters 11 (2), 92–99. Resources Research 45. Clark, M.P., Slater, A.G., 2006. Probabilistic quantitative precipitation Merz, R., Bloschl, G., 2008. Flood frequency hydrology: 1. temporal, spatial, estimation in complex terrain. Journal of Hydrometeorology 7 (1), 3–22. and causal expansion of information. Water Resources Research 44 (8). Cloke, H.L., Pappenberger, F., 2009. Ensemble flood forecasting: a review. Molteni, F., Buizza, R., Palmer, T.N., Petroliagis, T., 1996. The ECMWF Journal of Hydrology 375 (3–4), 613–626. ensemble prediction system: methodology and validation. Quarterly Collier, C.G., 2007. Flash flood forecasting: What are the limits of predictability? Journal Royal Meteorological Society 122 (529), 73–119. Quarterly Journal Royal Meteorological Society 133 (622), 3–23. Moulin, L., Gaume, E., Obled, C., 2009. Uncertainties on mean areal Cullmann, J., Wriedt, G., 2008. Joint application of event-based calibration precipitation: assessment and impact on streamflow simulations. and dynamic identifiability analysis in rainfall-runoff modelling: Hydrology and Earth System Sciences 13 (2), 99–114. implications for model parametrisation. Journal of Hydroinformatics 10 Naef, F., Schmocker-Fackel, P., Margreth, M., Kienzler, P., Scherrer, S., 2008. (4), 301–316. Die Häufung der Hochwasser der letzten Jahre. In: Bezzola, G.R., Hegg, C. Diezig, R., Fundel, F., Jaun, S., Vogt, S., 2010. Verification of runoff forecasts by (Eds.), Ereignisanalyse der Hochwasser 2005—Teil 2, Analyse von the FOEN and the WSL. In: CHR (Ed.), Advances in Flood Forecasting and Prozessen, Massnahmen und Gefahrengrundlagen in Umwelt. Umwelt- the Implications for Risk Management. International Commission for the Wissen, p. 429. 08025. Hydrology of the Rhine Basin (CHR), Alkmaar, pp. 111–113. Nash, J.E., Sutcliffe, J.V., 1970. River flow forecasting through conceptual Draper, D., 1995. Assessment and propagation of model uncertainty. Journal models (1), a discussion of principles. Journal of Hydrology 10, 282–290. of the Royal Statistical Society: Series B: Methodological 57 (1), 45–97. Palmer, T.N., 2000. Predicting uncertainty in forecasts of weather and Ehrendorfer, M., 1997. Predicting the uncertainty of numerical weather climate. Reports on Progress in Physics 63 (2), 71–116. forecasts: a review. Meteorologische Zeitschrift 6 (4), 147–183. Pappenberger, F., Beven, K.J., 2004. Functional classification and evaluation of Frick, J., Hegg, C., 2011. Can end-users' flood management decision making be hydrographs based on Multicomponent Mapping (Mx). International improved by information about forecast uncertainty? Atmospheric Journal of River Basin Management 2 (2), 89–100. Research 100, 296–302 (this issue). Pappenberger, F., Beven, K.J., 2006. Ignorance is bliss: or seven reasons not to Fundel, F., Zappa, M., 2011. Hydrological Ensemble Forecasting in Mesoscale use uncertainty analysis. Water Resources Research 42 (5). Catchments: Sensitivity to Initial Conditions and Value of Reforecasts. Pappenberger, F., et al., 2005. Cascading model uncertainty from medium Water Resources Research, in review. range weather forecasts (10 days) through a rainfall-runoff model to Gallus, W.A., 2002. Impact of verification grid-box size on warm-season QPF flood inundation predictions within the European Flood Forecasting skill measures. Weather and Forecasting 17 (6), 1296–1302. System (EFFS). Hydrology and Earth System Sciences 9 (4), 381–393. Germann, U., Galli, G., Boscacci, M., Bolliger, M., 2006. Radar precipitation Pappenberger, F., Ghelli, A., Buizza, R., Bodis, K., 2009. The skill of probabilistic measurement in a mountainous region. Quarterly Journal Royal precipitation forecasts under observational uncertainties within the Meteorological Society 132 (618), 1669–1692. generalized likelihood uncertainty estimation framework for hydrological Germann, U., Berenguer, M., Sempere-Torres, D., Zappa, M., 2009. REAL — applications. Journal of Hydrometeorology 10 (3), 807–819. ensemble radar precipitation estimation for hydrology in a mountainous Quiby, J., Denhard, M., 2003. SRNWP-DWD Poor-Man Ensemble Prediction region. Quarterly Journal Royal Meteorological Society 135 (639), System: the PEPS Project. Eumetnet Newsletter, pp. 9–12. 445–456. Ranzi, R., Zappa, M., Bacchi, B., 2007. Hydrological aspects of the Mesoscale Gurtz, J., Baltensweiler, A., Lang, H., 1999. Spatially distributed hydrotope- Alpine Programme: findings from field experiments and simulations. based modelling of evapotranspiration and runoff in mountainous Quarterly Journal Royal Meteorological Society 133 (625), 867–880. basins. Hydrological Processes 13 (17), 2751–2768. Romang, H., et al., 2011. IFKIS-Hydro — early warning and information system Gurtz, J., et al., 2003. A comparative study in modelling runoff and its components for floods and debris flows. Natural Hazards 56 (2), 509–527 (19). in two mountainous catchments. Hydrological Processes 17 (2), 297–311. Romanowicz, R.J., Young, P.C., Beven, K.J., 2006. Data assimilation and He, Y., et al., 2009. Tracking the uncertainty in flood alerts driven by grand adaptive forecasting of water levels in the river Severn catchment, ensemble weather predictions. Meteorological Applications 16 (1), United Kingdom. Water Resources Research 42 (6). 91–101. Romanowicz, R.J., Young, P.C., Beven, K.J., Pappenberger, F., 2008. A data Jaun, S., 2008. Towards operational probabilistic runoff forecasts, Disserta- based mechanistic approach to nonlinear flood routing and adaptive tion No. 17817, ETH Zurich, [available online at http://e-collection. flood level forecasting. Advances in Water Resources 31 (8), 1048–1056. ethbib.ethz.ch/view/eth:41686]. Rossa,A.,Liechti,K.,Zappa,M.,Bruen,M.,Germann,U.,Haase,G.,Keil,C.,Krahe,P., Jaun, S., Ahrens, B., 2009. Evaluation of a probabilistic hydrometeorological 2011. The COST 731 Action: A review on uncertainty propagation in forecast system. Hydrology and Earth System Sciences Discussions 6, advanced hydro-meteorological forecast systems. Atmospheric Research 100, 1843–1877. 150–167 (this issue). Jaun, S., Ahrens, B., Walser, A., Ewen, T., Schar, C., 2008. A probabilistic view Rotach, M.W., et al., 2009. MAP D-PHASE real-time demonstration of weather on the August 2005 floods in the upper Rhine catchment. Natural forecast quality in the Alpine region. Bulletin of the American Hazards and Earth System Sciences 8 (2), 281–291. Meteorological Society 90 (9) 1321-+. Koboltschnig, G.R., Schoner, W., Holzmann, H., Zappa, M., 2009. Glaciermelt Roulin, E., 2007. Skill and relative economic value of medium-range of a small basin contributing to runoff under the extreme climate hydrological ensemble predictions. Hydrology and Earth System conditions in the summer of 2003. Hydrological Processes 23 (7), Sciences 11 (2), 725–737. 1010–1018. Roulin, E., Vannitsem, S., 2005. Skill of medium-range hydrological ensemble Laio, F., Tamea, S., 2007. Verification tools for probabilistic forecasts of predictions. Journal of Hydrometeorology 6 (5), 729–744. continuous hydrological variables. Hydrology and Earth System Sciences Schaefli, B., Gupta, H.V., 2007. Do Nash values have value? Hydrological 11 (4), 1267–1277. Processes 21 (15), 2075–2080. Lamb, R., 1999. Calibration of a conceptual rainfall-runoff model for flood Siccardi, F., Boni, G., Ferraris, L., Rudari, R., 2005. A hydrometeorological frequency estimation by continuous simulation. Water Resources approach for probabilistic flood forecast. Journal of Geophysical Research 35 (10), 3103–3114. Research, [Atmospheres] 110 (D5). Lee, C.K., Lee, G., Zawadzki, I., Kim, K.E., 2009. A preliminary analysis of spatial Stensrud, D.J., Bao, J.W., Warner, T.T., 2000. Using initial condition and model variability of raindrop size distributions during stratiform rain events. physics perturbations in short-range ensemble simulations of mesoscale Journal of Applied Meteorology and Climatology 48 (2), 270–283. convective systems. Monthly Weather Review 128 (7), 2077–2107. Legates, D.R., McCabe, G.J., 1999. Evaluating the use of “Goodness-of-Fit” Szturc, J., Osrodka, K., Jurczyk, A., Jelonek, L., 2008. Concept of dealing with measures in hydrologic and hydroclimatic model validation. Water uncertainty in radar-based data for hydrological purpose. Natural Resources Research 35, 233–241. Hazards and Earth System Sciences 8 (2), 267–279. Author's personal copy

262 M. Zappa et al. / Atmospheric Research 100 (2011) 246–262

Todini, E., 2009. Predictive uncertainty assessment in real time flood the strengths of global optimization and data assimilation. Water forecasting. In: Baveye, P.C., Laba, M., Mysiak, J. (Eds.), Uncertainties Resources Research 41 (1). in Environmental Modelling and Consequences for Policy Making. Walser, A., Luthi, D., Schar, C., 2004. Predictability of precipitation in a cloud- NATO Science for Peace and Security Series C—Environmental Security, resolving model. Monthly Weather Review 132 (2), 560–577. pp. 205–228. Weigel, A.P., Liniger, M.A., Appenzeller, C., 2007. Generalization of the discrete Verbunt, M., Zappa, M., Gurtz, J., Kaufmann, P., 2006. Verification of a coupled brier and ranked probability skill scores for weighted multimodel ensemble hydrometeorological modelling approach for alpine tributaries in the forecasts. Monthly Weather Review 135 (7), 2778–2785. Rhine basin. Journal of Hydrology 324 (1–4), 224–238. Wilks, D., 2006. Statistical Methods in the Atmospheric sciences, vol. 91 of Verbunt, M., Walser, A., Gurtz, J., Montani, A., Schar, C., 2007. Probabilistic International Geophysics Series. Elsevier, Amsterdam, The Netherlands. flood forecasting with a limited-area ensemble prediction system: Wohling,T.,Lennartz,F.,Zappa,M.,2006.Technicalnote:updating selected case studies. Journal of Hydrometeorology 8 (4), 897–909. procedure for flood forecasting with conceptual HBV-type models. Villarini, G., Krajewski, W.F., 2008. Empirically-based modeling of spatial Hydrology and Earth System Sciences 10 (6), 783–788. sampling uncertainties associated with rainfall measurements by rain Zappa, M., 2002. Multiple-response verification of a distributed hydrological gauges. Advances in Water Resources 31 (7), 1015–1023. model at different spatial scales, Dissertation No. 14895, ETH Zurich, Viviroli, D., Zappa, M., Gurtz, J., Weingartner, R., 2009a. An introduction to the [available online at: http://e-collection.ethbib.ethz.ch/show?type= hydrological modelling system PREVAH and its pre- and post-processing- diss&nr=14895]. pp. tools. Environmental Modelling and Software 24 (10), 1209–1222. Zappa, M., Kan, C., 2007. Extreme heat and runoff extremes in the Swiss Alps. Viviroli, D., Zappa, M., Schwanbeck, J., Gurtz, J., Weingartner, R., 2009b. Natural Hazards and Earth System Sciences 7 (3), 375–389. Continuous simulation for flood estimation in ungauged mesoscale Zappa, M., Pos, F., Strasser, U., Warmerdam, P., Gurtz, J., 2003. Seasonal water catchments of Switzerland — part I: modelling framework and balance of an Alpine catchment as evaluated by different methods for spatially calibration results. Journal of Hydrology 377 (1–2), 191–207. distributed snowmelt modelling. Nordic Hydrology 34 (3), 179–202. Vrugt, J.A., Gupta, H.V., Bouten, W., Sorooshian, S., 2003. A shuffled complex Zappa, M., et al., 2008. MAP D-PHASE: real-time demonstration of hydrological evolution metropolis algorithm for optimization and uncertainty ensemble prediction systems. Atmospheric Science Letters 9 (2), 80–87. assessment of hydrologic model parameters. Water Resources Research Zappa, M., et al., 2010. Propagation of uncertainty from observing systems 39 (8). and NWP into hydrological models: COST-731 Working Group 2. Vrugt, J.A., Diks, C.G.H., Gupta, H.V., Bouten, W., Verstraten, J.M., 2005. Atmospheric Science Letters 11 (2), 83–91. Improved treatment of uncertainty in hydrologic modeling: combining