<<

wrm_ch07.qxd 8/31/2005 12:18 PM Page 168

7. Concepts in Probability, and Modelling

1. Introduction 169

2. Probability Concepts and Methods 170 2.1. Random Variables and Distributions 170 2.2. Expectation 173 2.3. Quantiles, Moments and Their Estimators 173 2.4. L-Moments and Their Estimators 176

3. Distributions of Random Events 179 3.1. Parameter Estimation 179 3.2. Model Adequacy 182 3.3. Normal and Lognormal Distributions 186 3.4. Gamma Distributions 187 3.5. Log-Pearson Type 3 Distribution 189 3.6. Gumbel and GEV Distributions 190 3.7. L-Moment Diagrams 192

4. Analysis of Censored Data 193

5. Regionalization and Index-Flood Method 195

6. Partial Duration Series 196

7. Stochastic Processes and 197 7.1. Describing Stochastic Processes 198 7.2. Markov Processes and Markov Chains 198 7.3. Properties of Time-Series Statistics 201

8. Synthetic Streamflow Generation 203 8.1. Introduction 203 8.2. Streamflow Generation Models 205 8.3. A Simple Autoregressive Model 206 8.4. Reproducing the Marginal Distribution 208 8.5. Multivariate Models 209 8.6. Multi-Season, Multi-Site Models 211 8.6.1. Disaggregation Models 211 8.6.2. Aggregation Models 213

9. Stochastic 214 9.1. Generating Random Variables 214 9.2. River Basin Simulation 215 9.3. The Simulation Model 216 9.4. Simulation of the Basin 216 9.5. Interpreting Simulation Output 217

10. Conclusions 223

11. References 223 wrm_ch07.qxd 8/31/2005 12:18 PM Page 169

169

7 Concepts in Probability, Statistics and Stochastic Modelling

Events that cannot be predicted precisely are often called random. Many if not most of the inputs to, and processes that occur in, water resources are to some extent random. Hence, so too are the outputs or predicted impacts, and even people’s reactions to those outputs or impacts. To ignore this or is to ignore reality. This chapter introduces some of the commonly used tools for dealing with uncertainty in water resources planning and management. Subsequent chapters illustrate how these tools are used in various types of optimization, simulation and statistical models for impact prediction and evaluation.

1. Introduction of the . If expected or median values of uncertain parameters or variables are used in a deterministic model, Uncertainty is always present when planning, developing, the planner can then assess the importance of uncertainty managing and operating water resources systems. It arises by means of sensitivity analysis, as is discussed later in this because many factors that affect the performance of water and the two subsequent chapters. resources systems are not and cannot be known with Replacement of uncertain quantities by either expected, certainty when a system is planned, designed, built, median or worst-case values can grossly affect the evalua- managed and operated. The success and performance of tion of project performance when important parameters each component of a system often depends on future are highly variable. To illustrate these issues, consider meteorological, demographic, economic, social, technical, the evaluation of the recreation potential of a reservoir. and political conditions, all of which may influence Table 7.1 shows that the elevation of the water surface future benefits, costs, environmental impacts, and social varies over time depending on the inflow and demand for acceptability. Uncertainty also arises due to the stochastic water. The table indicates the pool levels and their associ- nature of meteorological processes such as evaporation, ated probabilities as well as the expected use of the recre- ation facility with different pool levels. rainfall and temperature. Similarly, future populations of – towns and cities, per capita water-usage rates, irrigation The average pool level L is simply the sum of each patterns and priorities for water uses, all of which affect possible pool level times its probability, or – water demand, are never known with certainty. L 10(0.10) 20(0.25) 30(0.30) There are many ways to deal with uncertainty. One, and 40(0.25) 50(0.10) 30 (7.1) perhaps the simplest, approach is to replace each uncertain This pool level corresponds to 100 visitor-days per day: quantity either by its average (i.e., its mean or expected – VD(L) 100 visitor-days per day (7.2) value), its median, or by some critical (e.g., ‘worst-case’) value, and then proceed with a deterministic approach. Use A worst-case analysis might select a pool level of ten as a of expected or median values of uncertain quantities may be critical value, yielding an estimate of system performance adequate if the uncertainty or variation in a quantity is rea- equal to 100 visitor-days per day: sonably small and does not critically affect the performance VD(Llow) VD(10) 25 visitor-days per day (7.3) wrm_ch07.qxd 8/31/2005 12:18 PM Page 170

170 Water Resources Systems Planning and Management

adequate representations of the data. Sections 4, 5 and 6 expand upon the use of these mathematical models, and possible probability recreation potential E021101a pool levels of each level in visitor-days per day discuss alternative parameter estimation methods. for reservoir with Section 7 presents the basic ideas and concepts of different pool levels the stochastic processes or time series. These are used to model streamflows, rainfall, temperature or other 10 0.10 25 20 0.25 75 phenomena whose values change with time. The section 30 0.30 100 contains a description of Markov chains, a special type of 40 0.25 80 used in many stochastic optimization 50 0.10 70 and simulation models. Section 8 illustrates how synthetic flows and other time-series inputs can be generated for Table 7.1. Data for determining reservoir recreation potential. stochastic . is introduced with an example in Section 9. Many topics receive only brief treatment in this intro- ductory chapter. Additional information can be found in Neither of these values is a good approximation of the applied statistical texts or book chapters such as Benjamin average visitation rate, that is and Cornell (1970), Haan (1977), Kite (1988), Stedinger –– VD 0.10 VD(10) 0.25 VD(20) 0.30 VD(30) et al. (1993), Kottegoda and Rosso (1997), and Ayyub 0.25 VD(40) 0.10 VD(50) and McCuen (2002). 0.10(25) 0.25(75) 0.30(100) 0.25(80) 0.10(70) (7.4) 2. Probability Concepts and 78.25 visitor-days per day Methods –– Clearly, the average visitation rate, VD 78.25, the visitation rate corresponding to the average pool level This section introduces the basic concepts and definitions – VD(L) 100, and the worst-case assessment VD(Llow) used in analyses involving probability and statistics. These 25, are very different. concepts are used throughout this chapter and later Using only average values in a complex model can chapters in the book. produce a poor representation of both the average performance and the possible performance range. When 2.1. Random Variables and Distributions important quantities are uncertain, a comprehensive analysis requires an evaluation of both the expected The basic concept in is that of the performance of a project and the risk and possible . By definition, the value of a random magnitude of project failures in a physical, economic, variable cannot be predicted with certainty. It depends, ecological and/or social sense. at least in part, on the outcome of a chance event. This chapter reviews many of the methods of proba- Examples are: (1) the number of years until the flood bility and statistics that are useful in water resources stage of a river washes away a small bridge; (2) the num- planning and management. Section 2 is a condensed ber of times during a reservoir’s life that the level of the summary of the important concepts and methods of pool will drop below a specified level; (3) the rainfall probability and statistics. These concepts are applied in depth next month; and (4) next year’s maximum flow at this and subsequent chapters of this book. Section 3 a gauge site on an unregulated stream. The values of all presents several probability distributions that are often of these random events or variables are not knowable used to model or describe the distribution of uncertain before the event has occurred. Probability can be used quantities. The section also discusses methods for fitting to describe the likelihood that these random variables these distributions using historical information, and will equal specific values or be within a given range of methods of assessing whether the distributions are specific values. wrm_ch07.qxd 8/31/2005 12:18 PM Page 171

Concepts in Probability, Statistics and Stochastic Modelling 171

The first two examples illustrate discrete random The value of the cumulative distribution function FX(x) variables, random variables that take on values that are for a discrete random variable is the sum of the probabil-

discrete (such as positive integers). The second two exam- ities of all xi that are less than or equal to x. ples illustrate continuous random variables. Continuous FxXXi() ∑ p () x (7.10) random variables take on any values within a specified xxi range of values. A property of all continuous random variables is that the probability that the value of any of The right half of Figure 7.1 illustrates the cumulative those random variables will equal some specific number – distribution function (upper) and the probability mass any specific number – is always zero. For example, the function pX(xi) (lower) of a discrete random variable. probability that the total rainfall depth in a month will be The probability density function fX(x) (lower left plot in exactly 5.0 cm is zero, while the probability that the total Figure 7.1) for a continuous random variable X is the rainfall will lie between 4 and 6 cm could be nonzero. analogue of the probability mass function (lower right Some random variables are combinations of continuous plot in Figure 7.1) of a discrete random variable X. The and discrete random variables. probability density function, often called the pdf, is the Let X denote a random variable and x a possible value of the cumulative distribution function so that: of that random variable X. Random variables are generally dFx() fx()X 0 (7.11) denoted by capital letters, and particular values they take X dx on by lowercase letters. For any real-valued random It is necessary to have variable X, its cumulative distribution function FX(x), often denoted as just the cdf, equals the probability that the value fx()1 ∫ X (7.12) of X is less than or equal to a specific value or threshold x: FX(x) Pr[X x] (7.5) Equation 7.12 indicates that the area under the probabil- ity density function is 1. If a and b are any two constants, This cumulative distribution function F (x) is a non- X the cumulative distribution function or the density decreasing function of x because function may be used to determine the probability that X δ δ Pr[X x] Pr[X x ] for 0 (7.6) is greater than a and less than or equal to b where In addition, b Pr[aXb ] FbFa ( ) ( ) fxx ( )d XX∫ X (7.13) limFxX ( ) 1 (7.7) a x→ and The area under a probability density function specifies the relative frequency with which the value of a continuous limFxX ( ) 0 (7.8) random variable falls within any specified range of values, x→ that is, any interval along the horizontal axis. The first limit equals 1 because the probability that X Life is seldomly so simple that only a single quantity is takes on some value less than infinity must be unity; the uncertain. Thus, the joint of two second limit is zero because the probability that X takes or more random variables can also be defined. If X and Y on no value must be zero. are two continuous real-valued random variables, their The left half of Figure 7.1 illustrates the cumulative joint cumulative distribution function is: distribution function (upper) and its derivative, the prob- FxyXY (, ) Pr[ Xx and Yy ] ability density function, fX(x), (lower) of a continuous x y random variable X. ∫ ∫ fuvuvXY (, )dd (7.14) If X is a real-valued discrete random variable that takes − − on specific values x1, x2, … , then the probability mass If two random variables are discrete, then function p (x ) is the probability X takes on the value x . X i i FxyXY(, )∑ ∑ pxy XY ( i , i ) (7.15) pX(xi) Pr[X xi] (7.9) xxi yyi wrm_ch07.qxd 8/31/2005 12:18 PM Page 172

172 Water Resources Systems Planning and Management

1.0 a 1.0 b E020527a

) )

x x

( (

X X

F Figure 7.1. Cumulative F distribution and probability density or mass functions of random variables: (a) continuous distributions; (b) discrete distributions. 0.0 0.0 possible values of a random variable X xxpossible values of a random variable X

1.0 1.0

) )

x x

( (

X X

F F

0.0 0.0 possible values of a random variable X x possible values of a random variable X x

where the joint probability mass function is: Other useful concepts are those of the marginal and conditional distributions. If X and Y are two random variables pXY(xi, yi) Pr[X xi and Y yi] (7.16) whose joint cumulative distribution function FXY(x, y) has If X and Y are two random variables, and the distribution been specified, then FX(x), the marginal cumulative distribu- of X is not influenced by the value taken by Y, and vice tion of X, is just the cumulative distribution of X ignoring Y. versa, then the two random variables are said to be inde- The marginal cumulative distribution function of X equals pendent. For two independent random variables X and Y, FX(x) Pr[X x] limFxyXY ( , ) (7.21) the joint probability that the random variable X will be y→ between values a and b and that the random variable Y where the limit is equivalent to letting Y take on any will be between values c and d is simply the product of value. If X and Y are continuous random variables, the those separate probabilities. marginal density of X can be computed from Pr[a X b and c Y d] Pr[a X b] Pr[c Y d] (7.17) fxXXY()∫ f (,) xyyd (7.22) This applies for any values a, b, c, and d. As a result, The conditional cumulative distribution function is the F (x, y) F (x)F (y) (7.18) XY X Y cumulative distribution function for X given that Y has which implies for continuous random variables that taken a particular value y. Thus the value of Y may have fXY(x, y) fX(x)fY(y) (7.19) been observed and one is interested in the resulting conditional distribution for the so far unobserved value of and for discrete random variables that X. The conditional cumulative distribution function for pXY(x, y) pX(x)pY(y) (7.20) continuous random variables is given by wrm_ch07.qxd 8/31/2005 12:18 PM Page 173

Concepts in Probability, Statistics and Stochastic Modelling 173

x If X and Y are independent, the expectation of the product ∫ fsysXY (, )d of a function g() of X and a function h() of Y is the ∞ FXY| (|)xy Pr[ X xY | y ] (7.23) product of the expectations: fyY () E[g(X) h(Y)] E[g(X)] E[h(Y)] (7.30) where the conditional density function is This follows from substitution of Equations 7.19 and 7.20 fxyXY (, ) fxyXY| (|) (7.24) into Equation 7.29. fyY () For discrete random variables, the probability of observ- 2.3. Quantiles, Moments and Their Estimators ing X x, given that Y y equals While the cumulative distribution function provides a pxyXY (, ) pXY| (|)xy (7.25) complete specification of the properties of a random py() Y variable, it is useful to use simpler and more easily under- These results can be extended to more than two random stood measures of the central tendency and range of variables. Kottegoda and Rosso (1997) provide more detail. values that a random variable may assume. Perhaps the simplest approach to describing the distribution of a 2.2. Expectation random variable is to report the value of several quantiles. The pth quantile of a random variable X is the smallest

Knowledge of the probability density function of a value xp such that X has a probability p of assuming a continuous random variable, or of the probability mass value equal to or less than xp: function of a discrete random variable, allows one to Pr[X xp] p Pr[X xp] (7.31) calculate the expected value of any function of the random variable. Such an expectation may represent the average Equation 7.31 is written to insist if at some value xp, the rainfall depth, average temperature, average demand cumulative probability function jumps from less than p to shortfall or expected economic benefits from system more than p, then that value xp will be defined as the pth operation. If g is a real-valued function of a continuous quantile even though FX(xp) p. If X is a continuous random variable X, the expected value of g(X) is: random variable, then in the region where fX(x) 0, the quantiles are uniquely defined and are obtained by solution of Ed[()]gX∫ gxf ()X () x x (7.26) FX(xp) p (7.32) whereas for a discrete random variable Frequently reported quantiles are the median x0.50 and E[gX ( )]∑ gx (iXi ) p ( x ) (7.27) the lower and upper quartiles x0.25 and x0.75. The median i describes the location or central tendency of the distribu- The expectation operator,E[], has several important prop- tion of X because the random variable is, in the continu- erties. In particular, the expectation of a linear function of ous case, equally likely to be above as below that value. X is a linear function of the expectation of X. Thus, if a The interquartile range [x0.25, x0.75] provides an easily and b are two non-random constants, understood description of the range of values that the random variable might assume. The pth quantile is also E[a bX] a bE[X] (7.28) the 100 p percentile. The expectation of a function of two random variables is In a given application – particularly when safety is of given by concern – it may be appropriate to use other quantiles. In floodplain management and the design of flood control E[(gXY , )]∫ ∫ gxyf (, )XY (, xy )dd x y structures, the 100-year flood x0.99 is a commonly selected design value. In water quality management, a river’s or minimum seven-day-average low flow expected once in E[(gXY , )]∑∑ gx (ii , y )ppxyXY(, i i ) (7.29) ten years is commonly used in the United States as the i j wrm_ch07.qxd 8/31/2005 12:18 PM Page 174

174 Water Resources Systems Planning and Management

critical planning value: Here the one-in-ten year value is comparing the relative variability of the flow in rivers of the 10th percentile of the distribution of the annual min- different sizes, or of rainfall variability in different regions ima of the seven-day average flows. when the random variable is strictly positive. λ The natural sample estimate of the median x0.50 is The third moment about the mean, denoted X, meas- the median of the sample. In a sample of size n where ures the asymmetry, or skewness, of the distribution: … x(1) x(2) x(n) are the observations ordered by λ µ 3 X E[(X X) ] (7.37) magnitude, and for a non-negative integer k such that n 2k (even) or n 2k 1 (odd), the sample estimate Typically, the dimensionless coefficient of skewness γ λ of the median is X is reported rather than the third moment X. The coefficient of skewness is the third moment rescaled by  xnk()k1 for 21 the cube of the standard deviation so as to be dimension- xˆ0.50  1   2 xx()kk (1 ) for nk 2 (7.33) less and hence unaffected by the scale of the random variable: Sample estimates of other quantiles may be obtained by λ using x as an estimate of x for q i/(n 1) and then γ X (i) q X σ 3 (7.38) X interpolating between observations to obtain xˆp for the desired p. This only works for 1/(n 1) p n/(n 1) Streamflows and other natural phenomena that are neces- and can yield rather poor estimates of xp when (n 1)p sarily non-negative often have distributions with positive is near either 1 or n. An alternative approach is to skew coefficients, reflecting the asymmetric shape of their fit a reasonable distribution function to the observa- distributions. tions, as discussed in Section 3, and then estimate When the distribution of a random variable is not xp using Equation 7.32, where FX(x) is the fitted known, but a set of observations {x1,…,xn} is available, distribution. the moments of the unknown distribution of X can be Another simple and common approach to describing a estimated based on the sample values using the following distribution’s centre, spread and shape is by reporting the equations. The sample estimate of the mean: moments of a distribution. The first moment about the n origin is the mean of X and is given by XXn∑ i/ (7.39a) i1 µ XXEd[]Xxfxx∫ () (7.34) The sample estimate of the variance: Moments other than the first are normally measured 1 n σˆ 2 S22∑ ()XX X Xi (7.39b) about the mean. The second moment measured about the ()n 1 i1 σ 2 mean is the variance, denoted Var(X) or X, where: The sample estimate of skewness: σµ22Var(XX )E [( ) ] XX (7.35) n n λˆ ∑ ()XX 3 σ X i (7.39c) The standard deviation X is the square root of the vari- ()()nn12i1 µ ance. While the mean X is a measure of the central value σ The sample estimate of the coefficient of variation: of X, the standard deviation X is a measure of the spread of the distribution of X about µ . ˆ X CVXX S/ X (7.39d) Another measure of the variability in X is the coefficient of variation, The sample estimate of the coefficient of skewness: σ γˆ λˆ 3 CV X X X /SX (7.39e) X µ (7.36) X The sample estimate of the mean and variance are often _ 2 The coefficient of variation expresses the standard denoted as x and sx where the lower case letters are deviation as a proportion of the mean. It is useful for used when referring to a specific sample. All of these wrm_ch07.qxd 8/31/2005 12:18 PM Page 175

Concepts in Probability, Statistics and Stochastic Modelling 175

sample estimators provide only estimates of actual or true values. Unless the sample size n is very large, the difference between the estimators and the true values of date discharge date discharge E021101b µσλ2 γ 3 3 XXX,,,CV X , and Xmay be large. In many ways, the m /s m /s field of statistics is about the precision of estimators of different quantities. One wants to know how well the 1930 410 1951 3070 mean of twenty annual rainfall depths describes the true 1931 1150 1952 2360 expected annual rainfall depth, or how large the differ- 1932 899 1953 1050 ence between the estimated 100-year flood and the true 1933 420 1954 1900 100-year flood is likely to be. 1934 3100 1955 1130 As an example of the calculation of moments, consider 1935 2530 1956 674 the flood data in Table 7.2. These data have the following 1936 758 1957 683 sample moments: 1937 1220 1958 1500 _ x 1549.2 1938 1330 1959 2600 1939 1410 1960 3480 s 813.5 X 1940 3100 1961 1430 CVX 0.525 1941 2470 1962 809 γ ˆX 0.712 1942 929 1963 1010 1943 586 1964 1510 As one can see, the data are positively skewed and have a 1944 450 1965 1650 relatively large coefficient of variance. 1946 1040 1966 1880 When discussing the accuracy of sample estimates, two 1947 1470 1967 1470 quantities are often considered, bias and variance. An 1948 1070 1968 1920 θˆ θ estimator of a known or unknown quantity is a function 1949 2050 1969 2530 of the observed values of the random variable X, say in n 1950 1430 1970 1490 different time periods, X1,…,Xn, that will be available to * Value for 1945 is missing. θ θˆ θˆ estimate the value of ; may be written [X1, X2, … , Xn] to emphasize that θˆ itself is a random variable. Its value Table 7.2. Annual maximum discharges on Magra River, Italy, depends on the sample values of the random variable that at Calamazza, 1930 – 70*. will be observed. An estimator θˆ of a quantity θ is biased if E[θˆ]θ and unbiased if E[θˆ]θ. The quantity {E[θˆ]θ}is σ generally called the bias of the estimator. E[SXX ] (7.42) An unbiased estimator has the property that its The second important statistic often used to assess the expected value equals the value of the quantity to be accuracy of an estimator θˆ is the variance of the estimator estimated. The sample mean is an unbiased estimate of Var θˆ, which equals E{(θˆE[θˆ])2}. For the mean of a set the population mean µ because X of independent observations, the variance of the sample  n  n mean is: 11 µ EE[]X  ∑∑Xi  E[]Xi X (7.40) n i11 n i σ 2 Var(X) X (7.43) 2 n The estimator SX of the variance of X is an unbiased σ2 σ estimator of the true variance X for independent obser- It is common to call x / n the standard error of xˆ rather vations (Benjamin and Cornell, 1970): than its standard deviation. The standard error of an  22 σ average is the most commonly reported measure of its ESXX (7.41) precision. However, the corresponding estimator of the standard The bias measures the difference between the average σ deviation, SX, is in general a biased estimator of X because value of an estimator and the quantity to be estimated. wrm_ch07.qxd 8/31/2005 12:18 PM Page 176

176 Water Resources Systems Planning and Management

γ 2 The variance measures the spread or width of the estima- expected value of ˆX is 0.45; its variance equals (0.37) , tor’s distribution. Both contribute to the amount by its standard deviation is squared. Using Equation 7.44, the γ which an estimator deviates from the quantity to be mean square error of ˆX is: estimated. These two errors are often combined into the γˆ 22 mean square error. Understanding that θ is fixed and the MSE( X) (.045 050 . ) (. 037 ) estimator θˆ is a random variable, the mean squared error 0.. 0025 0 1369 0.139 Х 0.14 (7.46) is the expected value of the squared distance (error) γ γˆ between θ and its estimator θˆ: An unbiased estimate of X is simply (0.50/0.45) X. Here the estimator provided by Equation 7.39e has been scaled MSE(θˆ) E[(θˆ θ)2] E{[θˆ E(θˆ)] [E(θˆ) θ]}2 to eliminate bias. This unbiased estimator has a mean [Bias]2 Var(θˆ) (7.44) squared error of:

where [Bias] is E(θˆ) θ. 2  050. γˆ   050.   MSE X  (.050 050 . )2   (.037 ) Equation 7.44 shows that the MSE, equal to the expected  048.   045.   θˆ average squared deviation of the estimator from the true 0.. 169Х 0 17 (7.47) value of the parameter θ, can be computed as the bias γˆ squared plus the variance of the estimator. MSE is a con- The mean square error of the unbiased estimator of X is venient measure of how closely θˆ approximates θ because larger than the mean square error of the biased estimate. γˆ it combines both bias and variance in a logical way. Unbiasing X results in a larger mean square error for Estimation of the coefficient of skewness γ provides a all the cases listed in Table 7.3 except for the normal X γ good example of the use of the MSE for evaluating the distribution for which X 0, and the gamma distribu- γ total deviation of an estimate from the true population tion with X 3.00. γ γ As shown here for the skew coefficient, biased estima- value. The sample estimate ˆX of X is often biased, has a large variance, and its absolute value was shown by Kirby tors often have smaller mean square errors than unbiased (1974) to be bounded by the square root of the sample estimators. Because the mean square error measures the size n: total average deviation of an estimator from the quantity being estimated, this result demonstrates that the strict or γˆ | X | n (7.45) unquestioning use of unbiased estimators is not advisable. Additional information on the distribution of The bounds do not depend on the true skew, γ . However, X quantiles and moments is contained in Stedinger et al. the bias and variance of γˆ do depend on the sample size X (1993). and the actual distribution of X. Table 7.3 contains the expected value and standard deviation of the estimated γˆ coefficient of skewness X when X has either a normal 2.4. L-Moments and Their Estimators γ distribution, for which X 0, or a gamma distribution γ with X 0.25, 0.50, 1.00, 2.00, or 3.00. These values are L-moments are another way to summarize the statistical adapted from Wallis et al. (1974 a,b) who employed properties of hydrological data based on linear combina- moment estimators slightly different than those in tions of the original observations (Hosking, 1990). Equation 7.39. Recently, hydrologists have found that regionalization γ γ Х For the , E[ ˆ] 0 and Var [ ˆX] 5/n. methods (to be discussed in Section 5) using L-moments In this case, the skewness estimator is unbiased but highly are superior to methods using traditional moments variable. In all the other cases in Table 7.3, the skewness (Hosking and Wallis, 1997; Stedinger and Lu, 1995). estimator is biased. L-moments have also proved useful for construction of To illustrate the magnitude of these errors, consider the goodness-of-fit tests (Hosking et al., 1985; Chowdhury γ mean square error of the skew estimator ˆX calculated from et al., 1991; Fill and Stedinger, 1995), measures of a sample of size 50 when X has a gamma distribution with regional homogeneity and distribution selection methods γ X 0.50, a reasonable value for annual streamflows. The (Vogel and Fennessey, 1993; Hosking and Wallis, 1997). wrm_ch07.qxd 8/31/2005 12:18 PM Page 177

Concepts in Probability, Statistics and Stochastic Modelling 177

Table 7.3. Sampling properties of

01c coefficient of skewness estimator.

γ E0211 expected value of X Source: Wallis et al. (1974b) who only divided by n in the estimators of the moments, whereas in Equations 7.39b and 7.39c, we use the generally-adopted distribution sample size coefficients of 1/(n – 1) and n/(n – 1)(n – 2) for the of X 10 20 50 80 variance and skew.

γ normal X = 0 0.00 0.00 0.00 0.00 gamma γX = 0.25 0.15 0.19 0.23 0.23 γX = 0.50 0.31 0.39 0.45 0.47 γX = 1.00 0.60 0.76 0.88 0.93 γ X = 2.00 1.15 1.43 1.68 1.77 γX = 3.00 1.59 1.97 2.32 2.54

upper bound on skew 3.16 4.47 7.07 8.94

^ standard deviation of γX

distribution sample size of X 10 20 50 80

normal γX = 0 0.69 0.51 0.34 0.26 γ gamma X = 0.25 0.69 0.52 0.35 0.28 γ X = 0.50 0.69 0.53 0.37 0.31 γX = 1.00 0.70 0.57 0.44 0.38 γX = 2.00 0.72 0.68 0.62 0.57 γX = 3.00 0.74 0.76 0.77 0.77

λ The first L-moment designated as 1 is simply the Sample L-moment estimates are often computed using arithmetic mean: intermediate statistics called probability weighted moments th λ (PWMs). The r probability weighted moment is defined as: 1 E[X] (7.48) β r th r E{X[F(X)] } (7.52) Now let X(i|n) be the i largest observation in a sample of size n (i n corresponds to the largest). Then, for any where F(X) is the cumulative distribution function of X. λ distribution, the second L-moment, 2, is a description Recommended (Landwehr et al., 1979; Hosking and β of scale based upon the expected difference between two Wallis, 1995) unbiased PWM estimators, br, of r are randomly selected observations: computed as: λ (1/2) E[X X ] (7.49) 2 (2|1) (1|2) bX0 Similarly, L-moment measures of skewness and kurtosis use 1 n b ∑ ()jX 1 three and four randomly selected observations, respectively. 1 ()j nn()1 j2 λ (1/3) E[X 2X X ] (7.50) 3 (3|3) (2|3) (1|3) 1 n b ∑ ()()jj 12X λ 2 ()j (7.53) 4 (1/4) E[X(4|4) 3X(3|4) 3X(2|4) X(1|4)] (7.51) nn()()12 n j3 wrm_ch07.qxd 8/31/2005 12:18 PM Page 178

178 Water Resources Systems Planning and Management

These are examples∼ of the general formula for computing particularly useful for comparing symmetric distributions β estimators br of r. that have a skewness coefficient of zero. Table 7.4 provides definitions of the traditional coefficient of variation, coeffi- 1 n  j 11 n  cient of skewness and coefficient of kurtosis, as well as b ∑ X r   ()j ΋  the L-moment, L-coefficient of variation, L-coefficient of n jr1 r r skewness and L-coefficient of kurtosis. 1 n  j 1  n  ∑ X The flood data in Table 7.2 can be used to provide an   ()j ΋  (7.54) r 1 jr1 r r 1 example of L-moments. Equation 7.53 yields estimates of the first three Probability Weighted Moments: for r 1, … , n 1. b 1,549.20 L-moments are easily calculated in terms of PWMs 0 using: b1 1003.89 λ β b2 759.02 (7.56) 1 0 _ Recall that b is just the sample average x. The sample λ 2β β 0 2 1 0 L-moments are easily calculated using the probability λ β β β 3 6 2 6 1 0 weighted moments. One obtains: λ β β β β λˆ 4 20 3 30 2 12 1 0 (7.55) 1 b0 1,549 λˆ Wang (1997) provides formulas for directly calculating 2 2b1 b0 458 λ λˆ L-moment estimators of r. Measures of the coefficient 3 6b2 6b1 b0 80 (7.55) of variation, skewness and kurtosis of a distribution can be Thus, the sample estimates of the L-coefficient of computed with L-moments, as they can with traditional variation, t , and L-coefficient of skewness, t , are: product moments. Where skew primarily measures the 2 3 asymmetry of a distribution, the kurtosis is an additional t2 0.295 measure of the thickness of the extreme tails. Kurtosis is t3 0.174 (7.58)

Table 7.4. Definitions of dimensionless product-moment and L-moment ratios.

01d

namecommon symbol definition E0211

product-moment ratios

coefficient of variation CVX σX / µX 3 3 skewness γX E [ (X -µX) ] / σ X kurtosis κ X E [ (X -µX)4 ] / σ X 4

L-moment ratios

L-coefficient of

variation * L-CV, τ2 λ2 / λ1

skewness L-skewness, τ3 λ3 / λ2

kurtosis L-kurtosis, τ4 λ4 / λ2 ττ * Hosking and Wallis (1997) use instead of2 to represent the L-CV ratio wrm_ch07.qxd 8/31/2005 12:18 PM Page 179

Concepts in Probability, Statistics and Stochastic Modelling 179

3. Distributions of Random Events assume and their likelihood. For example, by using the empirical distribution, one implicitly assumes that no values larger or smaller than the sample maximum A frequent task in water resources planning is the or minimum can occur. For many situations, this is development of a model of some probabilistic or stochastic unreasonable. phenomena such as streamflows, flood flows, rainfall, tem- • Often one needs to estimate the likelihood of extreme peratures, evaporation, sediment or nutrient loads, nitrate events that lie outside the range of the sample (either or organic compound concentrations, or water demands. in terms of x values or in terms of frequency). Such This often requires one to fit a probability distribution extrapolation makes little sense with the empirical function to a set of observed values of the random variable. distribution. Sometimes, one’s immediate objective is to estimate a • In many cases, one is not interested in the values of a particular quantile of the distribution, such as the 100-year random variable X, but instead in derived values of flood, 50-year six-hour-rainfall depth, or the minimum variables Y that are functions of X. This could be a seven-day-average expected once-in-ten-year flow. Then the performance function for some system. If Y is the fitted distribution can supply an estimate of that quantity. In performance function, interest might be primarily in a stochastic simulation, fitted distributions are used to its mean value E[Y], or the probability some standard generate possible values of the random variable in question. is exceeded, Pr{Y standard}. For some theoretical Rather than fitting a reasonable and smooth mathe- X-distributions, the resulting Y-distribution may matical distribution, one could use the empirical distribu- be available in closed form, thus making the analysis tion represented by the data to describe the possible rather simple. (The normal distribution works with values that a random variable may assume in the future linear models, the lognormal distribution with product and their frequency. In practice, the true mathematical models, and the gamma distribution with queuing form for the distribution that describes the events is not systems.) known. Moreover, even if it was, its functional form may have too many parameters to be of much practical use. This section provides a brief introduction to some Thus, using the empirical distribution represented by the useful techniques for estimating the parameters of prob- data itself has substantial appeal. ability distribution functions and for determining if a Generally, the free parameters of the theoretical distri- fitted distribution provides a reasonable or acceptable bution are selected (estimated) so as to make the fitted model of the data. Sub-sections are also included on distribution consistent with the available data. The goal is families of distributions based on the normal, gamma to select a physically reasonable and simple distribution and generalized-extreme-value distributions. These three to describe the frequency of the events of interest, to families have found frequent use in water resources estimate that distribution’s parameters, and ultimately to planning (Kottegoda and Rosso, 1997). obtain quantiles, performance indices and risk estimates of satisfactory accuracy for the problem at hand. Use of a theoretical distribution has several advantages over use of 3.1. Parameter Estimation the empirical distribution: Given a set of observations to which a distribution is to • It presents a smooth interpretation of the empirical be fit, one first selects a distribution function to serve as distribution. As a result quantiles, performance indices a model of the distribution of the data. The choice of a and other statistics computed using the fitted distribu- distribution may be based on experience with data of tion should be more accurate than those computed that type, some understanding of the mechanisms giving with the empirical distribution. rise to the data, and/or examination of the observations • It provides a compact and easy-to-use representation themselves. One can then estimate the parameters of the of the data. chosen distribution and determine if the fitted distribu- • It is likely to provide a more realistic description of tion provides an acceptable model of the data. A model is the range of values that the random variable may generally judged to be unacceptable if it is unlikely that wrm_ch07.qxd 8/31/2005 12:18 PM Page 180

180 Water Resources Systems Planning and Management

n one could have observed the available data were they µσ L ln∏ fx [(i | , )] actually drawn from the fitted distribution. i1 In many cases, good estimates of a distribution’s n µσ parameters are obtained by the maximum-likelihood- ∑ lnfx (i | , ) estimation procedure. Give a set of n independent i 1 n observations {x , … , x } of a continuous random variable 1 n ∑ ln()xni 2π ln(σ ) X, the joint probability density function for the observa- i1 tions is: 1 n ∑ []ln()x µ 2 σ 2 i (7.61) … θ 2 i1 fxxXXX123,,,,… Xn (,1 , n |) = θθ⋅ θ fxXX(|12)) fx (| fx Xn (| ) (7.59) The maximum can be obtained by equating to zero the partial derivatives ∂L/∂µ and ∂L/∂σ whereby one obtains: where θ is the vector of the distribution’s parameters. θ The maximum likelihood estimator of is that ∂L 1 n θ ∑ µ vector which maximizes Equation 7.59 and thereby 0 2 [ln(xi ) ] ∂µσ makes it as likely as possible to have observed the values i 1 {x , … , x }. ∂Ln1 n 1 n 0∑[ln(x ) µ ]2 (7.62) ∂σσσ3 i Considerable work has gone into studying the i1 properties of maximum likelihood parameter estimates. Under rather general conditions, asymptotically the These equations yield the estimators estimated parameters are normally distributed, unbiased n µˆ 1 and have the smallest possible variance of any asymptoti- ∑ ln(xi ) n cally unbiased estimator (Bickel and Doksum, 1977). i 1 n These, of course, are asymptotic properties, valid for large σˆ 2 1 µˆ 2 ∑[ln(xi ) ] (7.63) sample sizes n. Better estimation procedures, perhaps n i1 yielding biased parameter estimates, may exist for small The second-order conditions for a maximum are met and sample sizes. Stedinger (1980) provides such an these values maximize Equation 7.59. It is useful to note example. Still, maximum likelihood procedures are that if one defines a new random variable Y ln(X), then recommended with moderate and large samples, even the maximum likelihood estimators of the parameters though the iterative solution of nonlinear equations is µ and σ2, which are the mean and variance of the Y often required. distribution, are the sample estimators of the mean and An example of the maximum likelihood procedure variance of Y: for which closed-form expressions for the parameter _ estimates are obtained is provided by the lognormal µˆ y σ 2 2 distribution. The probability density function of a lognor- ˆ [(n 1)/n]SY (7.64) mally distributed random variable X is: The correction [(n – 1)/n] in this last equation is often   neglected. 1 1 µ 2 fxX () exp [ln()x ] (7.60) x 2πσ 2  2σ 2  The second commonly used parameter estimation procedure is the method of moments. The method of Here, the parameters µ and σ2 are the mean and variance moments is often a quick and simple method for obtaining of the logarithm of X, and not of X itself. parameter estimates for many distributions. For a distribu- Maximizing the logarithm of the joint density for tion with m 1, 2 or 3 parameters, the first m moments of

{x1,…,xn} is more convenient than maximizing the the postulated distribution in Equations 7.34, 7.35 and joint probability density itself. Hence, the problem can 7.37 are equated to the estimates of those moments be expressed as the maximization of the log-likelihood calculated using Equations 7.39. The resulting nonlinear function equations are solved for the unknown parameters. wrm_ch07.qxd 8/31/2005 12:18 PM Page 181

Concepts in Probability, Statistics and Stochastic Modelling 181

For the lognormal distribution, the mean and variance inference employs the likelihood function to represent the of X as a function of the parameters µ and σ are given by information in the data. That information is augmented with a prior distribution that describes what is known   µµσ1 2 about constraints on the parameters and their likely X exp  2 values beyond the information provided by the recorded σµσσ222 X exp(21 )[exp ( ) ] (7.65) data available at a site. The likelihood function and the prior probability density function are combined to obtain _ µ 2 σ2 µ Substituting x for X and sX for X and solving for and the probability density function that describes the poste- σ 2 one obtains rior distribution of the parameters:

fθ(θ | x , x , … , x ) ∝ σˆ 2 22 1 2 n ln()1 sxX / θ ξ θ fX(x1, x2, … , xn | ) ( ) (7.70)   x 1 ∝ ξ θ µˆ ln  ln x σˆ 2 (7.66) The symbol means ‘proportional to’ and ( ) is the 22  1 sxX /  2 probability density function for the prior distribution for θ (Kottegoda and Rosso, 1997). Thus, except for The data in Table 7.2 provide an illustration of both a constant of proportionality, the probability density fitting methods. One can easily compute the sample mean function describing the posterior distribution of the and variance of the logarithms of the flows to obtain parameter vector θ is equal to the product of the likeli- θ µˆ 7.202 hood function fX(x1, x2, … , x n | ) and the probability ξ θ θ σˆ 2 0.3164 (0.5625)2 (7.67) density function for the prior distribution ( ) for . Advantages of the Bayesian approach are that it allows Alternatively, the sample mean and variance of the flows the explicit modelling of uncertainty in parameters themselves are (Stedinger, 1997; Kuczera, 1999) and provides a theoret- _ x 1549.2 ically consistent framework for integrating systematic flow records with regional and other hydrological infor- s 2 661,800 (813.5)2 (7.68) X mation (Vicens et al., 1975; Stedinger, 1983; Kuczera, Substituting those two values in Equation 7.66 yields 1983). Martins and Stedinger (2000) illustrate how a µˆ 7.224 prior distribution can be used to enforce realistic constraints upon a parameter as well as providing a σ 2 0.2435 (0.4935)2 (7.69) X description of its likely values. In their case, use of a Method of moments and maximum likelihood are just prior of the shape parameter κ of a generalized extreme two of many possible estimation methods. Just as method value (GEV) distribution (discussed in Section 3.6) of moments equates sample estimators of moments to allowed definition of generalized maximum likelihood population values and solves for a distribution’s parame- estimators that, over the κ-range of interest, performed ters, one can simply equate L-moment estimators to substantially better than maximum likelihood, moment, population values and solve for the parameters of a and L-moment estimators. distribution. The resulting method of L-moments has While Bayesian methods have been available for received considerable attention in the hydrological decades, the computational challenge posed by the literature (Landwehr et al., 1978; Hosking et al., 1985; solution of Equation 7.70 has been an obstacle to their Hosking and Wallis, 1987; Hosking, 1990; Wang, 1997). use. Solutions to Equation 7.70 have been available for It has been shown to have significant advantages when special cases such as normal data, and binomial and used as a basis for regionalization procedures that will be Poisson samples (Raiffa and Schlaifer, 1961; Benjamin discussed in Section 5 (Lettenmaier et al., 1987; Stedinger and Cornell, 1970; Zellner, 1971). However, a new and and Lu, 1995; Hosking and Wallis, 1997). very general set of Monte Carlo (MCMC) Bayesian procedures provide another approach that procedures (discussed in Section 7.2) allows numerical is related to maximum likelihood estimation. Bayesian computation of the posterior distributions of parameters wrm_ch07.qxd 8/31/2005 12:18 PM Page 182

182 Water Resources Systems Planning and Management

for a very broad class of models (Gilks et al., 1996). As a the fitted model (using graphs or tables) to rigorous result, Bayesian methods are now becoming much more statistical tests. Some of the early and simplest methods popular and are the standard approach for many difficult of parameter estimation were graphical techniques. problems that are not easily addressed by traditional Although quantitative techniques are generally more methods (Gelman et al., 1995; Carlin and Louis, 2000). accurate and precise for parameter estimation, graphical The use of Monte Carlo Bayesian methods in flood presentations are invaluable for comparing the fitted frequency analysis, rainfall–runoff modelling, and evalua- distribution with the observations for the detection of tion of environmental pathogen concentrations are systematic or unexplained deviations between the two. illustrated by Wang (2001), Bates and Campbell (2001) The observed data will plot as a straight line on probabil- and Crainiceanu et al. (2002), respectively. ity graph paper if the postulated distribution is the true Finally, a simple method of fitting flood frequency distribution of the observation. If probability graph paper curves is to plot the ordered flood values on special does not exist for the particular distribution of interest, probability paper and then to draw a line through the more general techniques can be used.

data (Gumbel, 1958). Even today, that simple method Let x(i) be the ith largest value in a set of observed … is still attractive when some of the smallest values are values {xi} so that x(1) x(2) x(n). The random zero or unusually small, or have been censored as will variable X(i) provides a reasonable estimate of the pth be discussed in Section 4 (Kroll and Stedinger, 1996). quantile xp of the true distribution of X for p i/(n 1). Plotting the ranked annual maximum series against a In fact, when one considers the cumulative probability Ui probability scale is always an excellent and recommended associated with the random variable X(i), Ui FX(X(i)), and way to see what the data look like and for determining if the observations X(i) are independent, then the Ui have whether or not a fitted curve is consistent with the data a beta distribution (Gumbel, 1958) with probability (Stedinger et al., 1993). density function: Statisticians and hydrologists have investigated which n! fu() uuini1()101 u of these methods most accurately estimates the parameters Ui ()!()!in11 themselves or the quantiles of the distribution. One also (7.71) needs to determine how accuracy should be measured. This beta distribution has mean Some studies have used average squared deviations, some i have used average absolute weighted deviations with E[U ] i (7.72a) different weights on under and over-estimation, and some n 1 have used the squared deviations of the log-quantile and variance estimator (Slack et al., 1975; Kroll and Stedinger, 1996). in() i 1 In almost all cases, one is also interested in the bias of an Var(Ui ) (7.72b) ()()nn122 estimator, which is the average value of the estimator minus the true value of the parameter or quantile being A good graphical check of the adequacy of a fitted distribu-

estimated. Special estimators have been developed to tion G(x) is obtained by plotting the observations x(i) versus compute design events that on average are exceeded with G–1[i/(n 1)] (Wilk and Gnanadesikan, 1968). Even if G(x)

the specified probability and have the anticipated risk of equalled to an exact degree the true X-distribution FX[x], the being exceeded (Beard, 1960, 1997; Rasmussen and plotted points would not fall exactly on a 45° line through

Rosbjerg, 1989, 1991a,b; Stedinger, 1997; Rosbjerg and the origin of the graph. This would only occur if FX[x(i)] Madsen, 1998). exactly equalled i/(n 1), and therefore each x(i) exactly –1 equalled FX [i/(n 1)]. 3.2. Model Adequacy An appreciation for how far an individual observation –1 x(i) can be expected to deviate from G [i/(n 1)] can be –1 (0.75) –1 (0.25) After estimating the parameters of a distribution, some obtained by plotting G [ui ] and G [ui ], where (0.75) (0.25) check of model adequacy should be made. Such checks ui and ui are the upper and lower quartiles of the vary from simple comparisons of the observations with distribution of Ui obtained from integrating the probability wrm_ch07.qxd 8/31/2005 12:18 PM Page 183

Concepts in Probability, Statistics and Stochastic Modelling 183

density function in Equation 7.71. The required incomplete

beta function is also available in many software packages, 4000

/sec)

including Microsoft Excel. Stedinger et al. (1993) show 3 that u(1) and (1 u(n)) fall between 0.052/n and 3(n 1) with a probability of 90%, thus illustrating the great uncer- 3000 upper 90% confidence tainty associated with the cumulative probability of the interval for all points smallest value and the exceedance probability of the largest value in a sample. Figures 7.2a and 7.2b illustrate the use of this 2000 quantile–quantile plotting technique by displaying the

and Kolmogorov-Smirnov bounds (m results of fitting a normal and a lognormal distribution

(i) to the annual maximum flows in Table 7.2 for the Magra X 1000 lower 90% confidence interval River, Italy, at Calamazza for the years 1930 – 70. The for all points observations of X(i), given in Table 7.2, are plotted on –1 the vertical axis against the quantiles G [i/(n 1)] on the observed values horizontal axis. 0 0 1000 2000 3000 4000 A probability plot is essentially a scatter plot of the quantiles of fitted normal distribution G-1[ i /(n+1)] m3/sec) sorted observations X versus some approximation of

(i) E020527d –1 their expected or anticipated value, represented by G (pi), where, as suggested, pi i/(n 1). The pi values are called plotting positions. A common alternative to i/(n 1) is (i 0.5)/n, which results from a probabilistic interpreta- 4000

/sec) tion of the empirical distribution of the data. Many reason- 3 able plotting position formulas have been proposed based –1 upon the sense in which G (pi) should approximate 3000 upper 90% confidence X(i). The Weibull formula i/(n 1) and the Hazen formula interval for all points (i 0.5)/n bracket most of the reasonable choices. Popular formulas are summarized by Stedinger et al. (1993), who also discuss the generation of probability plots for many 2000 distributions commonly employed in hydrology.

and Kolmogorov-Smirnov bounds (m and Kolmogorov-Smirnov

(i)

Rigorous statistical tests are available for trying to X

determine whether or not it is reasonable to assume that a 1000 lower 90% confidence interval given set of observations could have been drawn from for all points a particular family of distributions. Although not the most powerful of such tests, the Kolmogorov–Smirnov test observed values 0 provides bounds within which every observation should lie 0 1000 2000 3000 4000 if the sample is actually drawn from the assumed quantiles of fitted lognormal distribution G -1[i/(n+1)] (m3/sec)

E020527c distribution. In particular, for G FX, the test specifies that

Figure 7.2. Plots of annual maximum discharges of       11i i 1 ∀ α Magra River, Italy, versus quantiles of fitted (a) normal and Pr G  CXGαα ()i  Ci  1   n   n   (b) lognormal distributions. (7.73)

where Cα is the critical value of the test at significance level α. Formulas for Cα as a function of n are contained in Table 7.5 for three cases: (1) when G is completely wrm_ch07.qxd 8/31/2005 12:18 PM Page 184

184 Water Resources Systems Planning and Management

specified independent of the sample’s values; (2) when The Kolmogorov–Smirnov test conveniently provides G is the normal distribution and the mean and variance bounds within which every observation on a probability _ 2 are estimated from the sample with x and sX; and (3) plot should lie if the sample is actually drawn from the when G is the and the scale assumed distribution, and thus is useful for visually _ parameter is estimated as 1/(x). Chowdhury et al. (1991) evaluating the adequacy of a fitted distribution. However, provide critical values for the Gumbel and generalized it is not the most powerful test available for estimating extreme value (GEV) distributions (Section 3.6) with which distribution a set of observations is likely to have known shape parameter κ. For other distributions, the been drawn from. For that purpose, several other more values obtained from Table 7.5 may be used to construct analytical tests are available (Filliben, 1975; Hosking, approximate simultaneous confidence intervals for 1990; Chowdhury et al., 1991; Kottegoda and Rosso,

every X(i). 1997). Figures 7.2a and b contain 90% confidence intervals The Probability Plot Correlation test is a popular and for the plotted points constructed in this manner. For powerful test of whether a sample has been drawn from a the normal distribution, the critical value of Cα equals postulated distribution, though it is often weaker than alter- 0./(. 819nn 0 01 0 ./) 85 , where 0.819 corresponds native tests at rejecting thin-tailed alternatives (Filliben, to α 0.10. For n 40, one computes Cα 0.127. As 1975; Fill and Stedinger, 1995). A test with greater power can be seen in Figure 7.2a, the annual maximum flows has a greater probability of correctly determining that a are not consistent with the hypothesis that they were sample is not from the postulated distribution. The drawn from a normal distribution; three of the observa- Probability Plot Correlation Coefficient test employs the

tions lie outside the simultaneous 90% confidence inter- correlation r between the ordered observations x(i) and the –1 vals for all the points. This demonstrates a statistically corresponding fitted quantiles wi G (pi), determined by significant lack of fit. The fitted normal distribution plotting positions pi for each x(i). Values of r near 1.0 suggest underestimates the quantiles corresponding to small and that the observations could have been drawn from the fitted large probabilities while overestimating the quantiles in distribution: r measures the linearity of the probability plot _ an intermediate range. In Figure 7.2b, deviations between providing a quantitative assessment of fit. If x denotes the _ the fitted lognormal distribution and the observations average value of the observations and w denotes the average

can be attributed to the differences between FX(x(i)) and value of the fitted quantiles, then i/(n 1). Generally, the points are all near the 45° line ∑ ()()xxww()ii through the origin, and no major systematic deviations r (7.74)  2205 . are apparent. ()∑()()xx()ii∑ ww

Table 7.5. Critical values of

Kolmogorov–Smirnov statistic as a function 1101e

of sample size n (after Stephens, 1974). significance level α E02 0.150 0.100 0.050 0.025 0.010

Fx completely specified: Cα ( n + 0.12 + 0.11 / n) 1.138 1.224 1.358 1.480 1.628

Fx normal with mean and variance estimated as x and sx2 Cα ( n + 0.01 + 0.85 / n) 0.775 0.819 0.895 0.995 1.035

Fx exponential with scale parameter b estimated as 1 / (x) (Cα + 0.2 /n ) ( n + 0.26 + 0.5 / n ) 0.926 0.990 1.094 1.190 1.308

values of Cα are calculated as follows: for case 2 with α = 0.10,Cα = 0.819 / ( n - 0.01 + 0.85 / n ) wrm_ch07.qxd 8/31/2005 12:18 PM Page 185

Concepts in Probability, Statistics and Stochastic Modelling 185

Table 7.6 provides critical values for r for the normal distribution, or the logarithms of lognormal variates, based upon the Blom plotting position that has significance level E021101f pi (i 3/8)/(n 1/4). Values for the Gumbel distribu- n 0.10 0.05 0.01 tion are reproduced in Table 7.7 for use with the 10 0.9347 0.9180 0.8804 Gringorten plotting position pi (i 0.44)/(n 0.12). The table also applies to logarithms of Weibull variates 15 0.9506 0.9383 0.9110 (Stedinger et al., 1993). Other tables are available 20 0.9600 0.9503 0.9290

for the GEV (Chowdhury et al., 1991), the Pearson 30 0.9707 0.9639 0.9490 type 3 (Vogel and McMartin, 1991), and exponential 40 0.9767 0.9715 0.9597 and other distributions (D’Agostino and Stephens, 1986). 50 0.9807 0.9764 0.9664 Recently developed L-moment ratios appear to provide 60 0.9835 0.9799 0.9710 goodness-of-fit tests that are superior to both the 75 0.9865 0.9835 0.9757 Kolmogorov–Smirnov and the Probability Plot Correlation 100 0.9893 0.9870 0.9812 test (Hosking, 1990; Chowdhury et al., 1991; Fill and 300 0.99602 0.99525 0.99354

Stedinger, 1995). For normal data, the L-skewness estima- 1,000 0.99854 0.99824 0.99755 τ τ tor ˆ3 (or t3) would have mean zero and Var ˆ3 (0.1866 0.8/n)/n, allowing construction of a powerful test of normality against skewed alternatives using the normally Table 7.6. Lower critical values of the probability plot distributed statistic correlation test statistic for the normal distribution using pi (i 3/8)/(n 1/4) (Vogel, 1987). Zt 3 /(.0 1866 0 ./)/ 8 nn (7.75)

with a reject region |Z| zα/2. Chowdhury et al. (1991) derive the sampling variance τ τ significance level E021101g of the L-CV and L-skewness estimators ˆ2 and ˆ3 as a function of κ for the GEV distribution. These allow n 0.10 0.05 0.01 construction of a test of whether a particular data set is consistent with a GEV distribution with a regionally 10 0.9260 0.9084 0.8630 estimated value of κ, or a regional κ and a regional 20 0.9517 0.9390 0.9060 coefficient of variation, CV. Fill and Stedinger (1995) 30 0.9622 0.9526 0.9191 τ show that the ˆ3 L-skewness estimator provides a test for 40 0.9689 0.9594 0.9286 the Gumbel versus a general GEV distribution using the 50 0.9729 0.9646 0.9389

normally distributed statistic 60 0.9760 0.9685 0.9467

τ 70 0.9787 0.9720 0.9506 Z ( ˆ3 0.17)/ (.0 2326 0 . 70 /nn )/ (7.76) 80 0.9804 0.9747 0.9525 with a reject region |Z| zα/2. 100 0.9831 0.9779 0.9596 The literature is full of goodness-of-fit tests. 300 0.9925 0.9902 0.9819

Experience indicates that among the better tests there 1,000 0.99708 0.99622 0.99334 is often not a great deal of difference (D’Agostino and Stephens, 1986). Generation of a probability plot is most often a good idea because it allows the modeller Table 7.7. Lower critical values of the probability plot to see what the data look like and where problems correlation test statistic for the Gumbel distribution using occur. The Kolmogorov–Smirnov test helps the eye pi (i 0.44)/(n 0.12) (Vogel, 1987). wrm_ch07.qxd 8/31/2005 12:18 PM Page 186

186 Water Resources Systems Planning and Management

interpret a probability plot by adding bounds to a graph, for x 0 and µ ln(η). Here η is the median of the illustrating the magnitude of deviations from a straight X-distribution. line that are consistent with expected variability. One A lognormal random variable takes on values in the can also use quantiles of a beta distribution to illustrate range [0, ∞]. The parameter µ determines the scale of the possible error in individual plotting positions, the X-distribution whereas σ 2 determines the shape of particularly at the extremes where that uncertainty is the distribution. The mean and variance of the lognormal largest. The probability plot correlation test is a popular distribution are given in Equation 7.65. Figure 7.3 and powerful goodness-of-fit statistic. Goodness-of-fit illustrates the various shapes that the lognormal probability tests based upon sample estimators of the L-skewness density function can assume. It is highly skewed with τ σ ˆ3 for the normal and Gumbel distribution provide a thick right hand tail for 1, and approaches a simple and useful tests that are not based on a proba- symmetric normal distribution as σ → 0. The density func- bility plot. tion always has a value of zero at x 0. The coefficient of variation and skew are: σ 2 1/2 3.3. Normal and Lognormal Distributions CVX [exp( ) – 1] (7.79) γ 3CV CV3 (7.80) The normal distribution and its logarithmic transforma- X X X tion, the lognormal distribution, are arguably the most The maximum likelihood estimates of µ and σ 2 are given widely used distributions in science and . in Equation 7.63 and the moment estimates in Equation The probability density function of a normal random 7.66. For reasonable-sized samples, the maximum likeli- variable is hood estimates generally perform as well or better than the moment estimates (Stedinger, 1980).   1 1 µ 2 fxX () exp ( x ) The data in Table 7.2 were used to calculate the 2  σ 2  2πσ 2 parameters of the lognormal distribution that would (7.77) for ∞∞X describe these flood flows. The results are reported in Equation 7.67. The two-parameter maximum likelihood where µ and σ 2 are equivalent to µ and σ 2, the mean X X and method of moments estimators identify parameter and variance of X. Interestingly, the maximum likelihood estimates for which the distribution skewness coefficients estimators of µ and σ 2 are almost identical to the moment _ 2 estimates x and sX. The normal distribution is symmetric about its µ ∞ ∞ mean X and admits values from to . Thus, it is ) x 2.5 not always satisfactory for modelling physical phenomena f(

such as streamflows or pollutant concentrations, which 2.0 are necessarily non-negative and have skewed distribu-

tions. A frequently used model for skewed distributions is 1.5 the lognormal distribution. A random variable X has a lognormal distribution if the natural logarithm of X, ln(X), 1.0 σ = 0.2 has a normal distribution. If X is lognormally distributed, σ = 0.5 then by definition ln(X) is normally distributed, so that 0.5 σ = 1.5 the density function of X is 0.0   0.0 0.5 1.01 .5 2.0 2.5 3.0 3.5 1 1 µ 2 dln()x x fxX () exp [1 nx () ] πσ 2  σ 2  2 2 dx E020527e 1  1   η 2  Figure 7.3. Lognormal probability density functions with exp 2 [[(/)]ln x x 2πσ 2  2σ  (7.78) various standard deviations σ. wrm_ch07.qxd 8/31/2005 12:18 PM Page 187

Concepts in Probability, Statistics and Stochastic Modelling 187

are 2.06 and 1.72, which is substantially greater than the mean and variance of ln(X – τˆ) which yields the sample skew of 0.712. hybrid maximum likelihood estimates µˆ 7.605, A useful generalization of the two-parameter lognormal σˆ 2 0.1407 (0.3751)2 and again τˆ 600.1. distribution is the shifted lognormal or three-parameter log- The two sets of estimates are surprisingly close in this normal distribution obtained when ln(X τ) is described instance. In this second case, the fitted distribution has a by a normal distribution, and X τ. Theoretically, τ should coefficient of skewness of 1.22. be positive if, for physical reasons, X must be positive; Natural logarithms have been used here. One could practically, negative values of τ can be allowed when the have just as well use base 10 common logarithms to resulting probability of negative values of X is sufficiently estimate the parameters; however, in that case the relation- small. ships between the log-space parameters and the real-space Unfortunately, maximum likelihood estimates of the moments change slightly (Stedinger et al., 1993, Equation. parameters µ, σ2, and τ are poorly behaved because of irreg- 18.2.8). ularities in the likelihood function (Giesbrecht and Kempthorne, 1976). The method of moments does fairly well when the skew of the fitted distribution is reasonably 3.4. Gamma Distributions small. A method that does almost as well as the moment The gamma distribution has long been used to model method for low-skew distributions, and much better for many natural phenomena, including daily, monthly and τ highly skewed distributions, estimates by: annual streamflows as well as flood flows (Bobée and 2 Ashkar, 1991). For a gamma random variable X, τˆ xx()1 ()n xˆ050. xx2 xˆ (7.81) β ()1050 ()n . fx()()ββ xαβ1ex x 0 X Γ()α provided that x x – 2xˆ 0, where x and x (1) (n) 0.50 (1) (n) α are the smallest and largest observations and xˆ is the µ 0.50 X β sample median (Stedinger, 1980; Hoshi et al., 1984). If α x x 2xˆ 0, then the sample tends to be neg- σ 2 (1) (n) 0.50 X atively skewed and a three-parameter lognormal distribu- β 2 tion with a lower bound cannot be fit with this method. 2 γ 20CV for β (7.83) Good estimates of µ and σ 2 to go with τˆ in Equation 7.81 XXα are (Stedinger, 1980): The , Γ(α), for integer α is (α 1)!.  τˆ  The parameter α 0 determines the shape of the µˆ  x  ln distribution; β is the scale parameter. Figure 7.4 illustrates  1sx22/(τˆ )  X the different shapes that the probability density function for  s2  a gamma variable can assume. As α →∞, the gamma σˆ 2 ln 1 X   ()x τˆ 2  (7.82) distribution approaches the symmetric normal distribu- tion, whereas for 0 α 1, the distribution has a highly For the data in Table 7.2, Equations 7.81 and 7.82 yield asymmetric J-shaped probability density function whose the hybrid moment-of-moments estimates of µˆ 7.606, value goes to infinity as x approaches zero. σˆ 2 0.1339 (0.3659)2 and τˆ 600.1 for the three- The gamma distribution arises naturally in many parameter lognormal distribution. problems in statistics and hydrology. It also has a very This distribution has a coefficient of skewness of 1.19, reasonable shape for such non-negative random variables which is more consistent with the sample skewness as rainfall and streamflow. Unfortunately, its cumulative estimator than were the values obtained when a two- distribution function is not available in closed form, parameter lognornal distribution was fit to the data. except for integer α, though it is available in many Alternatively, one can estimate µ and σ 2 by the sample software packages including Microsoft Excel. The gamma wrm_ch07.qxd 8/31/2005 12:18 PM Page 188

188 Water Resources Systems Planning and Management

For the two-parameter gamma distribution,

)

x 2.5 f( 2 αˆ ()x 2 2.0 sX βˆ x 1.5 2 (7.85) sX 1.0 α= 0.50 Again, the flood record in Table 7.2 can be used to α= 1.00 α= 2.00 illustrate the different estimation procedures. Using 0.5 α= 8.00 the first three sample moments, one would obtain for the three-parameter gamma distribution the parameter 0.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 estimates x

E020527f τˆ 735.6 αˆ Figure 7.4. The gamma distribution function for various 7.888 values of the shape parameter α. βˆ 0.003452 1/427.2 Using only the sample mean and variance yields the family includes a very special case: the exponential method of moment estimators of the parameters of the τ distribution is obtained when α 1. two-parameter gamma distribution ( 0), The gamma distribution has several generalizations αˆ 3.627 (Bobée and Ashkar, 1991). If a constant τ is subtracted βˆ from X so that (X τ) has a gamma distribution, the 0.002341 1/427.2 distribution of X is a three-parameter gamma. This is also The fitted two-parameter gamma distribution has a called a Pearson type 3 distribution, because the resulting coefficient of skewness γ of 1.05, whereas the fitted three- distribution belongs to the third type of distributions parameter gamma reproduces the sample skew of 0.712. suggested by the statistician Karl Pearson. Another As occurred with the three-parameter lognormal distribu- variation is the log Pearson type 3 distribution obtained tion, the estimated lower bound for the three-parameter by fitting the logarithms of X with a Pearson type 3 gamma distribution is negative (τˆ –735.6), resulting distribution. The log Pearson distribution is discussed in a three-parameter model that has a smaller skew further in the next section. coefficient than was obtained with the corresponding The method of moments may be used to estimate the two-parameter model. The reciprocal of βˆ is also parameters of the gamma distribution. For the three- reported. While βˆ has inverse x-units, 1/βˆ is a natural parameter gamma distribution, scale parameter that has the same units as x and thus can be easier to interpret.  s  τˆ x 2 X  Studies by Thom (1958) and Matalas and Wallis  γˆ  X (1973) have shown that maximum likelihood parameter σˆ 4 estimates are superior to the moment estimates. For the γ 2 ()ˆX two-parameter gamma distribution, Greenwood and Durand (1960) give approximate formulas for the ˆ 2 β maximum likelihood estimates (also Haan, 1977). s γˆ (7.84) XX However, the maximum likelihood estimators are often _ 2 γ where x, sX, and X are estimates of the mean, variance, not used in practice because they are very sensitive to the and coefficient of skewness of the distribution of X smallest observations that sometimes suffer from (Bobée and Robitaille, 1977). measurement error and other distortions. wrm_ch07.qxd 8/31/2005 12:18 PM Page 189

Concepts in Probability, Statistics and Stochastic Modelling 189

When plotting the observed and fitted quantiles of a gamma distribution, an approximation to the inverse of 3.5 f (x) the distribution function is often useful. For |γ | 3, the 3.0 Wilson–Hilferty transformation 2.5 γ = +2 2.0 γ = +1  2 3  2  γγx  2 γ = 0 x µσ 1 N   1.5 γ G γ   γ  = - 1  636  (7.86) 1.0 γ = - 2 0.5 gives the quantiles xG of the gamma distribution in terms 0.0 of x , the quantiles of the standard-normal distribution. 0.0 0.5 1.0 1.5 2.0 2.5 3.03.5 4.0 N x Here µ, σ, and γ are the mean, standard deviation,

E020527g and coefficient of skewness of xG. Kirby (1972) and Chowdhury and Stedinger (1991) discuss this and other Figure 7.5. Log-Pearson type 3 probability density functions more complicated but more accurate approximations. for different values of coefficient of skewness γ. Fortunately the availability of excellent approximations of the gamma cumulative distribution function and its β inverse in Microsoft Excel and other packages has bound ( 0): as a result the LP3 distribution has a γ reduced the need for such simple approximations. similar shape. The space with 1 may be more realistic for describing variables whose probability density function becomes thinner as x takes on large values. For γ 3.5. Log-Pearson Type 3 Distribution 0, the two-parameter lognormal distribution is obtained The log-Pearson type 3 distribution (LP3) describes a as a special case. random variable whose logarithms have a Pearson type 3 The LP3 distribution has mean and variance α distribution. This distribution has found wide use in  β  µ ξ modelling flood frequencies and has been recommended X e    β 1 for that purpose (IACWD, 1982). Bobée (1975) and   αα  2  Bobée and Ashkar (1991) discuss the unusual shapes that ξ  β β  σ 2 e2   X  β   β  this hybrid distribution may take allowing negative  21    values of β. The LP3 distribution has a probability density foorββ 20,. or (7.88) function given by β For 0 β 2, the variance is infinite. ||βξα1 βξ fxX() [ (ln(xx ) )] exp{ (ln( ) )} These expressions are seldom used, but they do x()α reveal the character of the distribution. Figures 7.6 and (7.87) 7.7 provide plots of the real-space coefficient of skew- with α 0, and β either positive or negative. For β 0, ness and coefficient of variation of a log-Pearson type 3 ξ β σ values are restricted to the range 0 x exp( ). For variate X as a function of the standard deviation Y ξ γ 0, values have a lower bound so that exp( ) X. Figure and coefficient of skew Y of the log-transformation σ γ 7.5 illustrates the probability density function for the LP3 Y ln(X). Thus the standard deviation Y and skew Y γ γ distribution as a function of the skew of the P3 distribu- of Y are in log space. For Y 0, the log-Pearson type 3 σ tion describing ln(X), with lnX 0.3. The LP3 density distribution reduces to the two-parameter lognormal function for |γ | 2 can assume a wide range of shapes distribution discussed above, because in this case Y has with both positive and negative skews. For |γ | 2, the a normal distribution. For the lognormal distribution, σ log-space P3 distribution is equivalent to an exponential the standard deviation Y serves as the sole shape distribution function, which decays exponentially as x parameter, and the coefficient of variation of X for small β σ σ moves away from the lower bound ( 0) or upper Y is just Y. Figure 7.7 shows that the situation is more wrm_ch07.qxd 8/31/2005 12:18 PM Page 190

190 Water Resources Systems Planning and Management

0.60 log-space standard 12 real-space coefficient 0.50 deviation σY of skewness γX 10 0.40 0.60 1.6 real-space coefficient 0.30 of variation CV 8 0.50 1.4 X 0.20 0.10 0.40 Y 6 0.30 σ 1.2 0.05 0.20 1.0 4 0.10 0.8

log-space standard 0.05 deviation 2 0.6 0 0.4

-2 0.2 -1.6 -1.2 -0.8 -0.4 0.0 0.4 0.81.2 1.6 0.0 -1.6 -1.2 -0.8 -0.4 0.0 0.4 0.81.2 1.6 log-space coefficient of skewness γY log-space coefficient of skewness γY

E020527h

E020527j Figure 7.6. Real-space coefficient of skewness γ for LP3 X Figure 7.7. Real-space coefficient of variation CV for LP3 distributed X as a function of log-space standard deviation X distributed X as a function of log-space standard deviation σ σ and coefficient of skewness γ where Y ln(X). Y Y Y γ and coefficient of skewness Y where Y ln(X).

complicated for the LP3 distribution. However, for σ small Y, the coefficient of variation of X is approxi- estimated by a weighted average of the at-site sample σ mately Y. skewness coefficient and a regional estimate of the Again, the flood flow data in Table 7.2 can be used to skewness coefficient. Bulletin 17B also includes tables of illustrate parameter estimation. Using natural logarithms, frequency factors, a map of regional skewness estimators, one can estimate the log-space moments with the stan- checks for low outliers, confidence interval formula, a dard estimators in Equations 7.39 that yield: discussion of expected probability and a weighted- µˆ 7.202 moments estimator for historical data. σˆ 0.5625 γˆ 0.337 3.6. Gumbel and GEV Distributions For the LP3 distribution, analysis generally focuses on The annual maximum flood is the largest flood flow the distribution of the logarithms Y ln(X) of the during a year. One might expect that the distribution of flows, which would have a Pearson type 3 distribution annual maximum flood flows would belong to the set of µ σ γ with moments Y, Y and Y (IACWD, 1982; Bobée extreme value distributions (Gumbel, 1958; Kottegoda and Ashkar, 1991). As a result, flood quantiles are calcu- and Rosso, 1997). These are the distributions obtained in lated as the limit, as the sample size n becomes large, by taking the µ σ γ largest of n independent random variables. The Extreme xp exp{ Y YKp[ Y]} (7.89) Value (EV) type I distribution, or Gumbel distribution, γ where Kp[ Y] is a frequency factor corresponding to has often been used to describe flood flows. It has the γ cumulative probability p for skewness coefficient Y. cumulative distribution function: (K [γ ] corresponds to the quantiles of a three-parameter p Y ξ α gamma distribution with zero mean, unit variance, and FX(x) exp{ exp[ (x )/ ]} (7.90) γ skewness coefficient Y.) where ξ is the location parameter. It has a mean and Since 1967 the recommended procedure for flood variance of frequency analysis by federal agencies in the United States µ ξ α X 0.5772 has used this distribution. Current guidelines in Bulletin γ σ 2 π2α 2 Х α 2 17B (IACWD, 1982) suggest that the skew Y be X /6 1.645 (7.91) wrm_ch07.qxd 8/31/2005 12:18 PM Page 191

Concepts in Probability, Statistics and Stochastic Modelling 191

0.20 0.04

f(x)

f(x)

0.03 0.15

0.02 κ =-0.3 κ =-0.1 0.10 κ = 0.1 0.01 κ =-0.3 κ =-0.1 0.05 κ = 0.1 0.00 12 17 22 27 x

E020527m 0.00 010152025305 Figure 7.9. Right-hand tails of GEV distributions shown in x Figure 7.8.

E020527k

Figure 7.8. GEV density distributions for selected shape parameter κ values. thicker for κ 0, and thinner for κ 0, as shown in Figures 7.8 and 7.9. Its skewness coefficient has a fixed value equal to The parameters of the GEV distribution are easily γ X 1.1396. computed using L-moments and the relationships The generalized extreme value (GEV) distribution is (Hosking et al. (1985): a general mathematical expression that incorporates the κ 2 type I, II, and III extreme value (EV) distributions for 7.8590c 2.9554c α κλ Γ κ κ maxima (Gumbel, 1958; Hosking et al., 1985). In recent 2/[ (1 )(1 2 )] (7.94) years it has been used as a general model of extreme ξ λ α κ Γ κ events including flood flows, particularly in the context of 1 ( / )[ (1 ) 1] regionalization procedures (NERC, 1975; Stedinger and where Lu, 1995; Hosking and Wallis, 1997). The GEV distribu- λ / λ λ tion has the cumulative distribution function: c 2 2 ( 3 3 2) ln(2)/ln(3) [2/(τ 3)] ln(2)/ln(3) κ ξ α 1/κ κ 3 FX(x) exp{ [1 (x )/ ] } for 0 (7.92) As one can see, the estimator of the shape parameter κ κ From Equation 7.92, it is clear that for 0 (the typical will depend only upon the L-skewness estimator τˆ . The ξ α/κ 3 case for floods), x must exceed , whereas for estimator of the scale parameter α will then depend κ 0, x must be no greater than ξ α/κ (Hosking and κ λ on the estimate of and of 2. Finally, one must also use Wallis, 1987). The mean, variance, and skewness coeffi- the sample mean λ (Equation 7.48) to determine the κ 1 cient are (for 1/3): estimate of the location parameter ξ. µ ξ α κ Γ κ Using the flood data in Table 7.2 and the sample X ( / ) [1 (1 )], L-moments computed in Section 2, one obtains first σ 2 α κ 2 Γ κ Γ κ 2 ˆ X ( / ) { (1 2 ) [ (1 )] } (7.93) c0.000896 which yields κˆ0.007036, ξ 1,165.20 and αˆ 657.29. γ (Sign κ){Γ(1 3κ) 3Γ(1 κ) Γ(1 2κ) X The small value of the fitted κ parameter means that Γ κ 3 Γ κ Γ κ 2 3/2 2[ (1 )] }/{ (1 2 ) [ (1 )] } the fitted distribution is essentially a Gumbel distribution. where Γ(1κ) is the classical gamma function. The Again, ξ is a location parameter, not a lower bound, so Gumbel distribution is obtained when κ 0. For |κ | its value resembles a reasonable x value. 0.3, the general shape of the GEV distribution is similar Madsen et al. (1997a) show that moment estimators to the Gumbel distribution, though the right-hand tail is can provide more precise quantile estimators. Martins and wrm_ch07.qxd 8/31/2005 12:18 PM Page 192

192 Water Resources Systems Planning and Management

Stedinger (2001b) found that with occasional uninforma- estimators were used to generate an initial solution. tive samples, the MLE estimator of κ could be entirely Numerical optimization of the likelihood function in unrealistic resulting in absurd quantile estimators. Equation 7.96 yielded the maximum likelihood estima- However the use of a realistic prior distribution on κ tors of the GEV parameters: yielded generalized maximum likelihood estimators κˆ 0.0359, ξˆ 1165.4 and αˆ 620.2. (GLME) that performed better than moment and L-moment estimators over the range of κ of interest. Similarly, use of the geophysical prior (Equation 7.95) The generalized maximum likelihood estimators yielded the generalized maximum likelihood estimators (GMLE) are obtained by maximizing the log-likelihood κˆ 0.0823, ξˆ 1150.8 and αˆ 611.4. Here the function, augmented by a prior density function on κ. record length of forty years is too short to define reliably A prior distribution that reflects general world-wide the shape parameter κ so that result of using the prior is geophysical experience and physical realism is in the form to pull κ slightly toward the mean of the prior. The other of a beta distribution: two parameters adjust accordingly. π(κ) Γ(p) Γ(q) (0.5 κ)p–1 (0.5 κ) q–1/Γ(pq) (7.95) 3.7. L-Moment Diagrams

for 0.5 κ 0.5 with p 6 and q 9. Moreover, Section 3 presented several families of distributions. The this prior assigns reasonable probabilities to the values of L-moment diagram in Figure 7.10 illustrates the relation- κ within that range. For κ outside the range 0.4 to τ τ ships between the L-kurtosis ( 4) and L-skewness ( 3) for a 0.2 the resulting GEV distributions do not have density number of distributions often used in hydrology. It shows functions consistent with flood flows and rainfall (Martins that distributions with the same coefficient of skewness and Stedinger, 2000). Other estimators implicitly have still differ in the thickness of their tails. This thickness similar constraints. For example, L-moments restricts is described by their kurtosis. Tail shapes are important if κ to the range κ 1, and the method of moments an analysis is sensitive to the likelihood of extreme events. estimator employs the sample standard deviation so that The normal and Gumbel distributions have a fixed κ 0.5. Use of the sample skew introduces the shape and thus are presented by single points that fall on constraint that κ 0.3. Then given a set of independ- the Pearson type 3 (P3) curve for γ 0, and the general- ent observations {x1,…, xn} drawn for a GEV distribu- ized extreme value (GEV) curve for κ 0, respectively. tion, the generalized likelihood function is: The L-kurtosis/L-skewness relationships for the two- parameter and three-parameter gamma or P3 distribu- ln{L (ξακ , , |xx ,… , )} n ln( α ) 1 n tions are identical, as they are for the two-parameter and n  1   ∑  1 ln()()yy 1κ ln[π (κ )] three-parameter lognormal distributions. This is because κ  ii i1   the addition of a location parameter does not change with the range of fundamental shapes that can be generated. κα ξ However, for the same skewness coefficient, the lognor- yxi [(/)()]1 i (7.96) mal distribution has a larger kurtosis than the gamma or

For feasible values of the parameters, yi is greater than 0 P3 distribution, and thus assigns larger probabilities to (Hosking et al., 1985). Numerical optimization of the the largest events. generalized likelihood function is often aided by the As the skewness of the lognormal and gamma distribu- ε additional constraint that min{y1,…,yn} for some tions approaches zero, both distributions become normal small ε 0 so as to prohibit the search generating infea- and their kurtosis/skewness relationships merge. For the sible values of the parameters for which the likelihood same L-skewness, the L-kurtosis of the GEV distribution is function is undefined. The constraint should not be bind- generally larger than that of the lognormal distribution. ing at the final solution. For positive κ yielding almost symmetric or even nega- The data in Table 7.2 again provide a convenient data tively skewed GEV distributions, the GEV has a smaller set for illustrating parameter estimators. The L-moment kurtosis than the three-parameter lognormal distribution. wrm_ch07.qxd 8/31/2005 12:18 PM Page 193

Concepts in Probability, Statistics and Stochastic Modelling 193

specified range of values are not specifically reported (David, 1981). For example, in water quality investiga- 0.8 Normal Gumbel P3 GEV tions many constituents have concentrations that are LN Pareto l-kurtosis 0.6 reported as T, where T is a reliable detection threshold (MacBerthouex and Brown, 2002). Thus the concentra- 0.4 tion of the water quality variable of interest was too small to be reliably measured. Likewise, low-flow observations 0.2 and rainfall depths can be rounded to or reported as zero. Several approaches are available for analysis of censored 0.0 -0.2 0.0 0.2 0.4 0.6 0.8 data sets, including probability plots and probability-plot l-skewness regression, conditional probability models and maximum E020527b likelihood estimators (Haas and Scheff, 1990; Helsel, Figure 7.10. Relationships between L-skewness and 1990; Kroll and Stedinger, 1996; MacBerthouex and L-kurtosis for various distributions. Brown, 2002). Historical and physical paleoflood data provide another example of censored data. Before the beginning The latter can be negatively skewed when the log normal of a continuous measurement program on a stream or τ location parameter is used as an upper bound. river, the stages of unusually large floods can be Figure 7.10 also includes the three-parameter general- estimated on the basis of the memories of people who ized Pareto distribution, whose cdf is: have experienced these events and/or physical markings κ ξ α 1/κ in the watershed (Stedinger and Baker, 1987). Annual FX(x) 1 [1 (x )/ ] (7.97) maximum floods that were not unusual were not (Hosking and Wallis, 1997). For κ 0 it corresponds recorded nor were they remembered. These missing data to the exponential distribution (gamma with α 1). are censored data. They cover periods between occa- This point is where the Pareto and P3 distribution sional large floods that have been recorded or that have L-kurtosis/L-skewness lines cross. The Pareto distribution left some evidence of their occurrence (Stedinger and becomes increasing more skewed for κ 0, which is the Cohn, 1986). range of interest in hydrology. The generalized Pareto The discussion below addresses probability-plot distribution with κ 0 is often used to describe peaks- methods for use with censored data. Probability-plot over-a-threshold and other variables whose probability methods have a long history of use for this purpose density function has its maximum at their lower bound. because they are relatively simple to use and to under- In that range for a given L-skewness, the Pareto distribu- stand. Moreover, recent research has shown that they tion always has a larger kurtosis than the gamma distri- are relatively efficient when the majority of values are bution. In these cases the α parameter for the gamma observed, and unobserved values are known only to be distribution would need to be in the range 0 α 1, so below (or above) some detection limit or perception that both distributions would be J-shaped. threshold that serves as a lower (or upper) bound. In such As shown in Figure 7.10, the GEV distribution has a cases, probability-plot regression estimators of moments thicker right-hand tail than either the gamma/Pearson and quantiles are as accurate as maximum likelihood type 3 distribution or the lognormal distribution. estimators. They are almost as good as estimators computed with complete samples (Helsel and Cohn, 1988; Kroll and Stedinger, 1996). 4. Analysis of Censored Data Perhaps the simplest method for dealing with censored data is adoption of a conditional probability model. Such There are many instances in water resources planning models implicitly assume that the data are drawn from where one encounters censored data. A data set is one of two classes of observations: those below a single censored if the values of observations that are outside a threshold, and those above the threshold. This model is wrm_ch07.qxd 8/31/2005 12:18 PM Page 194

194 Water Resources Systems Planning and Management

appropriate for simple cases where censoring occurs And if a sample mean, sample variance or quantiles are because small observations are recorded as ‘zero,’ as often needed, then the distribution defined by the probability happens with low-flow, low pollutant concentration, plot is used to fill in the missing (censored) observations and some flood records. The conditional probability so that standard estimators of the mean, standard

model introduces an extra parameter P0 to describe deviation and of quantiles can be employed. Such fill-in the probability that an observation is ‘zero’. If r of a procedures are efficient and relatively robust for fitting total of n observations were observed because they a distribution and estimating various statistics with exceeded the threshold, then P0 is estimated as (n r)/n. censored water quality data when a modest number of the A continuous distribution GX(x) is derived for the smallest observations are censored (Helsel, 1990; Kroll strictly positive ‘non-zero’ values of X. Then the and Stedinger, 1996). parameters of the G distribution can be estimated Unlike the conditional probability approach, here the

using any procedure appropriate for complete uncen- below threshold probability P0 is linked with the selected sored samples. The unconditional cdf for any value x 0, probability distribution for the above-threshold observa- is then tions. The observations below the threshold are censored but are in all other respects envisioned as coming from FX(x) P0 ( 1 P0) G(x) (7.98) the same distribution that is used to describe the observed This model completely decouples the value of P from the 0 above-threshold values. parameters that describe the G distribution. When water quality data are well described by a Section 7.2 discusses probability plots and plotting … lognormal distribution, available values ln[X(1)] positions useful for graphical displays of data to allow ln[X ] can be regressed upon F 1[p ] µ σF 1[p ] a visual examination of the empirical frequency curve. (r) i i for i 1, … , r, where the r largest observations in a Suppose that among n samples a detection limit is sample of size n are available. If regression yields constant exceeded by the observations r times. The natural estima- m and slope s corresponding to the first and second tor of the exceedance probability P of the perception 0 population moments µ and σ, a good estimator of the pth threshold is again (n r)/n. If the r values that exceeded quantile is the threshold are indexed by i 1, … , r, wherein x is (r) the largest observation, reasonable plotting positions xp exp[m szp] (7.101) th within the interval [P0 , 1] are: wherein zp is the p quantile of the standard normal distribution. To estimate sample means and other pi P0 (1 P0) [(i a)/(r 1 2a)] (7.99) statistics one can fill in the missing observations with where a defines the plotting position that is used; a 0 is reasonable (Hirsch and Stedinger, 1987). Helsel and x(j) exp{y(j)} for j 1, … , (n r) (7.102) Cohn (1988) show that reasonable choices for a generally where make little difference. Both papers discuss development 1 y(j) m sF {P0[(j a)/(n r 1 2a)]} (7.103) of plotting positions when there are different thresholds, as occurs when the analytical precision of instrumenta- Once a complete sample is constructed, standard estima- tion changes over time. If there are many exceedances of tors of the sample mean and variance can be calculated, as can medians and ranges. By filling in the missing small the threshold so that r (1 2a), pi is indistinguish- able from observations, and then using complete-sample estimators of statistics of interest, the procedure is relatively insensi- ′ pi [i (n r) a]/(n 1 2a) (7.100) tive to the assumption that the observations actually have where again, i 1, … , r. These values correspond to the a lognormal distribution. plotting positions that would be assigned to the largest r Maximum likelihood estimators are quite flexible, and observations in a complete sample of n values. are more efficient than plotting-position methods when The idea behind the probability-plot regression the values of the observations are not recorded because estimators is to use the probability plot for the observed they are below or above the perception threshold (Kroll data to define the parameters of the whole distribution. and Stedinger, 1996). Maximum likelihood methods wrm_ch07.qxd 8/31/2005 12:18 PM Page 195

Concepts in Probability, Statistics and Stochastic Modelling 195

allow the observations to be represented by exact values, quality of measurement procedures. Numerical methods ranges and various thresholds that either were or were can be used to identify the parameter vector that maxi- not exceeded at various times. This can be particularly mizes the likelihood function for the data available. important with historical flood data sets because the magnitudes of many historical floods are not recorded precisely, and it may be known that a threshold was 5. Regionalization and Index-Flood never crossed or was crossed at most once or twice in a Method long period (Stedinger and Cohn, 1986; Stedinger, 2000; O’Connell et al., 2002). Unfortunately, maximum likeli- Research has demonstrated the potential advantages hood estimators for the LP3 distribution have proven of ‘index flood’ procedures (Lettenmaier et al., 1987; to be problematic. However, recently developed expected Stedinger and Lu, 1995; Hosking and Wallis, 1997; moment estimators seem to do as well as maximum Madsen, and Rosbjerg, 1997a). The idea behind the likelihood estimators with the LP3 distribution (Cohn index-flood approach is to use the data from many et al., 1997, 2001; Griffs et al., 2004). hydrologically ‘similar’ basins to estimate a dimensionless While often a computational challenge, maximum flood distribution (Wallis, 1980). Thus this method likelihood estimators for complete samples, and samples ‘substitutes space for time’ by using regional information with some observations censored, pose no conceptual to compensate for having relatively short records at each challenge. One need only write the maximum likelihood site. The concept underlying the index-flood method is function for the data and proceed to seek the parameter that the distributions of floods at different sites in a values that maximize that function. Thus if F(x | θ) and ‘region’ are the same except for a scale or index-flood f(x | θ) are the cumulative distribution and probability parameter that reflects the size, rainfall and runoff density functions that should describe the data, and θ characteristics of each watershed. Research is revealing is the vector of parameters of the distribution, then for when this assumption may be reasonable. Often a more

the case described above wherein x1, … , xr are r of n sophisticated multi-scaling model is appropriate (Gupta observations that exceeded a threshold T, the likelihood and Dawdy, 1995a; Robinson and Sivapalan, 1997). function would be (Stedinger and Cohn, 1986): Generally the mean is employed as the index flood. th L(θ | r, n, x , … , x ) The problem of estimating the p quantile xp is then 1 r µ θ (nr) θ θ … θ reduced to estimation of the mean, x, for a site and the F(T | ) f(x1 | )f(x2 | ) f(xr | ) (7.104) µ th ratio xp/ x of the p quantile to the mean. The mean can Here, (n r) observations were below the threshold T, often be estimated adequately with the record available at and the probability an observation is below T is F(T | θ) a site, even if that record is short. The indicated ratio is which then appears in Equation 7.104 to represent that estimated using regional information. The British Flood observation. In addition, the specific values of the r obser- Studies Report (NERC, 1975) calls these normalized flood

vations x1, … , xr are available, where the probability an distributions “growth curves”. δ δ observation is in a small interval of width around xi is Key to the success of the index-flood approach is θ f(xi | ). Thus strictly speaking the likelihood function also identification of sets of basins that have similar coeffi- includes a term δr. Here what is known of the magnitude cients of variation and skew. Basins can be grouped of all of the n observations is included in the likelihood geographically, as well as by physiographic characteristics function in the appropriate way. If all that were known of including drainage area and elevation. Regions need not some observation was that it exceeded a threshold M, then be geographically contiguous. Each site can potentially be that value should be represented by a term [1 F(M | θ)] assigned its own unique region consisting of sites with in the likelihood function. Similarly, if all that were known which it is particularly similar (Zrinji and Burn, 1994), or was that the value was between L and M, then a term regional regression equations can be derived to compute [F(M | θ) F(L | θ)] should be included in the likelihood normalized regional quantiles as a function of a site’s function. Different thresholds can be used to describe physiographic characteristics and other statistics (Fill and different observations corresponding to changes in the Stedinger, 1998). wrm_ch07.qxd 8/31/2005 12:18 PM Page 196

196 Water Resources Systems Planning and Management

Clearly the next step for regionalization procedures, with a regional estimator of the shape parameter for a such as the index-flood method, is to move away from GEV distribution. In many cases this would be roughly estimates of regional parameters that do not depend upon equivalent to fitting a Gumbel distribution corresponding basin size and other physiographic parameters. Gupta et al. to a shape parameter κ 0. Gabriele and Arnell (1991) (1994) argue that the basic premise of the index-flood develop the idea of having regions of different sizes for method – that the coefficient of variation of floods is different parameters. For realistic hydrological regions, relatively constant – is inconsistent with the known these and other studies illustrate the value of regionaliz- relationships between the coefficient of variation (CV) ing estimators of the shape, and often the coefficient of and drainage area (see also Robinson and Sivapalan, variation of a distribution. 1997). Recently, Fill and Stedinger (1998) built such a relationship into an index-flood procedure by using a regression model to explain variations in the normalized 6. Partial Duration Series quantiles. Tasker and Stedinger (1986) illustrated how one might relate log-space skew to physiographic basin Two general approaches are available for modelling characteristics (see also Gupta and Dawdy, 1995b). flood and precipitation series (Langbein, 1949). An Madsen and Rosbjerg (1997b) did the same for a regional annual maximum series considers only the largest event model of κ for the GEV distribution. In both studies, only in each year. A partial duration series (PDS) or peaks- a binary variable representing ‘region’ was found useful in over-threshold (POT) approach includes all ‘independent’ explaining variations in these two shape parameters. peaks above a truncation or threshold level. An objection Once a regional model of alternative shape parameters to using annual maximum series is that it employs only is derived, there may be some advantage to combining the largest event in each year, regardless of whether the such regional estimators with at-site estimators employing second-largest event in a year exceeds the largest events of an empirical Bayesian framework or some other weighting other years. Moreover, the largest annual flood flow in a schemes. For example, Bulletin 17B recommends weigh dry year in some arid or semi-arid regions may be zero, or at-site and regional skewness estimators, but almost cer- so small that calling them floods is misleading. When tainly places too much weight on the at-site values considering rainfall series or pollutant discharge events, (Tasker and Stedinger, 1986). Examples of empirical one may be interested in modelling all events that occur Bayesian procedures are provided by Kuczera (1982), within a year that exceed some threshold of interest. Madsen and Rosbjerg (1997b) and Fill and Stedinger Use of a partial duration series framework avoids such (1998). Madsen and Rosbjerg’s (1997b) computation of a problems by considering all independent peaks that exceed κ-model with a New Zealand data set demonstrates how a specified threshold. Furthermore, one can estimate annual important it can be to do the regional analysis carefully, exceedance probabilities from the analysis of partial taking into account the cross-correlation among concur- duration series. Arguments in favour of partial duration rent flood records. series are that relatively long and reliable records are often When one has relatively few data at a site, the index- available, and if the arrival rate for peaks over the threshold flood method is an effective strategy for deriving flood is large enough (1.65 events/year for the Poisson-arrival frequency estimates. However, as the length of the with exponential-exceedance model), partial duration series available record increases, it becomes increasingly advan- analyses should yield more accurate estimates of extreme tageous to also use the at-site data to estimate the coeffi- quantiles than the corresponding annual-maximum fre- cient of variation as well. Stedinger and Lu (1995) found quency analyses (NERC, 1975; Rosbjerg, 1985). However, that the L-moment/GEV index-flood method did quite when fitting a three-parameter distribution, there seems to well for ‘humid regions’ (CV Ϸ 0.5) when n 25, and for be little advantage from the use of a partial duration series semi-arid regions (CV Ϸ 1.0) for n 60, if reasonable approach over an annual maximum approach. This is true care is taken in selecting the stations to be included even when the partial duration series includes many more in a regional analysis. However, with longer records, it peaks than the maximum series because both contain the became advantageous to use the at-site mean and L-CV same largest events (Martins and Stedinger, 2001a). wrm_ch07.qxd 8/31/2005 12:18 PM Page 197

Concepts in Probability, Statistics and Stochastic Modelling 197

A drawback of partial duration series analyses is that the GPD is the two-parameter exponential distribution one must have criteria to identify only independent with κ 0. Method of moment estimators work relatively peaks (and not multiple peaks corresponding to the same well (Rosbjerg et al., 1992). event). Thus, such analysis can be more complicated than Use of a generalized Pareto distribution for G(x) with analyses using annual maxima. Partial duration models, a Poisson arrival model yields a GEV distribution for

perhaps with parameters that vary by season, are often the annual maximum series greater than x0 (Smith, 1984; used to estimate expected damages from hydrological Stedinger et al., 1993; Madsen et al., 1997a). The Poisson- events when more than one damage-causing event can Pareto and Poisson-GPD models provide very reasonable occur in a season or within a year (North, 1980). descriptions of flood risk (Rosbjerg et al., 1992). They A model of a partial duration series has at least two have the advantage that they focus on the distribution of components: first, one must model the arrival rate of events the larger flood events, and regional estimates of the GEV larger than the threshold level; second, one must model distribution’s shape parameter κ from annual maximum the magnitudes of those events. For example, a Poisson and partial duration series analyses can be used inter- distribution has often been used to model the arrival of changeably. events, and an exponential distribution to describe the Madsen and Rosbjerg (1997a) use a Poisson-GPD magnitudes of peaks that exceed the threshold. model as the basis of a partial duration series index-flood There are several general relationships between the procedure. Madsen et al. (1997b) show that the estima- probability distribution for annual maximum and the fre- tors are fairly efficient. They pooled information from quency of events in a partial duration series. For a partial many sites to estimate the single shape parameter κ and duration series model, let λ be the average arrival rate of the arrival rate where the threshold was a specified flood peaks greater than the threshold x0 and let G(x) be percentile of the daily flow duration curve at each site. the probability that flood peaks, when they occur, are less Then at-site information was used to estimate the mean than x x0, and thus those peaks fall in the range [x0, x]. above-threshold flood. Alternatively, one could use the The annual exceedance probability for a flood, denoted at-site data to estimate the arrival rate as well. 1/Ta, corresponding to an annual return period Ta, is related to the corresponding exceedance probability q [1 – G(x)] for level x in the partial duration series by 7. Stochastic Processes and e Time Series λ 1/Ta 1 – exp{ qe} 1 – exp{ 1/Tp} (7.105) λ Many important random variables in water resources are where Tp 1/( qe) is the average return period for level x in the partial duration series. functions whose values change with time. Historical Many different choices for G(x) may be reasonable. In records of rainfall or streamflow at a particular site are a particular, the generalized Pareto distribution (GPD) is a sequence of observations called a time series. In a time simple distribution useful for describing floods that exceed series, the observations are ordered by time, and it is a specified lower bound. The cumulative distribution generally the case that the observed value of the random function for the generalized three-parameter Pareto variable at one time influences one’s assessment of the distribution is: distribution of the random variable at later times. This means that the observations are not independent. Time κ ξ α 1/κ FX(x) 1 [1 (x )/ ] (7.106) series are conceptualized as being a single observation of a stochastic process, which is a generalization of the with mean and variance concept of a random variable. µ ξ α κ κ X /(1 ) This section has three parts. The first presents the con- σ 2 α 2/[(1κ)2(12κ)] (7.107) cept of stationarity and the basic statistics generally used X to describe the properties of a stationary stochastic where for κ 0, ξ x ∞, whereas for κ 0, ξ x process. The second presents the definition of a Markov ξ α/κ (Hosking and Wallis, 1987). A special case of process and the Markov chain model. Markov chains are wrm_ch07.qxd 8/31/2005 12:18 PM Page 198

198 Water Resources Systems Planning and Management

a convenient model for describing many phenomena, and T µˆ 1 ∑ are often used in synthetic flow and rainfall generation X X Xt (7.111) T t=1 and optimization models. The third part discusses the sampling properties of statistics used to describe the T σ 2 1 ∑ 2 characteristics of many time series. X ()XXt (7.112) T t1

while the autocorrelations ρ (k) for any time lag k can be 7.1. Describing Stochastic Processes X estimated as (Jenkins and Watts, 1968)

A random variable whose value changes through time Tk according to probabilistic laws is called a stochastic ∑ ()()xxxxtk t ρ process. An observed time series is considered to be one ˆX(k) t 1 rk T realization of a stochastic process, just as a single 2 ∑()xxt (7.113) observation of a random variable is one possible value the t1 random variable may assume. In the development here, a stochastic process is a sequence of random variables {X(t)} The sampling distribution of these estimators depends on ordered by a discrete time variable t 1, 2, 3, … the correlation structure of the stochastic process giving The properties of a stochastic process must generally rise to the time series. In particular, when the observa- be determined from a single time series or realization. tions are positively correlated, as is usually the case in To do this, several assumptions are usually made. First, natural streamflows or annual benefits in a river basin _ σˆ 2 one generally assumes that the process is stationary. This simulation, the variances of the estimated x and X are means that the probability distribution of the process is larger than would be the case if the observations were not changing over time. In addition, if a process is strictly independent. It is sometimes wise to take this inflation stationary, the joint distribution of the random variables into account. Section 7.3 discusses the sampling distribu- tion of these statistics. X(t1),…,X(tn) is identical to the joint distribution of All of this analysis depends on the assumption of X(t1 t),…,X(tn t) for any t; the joint distribution stationarity, for only then do the quantities defined in depends only on the differences ti tj between the times of occurrence of the events. Equations 7.108 to 7.110 have the intended meaning. For a stationary stochastic process, one can write the Stochastic processes are not always stationary. Agricultural mean and variance as and urban development, deforestation, climatic variability and changes in regional resource management can alter µ X E[X(t)] (7.108) the distribution of rainfall, streamflows, pollutant concen- and trations, sediment loads and groundwater levels over time. If a stochastic process is not essentially stationary over σ 2 Var[X(t)] (7.109) the time span in question, then statistical techniques that Both are independent of time t. The autocorrelations, the rely on the stationary assumption do not apply and the correlation of X with itself, are given by problem generally becomes much more difficult. ρ σ 2 X (k) Cov[X(t), X(t k)]/ X (7.110) 7.2. Markov Processes and Markov Chains for any positive integer time lag k. These are the statistics most often used to describe stationary stochastic A common assumption in many stochastic water processes. resources models is that the stochastic process X(t) is When one has available only a single time series, it is a Markov process. A first-order Markov process has the µ σ 2 ρ necessary to estimate the values of X, X, and X(k) from property that the dependence of future values of the values of the random variable that one has observed. The process on past values depends only on the current value mean and variance are generally estimated essentially as and not on previous values or observations. In symbols they were in Equation 7.39. for k 0, wrm_ch07.qxd 8/31/2005 12:18 PM Page 199

Concepts in Probability, Statistics and Stochastic Modelling 199

FX[X(t k) | X(t), X(t 1), X(t 2), …] possible streamflows) with unconditional probabilities p where FX[X(t k) | X(t)] (7.114) i n For Markov processes, the current value summarizes the ∑ pi 1 (7.115) state of the processes. As a consequence, the current value i1 of the process is often referred to as the state. This makes physical sense as well when one refers to the state or level It is frequently the case that the value of Qy1 is not of an aquifer or reservoir. independent of Qy. A Markov chain can model such A special kind of Markov process is one whose dependence. This requires specification of the transition state X(t) can take on only discrete values. Such a process probabilities pij, is called a Markov chain. Often in water resources pij Pr[Qy1 qj | Qy qi] (7.116) planning, continuous stochastic processes are approxi- mated by Markov chains. This is done to facilitate the A transition probability is the conditional probability that construction of simple stochastic models. This section the next state is qj, given that the current state is qi. The presents the basic notation and properties of Markov transition probabilities must satisfy chains. n Consider a stream whose annual flow is to be ∑ piij 1 for all (7.117) j1 represented by a discrete random variable. Assume that the distribution of streamflows is stationary. In the Figure 7.11 shows a possible set of transition probabilities

following development, the continuous random variable in a matrix. Each element pij in the matrix is the probabil- representing the annual streamflows (or some other ity of a transition from streamflow qi in one year to process) is approximated by a random variable Qy in streamflow qj in the next. In this example, a low flow year y, which takes on only n discrete values qi (each tends to be followed by a low flow, rather than a high value representing a continuous range or interval of flow, and vice versa. j j 1 2 P P

Figure 7.11. Matrix (above) and histograms (below) of streamflow transition probabilities showing probability

0.4 0.3 0.2 0.1 0.2 0.4 0.3 0.1 of streamflow qj (represented by index j) in year y1 given

j j streamflow qi (represented by 3 4

P P index i) in year y.

0.1 0.3 0.4 0.2 0.0 0.2 0.3 0.5 1234 1234 j j

E020527q wrm_ch07.qxd 8/31/2005 12:18 PM Page 200

200 Water Resources Systems Planning and Management

Let P be the transition matrix whose elements are pij. For a Markov chain, the transition matrix contains all year P y P y P y P y E021101h the information necessary to describe the behaviour of 1 2 3 4 y y the process. Let pi be the probability that the streamflow 0.000 1.000 0.000 0.000 y + 1 0.200 0.400 0.300 0.100 Qy is qi (in state i) in year y. Then the probability that y + 2 0.190 0.330 0.310 0.170 Q q is the sum of the products of the probabilities y 1 j y y + 3 0.173 0.316 0.312 0.199 pi that Qy qi times the probability pij that the next state y + 4 0.163 0.312 0.314 0.211 Qy1 is qj given that Qy qi. In symbols, this relationship y + 5 0.159 0.310 0.315 0.216 is written: y + 6 0.157 0.309 0.316 0.218 y n + 7 0.156 0.309 0.316 0.219 yy1 y . . . y y y + 8 0.156 0.309 0.316 0.219 pppppppppj 1 1 j 2 2 jnnj ∑ i ij (7.118) y i1 + 9 0.156 0.309 0.316 0.219 Letting py be the row vector of state resident probabilities Θ y yΙ pi , … , pn , this relationship may be written Table 7.8. Successive streamflow probabilities based on transition probabilities in Figure 7.11. p(y1) p(y)P (7.119) To calculate the probabilities of each streamflow state in year y 2, one can use p(y 1) in Equation 7.119 to obtain where p is the row vector of unconditional probabilities p(y 2) p(y 1)P or p(y 2) pyP2 (p1, … , pn). For the example in Table 7.8, the probability Continuing in this manner, it is possible to compute vector p equals (0.156, 0.309, 0.316, 0.219). the probabilities of each possible streamflow state for The steady-state probabilities for any Markov years y 1, y 2, y 3, … , y k, … as chain can be found by solving simultaneous Equation 7.122 for all but one of the states j together with the (yk) y k p p P (7.120) constraint

Returning to the four-state example in Figure 7.11, n assume that the flow in year y is in the interval repre- ∑ pi 1 (7.123) i1 sented by q2. Hence in year y the unconditional stream- y y flow probabilities pi are (0, 1, 0, 0). Knowing each pi , Annual streamflows are seldom as highly correlated y1 the probabilities pj corresponding to each of the four as the flows in this example. However, monthly, weekly streamflow states can be determined. From Figure 7.11, and especially daily streamflows generally have high y1 the probabilities pj are 0.2, 0.4, 0.3 and 0.1 for serial correlations. Assuming that the unconditional j 1, 2, 3 and 4, respectively. The probability vectors for steady-state probability distributions for monthly nine future years are listed in Table 7.8. streamflows are stationary, a Markov chain can be As time progresses, the probabilities generally reach defined for each month’s streamflow. Since there are twelve limiting values. These are the unconditional or steady-state months in a year, there would be twelve transition probabilities. The quantity pi has been defined as the matrices, the elements of which could be denoted as t t1 unconditional probability of qi. These are the steady-state p . Each defines the probability of a streamflow q ij j probabilities which p(y k) approaches for large k. It is clear t in month t 1, given a streamflow qi in month t. The from Table 7.8 that as k becomes larger, Equation 7.118 steady-state stationary probability vectors for each month becomes can be found by the procedure outlined above, except n that now all twelve matrices are used to calculate all ppp ∑ jiij (7.121) twelve steady-state probability vectors. However, once i1 the steady-state vector p is found for one month, the or in vector notation, Equation 7.119 becomes others are easily computed using Equation 7.120 with t p pP (7.122) replacing y. wrm_ch07.qxd 8/31/2005 12:18 PM Page 201

Concepts in Probability, Statistics and Stochastic Modelling 201

7.3. Properties of Time-Series Statistics

The statistics most frequently used to describe the sample correlation of consecutive observations E021101j distribution of a continuous-state stationary stochastic size n p = 0.0 p = 0.3 p = 0.6 process are the sample mean, variance and various 25 0.050 0.067 0.096 autocorrelations. Statistical dependence among the obser- 50 0.035 0.048 0.069 vations, as is frequently found in time series, can have a 100 0.025 0.034 0.050 marked effect on the distribution of these statistics. This part of Section 7 reviews the sampling properties of _ σ ρ ρ k these statistics when the observations are a realization Table 7.9. Standard error of X when x 0.25 and X(k) . of a stochastic process. The sample mean σρρρ2  21[(n ) ( 1n )] n Var(X )X 1  (7.128) 1 ρ 2 X ∑ Xi (7.124) nn ()1  n i1 σ 2 ρ Substitution of the sample estimates for X and X(1) in when viewed as a random variable is an unbiased estimate the equation above often yields a more realistic estimate µ _ of the mean of the process X, because 2 of the variance of X than does the estimate sX/n if the k n correlation structure ρ (k) ρ is reasonable; otherwise, 1 µ X EE[][]X ∑ Xi X (7.125) Equation 7.126 may be employed. Table 7.9 illustrates n i1 the effect of correlation among the Xt values on the ρ However, correlation among the Xi’s, so that X(k) _ 0 for standard error of their mean, equal to the square root of k 0, affects the variance of the estimated mean X. the variance in Equation 7.126. The properties of the estimate of the variance of X, µ 2 Var()XXE[( X)] 1 n 1  n n  σˆ 2 v22∑()XX (7.129) ∑∑ µµ X Xt 2 E ()()XXtXsX n t1 n t1 s1  are also affected by correlation among the X ’s. Here v σ 2  n 1 k  t X 112∑  1  ρ ()k  rather than s is used to denote the variance estimator,   X  (7.126) n  k1 n  because n is employed in the denominator rather than _ n 1. The expected value of v2 becomes σ 2 x The variance of X, equal to x /n for independent obser- vations, is inflated by the factor within the brackets. For  12n 1 k  22σρ ∑  ρ (k) 0, as is often the case, this factor is a nondecreas- E[vXX ] 1 1  X()k (7.130) X _  nnk1 n  ing function of n, so that the variance of X is inflated by a 2 ρ factor whose importance does not decrease with increas- The bias in vX depends on terms involving X(1) through ρ 2 ing sample size. This is an important observation, because X(n 1). Fortunately, the bias in vX decreases with n it means that the average of a correlated time series will be and is generally unimportant when compared to its less precise than the average of a sequence of independ- variance. ent random variables of the same length with the same Correlation among the Xt’s also affects the variance of 2 variance. vX. Assuming that X has a normal distribution (here the 2 A common model of stochastic series has variance of vX depends on the fourth moment of X), the variance of v2 for large n is approximately (Kendall and ρ (k) [ρ (1)]k ρk (7.127) X X X Stuart, 1966, Sec. 48.1). This correlation structure arises from the autoregressive σ 4  ∞  Markov model discussed at length in Section 8. For this Var()v2 Х 212X  ∑ ρ2 ()k  (7.131) X X  correlation structure n  k1  wrm_ch07.qxd 8/31/2005 12:18 PM Page 202

202 Water Resources Systems Planning and Management

ρ ρk where for X(k) , Equation 7.131 becomes

101k

σ 4  ρ2  sample correlation of consecutive observations E021 2 Х X 1 Var()vX 2   size n p = 0.0 p = 0.3 p = 0.6 n 1 ρ2  (7.132) _ 25 0.28 0.31 0.41 Like the variance of X, the variance of v2 is inflated by a X 50 0.20 0.22 0.29 factor whose importance does not decrease with n. This 100 0.14 0.15 0.21 is illustrated by Table 7.10 which gives the standard 2 σ 2 deviation of vX divided by the true variance X as a function of n and ρ when the observations have a normal ΄ 2 σ 2΅ Table 7.10. Standard deviation of vX/ X when observations ρ ρk ρ ρk distribution and X(k) . This would be the coefficient have a normal distribution and X(k) . 2 of variation of vX were it not biased. A fundamental problem of time-series analyses is the estimation or description of the relationship among the random variable values at different times. The However, for normally distributed Xt and large n (Kendall statistics used to describe this relationship are the and Stuart, 1966), autocorrelations. Several estimates of the autocorrelations Х 1 ρρ2 ρ have been suggested. A simple and satisfactory estimate Var(rkXXX ) ∑ [ ()llklk ( ) ( ) n recommended by Jenkins and Watts (1968) is: l 4ρρρ()klkl () ( ) 2ρρ22()kl () (7.135) nk XXX XX] ∑ ()()xxx x ttk ρ ρˆ t1 If X(k) is essentially zero for k q, then the simpler X(k) r (7.133) k n expression (Box et al., 1994) 2 ∑()xt x t1 q 1   Var(r ) Х 12 ∑ ρ2 ()l  Here r is the ratio of two sums where the numerator kX (7.136) k n  t1  contains n k terms and the denominator contains n terms. The estimate rk is biased, but unbiased estimates is valid for rk corresponding to k q. Thus for large n, frequently have larger mean square errors (Jenkins and Var(rk) l/n and values of rk will frequently be outside ρ Watts, 1968). A comparison of the bias and variance of r1 the range of 1.65/ n, even though X(k) may be zero. ρ ρk is provided by the case when the Xt’s are independent If X(k) , Equation 7.136 reduces to normal variates. Then (Kendall and Stuart, 1966) 1 ()()llρρ22k  1 Var(r )Х  2kρ2k  (7.137) E[]r1 (7.134a) k 2 n n  l ρ  and In particular for rl, this gives ()n 2 2 1 Х 1 Var(r1 ) 2 (7.134b) Х ρ2 nn()1 n Var(r1 ) ()1 (7.138) n For n 25, the expected value of r1 is 0.04 rather Approximate values of the standard deviation of rl for than the true value of zero; its standard deviation is 0.19. different values of n and ρ are given in Table 7.11. 2 This results in a mean square error of (E[r1]) Var(r1) The estimates of rk and rkj are highly correlated for 0.0016 0.0353 0.0369. Clearly, the variance of small j; this causes plots of rk versus k to exhibit slowly ρ r1 is the dominant term. varying cycles when the true values of X(k) may be zero. For Xt values that are not independent, exact expres- This increases the difficulty of interpreting the sample sions for the variance of rk generally are not available. autocorrelations. wrm_ch07.qxd 8/31/2005 12:18 PM Page 203

Concepts in Probability, Statistics and Stochastic Modelling 203

1m synthetic streamflow

sample correlation of consecutive observations E02110 and other sequences size n p = 0.0 p = 0.3 p = 0.6 simulation model system of river basin system performance 25 0.20 0.19 0.16 50 0.14 0.13 0.11 future demands and economic data 100 0.10 0.095 0.080 system design

p and operating policy 7 52 20 E0

Table 7.11. Approximate standard deviation of r when 1 Figure 7.12. Structure of a simulation study, indicating the observations have a normal distribution and ρ (k) ρk. X transformation of a synthetic streamflow sequence, future demands and a system design and operating policy into system performance statistics. 8. Synthetic Streamflow Generation

8.1. Introduction with projections of future demands and other economic data to determine how different system designs and This section is concerned primarily with ways of generat- operating policies might perform. ing sample data such as streamflows, temperatures and Use of only the historical flow or rainfall record in rainfall that are used in water resource systems simulation water resource studies does not allow for the testing of studies (e.g., as introduced in the next section). The mod- alternative designs and policies against the range of els and techniques discussed in this section can be used sequences that are likely to occur in the future. We can be to generate any number of quantities used as inputs to very confident that the future historical sequence of flows simulation studies. For example Wilks (1998, 2002) dis- will not be the historical one, yet there is important infor- cusses the generation of wet and dry days, rainfall depths mation in that historical record. That information is not on wet days and associated daily temperatures. The dis- fully used if only the historical sequence is simulated. cussion here is directed toward the generation of stream- Fitting continuous distributions to the set of historical flows because of the historical development and frequent flows and then using those distributions to generate other use of these models in that context (Matalas and Wallis, sequences of flows, all of which are statistically similar 1976). In addition, they are relatively simple compared to and equally weighted, gives one a broader range of inputs more complete daily weather generators and many other to simulation models. Testing designs and policies against applications. Generated streamflows have been called that broader range of flow sequences that could occur synthetic to distinguish them from historical observations more clearly identifies the variability and range of possi- (Fiering, 1967). The activity has been called stochastic ble future performance indicator values. This in turn hydrological modelling. More detailed presentations can should lead to the selection of more robust system designs be found in Marco et al. (1989) and Salas (1993). and policies. River basin simulation studies can use many sets of The use of synthetic streamflows is particularly useful streamflow, rainfall, evaporation and/or temperature for water resources systems having large amounts of sequences to evaluate the statistical properties of the over-year storage. Use of only the historical hydrological performance of alternative water resources systems. For record in system simulation yields only one time history this purpose, synthetic flows and other generated quanti- of how the system would operate from year to year. In ties should resemble, statistically, those sequences that are water resources systems with relatively little storage, so likely to be experienced during the planning period. that reservoirs and/or groundwater aquifers refill almost Figure 7.12 illustrates how synthetic streamflow, rainfall every year, synthetic hydrological sequences may not be and other stochastic sequences are used in conjunction needed if historical sequences of a reasonable length are wrm_ch07.qxd 8/31/2005 12:18 PM Page 204

204 Water Resources Systems Planning and Management

available. In this second case, a twenty-five-year historical precision, or is it sufficient just to model the n-year output record provides twenty-five descriptions of the possible series that results from simulation of the historical series? within-year operation of the system. This may be suffi- The answer seems to depend upon how well behaved cient for many studies. the input and output series are. If the simulation model is Generally, use of stochastic sequences is thought to linear, it does not make much difference. If the simulation improve the precision with which water resources system model were highly non-linear, then modelling the input performance indices can be estimated, and some studies series would appear to be advisable. Or if one is develop- have shown this to be the case (Vogel and Shallcross, ing reservoir operating policies, there is a tendency to 1996; Vogel and Stedinger, 1988). In particular, if system make a policy sufficiently complex to deal very well with operation performance indices have thresholds and sharp the few droughts in the historical record, giving a false breaks, then the coarse descriptions provided by histori- sense of security and likely misrepresenting the probabil- cal series are likely to provide relative inaccurate estimates ity of system performance failures. of the expected values of such statistics. For example, Another situation where stochastic data generating suppose that shortages only invoke non-linear penalties models are useful is when one wants to understand the on average one year in twenty. Then in a sixty-year impact on system performance estimates of the parameter simulation there is a 19% probability that the penalty will uncertainty stemming from short historical records. In be invoked at most once, and an 18% probability it will that case, parameter uncertainty can be incorporated into be invoked five or more times. Thus the calculation of the streamflow generating models so that the generated annual average value of the penalty would be highly unre- sequences reflect both the variability that one would liable unless some smoothing of the input distributions is expect in flows over time as well as the uncertainty of allowed, associated with a long simulation analysis. the parameter values of the models that describe that On the other hand, if one is only interested in the variability (Valdes et al., 1977; Stedinger and Taylor, mean flow, or average benefits that are mostly a linear 1982a,b; Stedinger et al., 1985; Vogel and Stedinger, function of flows, then use of stochastic sequences will 1988). probably add little information to what is obtained simply If one decides to use a stochastic data generator, the by simulating the historical record. After all, the fitted challenge is to use a model that appropriately describes models are ultimately based on the information provided the important relationships, but does not attempt to in the historical record, and their use does not produce reproduce more relationships than are justified or can be new information about the hydrology of the basin. estimated with available data sets. If in a general sense one has available n years of record, Two basic techniques are used for streamflow genera- the statistics of that record can be used to build a stochastic tion. If the streamflow population can be described by a model for generating thousands of years of flow. These stationary stochastic process (a process whose parameters synthetic data can then be used to estimate more exactly do not change over time), and if a long historical stream- the system performance, assuming, of course, that the flow record exists, then a stationary stochastic streamflow flow-generating model accurately represents nature. But model may be fit to the historical flows. This statistical the initial uncertainty in the model parameters resulting model can then generate synthetic sequences that from having only n years of record would still remain describe selected characteristics of the historical flows. (Schaake and Vicens, 1980). Several such models are discussed below. An alternative is to run the historical record (if it is The assumption of stationarity is not always plausible, sufficiently complete at every site and contains no gaps of particularly in river basins that have experienced marked missing data) through the simulation model to generate n changes in runoff characteristics due to changes in land years of output. That output series can be processed to cover, land use, climate or the use of groundwater during produce estimates of system performance. So the question the period of flow record. Similarly, if the physical is the following: Is it better to generate multiple input characteristics of a basin change substantially in the series based on uncertain parameter values and use those future, the historical streamflow record may not provide to determine average system performance with great reliable estimates of the distribution of future unregulated wrm_ch07.qxd 8/31/2005 12:18 PM Page 205

Concepts in Probability, Statistics and Stochastic Modelling 205

flows. In the absence of the stationarity of streamflows or specifying that the parameters of the fitted model should a representative historical record, an alternative scheme is be determined using the observed sample moments, or to assume that precipitation is a stationary stochastic their unbiased counterparts. Other parameter estimation process and to route either historical or synthetic precip- techniques, such as maximum likelihood estimators, are itation sequences through an appropriate rainfall–runoff often more efficient. model of the river basin. Definition of resemblance in terms of moments can also lead to confusion over whether the population 8.2. Streamflow Generation Models parameters should equal the sample moments, or whether the fitted model should generate flow sequences whose The first step in the construction of a statistical stream- sample moments equal the historical values. The two flow generating model is to extract from the historical concepts are different because of the biases (as discussed streamflow record the fundamental information about the in Section 7) in many of the estimators of variances and joint distribution of flows at different sites and at different correlations (Matalas and Wallis, 1976; Stedinger, 1980, times. A streamflow model should ideally capture what 1981; Stedinger and Taylor, 1982a). is judged to be the fundamental characteristics of the For any particular river basin study, one must joint distribution of the flows. The specification of what determine what streamflow characteristics need to be characteristics are fundamental is of primary importance. modelled. The decision should depend on what character- One may want to model as closely as possible the istics are important to the operation of the system being true marginal distribution of seasonal flows and/or the studied, the available data, and how much time can be marginal distribution of annual flows. These describe spared to build and test a stochastic model. If time permits, both how much water may be available at different times it is good practice to see if the simulation results are in fact and also how variable is that water supply. Also, model- sensitive to the generation model and its parameter values ling the joint distribution of flows at a single site in by using an alternative model and set of parameter values. different months, seasons and years may be appropriate. If the model’s results are sensitive to changes, then, as The persistence of high flows and of low flows, often always, one must exercise judgement in selecting the described by their correlation, affects the reliability with appropriate model and parameter values to use. which a reservoir of a given size can provide a given This section presents a range of statistical models for yield (Fiering, 1967; Lettenmaier and Burges, 1977a, the generation of synthetic data. The necessary sophisti- 1977b; Thyer and Kuczera, 2000). For multi-component cation of a data-generating model depends on the reservoir systems, reproduction of the joint distribution of intended use of the generated data. Section 8.3 below flows at different sites and at different times will also be presents the simple autoregressive Markov model for important. generating annual flow sequences. This model alone is Sometimes, a streamflow model is said to resemble too simple for many practical studies, but is useful for statistically the historical flows if it produces flows with illustrating the fundamentals of the more complex models the same mean, variance, skew coefficient, autocorrela- that follow. It seems, therefore, worth some time explor- tions and/or cross-correlations as were observed in the ing the properties of this basic model. historical series. This definition of statistical resemblance Subsequent sections discuss how flows with any is attractive because it is operational and only requires an marginal distribution can be produced, and present analyst to only find a model that can reproduce the models for generating sequences of flows that can observed statistics. The drawback of this approach is that reproduce the persistence of historical flow sequences. it shifts the modelling emphasis away from trying to find Other parts of this section present models for generating a good model of marginal distributions of the observed concurrent flows at several sites and for generating flows and their joint distribution over time and over seasonal or monthly flows that preserve the characteristics space, given the available data, to just reproducing of annual flows. More detailed discussions for those wish- arbitrarily selected statistics. Defining statistical resem- ing to study synthetic streamflow models in greater depth blance in terms of moments may also be faulted for can be found in Marco et al. (1989) and Salas (1993). wrm_ch07.qxd 8/31/2005 12:18 PM Page 206

206 Water Resources Systems Planning and Management

8.3. A Simple Autoregressive Model

Q y + 1 E [Q y + |Q y =qy ] A simple model of annual streamflows is the autoregres- 1

sive Markov model. The historical annual flows qy are thought of as particular values of a stationary stochastic µ+ρ q − µ process Qy. The generation of annual streamflows and [ ( y )] other variables would be a simple matter if annual flows σ 1 - ρ 2 were independently distributed. In general, this is not the case and a generating model for many phenomena should capture the relationship between values in different years or in other time periods. A common and reasonable 1 ρ

y+ assumption is that annual flows are the result of a first- 1 order Markov process (as discussed in Section 7.2).

Assume for now that annual streamflows are normally flow in year qy distributed. In some areas the distribution of annual flows Q y is in fact nearly normal. Streamflow models that produce flow in year y non-normal streamflows are discussed as an extension of

E041011a this simple model. The joint normal density function of two streamflows Figure 7.13. Conditional distribution of Qy1 given Qy qy for µ σ 2 two normal random variables. Qy and Qw in years y and w having mean , variance , and year-to-year correlation ρ between flows is ρ 1 the larger the absolute value of the correlation between fq,q()yw 2205. 21πσρ() the flows, the smaller the conditional variance of Qy1,  µρµ22 µ µ which in this case does not depend at all on the value qy. ()qqqqyyww2 ()() () exp   Synthetic normally distributed streamflows that have 2σ 222()1 ρ   mean µ, variance σ 2, and year-to-year correlation ρ, are (7.139) produced by the model The joint normal distribution for two random variables µρ µ σ ρ2 with the same mean and variance depend only on their QQyyy+1 ()V 1 (7.141) common mean µ, variance σ 2, and the correlation ρ between the two (or equivalently the covariance ρσ 2). where Vy is a standard normal random variable, meaning The sequential generation of synthetic streamflows that it has zero mean, E[Vy] 0, and unit variance,  2  requires the conditional distribution of the flow in EVy  1. The random variable Vy is added here to one year given the value of the flows in previous years. provide the variability in Qy1 that remains even after Qy However, if the streamflows are a first-order (lag 1) is known. By construction, each Vy is independent of past Markov process, then the distribution of the flow in year flows Qw where w y, and Vy is independent of Vw for y 1 depends entirely on the value of the flow in year y. w y. These restrictions imply that In addition, if the annual streamflows have a multivariate E[VwVy] 0 for w y (7.142) normal distribution, then the conditional distribution of and Qy1 is normal with mean and variance µ | µ ρ µ E[(Qw )Vy] 0 for w y (7.143) E[Qy1 Qy qy] (qy ) (7.140) | σ 2 ρ2 Clearly, Qy1 will be normally distributed if both Qy and Var(Qy1 Qy qy) (1 ) Vy are normally distributed because sums of independent where qy is the value of the random variable Qy in year y. normally distributed random variables are normally This relationship is illustrated in Figure 7.13. Notice that distributed. wrm_ch07.qxd 8/31/2005 12:18 PM Page 207

Concepts in Probability, Statistics and Stochastic Modelling 207

It is a straightforward procedure to show that this Because Vy is independent of Qy (Equation 7.143), the basic model indeed produces streamflows with the speci- second term on the right-hand side of Equation 7.148 fied moments, as demonstrated below. vanishes. Hence the unconditional variance of Q satisfies Using the fact that E[Vy] 0, the conditional mean of µ 2 ρ2 µ 2 σ 2 ρ2 E[(Qy1 ) ] E[(Qy ) ] (1 ) Q given that Q equals q is y 1 y y (7.149) µρ µ σ ρ2 E[Qyy+1 | qqV]E[ ( y ) y1 ] Assuming that Qy1 and Qy have the same variance yields µρ µ ()qy (7.144) E[(Q µ)2] σ 2 (7.150) 2 so that the unconditional variance is σ 2, as required. Since E[Vy] Var[Vy] 1, the conditional variance of Again, if one does not want to assume that Q and Q Qy1 is y 1 y have the same variance, a recursive argument can be 2 σ 2 Var[|]QQQyy111qqq E[{ y E[|]}|] yyy adopted to demonstrate that if Q1 has variance , then σ 2 µρ µ Qy for y 1 has variance . E[{ (qy ) The covariance of consecutive flows is another σρµρµ2 2 Vqyy1 [ ( )]} important issue. After all, the whole idea of building these σρσ2 22 ρ2 time-series models is to describe the year-to-year correla- E[Vy 1 ] (1 ) tion of the flows. Using Equation 7.141 one can show that (7.145) the covariance of consecutive flows must be Thus, this model produces flows with the correct condi- µµρ µ tional mean and variance. E[(QQyy1 )( )] E{[ ( Q y ) σρ− 2 µ To compute the unconditional mean of Qy1 one first Vyy1 ](Q )} takes the expectation of both sides of Equation 7.141 to ρ µρσ22 E[[(Qy ) ] obtain (7.151)

µρ µ σ ρ2 µ E[QQyyy1 ] (E[ ] ) E[V ] 1 where E[(Qy )Vy] 0 because Vy and Qy are inde- (7.146) pendent (Equation 7.143). Over a longer time scale, another property of this model where E[Vy] 0. If the distribution of streamflows is is that the covariance of flows in year y and y k is independent of time so that for all y, E[Qy1] E[Qy] k 2 E[(Q µ)(Q µ)] ρ σ (7.152) E[Q], it is clear that (1 ρ) E[Q] (1 ρ) µ or y k y This equality can be proven by induction. It has already µ E[Q] (7.147) been shown for k 0 and 1. If it is true for k j 1, then µ Alternatively, if Qy for y 1 has mean , then Equation µµρ µ µ E[(QQyj )( y )] E{[ ( Q y j 1 ) 7.146 indicates that Q2 will have mean . Thus repeated σρ2 µ application of the Equation 7.146 would demonstrate Vy j 1 1 ](Q y )} µ ρµ µ that all Qy for y 1 have mean . E[(QQyy )](j 1 )] The unconditional variance of the annual flows can j1 22j ρρ[] σ ρσ (7.153) be derived by squaring both sides of Equation 7.141 to µ obtain where E[(Qy )Vyj1] 0 for j 1. Hence Equation 7.152 is true for any value of k. µρµσρ2 22 E[(QQyyy1 ) ] E[{ ( ) V 1 }] It is important to note that the results in Equations ρµ2 2 ρσ ρ2 7.144 to 7.152 do not depend on the assumption that the E[(Qy ) ] 2 1 random variables Qy and Vy are normally distributed. µσρ 222 E[(Q yy )VV ] (1 )E[] yThese relationships apply to all autoregressive Markov (7.148) processes of the form in Equation 7.141 regardless of the wrm_ch07.qxd 8/31/2005 12:18 PM Page 208

208 Water Resources Systems Planning and Management

distributions of Qy and Vy. However, if the flow Qy in year The alternative and generally preferred method is to y 1 is normally distributed with mean µ and variance generate normal random variables and then transform σ 2 , and if the Vy are independent normally distributed these variates to streamflows with the desired marginal random variables with mean zero and unit variance, then distribution. Common choices for the distribution of the generated Qy for y 1 will also be normally distrib- streamflows are the two-parameter and three-parameter µ σ 2 uted with mean and variance . The next section lognormal distributions or a gamma distribution. If Qy is considers how this and other models can be used to a lognormally distributed random variable, then generate streamflows that have other than a normal τ Qy exp(Xy) (7.156) distribution. where Xy is a normal random variable. When the lower τ bound is zero, Qy has a two-parameter lognormal 8.4. Reproducing the Marginal Distribution distribution. Equation 7.156 transforms the normal variates X into lognormally distributed streamflows. The Most models for generating stochastic processes deal y transformation is easily inverted to obtain directly with normally distributed random variables. τ τ Unfortunately, flows are not always adequately described Xy ln(Qy ) for Qy (7.157) by the normal distribution. In fact, streamflows and many where Q must be greater than its lower bound τ. other hydrological data cannot really be normally distrib- y The mean, variance, skewness of X and Q are related uted because of the impossibility of negative values. In y y by the formulas (Matalas, 1967) general, distributions of hydrological data are positively skewed, having a lower bound near zero and, for practi-   µτ µ 1 σ2 Q exp  cal purposes, an unbounded right-hand tail. Thus they  X 2 X  look like the gamma or lognormal distribution illustrated σµσσ222 exp()() 2[] exp 1 in Figures 7.3 and 7.4. Q X XX exp()33σσ22 exp () 2 The asymmetry of a distribution is often measured by γ XX Q σ 232 / its coefficient of skewness. In some streamflow models, []exp()X 1 the skew of the random elements V is adjusted so that φφ3 φ σ 2 1/2 y 3whereexp[ ( X) 1] (7.158) the models generate flows with the desired mean, y u variance and skew coefficient. For the autoregressive If normal variates Xs and Xy are used to generate lognor- y u Markov model for annual flows mally distributed streamflows Qs and Qy at sites s and u, ρ then the lag-k correlation of the Qy’s, denoted Q(k; s, u), µρµσρ3 − 2 3 E[(QQyy1 ) ] E[] ( ) Vy 1 is determined by the lag-k correlation of the X variables, ρµ3 3 denoted ρ (k; s, u), and their variances σ 2(s) and σ 2(u), E[(Qy ) ] X x x σρ3 2 3/2 3 where (1 ) E[]Vy (7.154) exp[ρσσ (ksu ; , ) ( s ) ( u )]1 so that ρ (;,ksu ) XXX Q σ 212 / σ 212 / {[expX (s ) ]}{1 exp[[]}X ()u 1 µ 3 ρ23/ 2 E[(Q ) ] (1 ) 3 γ E[]Vy (7.159) Q σ 3 ρ3 (7.155) 1 s The correlations of the Xy can be adjusted, at least in 3 By appropriate choice of the skew of Vy, E[Vy], the theory, to produce the observed correlations among the s desired skew coefficient of the annual flows can be Qy variates. However, more efficient estimates of the true s produced. This method has often been used to generate correlation of the Qy values are generally obtained by s flows that have approximately a gamma distribution by transforming the historical flows qy into their normal s s τ using Vy’s with a gamma distribution and the required equivalentxqy ln(y ) and using the historical s ρ skew. The resulting approximation is not always adequate correlations of these Xy values as estimators of X(k; s, u) (Lettenmaier and Burges, 1977a). (Stedinger, 1981). wrm_ch07.qxd 8/31/2005 12:18 PM Page 209

Concepts in Probability, Statistics and Stochastic Modelling 209

precipitation and temperature include such tendencies by 2.5 parameters employing a Markov chain description of the occurrence µ µ Q = 1.00 X = 0.030 σ σ of wet and dry days (Wilks, 1998). Q = 0.25 X = 0.246 ρρ Q = 0.50 X = 0.508 2.0 90% probability 8.5. Multivariate Models range for Qy+1 If long concurrent streamflow records can be constructed 1.5

1 at the several sites at which synthetic streamflows are

y + desired, then ideally a general multi-site streamflow model

Q 1.0 E could be employed. O’Connell (1977), Ledolter (1978), 1 [Q y + 1 |q y ] Salas et al. (1980) and Salas (1993) discuss multivariate

y + models and parameter estimation. Unfortunately, identifi- 0.5 cation of the most appropriate model structure is very

flow in year difficult for general multivariate models.

0.0 This section illustrates how the basic univariate 0.0 0.5 1.0 1.5 2.0 2.5 annual-flow model in Section 8.3 can be generalized to the y flow in year Q y multivariate case. This exercise reveals how multivariate

E041011b models can be constructed to reproduce specified variances Figure 7.14. Conditional mean of Qy1 given Qy qy and 90% and covariances of the flow vectors of interest, or some probability range for the value of Qy1. transformation of those values. This multi-site generaliza- tion of the annual AR(1) or autoregressive Markov model follows the approach taken by Matalas and Wallis (1976). Some insight into the effect of this logarithmic This general approach can be further extended to generate transformation can be gained by considering the resulting multi-site/multi-season modelling procedures, as is done in model for annual flows at a single site. If the normal Section 8.6, employing what have been called disaggrega- variates follow the simple autoregressive Markov model tion models. However, while the size of the model matrices

2 and vectors increases, the model is fundamentally the same XXV µρ ( µ) σ1 ρ y 1 X yyX X (7.160) from a mathematical viewpoint. Hence this section starts with the simpler case of an annual flow model. then the corresponding Q follow the model (Matalas, y For simplicity of presentation and clarity, vector notation 1967) is employed. Let Z ΘZ1,…, ZnΙT be the column vector of τ µ ρ τ ρx y y y Qy1 Dy{exp[ x(1 x)]}(Qy ) (7.161) transformed zero-mean annual flows at sites s 1, 2, … , n, where so that ρσ2 1/2 s DVy exp[(1X )X y ] (7.162) E[Z y ] 0 (7.163) Θ 1 nΙT The conditional mean and standard deviation of Qy1 In addition, let Vy Vy ,…,Vy be a column vector τ ρx s given that Qy qy now depends on (qy ) . Because of standard-normal random variables, where Vy is r the conditional mean of Qy1 is no longer a linear function independent of Vw for (r, w) (s, y) and independent r of qy, (as shown in Figure 7.14), the streamflows are said of past flows Zw where y w. The assumption that to exhibit differential persistence: low flows are now more the variables have zero mean implicitly suggests that the likely to follow low flows than high flows are to follow mean value has already been subtracted from all the high flows. This is a property often attributed to real variables. This makes the notation simpler and eliminates streamflow distributions. Models can be constructed the need to include a constant term in the models. With to capture the relative persistence of wet and dry all the random variables having zero mean, one can focus periods (Matalas and Wallis, 1976; Salas, 1993; Thyer on reproducing the variances and covariances of the and Kuczera, 2000). Many weather generators for vectors included in a model. wrm_ch07.qxd 8/31/2005 12:18 PM Page 210

210 Water Resources Systems Planning and Management

A sequence of synthetic flows can be generated by the scalar case where E[aZy b] aE[Zy] b for fixed model constants a and b. Substituting S and S for the appropriate expectations Z AZ BV (7.164) 0 1 y 1 y y in Equation 7.169 yields where A and B are (n × n) matrices whose elements are S AS or A S S–1 (7.171) chosen to reproduce the lag 0 and lag 1 cross-covariances 1 0 1 0 of the flows at each site. The lag 0 and lag 1 covariances A relationship to determine the matrix B is obtained by and cross-covariances can most economically be manipu- multiplying both sides of Equation 7.164 by its own

lated by use of the two matrices S0 and S1. The lag-zero transpose (this is equivalent to squaring both sides of covariance matrix, denoted S0, is defined as the scalar equation a b) and taking expectations to obtain T S0 E[]ZZy y (7.165) T T T T T E[Zy1 Zy1] E[AZyZyA ] E[AZyVyB ] T T T and has elements E[BVyZyA ] E[BVyVyB ] (7.172) i j S0 (ij , ) E[ZZyy] (7.166) The second and third terms on the right-hand side of Equation 7.172 vanish because the components of Zy and T The lag-one covariance matrix, denoted S1, is defined as Vy are independent and have zero mean. E[VyVy] equals the identity matrix because the components of Vy are S E[Zy1 ZT ] 1 y (7.167) independently distributed with unit variance. Thus T T and has elements S0 AS0A BB (7.173) i j Solving for the B matrix, one finds that it should satisfy S(1 i, j ) E[ZZy1 y] (7.168) the quadratic equation T T –1 T The covariances do not depend on y because the stream- BB S0 AS0A S0 S1 S0 S1 (7.174) flows are assumed to be stationary. The last equation results from substitution of the rela- Matrix S contains the lag 1 covariances and lag 1 cross- 1 tionship for A given in Equation 7.171 and the fact that covariances. S is symmetric because the cross-covariance 0 S is symmetric; hence, S –1 is symmetric. S (i, j) equals S (j, i). In general, S is not symmetric. 0 0 0 0 1 It should not be too surprising that the elements of The variance–covariance equations that define the B are not uniquely determined by Equation 7.174. The values of A and B in terms of S and S are obtained by 0 1 components of the random vector V may be combined in manipulations of Equation 7.164. Multiplying both sides y many ways to produce the desired covariances as long as of that equation by ZT and taking expectations yields y B satisfies Equation 7.174. A lower triangular matrix that TTT satisfies Equation 7.174 can be calculated by Cholesky E[ZZy1 y]]] E[AB ZZ yy E[ VZ yy (7.169) decomposition (Young, 1968; Press et al., 1986). The second term on the right-hand side vanishes because Matalas and Wallis (1976) call Equation 7.164 the lag-

the components of Zy and Vy are independent. Now the 1 model. They do not call it the Markov model because T first term in Equation 7.169, E[AZy Zy], is a matrix whose the streamflows at individual sites do not have the (i, j)th element equals covariances of an autoregressive Markov process given in Equation 7.152. They suggest an alternative model  n  n k j k j for what they call the Markov model. It has the same EE∑∑aikZZy y  aik[] ZZ y y (7.170) k11 k structure as the lag-1 model except it does not preserve the lag-1 cross-covariances. By relaxing this requirement, The matrix with these elements is the same as the matrix they obtain a simpler model with fewer parameters ΄ T΅ AE Zy Zy . that generates flows that have covariances of an autore- Hence, A – the matrix of constants – can be pulled gressive Markov process at each site. In their Markov through the expectation operator just as is done in the model, the new A matrix is simply a diagonal matrix, wrm_ch07.qxd 8/31/2005 12:18 PM Page 211

Concepts in Probability, Statistics and Stochastic Modelling 211

whose diagonal elements are the lag-1 correlations of generation of synthetic flows that reproduce statistics flows at each site: both at the annual and at the seasonal level. Subsequent A diag[ρ(1; i, i)] (7.175) improvements and variations are described by Stedinger and Vogel (1984), Maheepala and Perera (1996), ρ where (1; i, i) is the lag-one correlation of flows at site i. Koutsoyiannis and Manetas (1996) and Tarboton et al. The corresponding B matrix depends on the new A (1998). matrix and S0, where as before Disaggregation models can be used for either multi- T T BB S0 AS0A (7.176) season single-site or multi-site streamflow generation. They represent a very flexible modelling framework for The idea of fitting time-series models to each site dealing with different time or spatial scales. Annual flows separately and then correlating the innovations in those for the several sites in question or the aggregate total separate models to reproduce the cross-correlation annual flow at several sites can be the input to the model between the series is a very general and useful modelling (Grygier and Stedinger, 1988). These must be generated idea that has seen a number of applications with different by another model, such as those discussed in the previous time-series models (Matalas and Wallis, 1976; Stedinger sections. These annual flows or aggregated annual flows et al., 1985; Camacho et al., 1985; Salas, 1993). are then disaggregated to seasonal values. Θ 1 NΙT Let Zy Zy,… , Zy be the column vector of N trans- 8.6. Multi-Season, Multi-Site Models formed normally distributed annual or aggregate annual Θ 1 In most studies of surface water systems it is necessary to flows for N separate sites or basins. Next, let Xy X1y,…, 1 2 2 n n ΙT consider the variations of flows within each year. XTy, X1y,…, XTy,…, X1y,…, XTy be the column vector of s Streamflows in most areas have within-year variations, nT transformed normally distributed seasonal flows Xty for exhibiting wet and dry periods. Similarly, water demands season t, year y, and site s 1, … , n. s for irrigation, municipal and industrial uses also vary, and Assuming that the annual and seasonal series, Zy and s the variations in demand are generally out of phase with Xty, have zero mean (after the appropriate transforma- the variation in within-year flows; more water is usually tion), the basic disaggregation model is desired when streamflows are low, and less is desired Xy AZy BVy (7.177) when flows are high. This increases the stress on water delivery systems and makes it all the more important that where Vy is a vector of nT independent standard normal time-series models of streamflows, precipitation and random variables, and A and B are, respectively, nT × N other hydrological variables correctly reproduce the sea- and nT × nT matrices. One selects values of the elements sonality of hydrological processes. of A and B to reproduce the observed correlations among This section discusses two approaches to generating the elements of Xy and between the elements of Xy and Zy. within-year flows. The first approach is based on the disag- Alternatively, one could attempt to reproduce the gregation of annual flows produced by an annual flow observed correlations of the untransformed flows as generator to seasonal flows. Thus the method allows for opposed to the transformed flows, although this is not reproduction of both the annual and seasonal characteris- always possible (Hoshi et al., 1978) and often produces tics of streamflow series. The second approach generates poorer estimates of the actual correlations of the flows seasonal flows in a sequential manner, as was done for the (Stedinger, 1981). generation of annual flows. Thus the models are a direct The values of A and B are determined using the T T T generalization of the annual flow models already discussed. matrices Szz E[ZyZy ], Sxx E[XyXy ], Sxz E[XyZy ], and T Szx E[ZyXy ] where Szz was called S0 earlier. Clearly, T 8.6.1. Disaggregation Models Sxz Szx. If Sxz is to be reproduced, then by multiplying T Equation 7.177 on the right by Zy and taking expectations, The disaggregation model proposed by Valencia and one sees that A must satisfy Schaake (1973) and extended by Mejia and Rousselle T T (1976) and Tao and Delleur (1976) allows for the E[XZyy]] E[A ZZ yy (7.178) wrm_ch07.qxd 8/31/2005 12:18 PM Page 212

212 Water Resources Systems Planning and Management

or The disaggregation model has substantial data require- ments. When the dimension of Z is n and the dimension Sxz ASzz (7.179) y of the generated vector Xy is m, the A matrix has mn Solving for the coefficient matrix A one obtains elements. The lower diagonal B matrix and the symmetric 1 A SSxz zz (7.180) Sxx matrix, upon which it depends, each have m(m 1)/2 nonzero or non-redundant elements. For example, when To obtain an equation that determines the required values disaggregating two aggregate annual flow series to in the matrix B, one can multiply both sides of Equation monthly flows at five sites, n 2 and m 12 5 60; 7.177 by their transpose and take expectations to obtain thus, A has 120 elements while B and Sxx each have 1,830 T T Sxx ASzzA BB (7.181) nonzero or non-redundant parameters. As the number of sites included in the disaggregation increases, the size Thus, to reproduce the covariance matrix Sxx, the B matrix must satisfy of Sxx and B increases rapidly. Very quickly the model can become overly parameterized, and there will be BBT S AS AT (7.182) xx zz insufficient data to estimate all parameters (Grygier and Equations 7.180 and 7.182 for determining A and B are Stedinger, 1988). completely analogous to Equations 7.171 and 7.174 for the In particular, one can think of Equation 7.177 as A and B matrices of the lag 1 models developed earlier. a series of linear models generating each monthly flow T k However, for the disaggregation model as formulated, BB , Xty for k 1, t 1, … , 12; k 2, t 1, … , 12 up to and hence the matrix B, can actually be singular or nearly k n, t 1, … , 12 that reproduces the correlation of k k so (Valencia and Schaake, 1973). This occurs because the each Xty with all n annual flows, Zy, and all previously real seasonal flows sum to the observed annual flows. Thus generated monthly flows. Then when one gets to the last given the annual flow at a site and all but one (T 1) of the flow in the last month, the model will be attempting to seasonal flows, the value of the unspecified seasonal flow reproduce n (12n 1) 13n 1 annual to monthly can be determined by subtraction. and cross-correlations. Because the model implicitly s If the seasonal variables Xty correspond to non-linear includes a constant, this means one needs k* 13n s T transformations of the actual flows Qty, then BB is gener- years of data to obtain a unique solution for this ally sufficiently non-singular that a B matrix can be critical equation. For n 3, k* 39. One could say obtained by Cholesky decomposition. On the other hand, that with a record length of forty years, there would s when the model is used to generate values of Xty to be be only one degree of freedom left in the residual s transformed into synthetic flows Qty, the constraint that model error variance described by B. That would be these seasonal flows should sum to the given value of the unsatisfactory. annual flow is lost. Thus the generated annual flows (equal When flows at many sites or in many seasons are to the sums of the seasonal flows) will deviate from the required, the size of the disaggregation model can be values that were to have been the annual flows. Some reduced by disaggregation of the flows in stages. Such distortion of the specified distribution of the annual flows condensed models do not explicitly reproduce every results. This small distortion can be ignored, or each year’s season-to-season correlation (Lane, 1979; Stedinger and seasonal flows can be scaled so that their sum equals the Vogel, 1984; Gryier and Stedinger, 1988; Koutsoyiannis specified value of the annual flow (Grygier and Stedinger, and Manetas, 1996), Nor do they attempt to reproduce 1988). The latter approach eliminates the distortion in the the cross-correlations among all the flow variates at the distribution of the generated annual flows by distorting the same site within a year (Lane, 1979; Stedinger et al., distribution of the generated seasonal flows. Koutsoyiannis 1985). Contemporaneous models, like the Markov and Manetas (1996) improve upon the simple scaling model developed earlier in Section 8.5, are models algorithm by including a step that rejects candidate vectors developed for individual sites whose innovation vectors

Xy if the required adjustment is too large, and instead gen- Vy have the needed cross-correlations to reproduce the erates another vector Xy. This reduces the distortion in the cross-correlations of the concurrent flows (Camacho monthly flows that results from the adjustment step. et al., 1985), as was done in Equation 7.176. Grygier and wrm_ch07.qxd 8/31/2005 12:18 PM Page 213

Concepts in Probability, Statistics and Stochastic Modelling 213

Stedinger (1991) describe how this can be done for a For an aggregation approach to be attractive, it is condensed disaggregation model without generating necessary to use a model with greater persisitence inconsistencies. than the Thomas–Fiering model. A general class of time-series models that allow reproduction of different correlation structures are the Box–Jenkins Autoregressive- 8.6.2. Aggregation Models Moving average models (Box et al., 1994). These models One can start with annual or seasonal flows, and break are presented by the notation ARMA(p,q) for a model them down into flows in shorter periods representing which depends on p previous flows, and q extra

months or weeks. Alternatively one can start with a model innovations Vty. For example, Equation 7.141 would be that describes the shortest time step flows. This latter called an AR(1) or AR(1,0) model. A simple ARMA(1,1) approach has been referred to as aggregation to distin- model is guish it from disaggregation. ZZVV φθ⋅⋅+ One method for generating multi-season flow tttt11 11 (7.184) sequences is to convert the time series of seasonal flows The correlations of this model have the values Qty into a homogeneous sequence of normally distributed ρ (1 θ φ )(φ θ )/(1 θ 2 2φ θ ) (7.185) zero-mean unit-variance random variables Zty. These can 1 1 1 1 1 1 1 1 then be modelled by an extension of the annual flow gen- for the first lag. For i 1 erators that have already been discussed. This transfor- ρφρ i−1 mation can be accomplished by fitting a reasonable i 1 (7.186) marginal distribution to the flows in each season so as to For φ values near and 0 θ φ , the autocorrelations be able to convert the observed flows qs into their trans- 1 1 ty ρ can decay much slower than those of the standard formed counterparts Zs , and vice versa. Particularly when k ty AR(1) model. shorter streamflow records are available, these simple The correlation function ρ of general ARMA(p,q) approaches may yield a reasonable model of some streams k model, for some studies. However, they implicitly assume that the standardized series is stationary, in the sense that the p q ∑∑φθ⋅⋅ season-to-season correlations of the flows do not depend ZZVVtitit111jt1 j (7.187) i1 j1 on the seasons in question. This assumption seems highly questionable. is a complex function that must satisfy a number of This theoretical difficulty with the standardized series conditions to ensure the resultant model is stationary and can be overcome by introducing a separate streamflow invertible (Box et al., 1994). model for each month. For example, the classic ARMA(p,q) models have been extended to describe Thomas–Fiering model (Thomas and Fiering, 1970) of seasonal flow series by having their coefficients depend monthly flows may be written upon the season – these are called periodic autoregressive- moving average models, or PARMA. Salas and Obeysekera ββ2 ZZty1, tty1 tty V (7.183) (1992), Salas and Fernandez (1993), and Claps et al., (1993) discuss the conceptual basis of such stochastic

where the Zty’s are standard normal random variables streamflow models. For example, Salas and Obeysekera β corresponding to the streamflow in season t of year y, t (1992) found that low-order PARMA models, such as is the season-to-season correlation of the standardized a PARMA(2,1), arise from reasonable conceptual repre-

flows, and Vty are independent standard normal random sentations of persistence in rainfall, runoff and ground- variables. The problem with this model is that it often fails water recharge and release. Claps et al. (1993) observe to reproduce the correlation among non-consecutive that the PARMA(2,2) model, which may be needed if months during a year and thus misrepresents the risk one wants to preserve year-to-year correlation, poses a of multi-month and multi-year droughts (Hoshi et al., parameter estimation challenge that is almost unmanage- 1978). able (see also Rasmussen et al., 1996). The PARMA (1,1) wrm_ch07.qxd 8/31/2005 12:18 PM Page 214

214 Water Resources Systems Planning and Management

model is more practical and easy to extend to the multi- number of runs to be made must also be determined. variate case (Hirsch, 1979; Stedinger et al., 1985; Salas, These considerations are discussed in more detail by 1993; Rasmussen et al., 1996). Experience has shown Fishman (2001) and in other books on simulation. The that PARMA(1,1) models do a better job of reproducing use of stochastic simulation and the analysis of the output the correlation of seasonal flows beyond lag 1 than does a of such models are introduced here primarily in the Thomas–Fiering PAR(1,0) model (see for example, context of an example to illustrate what goes into a Bartolini and Salas, 1993). stochastic simulation model and how one can deal with the information that is generated.

9. Stochastic Simulation 9.1. Generating Random Variables

This section introduces stochastic simulation. Much more Included in any stochastic simulation model is some detail on simulation is contained in later chapters. As provision for the generation of sequences of random discussed in Chapter 3, simulation is a flexible and widely numbers that represent particular values of events such as used tool for the analysis of complex water resources rainfall, streamflows or floods. To generate a sequence of systems. Simulation is trial and error. One must define the values for a random variable, the probability distribution system being simulated, both its design and operating for the random variable must be specified. Historical data policy, and then simulate it to see how it works. If the and an understanding of the physical processes are used purpose is to find the best design and operating policy, to select appropriate distributions and to estimate their many such alternatives must be simulated and their results parameters (as discussed in Section 7.2). must be compared. When the number of alternatives to Most computers have algorithms for generating simulate becomes too large for the time and money avail- random numbers uniformly distributed (equally likely) able for such analyses, some kind of preliminary screening, between zero and one. This uniform distribution of perhaps using optimization models, may be justified. This random numbers is defined by its cdf and pdf; use of optimization for preliminary screening – that is, for F (u) 0 for u 0, eliminating alternatives prior to a more detailed simulation U – is discussed in Chapters 3, 4 and later chapters. u for 0 u 1 As with optimization models, simulation models and may be deterministic or stochastic. One of the most useful 1ifu 1 (7.188) tools in water resources systems planning is stochastic so that simulation. While optimization can be used to help define reasonable design and operating policy alternatives fU(u) 1if0 u 1 and 0 otherwise (7.189) to be simulated, simulations can better reveal how each These uniform random variables can then be transformed such alternative will perform. Stochastic simulation of com- into random variables with any desired distribution. If

plex water resources systems on digital computers provides FQ(qt) is the cumulative distribution function of a random planners with a way to define the probability distributions variable Qt in period t, then Qt can be generated using the of multiple performance indices of those systems. inverse of the distribution. When simulating any system, the modeller designs Q F –1[U ] (7.190) an experiment. Initial flow, storage and water quality t Q t conditions must be specified if these are being simulated. Here Ut is the uniform random number used to generate For example, reservoirs can start full, empty or at random Qt. This is illustrated in Figure 7.15. representative conditions. The modeller also determines Analytical expressions for the inverse of many what data are to be collected on system performance and distributions, such as the normal distribution, are operation, and how they are to be summarized. The not known, so special algorithms are employed to length of time the simulation is to be run must be efficiently generate deviates with these distributions specified and, in the case of stochastic simulations, the (Fishman, 2001). wrm_ch07.qxd 8/31/2005 12:18 PM Page 215

Concepts in Probability, Statistics and Stochastic Modelling 215

1 year water E021101n demand ( x 107 m3 /yr) ut = FQ (qt ) ut

1 3.0 2 3.2 -1 3 3.4 FQ ()ut 4 3.6 F (q ) Q t 5 3.8 6 4.0 0 7 4.1 0 qt particular value of Qt 8 4.2

E020527n 9 4.3 Figure 7.15. The probability distribution of a random variable 10 4.3 can be inverted to produce values of the random variable. 11 4.4 12 4.4 9.2. River Basin Simulation 13 4.4 An example will demonstrate the use of stochastic 14 4.4 simulation in the design and analysis of water resources 15 4.5 systems. Assume that farmers in a particular valley have 16 4.5 been plagued by frequent shortages of irrigation water. 17 4.5 They currently draw water from an unregulated river to 18 4.5 which they have water rights. A government agency has 19 4.5 proposed construction of a moderate-size dam on the 20 4.5 river upstream of the points where the farmers withdraw water. The dam would be used to increase the quantity and reliability of irrigation water available to the farmers Table 7.12. Projected water demand for irrigation water. during the summer growing season. After preliminary planning, a reservoir with an active capacity of 4 107 m3 has been proposed for a natural dam site. It is anticipated that, because of the increased reliability and availability of irrigation water, the quantity Using the techniques discussed in the previous of water desired will grow from an initial level of 3 section, a Thomas–Fiering model is used to generate 107 m3/yr after construction of the dam to 4 107 m3/yr twenty-five lognormally distributed synthetic streamflow within six years. After that, demand will grow more sequences. The statistical characteristics of the synthetic slowly to 4.5 107 m3/yr – the estimated maximum flows are those listed in Table 7.14. Use of only the reliable yield. The projected demand for summer irriga- forty-five-year historic flow sequence would not allow tion water is shown in Table 7.12. examination of the system’s performance over the large A simulation study can evaluate how the system will range of streamflow sequences, which could occur during be expected to perform over a twenty-year planning the twenty-year planning period. Jointly, the synthetic period. Table 7.13 contains statistics that describe the sequences should be a description of the range of inflows hydrology at the dam site. The estimated moments are that the system might experience. A larger number of computed from the forty-five-year historic record. sequences could be generated. wrm_ch07.qxd 8/31/2005 12:18 PM Page 216

216 Water Resources Systems Planning and Management

Table 7.13. Characteristics of the river flow.

E021101p

winter summer annual mean flow 4.0 2.5 6.5 x 10 7 m3

standard deviation 1.5 1.0 2.3 x 10 7 m3

correlation of flows: winter with following summer 0.65 summer with following winter 0.60

9.3. The Simulation Model

E021029c The simulation model is composed primarily of continu- K K = - 1 ity constraints and the proposed operating policy. y+ Q 2y S 1, + The volume of water stored in the reservoir at the S 2y = R 2y beginning of seasons 1 (winter) and 2 (summer) in year reservoir release reservoir fills y are denoted by S1y and S2y. The reservoir’s winter = O 1 y+ 2y , Q operating policy is to store as much of the winter’s inflow S 1 + S 2y 1 Q1y as possible. The winter release R1y is determined by reservoir empties = the rule R 2y 1

K reservoir capacity water available S + Q SKSRKQQif 2y 2y  11yy 11 yymin R1y RKSRminif 11yyQ min 0  Figure 7.16. Summer reservoir operating policy. The shaded S11yyQ y otherwise (7.191) area denotes the feasible region of reservoir releases. 7 3 where K is the reservoir capacity of 4 10 m and Rmin is 0.50 107 m3, the minimum release to be made if possible. The volume of water in storage at the beginning 9.4. Simulation of the Basin of the year’s summer season is The question to be addressed by this simulation study S2y S1y Q1y R1y (7.192) is how well the reservoir will meet the farmers’ water The summer release policy is to meet each year’s requirements. Three steps are involved in answering this projected demand or target release Dy, if possible, so question. First, one must define the performance criteria that or indices to be used to describe the system’s perform- ance. The appropriate indices will, of course, depend on R2y S2y Q2y K if S2y Q2y Dy K the problem at hand and the specific concerns of the users Dy if 0 S2y Q2y Dy K and managers of a water resources system. In this exam- ple of a reservoir-irrigation system, several indices will be S2y Q2y otherwise (7.193) used relating to the reliability with which target releases This operating policy is illustrated in Figure 7.16. are met and the severity of any shortages. The volume of water in storage at the beginning of the The second step is to simulate the proposed system to next winter season is evaluate the specified indices. For our reservoir-irrigation S1,y1 S2y Q2y R2y (7.194) system, the reservoir’s operation was simulated twenty-five wrm_ch07.qxd 8/31/2005 12:18 PM Page 217

Concepts in Probability, Statistics and Stochastic Modelling 217

times using the twenty-five synthetic streamflow sequences, the earlier years. Starting in years 14 and after, failures each twenty years in length. Each of the twenty simulated occurred more frequently because of the higher demand years consisted of first a winter and then a summer season. placed on the system. Thus one has a sense of how the At the beginning of the first winter season, the reservoir was reliability of the target releases changes over time. taken to be empty (S1y 0 for y 1) because construction would just have been completed. The target release or 9.5. Interpreting Simulation Output demand for water in each year is given in Table 7.13. The third and final step, after simulating the system, is Table 7.14 contains several summary statistics of the to interpret the resulting information so as to gain an twenty-five simulations. Column 2 of the table contains understanding of how the system might perform both the average failure frequency in each simulation, which with the proposed design and operating policy and with equals the number of years the target release was not met modifications in either the system’s design or its operating divided by twenty, the number of years simulated. At the policy. To see how this may be done, consider the opera- bottom of column 2 and the other columns are several tion of our example reservoir-irrigation system. statistics that summarize the twenty-five values of the

The reliability py of the target release in year y is the different performance indices. The sample estimates of the probability that the target release Dy is met or exceeded in mean and variance of each index are given as one way of that year: summarizing the distribution of the observations. Another py Pr[R2y Dy] (7.195) approach is specification of the sample median, the approximate inter-quartile range x x , and/or the The system’s reliability is a function of the target release (6) (20) range x x of the observations, where x is the ith D , the hydrology of the river, the size of the reservoir and (1) (25) (i) y largest observation. Either set of statistics could be used to the operating policy of the system. In this example, the describe the centre and spread of each index’s distribution. reliability also depends on the year in question. Figure Suppose that one is interested in the distribution of the 7.17 shows the total number of failures that occurred in system’s failure frequency or, equivalently, the reliability each year of the twenty-five simulations. In three of these, with which the target can be met. Table 7.14 reports that the reservoir did not contain sufficient water after the the mean failure rate for the twenty-five simulations is initial winter season to meet the demand the first sum- 0.084, implying that the average reliability over the mer. After year 1, few failures occur in years 2 through 9 twenty-year period is 1 0.084 0.916, or 92%. The because of the low demand. Surprisingly few failures median failure rate is 0.05, implying a median reliability occur in years 10 and 13, when demand has reached its of 95%. These are both reasonable estimates of the centre peak; this is because the reservoir was normally full at the of the distribution of the failure frequency. Note that the beginning of this period as a result of lower demand in actual failure frequency ranged from 0 (seven times) to 0.30. Thus the system’s reliability ranged from 100% to as low as 70%, 75% and 80% in runs 17, 8, and 11, number of failures in each year respectively. Obviously, the farmers are interested not 5 only in knowing the mean failure frequency but also the 4 range of failure frequencies they are likely to experience. 3 If one knew the form of the distribution function 2 of the failure frequency, one could use the mean and

1 standard deviation of the observations to determine an interval within which the observations would fall with 0 24 68 10 12 14 1618 20 some pre-specified probability. For example, if the year of simulation observations are normally distributed, there is a 90% E021029m probability that the index falls within the interval µ σ Figure 7.17. Number of failures in each year of twenty-five x 1.65 x. Thus, if the simulated failure rates are twenty-year simulations. normally distributed, then there is about a 90% probability wrm_ch07.qxd 8/31/2005 12:18 PM Page 218

218 Water Resources Systems Planning and Management

Table 7.14. Results of 25 twenty-year q simulations.

frequency total average E021101 of failure to meet: shortage deficit, simulation 80% of TS AD number, i target target x10 7 m3

1 0.10 0.0 1.25 0.14 2 0.15 0.05 1.97 0.17 3 0.10 0.05 1.79 0.20 4 0.10 0.05 1.67 0.22 5 0.05 0.0 0.21 0.05 6 0.0 0.0 0.00 0.00 7 0.15 0.05 1.29 0.10 8 0.25 0.10 4.75 0.21 9 0.0 0.0 0.00 0.00 10 0.10 0.0 0.34 0.04 11 0.20 0.0 1.80 0.11 12 0.05 0.05 1.28 0.43 13 0.05 0.0 0.53 0.12 14 0.10 0.0 0.88 0.11 15 0.15 0.05 1.99 0.15 16 0.05 0.0 0.23 0.05 17 0.30 0.05 2.68 0.10 18 0.10 0.0 0.76 0.08 19 0.0 0.0 0.00 0.00 20 0.0 0.0 0.00 0.00 21 0.0 0.0 0.00 0.00 22 0.05 0.05 1.47 0.33 23 0.0 0.0 0.00 0.00 24 0.0 0.0 0.00 0.00 25 0.05 0.0 0.19 0.04 meanx 0.084 0.020 1.00 0.106 standard deviation of values; sx 0.081 0.029 1.13 0.110 median0.05 0.00 0.76 0.10 approximate interquartile range; x(6) -x(20) 0.0 - 0.15 0.0 - 0.05 0.0 - 1.79 0.0 - 0.17 range; x()1 -x(25) 0.0 - 0.30 0.0 - 0.10 0.0 - 4.75 0.0 - 0.43 wrm_ch07.qxd 8/31/2005 12:18 PM Page 219

Concepts in Probability, Statistics and Stochastic Modelling 219

that the actual failure rate observed in any simulation is the approximation to a normal distribution may be _ within the interval x 1.65s . In our case this interval sufficiently good to obtain a rough estimate of how x _ would be [0.084 1.65(0.081), 0.084 1.65(0.081)] close the average frequency of failure x is likely to be to µ α µ [ 0.050, 0.218]. Clearly, the failure rate cannot be less x. A 100(1 2 )% confidence interval for x is, approx- than zero, so this interval makes little sense in our example. imately, A more reasonable approach to describing the sx µ sx distribution of a performance index whose probability dis- xtααx xt n n tribution function is not known is to use the observations or themselves. If the observations are of a continuous random     variable, the interval x x provides a reasonable 0. 081 µ 0. 081 (i) (n 1 i) 0.084 ttαα  x 0. 084   (7.198) estimate of an interval within which the random variable  25   225  falls with probability If α 0.05, then using a normal distribution tα 1.65 ni1 i ni12 and Equation 7.118 becomes 0.057 µ 0.11. P (7.196) x n 11n n 1 Hence, based on the simulation output, one can be about 90% sure that the true mean failure frequency lies In our example, the range x(1) x(25) of the twenty-five between 5.7% and 11%. This corresponds to a reliability observations is an estimate of an interval in which a of between 89% and 94%. By performing additional continuous random variable falls with probability simulations to increase the size of n, the width of this (25 1 2)/(25 1) 92%, while x(6) x(20) corre- confidence interval can be decreased. However, this sponds to probability (25 1 2 6)/(25 1) 54%. increase in accuracy may be an illusion because the Table 7.14 reports that, for the failure frequency, uncertainty in the parameters of the streamflow model x(1) x(25) equals 0 0.30, while x(6) x(20) equals 0 0.15. has not been incorporated into the analysis. Reflection on how the failure frequencies are calculated Failure frequency or system reliability describes only reminds us that the failure frequency can only take on one dimension of the system’s performance. Table 7.14 the discrete, non-negative values 0, 1/20, 2/20, … , contains additional information on the system’s perform- 20/20. Thus, the random variable X cannot be less than ance related to the severity of shortages. Column 3 lists zero. Hence, if the lower endpoint of an interval is zero, the frequencies with which the shortage exceeded 20% of as is the case here, then 0 x(k) is an estimate of an inter- that year’s demand. This occurred in approximately 2% val within which the random variable falls with a proba- of the years, or in 24% of the years in which a failure bility of at least k/(n 1). For k equal to 20 and 25, the occurred. Taking another point of view, failures in excess corresponding probabilities are 77% and 96%. of 20% of demand occurred in nine out of twenty-five, or Often, the analysis of a simulated system’s perform- in 36% of the simulation runs. ance centres on the average value of performance indices, Columns 4 and 5 of Table 7.14 contain two other such as the failure rate. It is important to know the indices that pertain to the severity of the failures. The total accuracy with which the mean value of an index approx- shortfall, TS, in Column 4 is calculated as the sum of the imates the true mean. This is done by the construction of positive differences between the demand and the release confidence intervals. A confidence interval is an interval in the summer season over the twenty-year period. that will contain the unknown value of a parameter with ∑ a specified probability. Confidence intervals for a mean TS [D2y R2y] are constructed using the t statistic, y where x µ x [Q] Q if Q 0; 0 otherwise (7.199) t (7.197) snx / The total shortfall equals the total amount by which the which, for large n, has approximately a standard normal target release is not met in years in which shortages distribution. Certainly, n 25 is not very large, but occur. wrm_ch07.qxd 8/31/2005 12:18 PM Page 220

220 Water Resources Systems Planning and Management

Related to the total shortfall is the average deficit. The to have the expected effect on the system’s operation. With deficit is defined as the shortfall in any year divided by the new policy, only six severe shortages in excess of 20% the target release in that year. The average deficit, AD, is of demand occur in the twenty-five twenty-year simula- tions, as opposed to ten such shortages with the original 20 DR  policy. In addition, these severe shortages are all less severe 1 ∑  22yy AD (7.200) than the corresponding shortages that occur with the same m y=1 D2 y streamflow sequence when the original policy is followed. where m is the number of failures (deficits) or nonzero The decrease in the severity of shortages is obtained at terms in the sum. a price. The overall failure frequency has increased from Both the total shortfall and the average deficit measure__ 8.4% to 14.2%. However, the latter value is misleading the severity of shortages. The mean total shortfall TS, because in fourteen of the twenty-five simulations, a equal to 1.00 for the twenty-five simulation runs, is a dif- failure occurs in the first simulation year with the new ficult number to interpret. While no shortage occurred in policy, whereas only three failures occur with the original seven runs, the total shortage was 4.7 in run 8, in which policy. Of course, these first-year failures occur because the shortfall in two different years exceeded 20% of the the reservoir starts empty at the beginning of the first target. The median of the total shortage values, equal to winter and often does not fill that season. Ignoring these 0.76, is an easier number to interpret in that one knows first-year failures, the failure rates with the two policies that half the time the total shortage was greater and half over the subsequent nineteen years are 8.2% and 12.0%. the time less than this value. __ Thus the frequency of failures in excess of 20% of The mean average deficit AD is 0.106, or 11%. demand is decreased from 2.0% to 1.2% by increasing the However, this mean includes an average deficit of zero in frequency of all failures after the first year from 8.2% the seven runs in which no shortages occurred. The to 12.0%. Reliability decreases, but so does vulnerability. average deficit in the eighteen runs in which shortages If the farmers are willing to put up with more frequent occurred is (11%)(25/18) 15%. The average deficit in minor shortages, then it appears that they can reduce individual simulations in which shortages occurred their risk of experiencing shortages of greater severity. ranges from 4% to 43%, with a median of 11.5%. The preceding discussion has ignored the statistical After examining the results reported in Table 7.14, the issue of whether the differences between the indices farmers might determine that the probability of a shortage obtained in the two simulation experiments are of suffi- exceeding 20% of a year’s target release is higher than cient statistical reliability to support the analysis. If care is they would like. They can deal with more frequent minor not taken, observed changes in a performance index from shortages, not exceeding 20% of the target, with little one simulation experiment to another may be due to sam- economic hardship, particularly if they are warned at pling fluctuations rather than to modifications of the the beginning of the growing season that less than the water resource system’s design or operating policy. targeted quantity of water will be delivered. Then they can As an example, consider the change that occurred

curtail their planting or plant crops requiring less water. in the frequency of shortages. Let X1i and X2i be the In an attempt to find out how better to meet the simulated failure rates using the ith streamflow sequence farmers’ needs, the simulation program was re-run with with the original and modified operating policies. The the same streamflow sequences and a new operating policy random variables Yi X1i X2i for i equal 1 through 25 in which only 80% of the growing season’s target release is are independent of each other if the streamflow sequences provided (if possible) if the reservoir is less than 80% full are generated independently, as they were. at the end of the previous winter season. This gives the One would like to confirm that the random variable Y farmers time to adjust their planting schedules and may tends to be negative more often than it is positive, and increase the quantity of water stored in the reservoir to be hence, that policy 2 indeed results in more failures overall. used the following year if the drought persists. A direct test of this theory is provided by the sign test. Of As the simulation results with the new policy in the twenty-five paired simulation runs, yi 0 in twenty- Table 7.15 demonstrate, this new operating policy appears one cases and yi 0 in four cases. We can ignore the times wrm_ch07.qxd 8/31/2005 12:18 PM Page 221

Concepts in Probability, Statistics and Stochastic Modelling 221

Table 7.15. Results of 25 twenty-year simulations with modified operating policy to frequency total average E021101r avoid severe shortages. of failure to meet: shortage deficit, simulation 80% of TS AD number, i target target x10 7 m3

1 0.10 0.0 1.80 0.20 2 0.30 0.0 4.70 0.20 3 0.25 0.0 3.90 0.20 4 0.20 0.05 3.46 0.21 5 0.10 0.0 1.48 0.20 6 0.05 0.0 0.60 0.20 7 0.20 0.0 3.30 0.20 8 0.25 0.10 5.45 0.26 9 0.05 0.0 0.60 0.20 10 0.20 0.0 3.24 0.20 11 0.25 0.0 3.88 0.20 12 0.10 0.05 1.92 0.31 13 0.10 0.0 1.50 0.20 14 0.15 0.0 2.52 0.20 15 0.25 0.05 3.76 0.18 16 0.10 0.0 1.80 0.20 17 0.30 0.0 5.10 0.20 18 0.15 0.0 2.40 0.20 19 0.0 0.0 0.0 0.0 20 0.05 0.0 0.76 0.20 21 0.10 0.0 1.80 0.20 22 0.10 0.05 2.37 0.26 23 0.05 0.0 0.90 0.20 24 0.05 0.0 0.90 0.20 25 0.10 0.0 1.50 0.20 meanx 0.142 0.012 2.39 0.201 standard deviation of values; sx 0.087 0.026 1.50 0.050 median0.10 0.00 1.92 0.20 approximate interquartile range; x(6) -x(20) 0.05 - 0.25 0.0 - 0.0 0.90 - 3.76 0.20 - 0.20 range; x()1 -x(25) 0.0 - 0.30 0.0 - 0.10 0.0 - 5.45 0.0 - 0.31 wrm_ch07.qxd 8/31/2005 12:18 PM Page 222

222 Water Resources Systems Planning and Management

when yi 0. Note that if yi 0 and yi 0 were equally 0. 0580 t 725. (7.204) likely, then the probability of observing yi 0 in all 0./ 0400 25 –21 –7 twenty-one cases when yi 0 is 2 or 5 10 . This is The probability of observing a value of t equal to 7.25 exceptionally strong proof that the new policy has or smaller is less than 0.1% if n is sufficiently large that t increased the failure frequency. is normally distributed. Hence it appears very improbable A similar analysis can be made of the frequency with µ that Y equals zero. which the release is less than 80% of the target. Failure This example provides an illustration of the advantage of frequencies differ in the two policies in only four of the using the same streamflow sequences when simulating both twenty-five simulation runs. However, in all four cases policies. Suppose that different streamflow sequences were where they differ, the new policy resulted in fewer severe used in all the simulations. Then the expected value of Y failures. The probability of such a lopsided result, were it would not change, but its variance would be given by equally likely that either policy would result in a lower µµ2 frequency of failures in excess of 20% of the target, is Var(YXX ) E[12 ( 12 )] –4 µµ2 2 0.0625. This is fairly strong evidence that the new E[(XX11 ) ] 2E[( 11 ) µ µ 2 policy indeed decreases the frequency of severe failures. ()X22]]E[(X22 )] Another approach to this problem is to ask if the σσ222,Cov(XX ) (7.205) _ _ xx1212 difference between the average failure rates x1 and x2 is µ µ statistically significant; that is, can the difference between X1 where Cov(X1, X2) E[(X1 1)(X2 2)] and is the and X2 be attributed to the fluctuations that occur in the covariance of the two random variables. The covariance average of any finite set of random variables? In this example between X1 and X2 will be zero if they are independently the significance of the difference between the two means can distributed, as they would be if different randomly gener- be tested using the random variable Yi defined as X1i X2i ated streamflow sequences were used in each simulation. for i equal 1 through 25. The mean of the observed y ’s is Estimating σ 2 and σ 2 by their sample estimates, an i x1 x2 25 estimate of what the variance of Y would be if Cov(X1, X2) 1 yxxxx∑()12ii12 were zero is 25 i−1 0.. 084 0 142 0 . 058 (7.201) σˆ2 σσ22 (.0 081 ) 2 (. 0 087 ) 2 (. 0 119 ) 2 Y xx12 (7.206) and their variance is The actual sample estimate σ equals 0.040; if independ- 1 25 Y sxxy2 ∑()(.)2 0 0400 2 (7.202) ent streamflow sequences are used in all simulations, σ y 25 12ii Y i1 will take a value near 0.119 rather than 0.040 (Equation µ Now, if the sample size n, equal to 25 here, is sufficiently 7.202). A standard deviation of 0.119, with y 0, yields large, then t defined by a value of the t test statistic µ µ y Y y Y t (7.203) t 244. (7.207) snY / 0./ 119 25 has approximately a standard normal distribution. The If t is normally distributed, the probability of observing a closer the distribution of Y is to that of the normal value less than 2.44 is about 0.8%. This illustrates that use distribution, the faster the convergence of the distribution of the same streamflow sequences in the simulation of both of t is to the standard normal distribution with increasing policies allows one to better distinguish the differences in

n. If X1i – X2i is normally distributed, which is not the case the policies’ performance. By using the same streamflow here, then each Yi has a normal distribution and t has sequences, or other random inputs, one can construct a Student’s t-distribution. simulation experiment in which variations in performance µ If E[X1i] E[X2i], then Y equals zero, and upon sub- caused by different random inputs are confused as little as _ 2 stituting the observed values of y and sY into Equation possible with the differences in performance caused by 7.123, one obtains changes in the system’s design or operating policy. wrm_ch07.qxd 8/31/2005 12:18 PM Page 223

Concepts in Probability, Statistics and Stochastic Modelling 223

10. Conclusions unclear as to whether or not society knows exactly what to do with such information. Nevertheless, there seems to This chapter has introduced statistical concepts that be an increasing demand from stakeholders involved in analysts use to describe the randomness or uncertainty planning processes for information related to the uncer- of their data. Most of the data used by water resources tainty associated with the impacts predicted by models. systems analysts is uncertain. This uncertainty comes The challenge is not only to quantify that uncertainty, but from not understanding as well as we would like how also to communicate it in effective ways that inform, and our water resources systems (including their ecosystems) not confuse, the decision-making process. function as well as not being able to forecast, perfectly, the future. It is that simple. We do not know the exact amounts, qualities and their distributions over space and 11. References time of either the supplies of water we manage or the water demands we try to meet. We also do not know the bene- AYYUB, B.M. and MCCUEN, R.H. 2002. Probability, fits and costs, however measured, of any actions we take statistics, and reliability for engineers and scientists. Boca to manage both water supply and water demand. Raton, Chapman and Hill, CRC Press. The chapter began with an introduction to probability BARTOLINI, P. and SALAS, J. 1993. Modelling of stream- concepts and methods for describing random variables flow processes at different time scales. Water Resources and parameters of their distributions. It then reviewed Research, Vol. 29, No. 8, pp. 2573–87. some of the commonly used probability distributions and how to determine the distributions of sample data, how BATES, B.C. and CAMPBELL, E.P. 2001. A Markov chain to work with censored and partial duration series data, Monte Carlo scheme for parameter estimation and methods of regionalization, stochastic processes and inference in conceptual rainfall–runoff modelling. Water time-series analyses. Resources Research, Vol. 37, No. 4, pp. 937–47. The chapter concluded with an introduction to a range BEARD, L.R. 1960. Probability estimates based on small of univariate and multivariate stochastic models that normal-distribution samples. Journal of Geophysical are used to generate stochastic streamflow, precipitation Research, Vol. 65, No. 7, pp. 2143–8. depths, temperatures and evaporation. These methods are used to generate temporal and spatial stochastic process BEARD, L.R. 1997. Estimating flood frequency and aver- that serve as inputs to stochastic simulation models for age annual damages. Journal of Water Resources Planning system design, for system operations studies, and for and Management, Vol. 123, No. 2, pp. 84–8. evaluation of the reliability and precision of different BENJAMIN, J.R. and CORNELL, C.A. 1970. Probability, estimation algorithms. The final section of this chapter statistics and decisions for civil engineers. New York, provides an example of stochastic simulation, and the use McGraw-Hill. of statistical methods to summarize the results of such BICKEL, P.J. and DOKSUM, K.A. 1977. Mathematical simulations. statistics: basic ideas and selected topics. San Francisco, This is merely an introduction to some of the statisti- cal tools available for use when dealing with uncertain Holden-Day. data. Many of the concepts introduced in this chapter will BOBÉE, B. 1975. The log Pearson type 3 distribution and be used in the chapters that follow on constructing and its applications in hydrology. Water Resources Research, implementing various types of optimization, simulation Vol. 14, No. 2, pp. 365–9. and statistical models. The references cited in the BOBÉE, B. and ASHKAR, F. 1991. The gamma distribution reference section provide additional and more detailed and derived distributions applied in hydrology. Littleton information. Colo., Water Resources Press. Although many of the methods presented in this and in some of the following chapters can describe many of BOBÉE, B. and ROBITAILLE, R. 1977. The use of the characteristics and consequences of uncertainty, it is the Pearson type 3 distribution and log Pearson type 3 wrm_ch07.qxd 8/31/2005 12:18 PM Page 224

224 Water Resources Systems Planning and Management

distribution revisited. Water Resources Research, Vol. 13, DAVID, H.A. 1981. Order statistics, 2nd edition. New No. 2, pp. 427–43. York, Wiley. BOX, G.E.P.; JENKINS, G.M. and RISINSEL, G.C. 1994. FIERING, M.B. 1967. Streamflow synthesis. Cambridge, Times series analysis: forecasting and control, 3rd Edition. Mass., Harvard University Press. New Jersey, Prentice-Hall. FILL, H. and STEDINGER, J. 1995. L-moment and CAMACHO, F.; MCLEOD, A.I. and HIPEL, K.W. 1985. PPCC goodness-of-fit tests for the Gumbel distribution Contemporaneous autoregressive-moving average (CARMA) and effect of autocorrelation. Water Resources Research, modelling in water resources. Water Resources Bulletin, Vol. 31, No. 1, pp. 225–29. Vol. 21, No. 4, pp. 709–20. FILL, H. and STEDINGER, J. 1998. Using regional CARLIN, B.P. and LOUIS, T.A. 2000. Bayes and empirical regression within index flood procedures and an Bayes methods for data analysis, 2nd Edition. New York, empirical Bayesian estimator. Journal of Hydrology, Chapman and Hall, CRC. Vol. 210, Nos 1–4, pp. 128–45. CHOWDHURY, J.U. and STEDINGER, J.R. 1991. FILLIBEN, J.J. 1975. The probability plot correlation Confidence intervals for design floods with estimated test for normality. Technometrics, Vol. 17, No. 1, skew coefficient. Journal of Hydraulic Engineering, pp. 111–17. Vol. 117, No. 7, pp. 811–31. FISHMAN, G.S. 2001. Discrete-event simulation: modelling, CHOWDHURY, J.U.; STEDINGER, J.R. and LU, L.H. 1991. programming, and analysis. Berlin, Springer-Verlag. Goodness-of-fit tests for regional GEV flood distributions. Water Resources Research, Vol. 27, No. 7, pp. 1765–76. GABRIELE, S. and ARNELL, N. 1991. A hierarchical approach to regional flood frequency analysis. Water CLAPS, P. 1993. Conceptual basis for stochastic models Resources Research, Vol. 27, No. 6, pp. 1281–9. of monthly streamflows. In: J.B. Marco, R. Harboe and J.D. Salas (eds). Stochastic hydrology and its use in water GELMAN, A.; CARLIN, J.B.; STERN, H.S. and RUBIN, D.B. resources systems simulation and optimization, Dordrecht, 1995. Bayesian data analysis. Boca Raton, Chapman and Kluwer Academic, pp. 229–35. Hall, CRC. CLAPS, P.; ROSSI, F. and VITALE, C. 1993. Conceptual- GILKS, W.R.; RICHARDSON, S. and SPIEGELHALTER, stochastic modelling of seasonal runoff using autoregressive D.J. (eds). 1996. Markov chain Monte Carlo in practice. moving average models and different scales of aggregation. London and New York, Chapman and Hall. Water Resources Research, Vol. 29, No. 8, pp. 2545–59. GIESBRECHT, F. and KEMPTHORNE, O. 1976. Maximum COHN, T.A.; LANE, W.L. and BAIER, W.G. 1997. An likelihood estimation in the three-parameter log normal algorithm for computing moments-based flood quantile distribution. Journal of the Royal Statistical Society B, Vol. 38, estimates when historical flood information is available. No. 3, pp. 257–64. Water Resources Research, Vol. 33, No. 9, pp. 2089–96. GREENWOOD, J.A. and DURAND, D. 1960. Aids for COHN, C.A.; LANE, W.L. and STEDINGER, J.R. 2001. fitting the gamma distribution by maximum likelihood. Confidence intervals for EMA flood quantile estimates. Technometrics, Vol. 2, No. 1, pp. 55–65. Water Resources Research, Vol. 37, No. 6, pp. 1695–1706. GRIFFS, V.W.; STEDINGER, J.R. and COHN, T.A. 2004. CRAINICEANU, C.M.; RUPPERT, D.; STEDINGER, LP3 quantile estimators with regional skew information J.R. and BEHR, C.T. 2002. Improving MCMC mixing for a and low outlier adjustments. Water Resources Research, GLMM describing pathogen concentrations in water supplies, in Vol. 40, forthcoming. case studies in Bayesian analysis. New York, Springer-Verlag. GRYGIER, J.C. and STEDINGER, J.R. 1988. Condensed D’AGOSTINO, R.B. and STEPHENS, M.A. 1986. disaggregation procedures and conservation corrections. Goodness-of-fit procedures. New York, Marcel Dekker. Water Resources Research, Vol. 24, No. 10, pp. 1574–84. wrm_ch07.qxd 8/31/2005 12:18 PM Page 225

Concepts in Probability, Statistics and Stochastic Modelling 225

GRYGIER, J.C. and STEDINGER, J.R. 1991. SPIGOT: a and first-order approximations. Journal of Hydrology, synthetic flow generation software package, user’s manual and Vol. 71, Nos 1–2, pp. 1–30. technical description, version 2.6. Ithaca, N.Y., School of HOSKING, J.R.M. 1990. L-moments: analysis and Civil and Environmental Engineering, Cornell University. estimation of distributions using linear combinations of GUMBEL, E.J. 1958. Statistics of extremes. New York, order statistics. Journal of Royal Statistical Society, B, Columbia University Press. Vol. 52, No. 2, pp. 105–24. GUPTA, V.K. and DAWDY, D.R. 1995a. Physical inter- HOSKING, J.R.M. and WALLIS, J.R. 1987. Parameter pretation of regional variations in the scaling exponents of and quantile estimation for the generalized Pareto distri- flood quantiles. Hydrological Processes, Vol. 9, Nos. 3–4, bution. Technometrics, Vol. 29, No. 3, pp. 339–49. pp. 347–61. HOSKING, J.R.M. and WALLIS, J.R. 1995. A comparison GUPTA, V.K. and DAWDY, D.R. 1995b. Multiscaling and of unbiased and plotting-position estimators of skew separation in regional floods. Water Resources L-moments. Water Resources Research, Vol. 31, No. 8, Research, Vol. 31, No. 11, pp. 2761–76. pp. 2019–25. GUPTA, V.K.; MESA, O.J. and DAWDY, D.R. 1994. HOSKING, J.R.M. and WALLIS, J.R. 1997. Regional Multiscaling theory of flood peaks: regional quantile analy- frequency analysis: an approach based on L-moments. sis. Water Resources Research, Vol. 30, No. 12, pp. 3405–12. Cambridge, Cambridge University Press. HAAN, C.T. 1977. Statistical methods in hydrology. Ames, HOSKING, J.R.M.; WALLIS, J.R. and WOOD, E.F. 1985. Iowa, Iowa State University Press. Estimation of the generalized extreme-value distribution by the method of probability weighted moments. HAAS, C.N. and SCHEFF, P.A. 1990. Estimation of Technometrics, Vol. 27, No. 3, pp. 251–61. averages in truncated samples. Environmental Science and , Vol. 24, No. 6, pp. 912–19. IACWD (Interagency Advisory Committee on Water Data). 1982. Guidelines for determining flood flow frequency, HELSEL, D.R. 1990. Less than obvious: statistical treat- Bulletin 17B. Reston, Va., US Department of the ment of data below the detection limit. Environ. Sci. and Interior, US Geological Survey, Office of Water Data Technol., Vol. 24, No. 12, pp. 1767–74. Coordination. HELSEL, D.R. and COHN, T.A. 1988. Estimation of JENKINS, G.M. and WATTS, D.G. 1968. Spectral Analysis descriptive statistics for multiple censored water and its Applications. San Francisco, Holden-Day. quality data. Water Resources Research, Vol. 24, No. 12, pp. 1997–2004. KENDALL, M.G. and STUART, A. 1966. The advanced theory of statistics, Vol. 3. New York, Hafner. HIRSCH, R.M. 1979. Synthetic hydrology and water supply reliability. Water Resources Research, Vol. 15, KIRBY, W. 1974. Algebraic boundness of sample statis- No. 6, pp. 1603–15. tics. Water Resources Research, Vol. 10, No. 2, pp. 220–2. HIRSCH, R.M. and STEDINGER, J.R. 1987. Plotting KIRBY, W. 1972. Computer oriented Wilson–Hilferty positions for historical floods and their precision. Water transformation that preserves the first 3 moments and Resources Research, Vol. 23, No. 4, pp. 715–27. lower bound of the Pearson Type 3 distribution. Water Resources Research, Vol. 8, No. 5, pp. 1251–4. HOSHI, K.; BURGES, S.J. and YAMAOKA, I. 1978. Reservoir design capacities for various seasonal operational KITE, G.W. 1988. Frequency and risk analysis in hydrology. hydrology models. Proceedings of the Japanese Society of Civil Littleton, Colo. Water Resources Publications. Engineers, No. 273, pp. 121–34. KOTTEGODA, M. and ROSSO, R. 1997. Statistics, proba- HOSHI, K.; STEDINGER, J.R. and BURGES, S. 1984. bility, and reliability for civil and environmental engineers. Estimation of log normal quantiles: Monte Carlo results New York, McGraw-Hill. wrm_ch07.qxd 8/31/2005 12:18 PM Page 226

226 Water Resources Systems Planning and Management

KOUTSOYIANNIS, D. and MANETAS, A. 1996. Simple LETTENMAIER, D.P.; WALLIS, J.R. and WOOD, E.F. disaggregation by accurate adjusting procedures. Water 1987. Effect of regional heterogeneity on flood frequency Resources Research, Vol. 32, No. 7, pp. 2105–17. estimation. Water Resources Research, Vol. 23, No. 2, pp. 313–24. KROLL, K. and STEDINGER, J.R. 1996. Estimation of moments and quantiles with censored data. Water MACBERTHOUEX, P. and BROWN, L.C. 2002. Statistics Resources Research, Vol. 32, No. 4, pp. 1005–12. for environmental engineers, 2nd Edition. Boca Raton, Fla., Lewis, CRC Press. KUCZERA, G. 1999. Comprehensive at-site flood frequency analysis using Monte Carlo Bayesian inference. MADSEN, H.; PEARSON, C.P.; RASMUSSEN, P.F. and Water Resources Research, Vol. 35, No. 5, pp. 1551–7. ROSBJERG, D. 1997a. Comparison of annual maximum series and partial duration series methods for modelling KUCZERA, G. 1983. Effects of sampling uncertainty extreme hydrological events 1: at-site modelling. Water and spatial correlation on an empirical Bayes procedure Resources Research, Vol. 33, No. 4, pp. 747–58. for combining site and regional information. Journal of Hydrology, Vol. 65, No. 4, pp. 373–98. MADSEN, H.; PEARSON, C.P. and ROSBJERG, D. 1997b. Comparison of annual maximum series and partial duration KUCZERA, G. 1982. Combining site-specific and regional series methods for modelling extreme hydrological events 2: information: an empirical Bayesian approach. Water regional modelling. Water Resources Research, Vol. 33, Resources Research, Vol. 18, No. 2, pp. 306–14. No. 4, pp. 759–70. LANDWEHR, J.M.; MATALAS, N.C. and WALLIS, J.R. MADSEN, H. and ROSBJERG, D. 1997a. The partial 1978. Some comparisons of flood statistics in real and log duration series method in regional index flood modelling. space. Water Resources Research, Vol. 14, No. 5, pp. 902–20. Water Resources Research, Vol. 33, No. 4, pp. 737–46. LANDWEHR, J.M.; MATALAS, N.C. and WALLIS, J.R. MADSEN, H. and ROSBJERG, D. 1997b. Generalized 1979. Probability weighted moments compared with least squares and empirical Bayesian estimation in some traditional techniques in estimating Gumbel regional partial duration series index-flood modelling. parameters and quantiles. Water Resources Research, Water Resources Research, Vol. 33, No. 4, pp. 771–82. Vol. 15, No. 5, pp. 1055–64. MAHEEPALA, S. and PERERA, B.J.C. 1996. Monthly LANE, W. 1979. Applied stochastic techniques (users man- hydrological data generation by disaggregation. Journal of ual). Denver, Colo., Bureau of Reclamation, Engineering Hydrology, No. 178, 277–91. and Research Center, December. MARCO, J.B.; HARBOE, R. and SALAS, J.D. (eds). 1989. LANGBEIN, W.B. 1949. Annual floods and the partial Stochastic hydrology and its use in water resources duration flood series, EOS. Transactions of the American systems simulation and optimization. NATO ASI Series. Geophysical Union, Vol. 30, No. 6, pp. 879–81. Dordrecht, Kluwer Academic. LEDOLTER, J. 1978. The analysis of multivariate time MARTINS, E.S. and STEDINGER, J.R. 2000. Generalized series applied to problems in hydrology. Journal of maximum likelihood GEV quantile estimators for Hydrology, Vol. 36, No. 3–4, pp. 327–52. hydrological data. Water Resources Research, Vol. 36, No. 3, LETTENMAIER, D.P. and BURGES, S.J. 1977a. Operational pp. 737–44. assessment of hydrological models of long-term persistence. MARTINS, E.S. and STEDINGER, J.R. 2001a. Historical Water Resources Research, Vol. 13, No. 1, pp. 113–24. information in a GMLE-GEV framework with partial duration and annual maximum series. Water Resources LETTENMAIER, D.P. and BURGES, S.J. 1977b. An Research, Vol. 37, No. 10, pp. 2551–57. operational approach to preserving skew in hydrological models of long-term persistence. Water Resources MARTINS, E.S. and STEDINGER, J.R. 2001b. Research, Vol. 13, No. 2, pp. 281–90. Generalized maximum likelihood Pareto-Poisson flood wrm_ch07.qxd 8/31/2005 12:18 PM Page 227

Concepts in Probability, Statistics and Stochastic Modelling 227

risk analysis for partial duration series. Water Resources RASMUSSEN, P.F. and ROSBJERG, D. 1991b. Prediction Research, Vol. 37, No. 10, pp. 2559–67. uncertainty in seasonal partial duration series. Water Resources Research, Vol. 27, No. 11, pp. 2875–83. MATALAS, N.C. 1967. Mathematical assessment of synthetic hydrology. Water Resources Research, Vol. 3, RASMUSSEN, R.F.; SALAS, J.D.; FAGHERAZZI, L.; No. 4, pp. 937–45. RASSAM.; J.C. and BOBÉE, R. 1996. Estimation and validation of contemporaneous PARMA models for MATALAS, N.C. and WALLIS, J.R. 1973. Eureka! It fits a streamflow simulation. Water Resources Research, Vol. 32, Pearson type 3 distribution. Water Resources Research, No. 10, pp. 3151–60. Vol. 9, No. 3, pp. 281–9. ROBINSON, J.S. and SIVAPALAN, M. 1997. Temporal MATALAS, N.C. and WALLIS, J.R. 1976. Generation of scales and hydrological regimes: implications for flood synthetic flow sequences. In: A.K. Biswas (ed.), Systems frequency scaling. Water Resources Research, Vol. 33, approach to water management. New York, McGraw-Hill. No. 12, pp. 2981–99. MEJIA, J.M. and ROUSSELLE, J. 1976. Disaggregation ROSBJERG, D. 1985. Estimation in Partial duration series models in hydrology revisited. Water Resources Research, with independent and dependent peak values. Journal of Vol. 12, No. 2, pp. 185–6. Hydrology, No. 76, pp. 183–95. NERC (Natural Environmental Research Council) ROSBJERG, D. and MADSEN, H. 1998. Design with 1975. Flood studies report, Vol. 1: hydrological studies. uncertain design values, In: H. Wheater and C. Kirby London. (eds), Hydrology in a changing environment, New York, NORTH, M. 1980. Time-dependent stochastic model of Wiley. Vol. 3, pp. 155–63. floods. Journal of the Hydraulics Division, ASCE, Vol. 106, ROSBJERG, D.; MADSEN, H. and RASMUSSEN, P.F. No. HY5, pp. 649–65. 1992. Prediction in partial duration series with general- O’CONNELL, D.R.H.; OSTENAA, D.A.; LEVISH, D.R. ized Pareto-distributed exceedances. Water Resources and KLINGER, R.E. 2002. Bayesian flood frequency Research, Vol. 28, No. 11, pp. 3001–10. analysis with paleohydrological bound data. Water SALAS, J.D. 1993. Analysis and modelling of hydrological Resources Research, Vol. 38, No. 5, pp. 161–64. time series. In: D. Maidment (ed.), Handbook of hydrology, O’CONNELL, P.E. 1977. ARMA models in synthetic Chapter 17. New York, McGraw-Hill. hydrology. In: T.A. Ciriani, V. Maione and J.R. Wallis SALAS, J.D.; DELLEUR, J.W.; YEJEVICH, V. and (eds), Mathematical models for surface water hydrology. LANE, W.L. 1980. Applied modelling of hydrological time New York, Wiley. series. Littleton, Colo., Water Resources Press Publications. PRESS, W.H.; FLANNERY, B.P.; TEUKOLSKY, S.A. and SALAS, J.D. and FERNANDEZ, B. 1993. Models for data VETTERLING, W.T. 1986. Numerical recipes: the art generation in hydrology: univariate techniques. In: of scientific computing. Cambridge, UK, Cambridge J.B. Marco, R. Harboe and J.D. Salas (eds), Stochastic University Press. hydrology and its use in water resources systems simulation RAIFFA, H. and SCHLAIFER, R. 1961. Applied statistical and optimization, Dordrecht, Kluwer Academic. pp. 76–95. decision theory. Cambridge, Mass., MIT Press. SALAS, J.D. and OBEYSEKERA, J.T.B. 1992. Conceptual RASMUSSEN, P.F. and ROSBJERG, D. 1989. Risk estima- basis of seasonal streamflow time series. Journal of tion in partial duration series. Water Resources Research, Hydraulic Engineering, Vol. 118, No. 8, pp. 1186–94. Vol. 25, No. 11, pp. 2319–30. SCHAAKE, J.C. and VICENS, G.J. 1980. Design length RASMUSSEN, P.F. and ROSBJERG, D. 1991a. Evaluation of water resource simulation experiments. Journal of of risk concepts in partial duration series. Stochastic Water Resources Planning and Management, Vol. 106, No. 1, Hydrology and Hydraulics, Vol. 5, No. 1, pp. 1–16. pp. 333–50. wrm_ch07.qxd 8/31/2005 12:18 PM Page 228

228 Water Resources Systems Planning and Management

SLACK, J.R.; WALLIS, J.R. and MATALAS, N.C. 1975. validation. Water Resources Research, Vol. 18, No. 4, On the value of information in flood frequency analysis. pp. 919–24. Water Resources Research, Vol. 11, No. 5, pp. 629–48. STEDINGER, J.R. and TAYLOR, M.R. 1982b. Synthetic SMITH, R.L. 1984. Threshold methods for sample streamflow generation. Part 2: effect of parameter extremes. In: J. Tiago de Oliveira (ed.), Statistical extremes uncertainty. Water Resources Research, Vol. 18, No. 4, and applications, Dordrecht, D. Reidel. pp. 621–38. pp. 919–24. STEDINGER, J.R. 1980. Fitting log normal distributions STEDINGER, J.R. and VOGEL, R. 1984. Disaggregation to hydrological data. Water Resources Research, Vol. 16, procedures for the generation of serially correlated flow No. 3, pp. 481–90. vectors. Water Resources Research, Vol. 20, No. 1, pp. 47–56. STEDINGER, J.R. 1981. Estimating correlations in STEDINGER, J.R.; VOGEL, R.M. and FOUFOULA- multivariate streamflow models. Water Resources Research, GEORGIOU, E. 1993. Frequency analysis of extreme Vol. 17, No. 1, pp. 200–08. events, In: D. Maidment (ed.), Handbook of hydrology, Chapter 18. New York, McGraw-Hill. STEDINGER, J.R. 1983. Estimating a regional flood frequency distribution. Water Resources Research, Vol. 19, STEPHENS, M. 1974. Statistics for Goodness of Fit, Journal No. 2, pp. 503–10. of the American Statistical Association, Vol. 69, pp. 730–37. STEDINGER, J.R. 1997. Expected probability and annual TAO, P.C. and DELLEUR, J.W. 1976. Multistation, multi- damage estimators. Journal of Water Resources Planning year synthesis of hydrological time series by disaggregation. and Management, Vol. 123, No. 2, pp. 125–35. [With Water Resources Research, Vol. 12, No. 6, pp. 1303–11. discussion, Leo R. Beard, 1998. Journal of Water Resources TARBOTON, D.G.; SHARMA, A. and LALL, U. 1998. Planning and Management, Vol. 124, No. 6, pp. 365–66.] Disaggregation procedures for stochastic hydrology based STEDINGER, J.R. 2000. Flood frequency analysis and on nonparametric density estimation. Water Resources statistical estimation of flood risk. In: E.E. Wohl (ed.), Research, Vol. 34, No. 1, pp. 107–19. Inland flood hazards: human, riparian and aquatic communi- TASKER, G.D. and STEDINGER, J.R. 1986. Estimating ties, Chapter 12. Stanford, UK, Cambridge University Press. generalized skew with weighted least squares regression. STEDINGER, J.R. and BAKER, V.R. 1987. Surface water Journal of Water Resources Planning and Management, hydrology: historical and paleoflood information. Reviews Vol. 112, No. 2, pp. 225–37. of Geophysics, Vol. 25, No. 2, pp. 119–24. THOM, H.C.S. 1958. A note on the gamma distribution. STEDINGER, J.R. and COHN, T.A. 1986. Flood Monthly Weather Review, Vol. 86, No. 4, 1958. pp. 117–22. Frequency analysis with historical and paleoflood THOMAS, H.A., JR. and FIERING, M.B. 1962. information. Water Resources Research, Vol. 22, No. 5, Mathematical synthesis of streamflow sequences for pp. 785–93. the analysis of river basins by simulation. In: A. Maass, STEDINGER, J.R. and LU, L. 1995. Appraisal of regional M.M. Hufschmidt, R. Dorfman, H.A. Thomas, Jr., S.A. and index flood quantile estimators. Stochastic Hydrology Marglin and G.M. Fair (eds), Design of water resources and Hydraulics, Vol. 9, No. 1, pp. 49–75. systems. Cambridge, Mass., Harvard University Press. STEDINGER, J.R.; PEI, D. and COHN, T.A. 1985. A THYER, M. and KUCZERA, G. 2000. Modelling disaggregation model for incorporating parameter long-term persistence in hydroclimatic time series using uncertainty into monthly reservoir simulations. Water a hidden state Markov model. Water Resources Research, Resources Research, Vol. 21, No. 5, pp. 665–7. Vol. 36, No. 11, pp. 3301–10. STEDINGER, J.R. and TAYLOR, M.R. 1982a. Synthetic VALDES, J.R.; RODRIGUEZ, I. and VICENS, G. 1977. streamflow generation. Part 1: model verification and Bayesian generation of synthetic streamflows 2: the wrm_ch07.qxd 8/31/2005 12:18 PM Page 229

Concepts in Probability, Statistics and Stochastic Modelling 229

multivariate case. Water Resources Research, Vol. 13, Catania, Italy, Fondazione Politecnica del Mediterraneo. No. 2, pp. 291–95. pp. 3–36. VALENCIA, R. and SCHAAKE, J.C., Jr. 1973. WALLIS, J.R.; MATALAS, N.C. and SLACK, J.R. 1974a. Disaggregation processes in stochastic hydrology. Water Just a Moment! Water Resources Research, Vol. 10, No. 2, Resources Research, Vol. 9, No. 3, pp. 580–5. pp. 211–19. VICENS, G.J.; RODRÍGUEZ-ITURBE, I. and SCHAAKE, WALLIS, J.R.; MATALAS, N.C. and SLACK, J.R. 1974b. J.C., Jr. 1975. A Bayesian framework for the use of Just a moment! Appendix. Springfield, Va., National regional information in hydrology. Water Resources Technical Information Service, PB-231 816. Research, Vol. 11, No. 3, pp. 405–14. WANG, Q. 2001. A. Bayesian joint probability approach VOGEL, R.M. 1987. The probability plot correlation for flood record augmentation. Water Resources Research, coefficient test for the normal, lognormal, and Gumbel Vol. 37, No. 6, pp. 1707–12. distributional hypotheses. Water Resources Research, WANG, Q.J. 1997. LH moments for statistical analysis of Vol. 22, No. 4, pp. 587–90. extreme events. Water Resources Research, Vol. 33, No. 12, VOGEL, R.M. and FENNESSEY, N.M. 1993. L-moment pp. 2841–48. diagrams should replace product moment diagrams. WILK, M.B. and GNANADESIKAN, R. 1968. Probability Water Resources Research, Vol. 29, No. 6, pp. 1745–52. plotting methods for the analysis of data. Biometrika, Vol. 55, No. 1, pp. 1–17. VOGEL, R.M. and MCMARTIN, D.E. 1991. Probability plot goodness-of-fit and skewness estimation procedures WILKS, D.S. 2002. Realizations of daily weather in fore- for the Pearson type III distribution. Water Resources cast seasonal climate. Journal of Hydrometeorology, No. 3, Research, Vol. 27, No. 12, pp. 3149–58. pp. 195–207. VOGEL, R.M. and SHALLCROSS, A.L. 1996. The WILKS, D.S. 1998. Multi-site generalization of a daily moving blocks bootstrap versus parametric time series stochastic precipitation generation model. Journal of models. Water Resources Research, Vol. 32, No. 6, Hydrology, No. 210, pp. 178–91. pp. 1875–82. YOUNG, G.K. 1968. Discussion of ‘Mathematical assess- VOGEL, R.M. and STEDINGER, J.R. 1988. The value of ment of synthetic hydrology’ by N.C. Matalas and reply. stochastic streamflow models in over-year reservoir Water Resources Research, Vol. 4, No. 3, pp. 681–3. design applications. Water Resources Research, Vol. 24, ZELLNER, A. 1971. An introduction to Bayesian No. 9, pp. 1483–90. inference in econometrics, New York, Wiley. WALLIS, J.R. 1980. Risk and in the evalua- ZRINJI, Z. and BURN, D.H. 1994. Flood Frequency tion of flood events for the design of hydraulic structures. analysis for ungauged sites using a region of influence In: E. Guggino, G. Rossi and E. Todini (eds), Piene e Siccita, approach. Journal of Hydrology, No. 13, pp. 1–21.