L-Moments Based Assessment of a Mixture Model for Frequency Analysis of Rainfall Extremes

Advances in Geosciences, 2, 331–334, 2006 SRef-ID: 1680-7359/adgeo/2006-2-331 Advances in European Geosciences Union Geosciences © 2006 Author(s). This work is licensed under a Creative Commons License. L-moments based assessment of a mixture model for frequency analysis of rainfall extremes V. Tartaglia, E. Caporali, E. Cavigli, and A. Moro Dipartimento di Ingegneria Civile, Universita` degli Studi di Firenze, Italy Received: 2 December 2004 – Revised: 2 October 2005 – Accepted: 3 October 2005 – Published: 23 January 2006 Abstract. In the framework of the regional analysis of hy- events in Tuscany (central Italy), a statistical methodology drological extreme events, a probabilistic mixture model, for parameter estimation of a probabilistic mixture model is given by a convex combination of two Gumbel distributions, here developed. The use of a probabilistic mixture model in is proposed for rainfall extremes modelling. The papers con- the regional analysis of rainfall extreme events, comes from cerns with statistical methodology for parameter estimation. the hypothesis that rainfall phenomenon arises from two in- With this aim, theoretical expressions of the L-moments of dependent populations (Becchi et al., 2000; Caporali et al., the mixture model are here defined. The expressions have 2002). The first population describes the basic components been tested on the time series of the annual maximum height corresponding to the more frequent events of low magnitude, of daily rainfall recorded in Tuscany (central Italy). controlled by the local morpho-climatic conditions. The sec- ond population characterizes the outlier component of more intense and rare events, which is here considered due to the aggregation at the large scale of meteorological phenomena. 1 Introduction As the rare component is due to the large scale state of en- ergy and to the morpho-climatic conditions, it is assumed In hydrology the most frequent use of statistics has been that that any regionalization procedure can be reduced to the use of frequency analysis (Haan, 1994). This applies in particu- of parameters of the basic or ordinary component. lar when the magnitude of an event with a given return period The contribution of the basic and the outlier components or the rarity of a hydrological event, has to be estimated. Of- is presented as a mixture model, i.e. a convex combination ten the common use of the asymptotic extreme value type I, of two independent populations having Gumbel distribution. or Gumbel, distribution does not represent an acceptable fit- Linearity of “sum” – operator leaves separated the contribu- ting of the model to the observations of rare frequency, i.e. tions of the two populations, and it allows to evaluate sepa- it is not able to model the upper tail of the hydrological ex- rately their statistical parameters (Becchi et al., 2000). treme events. This limit has been overcome, for example, Being very suitable for frequency analysis of extreme val- by the Two Components Extreme Value (TCEV) distribution ues, the L-moments method is proposed for parameter es- (Rossi et al., 1984). Observing the presence of outliers data, timation (Hosking and Wallis, 1993). The L-moments are the TCEV shows, in fact, a better fitting to the data sam- in fact expectations of linear combinations of order statis- ple skewness (Matalas et al., 1975), but it needs of a larger tics (Hosking, 1990) and are more robust to the data outliers sample in order to obtain a robust estimation of the param- and virtually unbiased for small samples. Moreover the L- eters. For this reason such kind of distributions are often moments have the very important advantage, over the con- used on regional basis. The assumption that a region is a ventional moments, of being less affected from the effects of set of gauging sites whose hydrologic events frequency be- sampling variability being linear functions of the data. haviour is homogeneous, in some quantifiable manner (Cun- In the following the probabilistic model and the parame- nane, 1988), allows to estimate hydrologic variables quin- ter estimation method are described. A procedure to find an tiles even in ungauged sites. In Italy a regionalized procedure estimation of the outlier percentage is also investigated. have been developed in the VaPI (Valutazione delle Piene in Italia) project, based on the TCEV distribution (Rossi et al., 1992). In the framework of the regional analysis of rainfall Correspondence to: E. Caporali ([email protected]fi.it) 332 V. Tartaglia et al.: L-moments assessment for frequency analysis of rainfall extremes 2 Model development and parameter estimation if s=0 is expressed by: θ · [ln λ + γ + ln(r + 1)] The probability model is defined by the following convex I (i, j, r, s) = i i (10) combination or mixture of two Gumbel distributions. r + 1 x x sλj = − · − − + · − − if s>0 and θ θ <1 is expressed by: FX(x) (1 α) exp 31 exp α exp 32 exp [(r+1)λ ] i / j θ1 θ2 i (1) " ∞ j j # θi X (−1) λ j I (i, j, r, s) = ln ((r + 1) · λi) · · 0 + 1 + µ µ r + 1 j! θ with : 3 = exp − 1 and 3 = exp − 2 (2) j=0 1 θ 2 θ 1 2 j ∞ j j 0 + 1 θi X (−1) λ θ where µ1, µ2 are the modal values and θ1, θ2 are propor- + − · −γ − (11) r + 1 j! j + tional to the standard deviations of the two Gumbel distribu- j=0 θ 1 tions appearing in the mixture. The indexes 1 and 2 indicate respectively the basic and the outlier component. Using the α θj sλj here θ= , λ= 1 θ and γ is the Euler’s constant; θi [(r+1)λi ] / proportion, the model underlines the contribution of the two sλ if s>0 and j ≥1 is expressed by: independent components to the rainfall extreme values. θi θj [(r+1)λi ] / The fitting of the mixture model to the time series of rain- " " ∞ ## fall extreme values is evaluated comparing the L-moments of λ 1 X (−1)j λj j I (i, j, r, s) = · ln (s · λ ) · θ · 1 − · 0 + 1 + the at-site data (Hosking, 1990) with their theoretical expres- r + i j λ j! θ 1 j= sions. 0 " 2 ∞ j j # The theoretical expressions of the L-moments are defined λ θj X (−1) λ h + i − · · · −γ − θ 0 j 1 (12) as linear combinations of the Probability Weighted Moments r + 1 θ j! j+1 θ i j=0 (PWM) (Greenwood et al., 1979). The first four moments are: = θi = (r+1)λi here θ θ and λ 1/θ and γ is the Euler’s constant. j (sλj ) λ1 = β0; λ = 2β − β , 2 1 0 3 Application and preliminary conclusion λ3 = 6β2 − 6β1 + β0; λ4 = 20β3 − 30β2 + 12β1 + β0 (3) The analysis of the capability of the mixture model to reproduce the L-moments of the sample values has been carried where: out using L-kurtosis (τ4=λ4/λ2) and L-skewness (τ3=λ3/λ2), +∞ +∞ Z Z evaluated from Eqs. (3) and (5)–(8). i i βi = x · FX(x) dF (x) = x · FX(x) · f (x)dx (4) The study has been developed using time series, more than −∞ −∞ 30 years long, of annual maximum of daily rainfall height, over a territory of about 22 000 km2 in Tuscany. In particular is the i-th PWM given by: 201 time series, from 1916 to 1996, having a low spatial cor- relation, have been extracted from a set of 584 rain gauges. β0 = (1 − α) · θ1 · ln λ1 + α · θ2 · ln λ2 + γ [(1 − α) · θ1 + α · θ2] (5) Recently the database has been updated to the year 2000, and 307 time series have been extracted from a set of 880 rain β = (1 − α)2 · I(1, 2, 1, 0) + (1 − α)α · I(1, 2, 0, 1) 1 gauges. 2 + (1 − α) α · I(2, 1, 0, 1) + α · I(2, 1, 1, 0) (6) The analysis has been worked out on both the data sets and the L-moments, i.e. the L-kurtosis and the L-skewness, of the 3 2 β2 = (1 − α) · I(1, 2, 2, 0) + 2(1 − α) α · I(1, 2, 1, 1) sample have been compared with the corresponding theoret- +(1 − α)α2 · I(1, 2, 0, 2) + (1 − α)2α · I(2, 1, 0, 2) ical ones (Fig. 1). The comparison shows that the mixture model is able to reproduce the L-moments of the sample se- +2(1 − α)α2 · I(2, 1, 1, 1) + α3 · I(2, 1, 2, 0) (7) ries on a wide range of values. 4 3 β3 = (1 − α) · I(1, 2, 3, 0) + 3(1 − α) α · I(1, 2, 2, 1) +3(1 − α)2α2 · I(1, 2, 1, 2) + (1 − α)α3 · I(1, 2, 0, 3) 4 Future developments +( − α)3α · I( , , , ) + ( − α)2α2 · I( , , , ) 1 2 1 0 3 3 1 2 1 1 2 4.1 Characterization of the outliers percentage +3(1 − α)α3 · I(2, 1, 2, 1) + α4 · I(2, 1, 3, 0) (8) With the aim to develop a regionalization procedure for the where: parameters, based on the identification of homogeneous ar- +∞ Z eas with respect to the basic component of the rainfall phe- r s I (i, j, r, s) = x · Fi (x) · Fj (x) · fidx (9) nomena (Gabriele and Arnell, 1991), the spatial distribution −∞ of the outliers, in the studied region, has been investigated. range. Each site is classified with the number of outliers per total number of data. The ratio of the outliers with the total number of data is the α proportion of the specific measurement site.

L-Moments Based Assessment of a Mixture Model for Frequency Analysis of Rainfall Extremes

Lesson 7: Measuring Variability for Skewed Distributions (Interquartile Range)

Normal Probability Distribution

2018 Statistics Final Exam Academic Success Programme Columbia University Instructor: Mark England

Estimation of Quantiles and the Interquartile Range in Complex Surveys Carol Ann Francisco Iowa State University

Hand-Book on STATISTICAL DISTRIBUTIONS for Experimentalists

Descriptive Statistics

Measures of Variability

Measures of Spread

Chapter 4 – Analyzing Skewed Quantitative Data Introduction: in Chapter 3, We Focused on Analyzing Bell Shaped (Normal) Data, but Many Data Sets Are Not Bell Shaped

Outing the Outliers – Tails of the Unexpected

Problem Max. Points Your Points 1-10 10 11 10 12 3 13 4 14 18 15 8 16 7 17 14 Total 75

For Questions 1-4, Consider the Following Data Set: 4 3 6 2 8 8