(or how to go beyond of the range data)

Sept 2007, Romania

Philippe Naveau

Laboratoire des Sciences du Climat et l’Environnement (LSCE) Gif-sur-Yvette, France

Katz et al., Statistics of extremes in hydrology, Advances in Water Resources 25 (2002) 1287-1304 Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Extreme quotes

1 “Man can believe the impossible, but man can never believe the improbable” Oscar Wilde (Intentions, 1891)

2 “Il est impossible que l’improbable n’arrive jamais” Emil Julius Gumbel (1891-1966)

Extreme events ? ... a probabilistic concept linked to the tail behavior : low frequency of occurrence, large uncertainty and sometimes strong amplitude. Region of interest Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Important issues in Extreme Value Theory

Applied statistics

An asymptotic probabilistic concept Non-stationarity A statistical modeling approach Identifying clearly assumptions Univariate Multivariate Non-parametric Parametric Assessing uncertainties Goodness of fit and model selection Independence

Theoritical probability Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Outline

1 Motivation Heavy rainfalls Three applications

2 Univariate EVT Asymptotic result Historical perspective GPD Parameters estimation Brief summary of univariate iid EVT

3 Non-stationary extremes Spatial interpolation of return levels Downscaling of heavy rainfalls

4 Spatial extremes Assessing spatial dependences among maxima

5 Conclusions

II Main Part Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Why heavy rainfalls are important in geosciences ?

”It is very likely that hot extremes, heat waves, and heavy precipitation events will continue to become more frequent” and that “precipitation is highly variable spatially and temporally”

The policymakers summary of the 2007 Intergovernmental Panel on Climate Change Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Random variable types

Climate : maxima or mimina (daily, monthly, annually), dry spells, etc Hydrology : return levels.

a quantile estimation pb : how to find zp such that P(Z > zp) = p =⇒ Exceedances Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Return levels and return periods

A return level with a return period of T = 1/p years is a high threshold zp whose probability of exceedance is p. E.g., p = 0.01 ⇒ T = 100 years. Return level interpretations Waiting time : Average waiting time until next occurrence of event is T years Number of events : Average number of events occurring within a T -year time period is one Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Our main of interest : precipitation

1 Relevant parameter in meteorology and climatology

2 Highly stochastic nature compared to other meteorological parameters

era40 200 cccma3.1/t47[5] giss er miroc3.2/hires cccma3.1/t63 inm cm3.0 ncep2 miroc3.2/medres[3] cnrm cm3 ipsl cm4[2] era15 mpi echam5 100 echo g[3] mri cgcm2.3.2[5] gfdl cm2.0 ncar ccsm3[6]

1 gfdl cm2.1 ncar pcm1[4] !

y 50 giss aom a d

m m 20 ncep1

10 ncep2 ncep1 era40 era15 P20, 1981!2000 5 60S 30S 0E 30N 60N 100 Kharin andcccma3.1/t47[5] Zwiers,giss Journal er of Climate 2007era40 miroc3.2/hires ncep2, P20 (1981-2000) cccma3.1/t63 inm cm3.0 era15 miroc3.2/medres[3] 50 cnrm cm3 ipsl cm4[2] mpi echam5 echo g[3] mri cgcm2.3.2[5] gfdl cm2.0 ncar ccsm3[6]

1 gfdl cm2.1 ncar pcm1[4] !

y 20 giss aom a d cmap ncep1 m 10 m

5 ncep2 ncep1 5 cmap era40 era15 P20, 1981!2000 2 60S 30S 0E 30N 60N 200 1981!2000

100

1 50 ! y a d

m

m 20

10 P20 ncep2 era15 P5 5 20 era40 cmap ASI LND STR AFR ANT SAS GLB NTR SHE AUS NHE EUR ARC TRO SAM OCN NAM global regions zonal bands land only regions

Fig. 6 Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Heavy rainfall distributions

The problems at hand Classical distributions (Gamma, Weibull, Stretched-exponential, . . .) not satisfying for extremes EVT not adequate low and medium precipitation

Our main question How to go beyond the univariate site-per-site modeling and to take into account the spatial pairwise dependence among sites ? Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Three applications

Measuring the spatial dependence among maxima (Max-stable processes) : Vannitsem & Naveau (2007), Schlather & Tawn (2003), de Haan & Pereira (2005) Spatial Interpolation of return levels in Colorado (Hierarchical Bayesian models) : Cooley, Nychka and Naveau (2007), Coles & Tawn (1996). Downscaling extremes over Illinois (latent processes) : Vrac and Naveau (2007) Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Two toy examples

Annual maximum peak flow Crete ice core 1292 R.W. Katz et al. / Advances in Water Resources 25 (2002) 1287–1304 Potomac river (cfs)Data = crete Greenland (ecm) 3.1.1. Poisson–GP model Arising as an approximation for the distribution of 1000 excesses above a high threshold, the cumulative distri-

800 bution and quantile functions for the GP are given by:

600 1=c F x; rÃ; c 1 1 c x=rà À ; ð Þ ¼ À ½ þ ð ފ

400 rà > 0; 1 c x=rà > 0; 4 þ ð Þ ð Þ 1 c

200 F À 1 p; rÃ; c rÃ=c pÀ 1 ; 0 < p < 1: ð À Þ ¼ ð Þð À Þ

0 Here rà and c are the scale and shape parameters, re-

0.8 spectively. The interpretation of the shape parameter c is Time 0.4 equivalent to that for the GEV distribution (e.g., if

Posterior proba c > 0, then the GP distribution is heavy tailed). By 0.0 convention, c 0 refers to the limiting case obtained as 600 800 1000 ¼1200 1400 1600 1800 2000 c 0 in Eq. (4) Timeof the exponential distribution (i.e., an Fig. 3. Annual peak flow of Potomac River at Point of Rocks, MD, unbou! nded, thin tail). USA, 1895–2000. Let X1; X2; . . . ; Xn, denote a time series (assumed, for now, to be independent and identically distributed) whose high extreme values are of interest. The Poisson– GP model consists of two components (Chapter 4 in [15,25], Chapter 5 in [77]): (i) the occurrences of ex- ceedances of some high threshold u (i.e., Xi > u, for some i) are generated by a Poisson process (with rate parameter k); and (ii) the excesses over threshold u (i.e., Xi u, for some i) have a GP distribution (with scale andÀ shape parameters, rà and c). The scale parameter rà of the GP distribution differs from that for the GEV by an amount depending on the threshold u (see Eq. (A.3) in Appendix A). As previously mentioned, the assumption of inde- pendence can be relaxed by dealing with cluster maxima instead of all exceedances, and one way to relax the as- sumption of identical distribution is by letting the pa- rameters of the Poisson–GP model depend on covariates (e.g., annual or diurnal cycles). Fig. 4. Q–Q plot for fit of GEV distribution to annual peak flow of Potomac River (line of equality indicates perfect fit). 3.1.2. Point process approach Among others, Smith [85] developed the statistical theory needed to apply the point process approach to (Fig. 4) indicates that the fit is reasonably adequate, the statistics of extremes. In essence, this approach in- even in the upper tail. In Section 5.2.2, another annual volves representing the two components of the Poisson– peak flow time series will be analyzed for which the fit of GP model (i.e., the occurrence of exceedances and the the GEV distribution does not appear to be acceptable. excesses over a high threshold) jointly as a two-dimen- sional nonhomogenous Poisson process (one dimension is time, the other the excess values). In this way, features 3. Methodological developments of the GEV distribution for block maxima and the POT approach can be combined. In particular, the GEV 3.1. Theoretical framework distribution can be indirectly fitted via the POT method, but still in terms of the GEV parameterization. In this Underlying the POT method is a formal statistical way, the scale parameter r is invariant with respect to model, consisting of a Poisson process for the occur- the choice of threshold u, the extension to time-depen- rence of an exceedance of a high threshold and a gen- dent parameters (e.g., covariates) is immediate, and even eralized Pareto (GP) distribution for the excess over the thresholds that vary with time (e.g., because of annual threshold (termed ‘‘Poisson–GP model’’). A basic ref- cycles or trends) are permissible (Chapter 7 in [15]). erence for the point process representation of extremes is Suppose that it is desired to fit the GEV distribution, Leadbetter et al. (Chapter 5 in [59], also see Chapter 7 in with parameters l, r, and c, for the maximum over some [15]). time period denoted by 1=h. In other words, the time Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Maxima DistributionGumbel (1891-1966) Weibull (1887-1979) Fr´echet (1878-1973)

Distribution du maximum

Normal density Gumbel density ⇒ ⇐

Uniform density Weibull density ⇒ ⇐

Cauchy density Fr´echet density ⇒ ⇐

n = 50 n = 100

Extr^emes? Mesurer Interpoler R´egionaliser 6 ⇓ ↓ ↑ ⇑ Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Max-stability

Let Mn = max(X1,..., Xn) with Xi iid with distribution F.

Problem : find an and bn for a given F such that „ « Mn − an n P < x = F (anx + bn) = F(x) bn

Home work A Unit-Frechet` F(x) = exp(−1/x) for x > 0. Then an = 1 & bn = 0

Gumbel F(x) = exp(− exp(−x)) for all real x. Then an = 1 & bn = − log n

α 1/α Weibull F(x) = exp(−|x| ) for x < 0 (1 otherwise). Then an = n & bn = 0 Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Max-stability

Let Mn = max(X1,..., Xn) with Xi iid with distribution F.

Problem : find an and bn for a given F such that „ « Mn − an n lim P < x = lim F (anx + bn) = F(x) n→∞ bn n→∞

Home work B Exponential F(x) = 1 − exp(−x) for x > 0. Then an = 1 & bn = log n

Uniform F(x) = x for 0 < x < 1. Then an = 1/n & bn = 1 Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Generalized Extreme Value (GEV) distribution

„ «  −1/ξff Mn − an h “ x − µ ”i P < x ∼ GEV(x) = exp − 1 + ξ bn σ +

m-M01234xd21.0.1.2.3.4ensity

.4

0

.3

0

ty

si

n

.2

e

0

d

.1

0

.0

0

-2 -1 0 1 2 3 4 x

Home work C : show that a GEV is max-stable Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Historical perspective

Gumbel (1891-1966) Weibull (1887-1979) Frechet´ (1878-1973)

Emil Gumbel was born and trained as a statistician in Germany, forced to move to France and then the U.S. because of his pacifist and socialist views. He was a pioneer in the application of extreme value theory, particularly to climate and hydrology. Waloddi Weibull was a Swedish engineer famous for his pioneering work on reliability, providing a statistical treatment of fatigue, strength, and lifetime. Maurice Frechet was a French mathematician who made major contributions to pure mathematics as well as probability and statistics. He also collected empirical examples of heavy-tailed distributions.

Other important names : Fisher and Tippet (1928), Gnedenko (1943), etc Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

An active statistical and probabilistic field Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

GEV and return levels

 h “ x − µ ”i−1/ξff GEV(x) = exp − 1 + ξ σ +

Computing the return level zp such that GEV(zp) = 1 − p

−1 zp = GEV (1 − p)

σ ` −ξ ´ Hence, zp = µ + ξ [− ln(1 − p)] − 1] Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

GEV and return levels estimation

σ “ ” z = µ + [− ln(1 − p)]−ξ − 1] p ξ

Estimating the return level zp

“ ˆ ” zˆ =ˆµ + σˆ [− ln(1 − p)]−ξ − 1] p ξˆ

Estimating the GEV parameters estimates (ˆµ, σˆ, ξˆ) Maximum likelihood estimation Methods of moments type (PWM and GPWM) Exhaustive tail-index approaches Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

GEV and return levels estimation

σˆ “ −ξˆ ” zˆp =ˆµ + [− ln(1 − p)] − 1] ξˆ

Maximum likelihood estimates of (µˆ, σˆ, ξˆ)t Asymptotically distributed as a multivariate Gaussian vector with mean θ = (ˆµ, σˆ, ξˆ)t and covariance matrix that is the inverse of the expected information matrix whose elements are equal

„ ∂2 log l(θ) « E − ∂θi ∂θj where l(θ) is the likelihood function of the GEV distributed sample 1292 R.W. Katz et al. / Advances in Water Resources 25 (2002) 1287–1304

3.1.1. Poisson–GP model Arising as an approximation for the distribution of excesses above a high threshold, the cumulative distri- bution and quantile functions for the GP are given by:

1=c F x; rÃ; c 1 1 c x=rà À ; ð Þ ¼ À ½ þ ð ފ rà > 0; 1 c x=rà > 0; 4 þ ð Þ ð Þ 1 c F À 1 p; rÃ; c rÃ=c pÀ 1 ; 0 < p < 1: ð À Þ ¼ ð Þð À Þ Motivation Univariate EVT Non-stationary extremes Spatial extremes ConclusionsHere rà and c are the scale and shape parameters, re- spectively. The interpretation of the shape parameter c is equivalent to that for the GEV distribution (e.g., if c > 0, then the GP distribution is heavy tailed). By Our first toy example convention, c 0 refers to the limiting case obtained as ¼ c 0 in Eq. (4) of the exponential distribution (i.e., an Fig. 3. Annual peak flow of Potomac River at Point of Rocks, MD, unbou! nded, thin tail). USA, 1895–2000. Let X1; X2; . . . ; Xn, denote a time series (assumed, for 1292 R.W. Katz et al. / Advances in Water Resources 25 (2002) 1287–1304 now, to be independent and identically distributed) whose high extreme values are of interest. The Poisson– 3.1.1. Poisson–GP model GP model consists of two components (Chapter 4 in [15,25], Chapter 5 in [77]): (i) the occurrences of ex- Arising as an approximation for the distribution of ceedances of some high threshold u (i.e., Xi > u, for some excesses above a high threshold, the cumulativei) are generatdistri-ed by a Poisson process (with rate parameter bution and quantile functions for the GP arek);givenand (ii)by:the excesses over threshold u (i.e., Xi u, for some i) have a GP distribution (with scale andÀ shape 1=c F x; rÃ; c 1 1 c x=rà À ; parameters, rà and c). The scale parameter rà of the GP ð Þ ¼ À ½ þ ð ފ distribution differs from that for the GEV by an amount rà > 0; 1 c x=rà > 0; 4 þ ð Þ dependingðonÞ the threshold u (see Eq. (A.3) in Appendix 1 c F À 1 p; rÃ; c rÃ=c pÀ 1 ; 0 < p 0, then the GP distribution is heavy tailed). By Fig. 4. Q–Qconventiplot for fiton,of GEVc distribution0 refers totoannualthe limitingpeak flow ofcase obtained as Potomac River (line of equality¼indicates perfect fit). 3.1.2. Point process approach c 0 in Eq. (4) of the exponential distribution (i.e., an ! Among others, Smith [85] developed the statistical Fig. 3. Annual peak flow of Potomac River at Point of Rocks, MD, unbounded, thin tail). theory needed to apply the point process approach to (Fig. 4) indicates that the fit is reasonably adequate, USA,ξˆ1895–20= .00. . ξ = the statistics of extremes. In essence, this approach in- 0 191 with a P-value of 0 002 for likelihoodeven in the Letuppe ratioXr 1tail.; testX2;In. .Secti of. ; Xonn, den5.2.02,oteanothera timeanseriesnual (assumed, for volves representing the two components of the Poisson– peak flownow,time serito esbewillindependenbe analyzed fort andwhichidenticallthe fit of y distributed) GP model (i.e., the occurrence of exceedances and the the GEVwhosedistributhighion doeextres notmeapvaluespear toarebe acceptaof interble.est. The Poisson– excesses over a high threshold) jointly as a two-dimen- GP model consists of two components (Chsionalapternonho4 inmogenous Poisson process (one dimension [15,25], Chapter 5 in [77]): (i) the occurrenceis time,s oftheex-other the excess values). In this way, features 3. Methodceedancesological develoof somepmentshigh threshold u (i.e., Xi > ofu,theforGEsomeV distribution for block maxima and the POT i) are generated by a Poisson process (with rateapproachparametercan be combined. In particular, the GEV 3.1. Theoretical framework distribution can be indirectly fitted via the POT method, k); and (ii) the excesses over threshold u (i.e., X u, for buti Àstill in terms of the GEV parameterization. In this Underlyingsome thei) havePOT amethodGP distis aributionformal statistica(with lscale way,andtheshapescale parameter r is invariant with respect to model, conparameters,sisting of arPoià anssond c).processThe scalefor theparameoccur-ter rÃtheofchoicethe GPof threshold u, the extension to time-depen- rence ofdistan ributexceedanceion diofffersa highfromthresthatholdforanthed a GEVgen- bydeanntamountparameters (e.g., covariates) is immediate, and even eralized deParetopending(GP) ondistributionthe thresforholdthe euxcess(seeoverEq.the(A.3) inthresAppendixholds that vary with time (e.g., because of annual thresholdA).(termedAs previous‘‘Poisson–Gly menP model’tioned,’). Athebasicassumref- ptioncyclesof orinde-trends) are permissible (Chapter 7 in [15]). erence for the point process representation of extremes is Suppose that it is desired to fit the GEV distribution, Leadbetteperndenceet al. (Chaptercan be5 irelaxn [59],edalsobyseedealingChapterwith7 in clusterwithmaxiparametersma l, r, and c, for the maximum over some [15]). instead of all exceedances, and one way to relaxtime theperiodas-denoted by 1=h. In other words, the time sumption of identical distribution is by letting the pa- rameters of the Poisson–GP model depend on covariates (e.g., annual or diurnal cycles). Fig. 4. Q–Q plot for fit of GEV distribution to annual peak flow of Potomac River (line of equality indicates perfect fit). 3.1.2. Point process approach Among others, Smith [85] developed the statistical theory needed to apply the point process approach to (Fig. 4) indicates that the fit is reasonably adequate, the statistics of extremes. In essence, this approach in- even in the upper tail. In Section 5.2.2, another annual volves representing the two components of the Poisson– peak flow time series will be analyzed for which the fit of GP model (i.e., the occurrence of exceedances and the the GEV distribution does not appear to be acceptable. excesses over a high threshold) jointly as a two-dimen- sional nonhomogenous Poisson process (one dimension is time, the other the excess values). In this way, features 3. Methodological developments of the GEV distribution for block maxima and the POT approach can be combined. In particular, the GEV 3.1. Theoretical framework distribution can be indirectly fitted via the POT method, but still in terms of the GEV parameterization. In this Underlying the POT method is a formal statistical way, the scale parameter r is invariant with respect to model, consisting of a Poisson process for the occur- the choice of threshold u, the extension to time-depen- rence of an exceedance of a high threshold and a gen- dent parameters (e.g., covariates) is immediate, and even eralized Pareto (GP) distribution for the excess over the thresholds that vary with time (e.g., because of annual threshold (termed ‘‘Poisson–GP model’’). A basic ref- cycles or trends) are permissible (Chapter 7 in [15]). erence for the point process representation of extremes is Suppose that it is desired to fit the GEV distribution, Leadbetter et al. (Chapter 5 in [59], also see Chapter 7 in with parameters l, r, and c, for the maximum over some [15]). time period denoted by 1=h. In other words, the time Improve Flood From a time series to. . . Quantile Estimates by Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions Mathieu Ribatet ! ! Peak overAn Thresholdnual M (POT)axima Time series ⇒ 1 obs/year Intro ! POT ! Time series ⇒ λ obs/year Motivations Extreme Value Theory ! Markovian ! Time series ⇒ all exceedances Improve Inferences Modelling All Exceedances Justification Theory

First (Few) Results Test for Asymptotic Dependence Comparison Between All Markovian Models Inference on Flood Duration

Conclusions and Perspectives

Some References

11/29 Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Thresholding : the Generalized Pareto Distribution (GPD)

„ ξ y «−1/ξ P{R−u > y|R > u} = 1 + σu +

Vilfredo Pareto : 1848-1923

Born in France and trained as an engineer in Italy, he turned to the social sciences and ended his career in Switzerland. He formulated the power-law distribution (or ”Pareto’s Law”), as a model for how income or wealth is distributed across society. Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Generalized Pareto Distribution (GPD)

“ ”−1/ξ {R − u > y|R > u} = 1 + ξ y P σu

Parameters u = predetermined threshold

σu = scale parameter to be estimated ξ = shape parameter to be estimated

Advantages & Practical issues Flexibility to describe three different types of tail behavior More data are kept for the statistical inference Problem of threshold selection Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

GPD

„ ξ y «−1/ξ P{R − u > y|R > u} = 1 + σu +

Special cases (home work D)

Unit-Frechet` F(x) = exp(−1/x) for x > 0. Then σu = 1 and ξ = −1/α

Exponential F(x) = 1 − exp(−x) for x > 0. Then σu = 1 and ξ = 0

Uniform F(x) = x for 0 < x < 1. Then σu = 1 and ξ = −1

Stability property (home work E)

If the exceedance (R − u|R > u) follows a GPD(σu,ξ ) then the higher exceedance (R − v|R > v) also follows GPD(σu + (v − u)ξ,ξ ) Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

GPD : “From Bounded to Heavy tails” 5 . 1 0 . 1

!=-0.5 5 . 0 0 . 0 7

6 Index 5 4 3 2 1 !=0.0 0 0

Index 2 5 1 0 1 5

!=0.5 0

0 50 100 150 200 250 300 Index Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Estimating the GPD parameters estimates (ˆσu, ξˆ) Maximum likelihood estimation Methods of moments type (PWM and GPWM) Exhaustive tail-index approaches

Taking advantages of the stability property Mean Excess function σ + uξ (R − u|R > u) = u E 1 − ξ the scale parameter varies linearly in the threshold u the shape parameter ξ is fixed wrt the threshold u Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

GPD diagnostics & models selection for our Crete data

Probability plot Quantile Plot

400 400 !! ! ! ! ! 350 350

0.8 !!

300 300 !! !! ! 600

250 250 ! ! ! model 0.4 ! empirical 200 200 Mean Excess Mean Excess !! !! !! ! !! 150 150 ! !! !! !!!!! ! 200 ! 100 100 0.0

50 50 0.2 0.6 1.0 200 400 600 800 150 160 170 180 190 200 150 160 170 180 190 200 u empiricalu model

ξˆ = 0.56 (0.37) Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

GPD

GPD return level zp

» –−ξ ! σu p zp = u + − 1 ξ P(R > u)

Estimating the return level zp

» –−ξˆ ! σˆu p × n zˆp = u + − 1 ξˆ Nu Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

A first summary

Applied statistics So far, we have assumed an idd model Non-stationarity Asymptotic probability suggests GEV and GPD Univariate Multivariate Maximum likelihood approach Non-parametric provides asymptotic parameters Parametric and return levels uncertainties Goodness of fit and model Independence selection

Theoritical probability Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Outline

1 Motivation Heavy rainfalls Three applications

2 Univariate EVT Asymptotic result Historical perspective GPD Parameters estimation Brief summary of univariate iid EVT

3 Non-stationary extremes Spatial interpolation of return levels Downscaling of heavy rainfalls

4 Spatial extremes Assessing spatial dependences among maxima

5 Conclusions

II Main Part Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Daily precipitation (April-October, 1948-2001, 56 stations) 41

Ft. Collins● 40

Denver● Breckenridge ● Limon●

Grand● Junction 39 Colo ●Spgs latitude

Pueblo● 38 37

−109 −108 −107 −106 −105 −104 −103 −102

longitude Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Precipitation in Colorado’s front range

Data 56 weather stations in Colorado (semi-arid and mountainous region) Daily precipitation for the months April-October Time span = 1948-2001 Not all stations have the same number of data points Precision : 1971 from 1/10th of an inche to 1/100

D. Cooley, D. Nychka and P. Naveau, (2007). Bayesian Spatial Modeling of Extreme Precipitation Return Levels. Journal of The American Statistical Association (in press). Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Our main assumptions

Process layer : The scale and shape GPD parameters (ξ(x), σ(x)) are random fields and result from an unobservable latent spatial process Conditional independence : precipitation are independent given the GPD parameters

Our main variable change

σ(x) = exp(φ(x)) Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Hierarchical Bayesian Model with three levels

P(process, parameters|data) ∝ P(data|process, parameters) ×P(process|parameters) ×P(parameters) Process level : the scale and shape GPD parameters (ξ(x), σ(x)) are hidden random fields Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Our three levels

A) Data layer := P(data|process, parameters) =

„ «−1/ξi ξi y Pθ{R(xi) − u > y|R(xi) > u} = 1 + exp φi B) Process layer := P(process|parameters):

e.g. φi = α0 + α1 × elevationi + MVN (0,β 0 exp(−β1||x k − x j ||))

 ξmoutains, if x i ∈ mountains and ξi = ξplains, if x i ∈ plains

C) Parameters layer (priors) := P(parameters): e.g. (ξmoutains,ξ plains) Gaussian distribution with non-informative mean and variance Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

BayesianNo hierarchicaltre m modelingod`ele Bayesien hi´erarchique

Priors - α0 + α1 elev ξmoutains  Priors

@ @R © σ - zx  ξ

 @I 6 @

- β exp( β . ) P (R(x) > u) ξ  Priors 0 − 1|| || plains Priors

6 Priors

Extr^emes? Mesurer Interpoler R´egionaliser 16 ⇓ ↓ ↑ ⇑ JASA jasa v.2004/12/09 Prn:12/10/2006; 18:07 F:jasaap05380r.tex; (R) p. 10

Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions 10 Journal of the American Statistical Association, ???? 0

1 the same manner as φ when modeled spatially.ModelThe exceedance selection Table 1. Several of the Different GPD Hierarchical Models Tested and 60 Their Corresponding DIC Scores 2 rate model is handled analogously. 61 3 We ran three parallel chains for each model. Each simula- 62 Baseline model D¯ pD DIC 4 tion consisted of 20,000 iterations, the first 2,000 iterations 63 Model 0: φ φ 73,595.5 2.0 73,597.2 5 were considered to be burn-in time. Of the remaining itera- ξ = ξ 64 = 6 tions, every 10th iteration was kept to reduce dependence. We 65 Models in latitude/longitude space D pD DIC 7 used the criterion R as suggested by Gelman (1996) to test for ¯ 66 ˆ 8 convergence and assumed that values below the suggested crit- Model 1: φ α0 %φ 73,442.0 40.9 73,482.9 67 ξ = ξ + 9 ical value of 1.1 imply convergence. For all parameters of all = 68 Model 2: φ α0 α1(msp) %φ 73,441.6 40.8 73,482.4 10 models, the value of R is below 1.05 unless otherwise noted in ξ = ξ + + 69 ˆ = 11 Section 4.1. Model 3: φ α0 α1(elev) %φ 73,443.0 35.5 73,478.5 70 ξ = ξ + + 12 = 71 3.4 Spatial Interpolation and Inference Model 4: φ α0 α1(elev) α2(msp) %φ 73,443.7 35.0 73,478.6 13 ξ = ξ + + + 72 = 14 Our goal is to estimate the posterior distribution for the return 73 Models in climate space D¯ pD DIC 15 level for every location in the study region. From (3), zr(x) is a 74 Model 5: φ α0 %φ 73,437.1 30.4 73,467.5 16 function of the GPD parameters φ(x), ξ(x), and the (indepen- ξ = ξ + 75 = 17 dent) exceedance rate parameter ζ(x); thus, it is sufficient to Model 6: φ α0 α1(elev) %φ 73,438.8 28.3 73,467.1 76 ξ = ξ + + 18 estimate the posteriors of these processes. Our method allows = 77 Model 7: φ α0 %φ 73,437.5 28.8 73,466.3 19 us to draw samples from these distributions, which in turn can = + 78 ξ ξ mtn, ξ = plains 20 be used to produce draws from zr(x). Model 8: φ α0 α1(elev) %φ 73,436.7 30.3 73,467.0 79 = + + ξ ξ mtn, ξ 21 To illustrate our interpolation method, consider the log- = plains 80 Model 9: φ α0 %φ 73,433.9 54.6 73,488.5 22 transformed GPD scale parameter of the exceedance model. We = + 81 ξ ξ %ξ = + 23 begin with values for φ, αφ, and βφ from which we need to 82 NOTE: Models in the climate space had better scores than models in the longitude/latitude 24 interpolate the value of φ(x). We have assumed that the para- 83 space. % MVN(0, &), where [σ ]i, j β , 0 exp( β , 1 xi xj ). · ∼ = · − · # − # 25 meters αφ and βφ, respectively, determine the mean and co- 84 26 variance structure of the Gaussian process for φ(x). Using the 85 27 values of αφ and βφ, we are able to draw from the conditional φ(x) is modeled as in Section 3.1 and ξ(x) is modeled as a 86 28 distribution for φ(x) given the current values of φ. Doing this single value throughout the region. We allow the mean of the 87 29 for each iteration of the MCMC algorithm provides draws from scale parameter to be a linear function of elevation and/or MSP 88 30 the posterior distribution of φ(x). (Models 2, 3, and 4). To our surprise, we find that elevation 89 31 We do the same for the exceedance rate parameter ζ(x) and outperforms MSP as a covariate and, in fact, adding MSP does 90 32 for the GPD shape parameter ξ(x) if it is modeled spatially. not improve the model over including elevation alone. Unfor- 91 33 Pointwise means are used as point estimates for each of the pa- tunately, the maps produced by these simple models in the 92 34 rameters (Fig. 7). The entire collection of draws from the pos- traditional space seem to inadequately describe the extreme 93 35 terior distributions of φ(x), ξ(x), and ζ(x) are used to produce precipitation. For example, the point estimate maps for φ(x) 94 36 draws from the return level posterior distribution. The point- show relatively high values around the cities of Boulder and 95 37 wise quantiles and pointwise means of the posterior draws are Fort Collins but do not show similar values for the stationless 96 38 used for the return level maps (Figs. 8 and 9). region between the cities despite that it has a similar climate 97 and geography. 39 4. RESULTS 98 40 When we perform the analysis for the climate space, we ob- 99 41 4.1 Model Selection and Map Results tain better results. Both the model fit score and the effective 100 42 As in a regression study, we test both the threshold ex- number of parameters are lower in the climate space, yield- 101 43 ceedance and the exceedance rate models with different covari- ing lower DIC scores for corresponding models (e.g., Models 102 44 ates. To assess model quality, we use the deviance information 1 and 5 or Models 3 and 6). However, in the climate space, 103 45 criterion (DIC) (Spiegelhalter, Best, Carlin, and van der Linde adding elevation (or MSP) as a covariate does not seem to im- 104 46 2002) as a guide. The DIC produces a measure of model fit D prove the model as these covariates are already integrated into 105 47 ¯ the analysis as the locations’ coordinates. Most important, when 106 and a measure of model complexity pD and sums them to get 48 an overall score (lower is better). As the DIC scores result from the points are translated back to the original space, we obtain 107 49 the realizations of an MCMC run, there is some randomness parameter estimate maps that seem to better agree with the ge- 108 50 in them, and, therefore, nested models do not always have im- ography. 109 51 proved fits. We do not solely rely on the DIC to choose the most We then begin to add complexity to the shape parameter ξ(x). 110 52 appropriate model. Because our project is product oriented (i.e., Allowing the mountain stations and plains stations to have 111 53 we want to produce a map), we also considered the statistical separate shape parameter values slightly improves model fit 112 54 and climatological characteristics of each model’s map, as well (Model 7), but a fully spatial model for ξ(x) does not improve 113 55 as their uncertainty measures. model fit enough to warrant the added complexity (Model 9). 114 56 We first discuss the model for threshold exceedances. Ta- Model 7 is chosen as the most appropriate model tested based 115 57 ble 1 shows the models tested and their corresponding DIC not only on its DIC score but also on the posteriors for the 116 58 scores. We begin developing models in the traditional lat- parameters ξmtn and ξplains. Selected posterior densities from 117 59 itude/longitude space and start with simple models where Model 7 are plotted in Figure 6. The left plot in Figure 7 shows 118 Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Return levels posterior mean

Ft. Collins Ft. Collins ! Greeley ! Greeley ! !

Boulder 8 Boulder 8 ! ! 40 40 Denver Denver ! !

7 7

39 Colo Spgs 39 Colo Spgs

latitude ! latitude !

6 6 Pueblo Pueblo ! ! 38 38

5 5 37 37 !106.0 !105.0 !106.0 !105.0

longitude longitude Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Posterior quantiles of return levels (.025, .975)

Ft. Collins Ft. Collins ! ! ! ! ! Greeley ! Greeley ! 10 10 4.0 ! ! ! !! !

! ! Boulder Boulder

9 9 ! ! ! ! ! 40 40 40 3.5 Denver Denver ! ! ! ! ! ! ! ! ! !! ! ! 8 8 ! ! ! ! 3.0

! ! ! ! ! ! 7 7 ! !

39 Colo Spgs 39 Colo Spgs 39 !

latitude latitude latitude ! ! ! !! ! ! ! ! ! 2.5 ! 6 6

Pueblo Pueblo ! ! ! ! 2.0 5 5

38 38 38 ! !

!

! 4 4 ! 1.5

! ! 37 37 37 !106.0 !105.5 !105.0 !104.5 !106.0 !105.5 !105.0 !104.5 !106.0 !105.5 !105.0 !104.5

longitude longitude longitude Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Downscaling of rainfalls

Vrac and Naveau, (2007). Stochastic downscaling of precipitation : From dry events to heavy rainfalls. Water Resource Research (in press) Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Our data

Local scale : Rt = Daily precipitation recorded at 37 stations 1980-1999 (DJF)

Large scale : X t = NCEP geopotential height, Q and DT at 850mb

Weather regimes : St = Four regimes of precipitation

Our objective :

What is the precipitation of Rt given the large and regional scale characteristics, X t and St ?

Subsidiary questions : What is the precipitation distribution at a given site ? What are meaningful regional patterns ? How to connect the different scales ?

Our strategy A GPD latent process (hidden markov process + logistic model) that depends on X t et de St Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Illinois rainfall patterns

Precipitation pattern 1 Precipitation pattern 2 '( '( '" '" '# '# %! %! %$ %$ %& %&

!!" !!# !$! !$$ !!" !!# !$! !$$

Precipitation pattern 3 Precipitation pattern 4 '( '( '" '" '# '# %! %! %$ %$ %& %&

!!" !!# !$! !$$ !!" !!# !$! !$$ Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

How to switch from one precipitation pattern to the other ?

Markov chains

P(St = j|St−1 = i) ∝ γij ..., but these transition probabilities are independent of the atmospheric variables X t like Q

Non-homogeneous Markov chains

» 1 – P(S = j|S = i, X ) ∝ γ exp − (X − µ )Σ−1(X − µ )0 t t−1 t ij 2 t ij t ij where Σ = atmospheric variables covariance

µij = atmospheric variables means Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Precipitation density : non-homogeneous Mixture Model

1.0 1.0 Mixture Weibull 0.8 GPD 0.01

0.6

0.0001 0.4

1e-06 0.2 Mixture Weibull GPD 0.0 1e-08 0.0 1.0 2.0 3.0 4.0 5.0 0.0 1.0 2.0 3.0 4.0 5.0

Mixture Distribution GPD and Gamma (Weibull after Frigessi et al. (2003))

1 “` ´ ” fmix(r) = 1 − wµ,τ (r) · fΓ(α,β)(r) + wµ,τ (r) · fG(σ,ξ)(r, u = 0) Z | {z } | {z } | {z } | {z } Gamma weight Gamma pdf GPD weight GPD Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Weight Function 1.0 0.8 0.6 GAMMA GPD 0.4 Weight function [1] 0.2 0.0

0 1 2 3 4 5 6 Precipitation [cm]

Weight Function Dynamic mixture model for unsupervised tail estimation without threshold selection (Frigessi et al., 2002) 1 1 “ r − µ ” w (r) = + arctan m,τ 2 π τ Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

QQ plots for the Spartan station

(a) (b)

(c) (d)

Figure 5: QQplots of precipitation patterns 2 and 3 for station “Sparta”, for function hβ in

(11) as a gamma distribution in (a) and (b) and hβ as a mixture (5) in (c) and (d). Units

are cm.

43 Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Model selection

(0) gamma + GPD : parameters vary with location and precipitation pattern, (i) only gamma : parameters vary with location and precipitation pattern, (ii) gamma + GPD with one ξ parameter per pattern (iii) same as (ii) with τ set to be equal to 0, (iv) gamma + GPD with one common ξ for all stations and all patterns, (v) same as (iv) with τ set to be equal to 0. (iii)∗ same as model (iii) but only gamma distributions for pattern 1. Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Model selection

Station Model (0) Model (i) Model (ii) Model (iii) Model (iv) Model (v) Model (iii)∗

p = 24n p = 8n p = 20n + 4 p = 16n + 4 p = 20n + 1 p = 16n + 1 p = 12n + 5

Aledo AIC=-796.52 AIC=-816.58 AIC=-795.76 AIC=-809.79 AIC=-819.46 AIC=-816.18 AIC=-816.79

Aurora AIC=-1137.47 AIC=-1149.99 AIC=-1256.53 AIC=-1293.89 AIC=-1358.48 AIC=-1152.51 AIC=-1299.89

Fairfield AIC=14.36 AIC=103.07 AIC=22.45 AIC=22.37 AIC=-76.81 AIC=-10.21 AIC=16.37 3

7 Sparta AIC=277.10 AIC=372.92 AIC=235.65 AIC=228.35 AIC=231.91 AIC=251.44 AIC=222.35

Windsor AIC=-1014.80 AIC=-920.68 AIC=-1016.25 AIC=-1017.59 AIC=-1069.99 AIC=-1028.91 AIC=-1023.59

All five stations AIC=-4433.18 AIC=-4422.27 AIC=-4479.50 AIC=-4515.13 AIC=-4425.06 AIC=-4423.78 AIC=-4553.13

Table 3: Akaike Information Criterion (AIC) values obtained for our five selected weather stations and for our seven models.

The bold values correspond to the optimal criterion per row. Below each model’s name, the number p of parameters for n

stations is provided. Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Spatial Statistics for Maxima 40 3 30 2 How to describe the spatial 1 y 20 dependence as a function of

0 the distance between two

10 points ? −1

10 20 30 40

x Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Spatial Statistics for Maxima

● ● ● ●

40 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

30 ● ● ● ● ● ● ● ● ● ● ● ● ● ●

y ● ● ● How to perform

20 ● ● ● ● ● ● spatial interpolation of ● ● ● ● ● ● ● ● extreme events ? ● ● ● ●

10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0

10 20 30 40

x Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Spatial Statistics for Maxima

A few Approaches for modeling spatial extremes Max-stable processes : Adapting asymptotic results for multivariate extremes Schlather & Tawn (2003), Naveau et al. (2007), de Haan & Pereira (2005) Bayesian or latent models : spatial structure indirectly modeled via the EVT parameters distribution Coles & Tawn (1996), Cooley et al. (2005) Linear filtering : Auto-Regressive spatio-temporal heavy tailed processes, Davis and Mikosch (2007) Gaussian anamorphosis : Transforming the field into a Gaussian one Wackernagel (2003) Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Max-stable processes

Max-stability in the univariate case with an unit-Frechet´ margin

F t (tx) = F(x), for F(x) = exp(−1/x)

Max-stability in the multivariate case with unit-Frechet´ margins

t F (tx1,..., txm) = F(x1,..., txm), for Fi (xi ) = exp(−1/xi ) Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

A central question

P [M(x) < u, M(x + h) < v] =?? Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Bivariate case for Maxima from an asymptotic point of view

If one assumes unit Frechet´ margins then the distribution of the vector (M(x), M(x + h)) goes to

F(u, v) = exp [−Vh(u, v)]

where Z 1 „ w 1 − w « Vh(u, v) = 2 max , dHh(w) 0 u v R 1 with Hh(.) a distribution function on [0, 1] such that 0 w dHh(w) = 0.5.

Home work : check that F(u, v) is bivariate max-stable Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Bivariate case (M(x), M(x + h))

Complex non-parametric structure

Z 1 w 1 − w  Vh(u, v) = 2 max , dHh(w) 0 u v

How to estimate Vh(u, v) ? Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Geostatistics : Variograms

Complex non-parametric structure ●

● 1.0

● ● ● ● 1 2 0.8 γ(h) = E|Z(x + h) − Z(x)| ● 2 ● ● ● 0.6

● semivariance Finite if light tails 0.4 ● Capture all spatial ● 0.2 structure if {Z(x)}

0.0 Gaussian fields 0.0 0.2 0.4 0.6 0.8 but not well adapted for distance extremes Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

A madogram type

1 ν = |F(M(x + h)) − F(M(x))| h 2E

Properties Defined for light & heavy tails

nice link with EVT but only gives Vh(1, 1) Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

1 νh = 2 E |F(M(x + h)) − F(M(x))|

simulated fields Madogram

40 ●

0.8 ●

● 3 ● 30 0.6 2

● 1 y 0.4 20 ● ● ● ● ● ● ●

0 ● estimated madogram ● ●

0.2 ● 10 ● ● −1 ● 0.0

10 20 30 40 1 4 6 8 10 12 14 16 18 20

x distance Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

The λ−madogram

1 ν (λ) = F λ(M(x + h)) − F 1−λ(M(x)) h 2E

Properties Defined for light & heavy tails Called a λ-Madogram Nice links with extreme value theory

νh(0) = νh(1) = 0.25 Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

A fundamental relationship

Home work

Vh(λ, 1 − λ) 3 νh(λ) = − c(λ), with c(λ) = 1 + Vh(λ, 1 − λ) 2(1 + λ)(2 − λ)

Conversely, c(λ) + νh(λ) Vh(λ, 1 − λ) = 1 − c(λ) − νh(λ) Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

The λ−madogram Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

30-year maxima of daily precipitation in Bourgogne

146 stations of maxima of daily precipitation over 1970-1999 in Bourgogne Pairwise spatial dependences of precipitation extremes over Belgium Ste´phane Vannitsem and Philippe Naveau Institut Royal Me´te´orologique de Belgique, Laboratoire des Sciences du Climat et de l’Environnement, IPSL-CNRS, France (Contact: [email protected]) variables defined by the empirical marginal distributions as 1 Aim 55 stations of the Climatological network 0.19 0.19 The practical use of the extreme value theory is usually based on the univariate asymptotic distri- 0.18 0.18 butions known as the Generalized Extreme Value distributions (GEV). Although this information 0.17 0.17 0.16 0.16 could be very useful when interested in the occurrence of extremes at the local level, it does 0.15 0.15 not provide the complete information on the joint occurrence of extremes in space, except in the (1) 0.14 0.14 0.13 0.13 case of independence between extremes. This type of information can however be of crucial im- Madogram 0.12 0.12 portance when the occurrence of a specific event (e.g. a river flood) depends on extreme events where the indices and indicate the position of the two series. To evaluate the dependence for 0.11 0.11 0.1 0.1 (e.g. precipitation) that have occurred concommitantly at different places. The importance of the this bivariate process, Coles et al (1999) propose a measure of the form 0.09 0.09 0.08 0.08 spatial features on the estimate of return times of extreme events over specific regions has been 0 50 100 150 200 250 300 also extensively discussed in Buishand (1984) and Coles and Tawn (1990). (2) distance (km) Multivariate as well as spatial approaches have been developped in order to model the joint Figure 3: occurrence of extreme events. The specific choice of a model depends however on the depen- For independent variables, , and for full dependence, . The extremal If one assumes that reaches a plateau for large, this plateau provide an estimate of dence between the extremes. Several techniques have been developed in order to evaluate this coefficient is defined as the extremal coefficient. One can therefore look at the variations of this estimate as a function dependence, two of which are compared in the present work to evaluate the spatial dependence of the distance between pairs. This is what is shown on the right panel of Fig. 2. The function (3) of precipitation extremes. To this aim, an ensemble of stations forming two different networks crosses the 0 line corresponding to the absence of dependences around 200 km. However one covering the Belgian territory are used. The second approach consists to evaluate the extremal dependence by using the madogram of the difficulty with this quantity is again the large uncertainty on the estimator of . that was found to provide a direct estimate of the extremal coefficient under the assumption of 19 stations, time of aggregation: 1 hour 19 stations, time of aggregation: 6 hours max-stability. In this case the extremal coeeficient is given by 0.19 0.19 0.22 0.22 2 The Observational networks 0.21 0.21 0.18 0.18 0.2 0.2 In the climatological network the precipitation accumulated daily from 8 AM up to 8 AM of the next 0.19 0.19 0.17 0.17 day is recorded. It is now composed of 269 stations. This network has of course experienced (4) 0.18 0.18 a lot of changes such as displacement or withdrawal of stations. To get a quite homogeneous 0.16 0.16 0.17 0.17

Madogram Madogram 0.16 0.16 0.15 0.15 network, we focus here on a subset of stations which have not experienced such displacements where 0.15 0.15 0.14 0.14 and covering the same period of time from 1951 to 2005. We end up with 55 stations covering 0.14 0.14 0.13 0.13 the whole belgian territory. The stars in Fig.1 indicate the locations of these stations. (5) 0.13 0.13 0.12 0.12 0 50 100 150 200 250 0 50 100 150 200 250 The hydrometeorological network is composed of 19 stations, of which 18 have been installed in distance (km) distance (km) 1968. It records the amount of precipitation at a very high frequency (every 10 minutes). Clearly is the estimator of the normalized madogram, which converges asymptotically to a gaussian dis- Figure 4: this network is less dense than the climatological network and some stations display long periods tribution with mean and variance with (see The second measure presented in Section 3 displays a smaller uncertainty on the extremal co- without data. Despite the less good coverage and maintenance of the stations, this network can Cooley, 2005). efficient estimate. Note that when using this approach, one assumes implicitely that the process provide useful information on the spatial properties of the extremes for smaller durations of the An extension of the previous approach has been recently developed by Naveau et al (2007) is falling into the class of bivariate extreme value distributions. In view of the discussion above, precipitation. These stations are marked by a plus in Figure 1. in which the extremes corresponding to different quantiles are compared. It is based on the some caution should be kept in mind. Figure 3 displays the madogram as well as the con- For both networks, the annual maxima have been selected for different period of accumulation of -madogram whose relation is fidence interval for the stations of the climatological network. The value associated with the the precipitations (the duration) and the spatial properties of the extremes have been explored. independence is 1/6. The result suggests that some dependences are present up to 200 km Before exploring these properties, a statistical test for a change in the mean of the time series (6) in quite good agreement with the estimate base on . Figure 4 displays the results ob- composing the network has been applied in order to partially check their stationarity. The ex- tained with the hydrometeorological network for durations of 1 hour (left panel) and 6 hours (right tremes for durations from 10 minutes up to about 1 day, display a change in the mean at the 5 panel). The spatial dependence is considerably reduced in this case (up to 50 km for a duration confidence level for the time series located close to the coastal zone. For longer durations, the which allows to evaluate for different quantile values, of 1 hour). change in the mean affects the other zones of the country. . This measure will also be briefly discussed in the following. Let us Motivationnow turn to theUnivariate EVT . Non-stationaryFigure 5 extremesdisplays the Spatial-madog extremesr am as a Conclusionsfunction of for The analysis of the spatial dependences has been performed on the set of stationary, as well as different distances h. The continuous line corresponds to the case for which the extremes are on the ensemble of stations of the networks. In both cases (tested for small durations), the results independent54-year, w maximahile the das of dailyhed precipitationline to the full independenc Belgium e. The empirical evaluations are falling are close. We will therefore present the statistics of the spatial dependences for the ensemble of 4 Application to the Belgian networks between these two asymptotic solutions and progressively converge to the independent solution stations. Figure 3 displays the value of as a function of u for pairs whose distances fall into the for increasing distances. interval 50-70 km. The quantile value interval has been divided into 10 bins for which 55 stations of the Climatological network is evaluated. Clearly, this function (symbols) decrease slowly as a function of u up to a plateau 0.25 beyond about a quantile of . Note that the two curves refer to the 95 confidence interval 0.2 as obtained with the delta method (Coles et al 1999). 0.15

55 stations of the climatological network 55 stations of the climatological network 0.1 0.4 0.4 0.8 0.8 l-madogram

0.35 50-70 km 0.35 0.6 0.6 0.05 0.3 0.3 0.4 0.4 0 0.25 0.25 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.2 0.2 l 0.2 0.2 Xi(u) Xi(0.8) 0 0 0-10 km independence 0.15 0.15 30-50 km full dependence -0.2 -0.2 0.1 0.1 70-90 km full dependence 130-150 km Figure 1: The two precipitation networks used in this study 0.05 0.05 -0.4 -0.4 Figure 5: 0 0 -0.6 -0.6 55 stations of precipitation maxima over 1951-2005 in Belgium 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 50 100 150 200 250 300 ********* u distance (km) Work supported by the NSF-GMC (ATM-0327936) grant and the European commission project NEST No 12975. 3 Spatial dependences: theoretical aspects Figure 2: Several techniques have been developed to handle the spatial dependence between extremes, For an extremal process falling into the class of bivariate extreme value distribution (Coles, 2001), two of which will be discussed here in the context of the analysis of precipitation. The first the value of is constant as a function of . In the present case, the decrease as a func- References [1] Buishand A. Bivariate extreme-value data and the station-year method. J. Hydrol., Vol. 69, 1984. approach is based on the notion of copula (see Coles, 2001), which relates the marginal distribu- tion of the quantile value suggests that the annual maxima do not fall into this category, except tions and the multivariate distribution as for large quantile values for which an apparent convergence toward a plateau is found. We can [2] Coles S.G. An introduction to statistical modeling of extreme values. Springer Series in Statistics, 208 p, 2001. If the marginal distributions are continuous, this function C is unique. It can be therefore only expect that the asymptotic theory is only valid for very large extreme values. To [3] Coles S.G., J. Heffernan and J.A. Tawn Dependence measures for multivariate extremes. Extremes, Vol 2, 339-365. obtained through a change of variables from to the new variables check this point, an analysis of the behavior of for extremes selected on longer time . C is the multivariate distribution of the variables, . Note that the new variables [4] Coles S.G. and J.A. Tawn Statistics of coastal flood prevention. Phil. Trans. R. Soc. Lond. A, Vol. 332, 457, 1990. windows (2, 3, ... years) would be necessary. Unfortunately, this approach will increase sub- are uniformly distributed between 0 and 1. In the present paper we focus on the bivariate [5] Cooley D. Statistical analysis of extremes motivated by weather and climate studies: applied and theoretical stantially the uncertainty with which the evaluation of is made, since it depends on the distributions since they already provide important information on the spatial dependences. In- advances. PhD thesis, University of Colorado, 110p, 2005. number of extremal events used. stead of looking at the properties of the original variables, one analyzes the properties of the new [6] Naveau P., A. Guillou, D. Cooley and J. Biebolt. Modeling pairwise dependence of maxima in space. In press in ...... Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions

Conclusions

Applied statistics

An asymptotic probabilistic concept Non-stationarity A statistical modeling approach Identifying clearly assumptions Univariate Multivariate Non-parametric Parametric Assessing uncertainties Goodness of fit and model selection Independence

Theoritical probability