Extreme Value Theory (Or How to Go Beyond of the Range Data)
Total Page:16
File Type:pdf, Size:1020Kb
Extreme Value Theory (or how to go beyond of the range data) Sept 2007, Romania Philippe Naveau Laboratoire des Sciences du Climat et l’Environnement (LSCE) Gif-sur-Yvette, France Katz et al., Statistics of extremes in hydrology, Advances in Water Resources 25 (2002) 1287-1304 Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions Extreme quotes 1 “Man can believe the impossible, but man can never believe the improbable” Oscar Wilde (Intentions, 1891) 2 “Il est impossible que l’improbable n’arrive jamais” Emil Julius Gumbel (1891-1966) Extreme events ? ... a probabilistic concept linked to the tail behavior : low frequency of occurrence, large uncertainty and sometimes strong amplitude. Region of interest Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions Important issues in Extreme Value Theory Applied statistics An asymptotic probabilistic concept Non-stationarity A statistical modeling approach Identifying clearly assumptions Univariate Multivariate Non-parametric Parametric Assessing uncertainties Goodness of fit and model selection Independence Theoritical probability Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions Outline 1 Motivation Heavy rainfalls Three applications 2 Univariate EVT Asymptotic result Historical perspective GPD Parameters estimation Brief summary of univariate iid EVT 3 Non-stationary extremes Spatial interpolation of return levels Downscaling of heavy rainfalls 4 Spatial extremes Assessing spatial dependences among maxima 5 Conclusions II Main Part Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions Why heavy rainfalls are important in geosciences ? ”It is very likely that hot extremes, heat waves, and heavy precipitation events will continue to become more frequent” and that “precipitation is highly variable spatially and temporally” The policymakers summary of the 2007 Intergovernmental Panel on Climate Change Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions Random variable types Climate : maxima or mimina (daily, monthly, annually), dry spells, etc Hydrology : return levels. a quantile estimation pb : how to find zp such that P(Z > zp) = p =) Exceedances Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions Return levels and return periods A return level with a return period of T = 1=p years is a high threshold zp whose probability of exceedance is p. E.g., p = 0.01 ) T = 100 years. Return level interpretations Waiting time : Average waiting time until next occurrence of event is T years Number of events : Average number of events occurring within a T -year time period is one Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions Our main random variable of interest : precipitation 1 Relevant parameter in meteorology and climatology 2 Highly stochastic nature compared to other meteorological parameters era40 200 cccma3.1/t47[5] giss er miroc3.2/hires cccma3.1/t63 inm cm3.0 ncep2 miroc3.2/medres[3] cnrm cm3 ipsl cm4[2] era15 mpi echam5 100 echo g[3] mri cgcm2.3.2[5] gfdl cm2.0 ncar ccsm3[6] 1 gfdl cm2.1 ncar pcm1[4] ! y 50 giss aom a d m m 20 ncep1 10 ncep2 ncep1 era40 era15 P20, 1981!2000 5 60S 30S 0E 30N 60N 100 Kharin andcccma3.1/t47[5] Zwiers,giss Journal er of Climate 2007era40 miroc3.2/hires ncep2, P20 (1981-2000) cccma3.1/t63 inm cm3.0 era15 miroc3.2/medres[3] 50 cnrm cm3 ipsl cm4[2] mpi echam5 echo g[3] mri cgcm2.3.2[5] gfdl cm2.0 ncar ccsm3[6] 1 gfdl cm2.1 ncar pcm1[4] ! y 20 giss aom a d cmap ncep1 m 10 m 5 ncep2 ncep1 5 cmap era40 era15 P20, 1981!2000 2 60S 30S 0E 30N 60N 200 1981!2000 100 1 50 ! y a d m m 20 10 P20 ncep2 era15 P5 5 20 era40 cmap ASI LND STR AFR ANT SAS GLB NTR SHE AUS NHE EUR ARC TRO SAM OCN NAM global regions zonal bands land only regions Fig. 6 Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions Heavy rainfall distributions The problems at hand Classical distributions (Gamma, Weibull, Stretched-exponential, . .) not satisfying for extremes EVT not adequate low and medium precipitation Our main question How to go beyond the univariate site-per-site modeling and to take into account the spatial pairwise dependence among sites ? Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions Three applications Measuring the spatial dependence among maxima (Max-stable processes) : Vannitsem & Naveau (2007), Schlather & Tawn (2003), de Haan & Pereira (2005) Spatial Interpolation of return levels in Colorado (Hierarchical Bayesian models) : Cooley, Nychka and Naveau (2007), Coles & Tawn (1996). Downscaling extremes over Illinois (latent processes) : Vrac and Naveau (2007) Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions Two toy examples Annual maximum peak flow Crete ice core 1292 R.W. Katz et al. / Advances in Water Resources 25 (2002) 1287–1304 Potomac river (cfs)Data = crete Greenland (ecm) 3.1.1. Poisson–GP model Arising as an approximation for the distribution of 1000 excesses above a high threshold, the cumulative distri- 800 bution and quantile functions for the GP are given by: 600 1=c F x; rÃ; c 1 1 c x=rà À ; ð Þ ¼ À ½ þ ð Þ 400 rà > 0; 1 c x=rà > 0; 4 þ ð Þ ð Þ 1 c 200 F À 1 p; rÃ; c rÃ=c pÀ 1 ; 0 < p < 1: ð À Þ ¼ ð Þð À Þ 0 Here rà and c are the scale and shape parameters, re- 0.8 spectively. The interpretation of the shape parameter c is Time 0.4 equivalent to that for the GEV distribution (e.g., if Posterior proba c > 0, then the GP distribution is heavy tailed). By 0.0 convention, c 0 refers to the limiting case obtained as 600 800 1000 ¼1200 1400 1600 1800 2000 c 0 in Eq. (4) Timeof the exponential distribution (i.e., an Fig. 3. Annual peak flow of Potomac River at Point of Rocks, MD, unbou! nded, thin tail). USA, 1895–2000. Let X1; X2; . ; Xn, denote a time series (assumed, for now, to be independent and identically distributed) whose high extreme values are of interest. The Poisson– GP model consists of two components (Chapter 4 in [15,25], Chapter 5 in [77]): (i) the occurrences of ex- ceedances of some high threshold u (i.e., Xi > u, for some i) are generated by a Poisson process (with rate parameter k); and (ii) the excesses over threshold u (i.e., Xi u, for some i) have a GP distribution (with scale andÀ shape parameters, rà and c). The scale parameter rà of the GP distribution differs from that for the GEV by an amount depending on the threshold u (see Eq. (A.3) in Appendix A). As previously mentioned, the assumption of inde- pendence can be relaxed by dealing with cluster maxima instead of all exceedances, and one way to relax the as- sumption of identical distribution is by letting the pa- rameters of the Poisson–GP model depend on covariates (e.g., annual or diurnal cycles). Fig. 4. Q–Q plot for fit of GEV distribution to annual peak flow of Potomac River (line of equality indicates perfect fit). 3.1.2. Point process approach Among others, Smith [85] developed the statistical theory needed to apply the point process approach to (Fig. 4) indicates that the fit is reasonably adequate, the statistics of extremes. In essence, this approach in- even in the upper tail. In Section 5.2.2, another annual volves representing the two components of the Poisson– peak flow time series will be analyzed for which the fit of GP model (i.e., the occurrence of exceedances and the the GEV distribution does not appear to be acceptable. excesses over a high threshold) jointly as a two-dimen- sional nonhomogenous Poisson process (one dimension is time, the other the excess values). In this way, features 3. Methodological developments of the GEV distribution for block maxima and the POT approach can be combined. In particular, the GEV 3.1. Theoretical framework distribution can be indirectly fitted via the POT method, but still in terms of the GEV parameterization. In this Underlying the POT method is a formal statistical way, the scale parameter r is invariant with respect to model, consisting of a Poisson process for the occur- the choice of threshold u, the extension to time-depen- rence of an exceedance of a high threshold and a gen- dent parameters (e.g., covariates) is immediate, and even eralized Pareto (GP) distribution for the excess over the thresholds that vary with time (e.g., because of annual threshold (termed ‘‘Poisson–GP model’’). A basic ref- cycles or trends) are permissible (Chapter 7 in [15]). erence for the point process representation of extremes is Suppose that it is desired to fit the GEV distribution, Leadbetter et al. (Chapter 5 in [59], also see Chapter 7 in with parameters l, r, and c, for the maximum over some [15]). time period denoted by 1=h. In other words, the time Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions Maxima DistributionGumbel (1891-1966) Weibull (1887-1979) Fr´echet (1878-1973) Distribution du maximum Normal density Gumbel density ⇒ ⇐ Uniform density Weibull density ⇒ ⇐ Cauchy density Fr´echet density ⇒ ⇐ n = 50 n = 100 Extr^emes? Mesurer Interpoler R´egionaliser 6 ⇓ ↓ ↑ ⇑ Motivation Univariate EVT Non-stationary extremes Spatial extremes Conclusions Max-stability Let Mn = max(X1;:::; Xn) with Xi iid with distribution F. Problem : find an and bn for a given F such that „ « Mn − an n P < x = F (anx + bn) = F(x) bn Home work A Unit-Frechet` F(x) = exp(−1=x) for x > 0. Then an = 1 & bn = 0 Gumbel F(x) = exp(− exp(−x)) for all real x.