Statistical Extreme Value Theory (EVT) Part II

Eric Gilleland Research Applications Laboratory 21 July 2014, Trieste, Italy Peaks Over Threshold (POT)

Poisson distribution applicable for modeling the frequency of exceeding a high threshold (i.e., low probability, rare, event).

What about the intensity of values that exceed the threshold?

UCAR Confidential and Proprietary. © 2008, University Corporation for Atmospheric Research. All rights reserved.

Peaks Over Threshold (POT)

• X

• Let Y = X – u, conditional on X > u, where u is a high threshold. • Model the “excesses” over the threshold. • Y has an approximate generalized Pareto (GP) distribution for high u with cdf:

−1/ξ ⎡ y ⎤ y H (y;σ (u),ξ) = 1− ⎢1+ ξ ⎥ , y > 0, ξ > 0 ⎣ σ (u)⎦ σ (u) σ (u) > 0

UCAR Confidential and Proprietary. © 2008, University Corporation for Atmospheric Research. All rights reserved.

Peaks Over Threshold (POT)

Three types of Extreme Beta df; Value df’s 1.2 bounded Beta Exponential upper tail 1.0 Pareto

0.8 Exponential; 0.6

density light tail 0.4

0.2 Pareto; heavy tail 0.0

0 2 4 6 8 10

UCARUCAR ConfidentialConfidential andand Proprietary.Proprietary. ©© 2014,2008, UniversityUniversity CorporationCorporation forfor AtmosphericAtmospheric Research.Research. AllAll rightsrights reserved.reserved.

Peaks Over Threshold (POT)

• ξ < 0 yields the Beta cdf (bounded upper tail) upper bound at: u – σ(u) / ξ

• ξ = 0 yields the exponential cdf (“light” upper tail)

• ξ > 0 yields the Pareto cdf (“heavy” upper tail)

UCARUCAR ConfidentialConfidential andand Proprietary.Proprietary. ©© 2014,2008, UniversityUniversity CorporationCorporation forfor AtmosphericAtmospheric Research.Research. AllAll rightsrights reserved.reserved.

Peaks Over Threshold (POT) Connection between GEV and GP families

• Maximum Mn ≤ u if no Xi > u, i = 1, 2, …, n

• “Memoryless” property of exponential distribution Pr{ Y > y + y’ | Y > y’ } = Pr{ Y > y }

of GP distribution • Lose memoryless property (need to rescale)

• If Y = X – u (X > u) has exact GP distribution with parameters σ(u) > 0 and ξ, then excess over a higher threshold u’ > u follows the GP distribution with parameters σ(u’) > 0 and ξ, where

σ(u’) = σ(u) + ξ(u’ – u), u’ > u

UCARUCAR ConfidentialConfidential andand Proprietary.Proprietary. ©© 2014,2008, UniversityUniversity CorporationCorporation forfor AtmosphericAtmospheric Research.Research. AllAll rightsrights reserved.reserved.

Peaks Over Threshold (POT) 60 40 20 Hurricane Damage (billion USD) 0

1930 1940 1950 1960 1970 1980 1990 Year

UCARUCAR ConfidentialConfidential andand Proprietary.Proprietary. ©© 2014,2008, UniversityUniversity CorporationCorporation forfor AtmosphericAtmospheric Research.Research. AllAll rightsrights reserved.reserved.

Peaksfevd(x = Dam, Over data = damage, thresholdThreshold = 6, type = "GP", time.units = "2/year")(POT)

● ●

70 1−1 line regression line 95% confidence bands 80 50 60 ● 30 40 ●

Empirical Quantiles ● ● ● ● 20 ● ● ● ● ●●● ●●● ●● ●● 10 ● ●●● ●●●●● ●●

10 15 20 25 30 35 Data Quantiles from Model Simulated 10 20 30 40 50 60 70

Model Quantiles Dam( > 6) Empirical Quantiles

Empirical Modeled 0.15 300 0.10

100 ● Density ●

Return Level ●●●●●●●●●●●●●● ● ● 0.05 100 − 0.00 0 20 40 60 2 5 10 50 200 500

N = 18 Bandwidth = 2.289 Return Period (years)

UCARUCAR ConfidentialConfidential andand Proprietary.Proprietary. ©© 2014,2008, UniversityUniversity CorporationCorporation forfor AtmosphericAtmospheric Research.Research. AllAll rightsrights reserved.reserved.

Peaks Over Threshold (POT)

95% lower CI Estimate 95% upper CI

σ(u = 6 billion USD) 1.03 billion USD 4.59 billion USD 8.15 billion USD

ξ -0.15 0.51 1.18

100-year return level 0.65 billion USD 43.64 billion USD 86.64 billion USD (billion USD)

UCARUCAR ConfidentialConfidential andand Proprietary.Proprietary. ©© 2014,2008, UniversityUniversity CorporationCorporation forfor AtmosphericAtmospheric Research.Research. AllAll rightsrights reserved.reserved.

Peaks Over Threshold (POT): GP return levels

• First need quantile of GP cdf

-1 • xp = H (1 – p; σ(u), ξ)

• = (σ(u) / ξ) × (p-ξ – 1), 0 < p < 1

• Complication: must account for the rate, ζ, of exceeding the threshold for interpretability, as well

as the number of events per year, ny.

• replace p-1 with m = return period of interest×ny×ζ

UCARUCAR ConfidentialConfidential andand Proprietary.Proprietary. ©© 2014,2008, UniversityUniversity CorporationCorporation forfor AtmosphericAtmospheric Research.Research. AllAll rightsrights reserved.reserved.

Peaks Over Threshold (POT): GP return levels

Hurricane example

Lack of structure in data: some years have no hurricanes, some one, some two, etc.

Reasonable to use an average number per year in place of ny.

UCARUCAR ConfidentialConfidential andand Proprietary.Proprietary. ©© 2014,2008, UniversityUniversity CorporationCorporation forfor AtmosphericAtmospheric Research.Research. AllAll rightsrights reserved.reserved.

Peaks Over Threshold (POT): Threshold Selection

• The Bias-Variance Trade-Off

• Want a high threshold to obtain better GP approximation

• Want a lower threshold for more reliable estimation (more data!)

• Difficult to automate selection

UCARUCAR ConfidentialConfidential andand Proprietary.Proprietary. ©© 2014,2008, UniversityUniversity CorporationCorporation forfor AtmosphericAtmospheric Research.Research. AllAll rightsrights reserved.reserved.

Peaks Over Threshold (POT): Threshold Selection

Invariance of GP above threshold

• ξ does not change

• σ(u) is a function of the threshold

• The modification, σ* = σ(u) – ξu, is no longer a function of the threshold, and does not change

• Check for stability in parameter estimates as the threshold varies.

UCARUCAR ConfidentialConfidential andand Proprietary.Proprietary. ©© 2014,2008, UniversityUniversity CorporationCorporation forfor AtmosphericAtmospheric Research.Research. AllAll rightsrights reserved.reserved.

Peaks Over Threshold (POT):

Thresholdthreshrange.plot(x = damage$Dam,Selection r = c(2, 7)) 10 ● ● ● ●

5 ● ● ● ● ● ● 0 5 − reparameterized scale reparameterized

2 3 4 5 6 7 1.0

● ● ● ●

0.5 ●

shape ● ● ● ● ● 0.0

2 3 4 5 6 7

Threshold

UCARUCAR ConfidentialConfidential andand Proprietary.Proprietary. ©© 2014,2008, UniversityUniversity CorporationCorporation forfor AtmosphericAtmospheric Research.Research. AllAll rightsrights reserved.reserved.

Peaks Over Threshold (POT): Dependence in Threshold Excess Data

• Remove dependence (e.g., decluster)

• Runs declustering perhaps the simplest

• Model the dependence (e.g., through bivariate extreme value models)

• Concept of “extremal index”

UCARUCAR ConfidentialConfidential andand Proprietary.Proprietary. ©© 2014,2008, UniversityUniversity CorporationCorporation forfor AtmosphericAtmospheric Research.Research. AllAll rightsrights reserved.reserved.

Peaks Over Threshold (POT): Dependence in Threshold Excess Data

Extremal Index, θ, 0 < θ ≤ 1

Mean cluster size ≈ 1 / θ

θ = 1 implies a lack of clustering at high levels, and degree of clustering at high levels increases as θ decreases

UCARUCAR ConfidentialConfidential andand Proprietary.Proprietary. ©© 2014,2008, UniversityUniversity CorporationCorporation forfor AtmosphericAtmospheric Research.Research. AllAll rightsrights reserved.reserved.

Peaks Over Threshold (POT): Dependence in Threshold Excess Data

UCARUCAR ConfidentialConfidential andand Proprietary.Proprietary. ©© 2014,2008, UniversityUniversity CorporationCorporation forfor AtmosphericAtmospheric Research.Research. AllAll rightsrights reserved.reserved.

Peaks Over Threshold (POT): Dependence in Threshold Excess

Full time Data range is 1 August θ ≈ 0.08 1961 to 28 1500 September r = 29 2010 1000 164 clusters 500 Red River flow (Port Royal, Texas) Royal, (Port flow Red River 0

05/62 11/62 06/63 12/63 Time (days)

UCARUCAR ConfidentialConfidential andand Proprietary.Proprietary. ©© 2014,2008, UniversityUniversity CorporationCorporation forfor AtmosphericAtmospheric Research.Research. AllAll rightsrights reserved.reserved.

Peaks Over Threshold (POT): Dependence in Threshold Excess Data

1500 θ ≈ 0.85 1000 500 Red River flow (Port Royal, Texas) Royal, (Port flow Red River 0

05/62 11/62 06/63 12/63 Time (days)

UCARUCAR ConfidentialConfidential andand Proprietary.Proprietary. ©© 2014,2008, UniversityUniversity CorporationCorporation forfor AtmosphericAtmospheric Research.Research. AllAll rightsrights reserved.reserved.

Peaks Over Threshold (POT): Modeling Frequency and Intensity

Poisson-GP Model

• Poisson process for exceeding a high threshold

• Event: Xt > u • Rate parameter: λ • Number of events in [0, T] has with parameeter λT

• GP distribution for excess over threshold

• Excess Yt = Xt – u given Xt > u • Scale and Shape parameters

UCARUCAR ConfidentialConfidential andand Proprietary.Proprietary. ©© 2014,2008, UniversityUniversity CorporationCorporation forfor AtmosphericAtmospheric Research.Research. AllAll rightsrights reserved.reserved.

Peaks Over Threshold (POT): Modeling Frequency and Intensity

Point Process (PP) representation

• Subsumes Poisson-GP model • GEV parameterization (connection to GEV) • Can relate parameters of GEV(µ, σ, ξ) to those of the PP(λ, σ(u), ξ) • Shape parameter, ξ, is identical • ln λ = -(1/ξ) ln(1 + ξ(u – µ) / σ) • σ(u) = σ + ξ(u – µ) • Need time scale, h, to take account of the block size for GEV (e.g., h ≈ 1/365.25 for daily data and annual blocks)

UCARUCAR ConfidentialConfidential andand Proprietary.Proprietary. ©© 2014,2008, UniversityUniversity CorporationCorporation forfor AtmosphericAtmospheric Research.Research. AllAll rightsrights reserved.reserved.

Peaks Over Threshold (POT): Modeling Frequency and Intensity

Point Process (PP) vs Poisson-GP

• Approaches are equivalent • Poisson-GP model • Can obtain the same parameter estimates indirectly through the relationships between the two parameterizations • Point Process • More convenient to quantify total uncertainty • Easier to interpret (eliminates dependence of parameters on the threshold)

UCARUCAR ConfidentialConfidential andand Proprietary.Proprietary. ©© 2014,2008, UniversityUniversity CorporationCorporation forfor AtmosphericAtmospheric Research.Research. AllAll rightsrights reserved.reserved.

Peaks Over Threshold (POT): Modeling Frequency and Intensity 3 2 simulated data simulated 1 0

0 200 400 600 800 1000

UCARUCAR ConfidentialConfidential andand Proprietary.Proprietary. ©© 2014,2008, UniversityUniversity CorporationCorporation forfor AtmosphericAtmospheric Research.Research. AllAll rightsrights reserved.reserved.

Peaks Over Threshold (POT): Modeling Frequency and Intensity

fevd(x = y, threshold = 1, type = "PP")

● 1−1 line ● 3.5 regression line

3.5 95% confidence bands 3.0

● ● 3.0 ● ● ● 2.5

● ● ●

● ● ● 2.5 2.0 ● ●●●

● ●●● ● ●●● ● 1.5 ●●●● Empirical Quantiles 2.0 ●

● Observed Z_k Values ● ● ● ●●●● ●● ●●●● ●● ● ●● ●● ●● ● 1.0 ●●●● ●●●●● ●● ●●● 1.5 ● ●●● ● ●●●● ●● ●●●●●● ●●●●●● ●●●●●● ●● ●●●●●● ●●●● 0.5 ●●●● ●●●●●● ●●● ●●●● ●●●●●● ●●●● ●●●● ●●● ●● ● 1.0

1.0 1.5 2.0 2.5 3.0 0 1 2 3 4 5 Expected Values Model Quantiles Under exponential(1) Intensity Frequency

UCARUCAR ConfidentialConfidential andand Proprietary.Proprietary. ©© 2014,2008, UniversityUniversity CorporationCorporation forfor AtmosphericAtmospheric Research.Research. AllAll rightsrights reserved.reserved.

Peaks Over Threshold (POT): Modeling Frequency and Intensity

Poisson-GP estimation method

95% lower CI Estimate 95% upper CI λ 0.07 0.08 0.10 σ(u) 0.28 0.40 0.53 ξ -0.16 0.07 0.29

PP estimation method

95% lower CI Estimate 95% upper CI µ 2.06 2.53 3.01 σ 0.20 0.51 0.82 ξ -0.16 0.07 0.30

UCARUCAR ConfidentialConfidential andand Proprietary.Proprietary. ©© 2014,2008, UniversityUniversity CorporationCorporation forfor AtmosphericAtmospheric Research.Research. AllAll rightsrights reserved.reserved.

Peaks Over Threshold (POT): Modeling Frequency and Intensity

Verify that the two sets of parameter estimates are consistent ln λ = -(1 / ξ) ln(1 + ξ(u – µ) / σ ≈ 0.009

λ ≈ 29.93 (estimated on an annual maximum scale)

0.08 * 365.25 ≈ 29.22

σ(u) = σ + ξ(u – µ) ≈ 0.4

UCARUCAR ConfidentialConfidential andand Proprietary.Proprietary. ©© 2014,2008, UniversityUniversity CorporationCorporation forfor AtmosphericAtmospheric Research.Research. AllAll rightsrights reserved.reserved.

POT vs Block Maxima ● 4

3 ● ●

● ● 2 simulated data simulated

1 ● 0

0 20 40 60 80 100 120

UCAR Confidential and Proprietary. © 2008, University Corporation for Atmospheric Research. All rights reserved.

Conclusion of Part II: Questions?

UCAR Confidential and Proprietary. © 2008, University Corporation for Atmospheric Research. All rights reserved.