Towards a multivariate Extreme Value Theory
Samuel Hugueny
Institute of Biomedical Engineering
Life Sciences Interface
Doctoral Training Centre
University of Oxford
March 20, 2009
Contents
1 Classical EVT results 2 1.1 Fisher-Tippett Theorem ...... 2 1.2 Maximum Domains of Attraction ...... 3 1.2.1 Tail-equivalence ...... 4 1.2.2 Maximum domain of attraction of the Fr´echet distribution ...... 4 1.2.3 Maximum domain of attraction of the Weibull distribution ...... 4 1.2.4 Maximum domain of attraction of the Gumbel distribution ...... 4
2 Univariate Gaussian distribution 7
3 Univariate one-sided Gaussian distribution 13
4 Probability of probabilities 19 4.1 Sampling in the data space is equivalent to sampling in the image probability space . . . . . 19 4.2 Univariate standard Gaussian distribution ...... 19 4.3 Multivariate standard Gaussian distribution ...... 20 4.4 Gaussian Distributions with Generic Mean and Covariance Matrix ...... 24 5 Extreme Value Distribution for the standard bivariate Gaussian distribution 26 5.1 EVD of minima for G ...... 26 5.2 EVD for minima of F ...... 27
6 Extreme Value Distribution for a generic bivariate Gaussian distribution 31
7 Extreme Value Distribution for the standard n-dimensional Gaussian distribution 37 7.1 Cumulative distribution function for the standard n-dimensional Gaussian distribution . . . . 37
8 Notations 42
1 1 Classical EVT results
Useful classical EVT results taken from [1] and adapted so that the notations are consistent throughout this document. In particular, the number of samples from which an extremum is drawn is called n throughout [1]. Here, n will be the dimension of the data space and the number of samples from which extrema are drawn will be m.
Furthermore, in [1], Embrechts notes cn and dn the scale and location parameters of extreme value distri- butions, whereas Roberts notes them σm and µm, respectively, in [2] and [3]. We choose to note them cm and dm, respectively.
1.1 Fisher-Tippett Theorem
With our notations: Theorem 1. (Fisher-Tippett theorem - Theorem 3.2.3 in [1], p.121)
Let (Xm) be a sequence of iid rvs. If there exist norming constants dm ∈ R, cm > 0 and some non degenerate distribution function H such that −1 d cm (Mm − dm) → H, (1.1) then H belongs to the type of one of the following three distribution functions :
−x Type I (Gumbel): Λ(x) = exp {−e }, x ∈ R. 0, x ≤ 0 Type II (Fr´echet): Φα(x) = α < 0 exp {−x−α} , x > 0 exp {−(−x)α} , x ≤ 0 Type III (Weibull): Ψα(x) = α < 0 1, x > 0 Definition 1. (Extreme Value distribution and extremal random variables - Definition 3.2.6 in [1], p.124)
The distribution functions Λ, Φα, Ψα as presented in Theorem 1 are called standard extreme value dis- tributions the corresponding random variables standard extremal random variables. Distribution functions of the types of Λ, Φα, Ψα are extreme value distributions; the corresponding random variables extremal random variables.
2 d Gumbel: Mm = X + log m
d 1/α Fr´echet: Mm = m X
d −1/α Weibull: Mm = m
Notes
• An extreme value distribution is only be dependent on m, the number of samples from which the extrema is taken, and the parameters of the generative distribution.
• Embrechts refers to theorem 1 as being ‘the basis of classical extreme value theory’.
• The Weibull distribution in theorem 1 is sometimes referred to as the ‘inverse Weibull distribution’. It is obtained from the other Weibull distribution by reversing the direction of the x-axis for the probability density function .
1.2 Maximum Domains of Attraction
Definition 2. (Maximum domain of attraction - Definition 3.3.1 in [1], p. 128) We say that the random variable X (the distribution function F of X, the distribution of X) belongs to the maximum domain of attraction of the extreme value distribution H if there exist constants dm ∈ R and cm > 0, such that: 1 d (Mm − dm) → H. (1.2) cm We write X ∈ MDA (H) (F ∈ MDA (H)). Proposition 1. (Characterisation of MDA (H) - Proposition 3.3.2 in [1], p. 129) The distribution function F belongs to the maximum domain of attraction of the extreme value distribution
H with norming constants dm ∈ R, cm > 0, if and only if
lim mF (cmx + dm) = − ln(H(x)), x ∈ . (1.3) m→∞ R
When H(x) = 0, the limit is interpreted as ∞.
3 1.2.1 Tail-equivalence
Definition 3. (Tail-equivalence - Definition 3.3.3 in [1], p.129) Two distribution functions F and G are called tail-equivalent if they have the same right-end point, i.e. if xF = xG, and if there exists some constant 0 < c < ∞, such that
lim F (x)/G(x) = c. x↑xF
We note F ∼t G
1.2.2 Maximum domain of attraction of the Fr´echet distribution
Theorem 2. (Maximum domain of attraction of Φα - Theorem 3.3.7 in [1], p.131)
The distribution function F belongs to the maximum domain of attraction of Φα, α > 0, if and only if there exists some slowly varying function L such that F (x) = x−αL(x).
If F ∈ MDA (Φα), then 1 d (Mm − dm) → Φα, (1.4) cm ← where the norming constants can be chosen as dm = 0 and cm = (1/F ) (m).
1.2.3 Maximum domain of attraction of the Weibull distribution
Theorem 3. (Maximum domain of attraction of Ψα - Theorem 3.3.12 in [1], p.135)
The distribution function F belongs to the maximum domain of attraction of Ψα, α > 0, if and only if −1 −α xF < ∞ and there exists some slowly varying function L such that F (xF − x ) = x L(x).
If F ∈ MDA (Ψα), then 1 d (Mm − dm) → Ψα, (1.5) cm ← −1 where the norming constants can be chosen as dm = xF and cm = xF − F (1 − m ).
1.2.4 Maximum domain of attraction of the Gumbel distribution
Theorem 4. (Maximum domain of attraction of Λ - Theorem 3.3.26 in [1], p.142)
The distribution function F with right endpoint xF ≤ ∞ belongs to the maximum domain of attraction of
4 Λ if and only if there exists some z < xF such that F has representation
Z x g(t) F (x) = c(x) exp − dt , z < x < xF , (1.6) z a(t) where c and g are measurable functions satisfying c(x) → c > 0, g(x) → 1 as x ↑ xF , and a(x) is a positive, 0 0 absolutely continuous function (with respect to Lebesgue measure) with density a (x) having limx↑xF a (x) = 0. For F with representation 1.6, we can choose the norming constants as
← −1 dm = F (1 − n ) and cm = a(dm).
A possible choice for the function a is
Z xF F (t) a(x) = dt, x < xF . (1.7) x F (x)
Proposition 2. (Closure property of MDA (Λ) - Proposition 3.3.28 in [1], p.142)
Let F and G be distribution functions with the same right endpoint xF = xG and assume that F ∈ MDA (Λ) with norming constants cm > 0 and dm ∈ R; i.i
m lim F (cmx + dm) = Λ(x), x ∈ . (1.8) m→∞ R
Then m lim G (cmx + dm) = Λ(x + b), x ∈ , (1.9) m→∞ R if and only if f and G are tail-equivalent with
lim F (x)/G(x) = eb. (1.10) x↑xF
Notes
• ‘The maximum domain of attraction of the Gumbel distribution consists of distribution functions whose right tails decrease to zero faster than any power function’ ([1], p.139).
• Every maximum domain of attraction is closed with respect to tail-equivalence. Moreover, for any two tail-equivalent distributions, one can take the same norming constants ([1], p.139).
• An F ∈ MDA (Λ) can have either a finite or infinite endpoint: xF ≤ ∞.
5 • Every F ∈ MDA (ΦΦα ) has an infinite right endpoint: xF = ∞.
• Every F ∈ MDA (ΨΨα ) has a finite right endpoint: xF < ∞. • Proposition 2 is useful when searching for the parameters of a Gumbel distribution. If we can show that the distribution of interest is tail-equivalent to a distribution of reference, it becomes possible to deduce its parameters (see section 2 for an example.).
6 2 Univariate Gaussian distribution
In this section, our aim is to find the parameters of the Gumbel distribution of maxima of univariate Gaussian distribution with arbitrary mean and variance. We do this as an exercise, whose aim is to show how one goes about identifying such parameters. We first show that the Gaussian distribution is in the Maximum Domain of Attraction of the Gumbel distribution and use the closure property of the MDA as well as a tail-equivalent distribution to find a formula for the location parameter. The scale parameter is easily deduced from the previous steps using theorem 4.
Probability density function
The probability density function of a Gaussian distribution with mean µ ∈ R and standard deviation σ > 0
1 (x − µ)2 f(x) = √ exp − (2.1) 2πσ2 2σ2
Cumulative distribution function
1 x − µ F (x) = 1 + erf √ (2.2) 2 σ 2 where x 2 Z 2 erf(x) = √ e−t dt (2.3) π 0 is the so-called error function.
Mill’s ratio
√ 1 − F (x) 2πσ2 x − µ (x − µ)2 = 1 − erf √ exp f(x) 2 σ 2 2σ2 √ 2πσ2 x − µ (x − µ)2 = erfc √ exp (2.4) 2 σ 2 2σ2 where ∞ 2 Z 2 erfc(x) = √ e−t dt (2.5) π x
7 is the complementary error function. A Taylor expansion when x → +∞ of erfc yields: √ √ ! 1 − F (x) 2πσ2 (x − µ)2 σ 2 (x − µ)2 = exp √ exp − (1 + o(1)) f(x) 2 2σ2 (x − µ) π 2σ2 σ2 1 = + o . (2.6) x − µ x
We can therefore write:
σ2f(x) σ (x − µ)2 1 − F (x) ∼ = √ exp − (2.7) x − µ 2π(x − µ) 2σ2
0 x−µ Furthermore, f (x) = − σ2 f(x) < 0 and
(1 − F (x))f 0(x) lim = −1. (2.8) x→∞ f 2(x)
Thus F is a Von Mises function (with auxiliary function a) and, as such, is in the Maximum Domain of Attraction of the Gumbel distribution (Example 3.3.23 and proposition 3.3.25 in [1]).
To calculate the norming constants, we use Mill’s ration again:
σ2f(x) σ (x − µ)2 F ∼ = √ exp − , x → ∞ (2.9) x − µ 2π(x − µ) 2σ2 and interpret the right-hand side as the tail of some distribution function G. Then by proposition 2, F and ← −1 G have the same norming constants. According to theorem 4, dm = G (1 − m ). We therefore look for a solution of − ln G(dm) = ln m; i.e a solution of
1 1 (d − µ)2 + ln d − µ + ln 2π − ln σ = ln m. (2.10) 2σ2 m m 2
1 (d − µ)2 = 2σ2 ln m − ln (d − µ) − ln 2π + ln σ m m 2 1 ! − ln (dm − µ) − ln 2π + ln σ = 2σ2 ln m 1 + 2 . (2.11) ln m
8 Taking the positive root of the square, we can write the following taylor expansion:
1 !1/2 √ − ln (dm − µ) − ln 2π + ln σ (d − µ) = σ 2 ln m 1 + 2 , m ln m 1 ! √ − ln (dm − µ) − ln 2π + ln σ 1 = σ 2 ln m 1 + 2 + o , 2 ln m ln m 1 √ − ln (dm − µ) − ln 2π + ln σ 1 = σ 2 ln m + σ √ 2 + o √ , 2 ln m ln m √ 1 √ − ln σ 2 ln m − 2 ln 2π + ln σ 1 = σ 2 ln m + σ √ + o √ , 2 ln m ln m √ ln (ln m) + ln 4π 1 = σ 2 ln m − σ √ + o √ , 2 2 ln m ln m (2.12) from which we deduce:
ln ln m + ln 4π d = σ(2 ln m)1/2 + µ − σ + o (ln m)−1/2 (2.13) m 2(2 ln m)1/2
1−F (x) σ2 Since we can take a(x) = f(x) , we have a(x) ∼ (x−µ) and therefore σ cm = a(dm) ∼ √ . (2.14) 2 ln m
9 Figure 2.1: Pdfs of maxima for the univariate standard Gaussian for m = 10, 20, 50, 100, 500, 1000 (from left to right, top to bottom). For each value of m, we show the histogram of simulated maxima (N = 105 in all cases, blue), the pdf obtained using formulae 2.13 and 2.14 (red), the pdf obtained using the formulae in [2] (cyan) and the pdf obtained by maximum likelihood estimation (green).
10 Figure 2.2: Cdfs of maxima for the univariate standard Gaussian for m = 10, 20, 50, 100, 200, 1000 (from left to right, top to bottom). For each case, we show the cdf obtained using formulae 2.13 and 2.14 (red), and the cdf obtained by maximum likelihood estimation (from N = 105 maxima, green).
11 Figure 2.3: Semi-logarithmic plot of the values (top row), absolute (middle row) and relative (bottom row) differences with the corresponding maximum likelihood estimates of dm (left column) and cm (right) values as m increases. The red crosses are obtained using formulae 2.13 and 2.14.
12 3 Univariate one-sided Gaussian distribution
In this section, we procede as in section 2 to find the parameters of the Gumbel distribution of maxima for the univariate one-sided Gaussian distribution.
Univariate Gaussian distribution
Probability density function
The probability density function of a Gaussian distribution with mean µ ∈ R and standard deviation σ > 0
2 (x − µ)2 f(x) = √ exp − , ∀x > 0 (3.1) 2πσ2 2σ2
Cumulative distribution function
x − µ F (x) = erf √ (3.2) σ 2
Mill’s ratio
√ 1 − F (x) 2πσ2 x − µ (x − µ)2 = 1 − erf √ exp f(x) 2 σ 2 2σ2 √ 2πσ2 x − µ (x − µ)2 = erfc √ exp . (3.3) 2 σ 2 2σ2
A Taylor expansion when x ↔ +∞ of erfc yields: √ √ ! 1 − F (x) 2πσ2 (x − µ)2 σ 2 (x − µ)2 = exp √ exp − + o(exp(−x2)) f(x) 2 2σ2 (x − µ) π 2σ2 σ2 1 = + o . (3.4) x − µ x
We can therefore write:
σ2f(x) 2σ (x − µ)2 1 − F (x) ∼ = √ exp − (3.5) x − µ 2π(x − µ) 2σ2
13 0 x−µ Furthermore, f (x) = − σ2 f(x) < 0 and
(1 − F (x))f 0(x) lim = −1. (3.6) x↔∞ f 2(x)
As in the previous section, F is therefore in the Maximum Domain of Attraction of the Gumbel distribution
σ2f(x) 2σ (x − µ)2 F ∼ = √ exp − , x → ∞ (3.7) x − µ 2π(x − µ) 2σ2 and interpret the right-hand side as the tail of some distribution function G. Then by proposition 2, F and ← −1 G have the same norming constants. According to theorem 4, dm = G (1 − m ). We therefore look for a solution of − ln G(dm) = ln m; i.e a solution of
1 1 π (d − µ)2 + ln d − µ + ln − ln σ = ln m. (3.8) 2σ2 m m 2 2
1 π (d − µ)2 = 2σ2 ln m − ln (d − µ) − ln + ln σ m m 2 2 1 π ! − ln (dm − µ) − ln + ln σ = 2σ2 ln m 1 + 2 2 . (3.9) ln m
Taking the positive root of the square, we can write the following taylor expansion:
1 π !1/2 √ − ln (dm − µ) − ln + ln σ (d − µ) = σ 2 ln m 1 + 2 2 , m ln m 1 π ! √ − ln (dm − µ) − ln + ln σ 1 = σ 2 ln m 1 + 2 2 + o , 2 ln m ln m 1 π √ − ln (dm − µ) − ln + ln σ 1 = σ 2 ln m + σ √ 2 2 + o √ , 2 ln m ln m √ 1 π √ − ln σ 2 ln m − 2 ln 2 + ln σ 1 = σ 2 ln m + σ √ + o √ , 2 ln m ln m √ ln (ln m) + ln π 1 = σ 2 ln m − σ √ + o √ , 2 2 ln m ln m (3.10)
14 from which we deduce:
ln ln m + ln π d = σ(2 ln m)1/2 + µ − σ + o (ln m)−1/2 (3.11) m 2(2 ln m)1/2
1−F (x) σ2 Since we can take a(x) = f(x) , we have a(x) ∼ (x−µ) and therefore σ cm = a(dm) ∼ √ . (3.12) 2 ln m
15 Figure 3.1: Pdfs of maxima for the standard one-sided Gaussian for m = 10, 20, 50, 100, 500, 500 (from left to right, top to bottom). For each value of m, we show the histogram of simulated maxima (N = 105 in all cases, blue), the pdf obtained using formulae 3.11 and 3.12 (red), the pdf obtained using the formulae in [2] (cyan) and the pdf obtained by maximum likelihood estimation (green).
16 Figure 3.2: Cdfs of maxima for the standard one-sided Gaussian for m = 10, 20, 50, 100, 500, 1000 (from left to right, top to bottom). For each case, we show the cdf obtained using formulae 3.11 and 3.12 (red), the cdf obtained using the formulae in [2] (cyan) and the cdf obtained by maximum likelihood estimation (from N = 105 maxima, green).
17 Figure 3.3: Semi-logarithmic plot of the values (top row), absolute (middle row) and relative (bottom row) differences with the corresponding maximum likelihood estimates of dm (left column) and cm (right) values as m increases. The red crosses are obtained using formulae 3.11 and 3.12, the cyan stars using the formulae in [2].
18 4 Probability of probabilities
4.1 Sampling in the data space is equivalent to sampling in the image probability space
Let x1, x2,..., xk be samples drawn form a distribution D (univariate or multivariate) for k ∈ N and f(x1), f(x2), . . . , f(xk) the probabilities of these samples with respect to D. The xi are vectors of the sample space S. The f(xi) are real numbers which take values in f(S), the image of S under f. Since f is a probability distribution function we are sure that f(S) ⊆ [0, +∞[, i.e. f(xi) are positive numbers.
The probability of obtaining y ∈ f(S) by drawing samples from the sample space is strongly related to the form of f. Assuming that X is a random variable distributed according to D, our aim in this section is to determine the form of the probability distribution function g on f(S) according to which f(X) is distributed, for some simple cases ofD
4.2 Univariate standard Gaussian distribution
In the case of the univariate standard Gaussian distribution S = R, f is defined as
1 x2 f(x) = √ exp − , (4.1) 2π 2 i i and f(S) is the interval 0, √1 . 2π i i Let [y , y ] ⊂ 0, √1 , the probability of f(X) being in [y , y ] is 1 2 2π 1 2