Towards a multivariate

Samuel Hugueny

Institute of Biomedical Engineering

Life Sciences Interface

Doctoral Training Centre

University of Oxford

March 20, 2009

Contents

1 Classical EVT results 2 1.1 Fisher-Tippett Theorem ...... 2 1.2 Maximum Domains of Attraction ...... 3 1.2.1 Tail-equivalence ...... 4 1.2.2 Maximum domain of attraction of the Fr´echet distribution ...... 4 1.2.3 Maximum domain of attraction of the ...... 4 1.2.4 Maximum domain of attraction of the Gumbel distribution ...... 4

2 Univariate Gaussian distribution 7

3 Univariate one-sided Gaussian distribution 13

4 Probability of probabilities 19 4.1 Sampling in the data space is equivalent to sampling in the image probability space . . . . . 19 4.2 Univariate standard Gaussian distribution ...... 19 4.3 Multivariate standard Gaussian distribution ...... 20 4.4 Gaussian Distributions with Generic Mean and Covariance Matrix ...... 24 5 Extreme Value Distribution for the standard bivariate Gaussian distribution 26 5.1 EVD of minima for G ...... 26 5.2 EVD for minima of F ...... 27

6 Extreme Value Distribution for a generic bivariate Gaussian distribution 31

7 Extreme Value Distribution for the standard n-dimensional Gaussian distribution 37 7.1 Cumulative distribution function for the standard n-dimensional Gaussian distribution . . . . 37

8 Notations 42

1 1 Classical EVT results

Useful classical EVT results taken from [1] and adapted so that the notations are consistent throughout this document. In particular, the number of samples from which an extremum is drawn is called n throughout [1]. Here, n will be the dimension of the data space and the number of samples from which extrema are drawn will be m.

Furthermore, in [1], Embrechts notes cn and dn the scale and location parameters of extreme value distri- butions, whereas Roberts notes them σm and µm, respectively, in [2] and [3]. We choose to note them cm and dm, respectively.

1.1 Fisher-Tippett Theorem

With our notations: Theorem 1. (Fisher-Tippett theorem - Theorem 3.2.3 in [1], p.121)

Let (Xm) be a sequence of iid rvs. If there exist norming constants dm ∈ R, cm > 0 and some non function H such that −1 d cm (Mm − dm) → H, (1.1) then H belongs to the type of one of the following three distribution functions :

−x Type I (Gumbel): Λ(x) = exp {−e }, x ∈ R.   0, x ≤ 0 Type II (Fr´echet): Φα(x) = α < 0  exp {−x−α} , x > 0   exp {−(−x)α} , x ≤ 0 Type III (Weibull): Ψα(x) = α < 0  1, x > 0 Definition 1. (Extreme Value distribution and extremal random variables - Definition 3.2.6 in [1], p.124)

The distribution functions Λ, Φα, Ψα as presented in Theorem 1 are called standard extreme value dis- tributions the corresponding random variables standard extremal random variables. Distribution functions of the types of Λ, Φα, Ψα are extreme value distributions; the corresponding random variables extremal random variables.

2 d Gumbel: Mm = X + log m

d 1/α Fr´echet: Mm = m X

d −1/α Weibull: Mm = m

Notes

• An extreme value distribution is only be dependent on m, the number of samples from which the extrema is taken, and the parameters of the generative distribution.

• Embrechts refers to theorem 1 as being ‘the basis of classical extreme value theory’.

• The Weibull distribution in theorem 1 is sometimes referred to as the ‘inverse Weibull distribution’. It is obtained from the other Weibull distribution by reversing the direction of the x-axis for the probability density function .

1.2 Maximum Domains of Attraction

Definition 2. (Maximum domain of attraction - Definition 3.3.1 in [1], p. 128) We say that the X (the distribution function F of X, the distribution of X) belongs to the maximum domain of attraction of the extreme value distribution H if there exist constants dm ∈ R and cm > 0, such that: 1 d (Mm − dm) → H. (1.2) cm We write X ∈ MDA (H) (F ∈ MDA (H)). Proposition 1. (Characterisation of MDA (H) - Proposition 3.3.2 in [1], p. 129) The distribution function F belongs to the maximum domain of attraction of the extreme value distribution

H with norming constants dm ∈ R, cm > 0, if and only if

lim mF (cmx + dm) = − ln(H(x)), x ∈ . (1.3) m→∞ R

When H(x) = 0, the limit is interpreted as ∞.

3 1.2.1 Tail-equivalence

Definition 3. (Tail-equivalence - Definition 3.3.3 in [1], p.129) Two distribution functions F and G are called tail-equivalent if they have the same right-end point, i.e. if xF = xG, and if there exists some constant 0 < c < ∞, such that

lim F (x)/G(x) = c. x↑xF

We note F ∼t G

1.2.2 Maximum domain of attraction of the Fr´echet distribution

Theorem 2. (Maximum domain of attraction of Φα - Theorem 3.3.7 in [1], p.131)

The distribution function F belongs to the maximum domain of attraction of Φα, α > 0, if and only if there exists some slowly varying function L such that F (x) = x−αL(x).

If F ∈ MDA (Φα), then 1 d (Mm − dm) → Φα, (1.4) cm ← where the norming constants can be chosen as dm = 0 and cm = (1/F ) (m).

1.2.3 Maximum domain of attraction of the Weibull distribution

Theorem 3. (Maximum domain of attraction of Ψα - Theorem 3.3.12 in [1], p.135)

The distribution function F belongs to the maximum domain of attraction of Ψα, α > 0, if and only if −1 −α xF < ∞ and there exists some slowly varying function L such that F (xF − x ) = x L(x).

If F ∈ MDA (Ψα), then 1 d (Mm − dm) → Ψα, (1.5) cm ← −1 where the norming constants can be chosen as dm = xF and cm = xF − F (1 − m ).

1.2.4 Maximum domain of attraction of the Gumbel distribution

Theorem 4. (Maximum domain of attraction of Λ - Theorem 3.3.26 in [1], p.142)

The distribution function F with right endpoint xF ≤ ∞ belongs to the maximum domain of attraction of

4 Λ if and only if there exists some z < xF such that F has representation

 Z x g(t)  F (x) = c(x) exp − dt , z < x < xF , (1.6) z a(t) where c and g are measurable functions satisfying c(x) → c > 0, g(x) → 1 as x ↑ xF , and a(x) is a positive, 0 0 absolutely continuous function (with respect to Lebesgue measure) with density a (x) having limx↑xF a (x) = 0. For F with representation 1.6, we can choose the norming constants as

← −1 dm = F (1 − n ) and cm = a(dm).

A possible choice for the function a is

Z xF F (t) a(x) = dt, x < xF . (1.7) x F (x)

Proposition 2. (Closure property of MDA (Λ) - Proposition 3.3.28 in [1], p.142)

Let F and G be distribution functions with the same right endpoint xF = xG and assume that F ∈ MDA (Λ) with norming constants cm > 0 and dm ∈ R; i.i

m lim F (cmx + dm) = Λ(x), x ∈ . (1.8) m→∞ R

Then m lim G (cmx + dm) = Λ(x + b), x ∈ , (1.9) m→∞ R if and only if f and G are tail-equivalent with

lim F (x)/G(x) = eb. (1.10) x↑xF

Notes

• ‘The maximum domain of attraction of the Gumbel distribution consists of distribution functions whose right tails decrease to zero faster than any power function’ ([1], p.139).

• Every maximum domain of attraction is closed with respect to tail-equivalence. Moreover, for any two tail-equivalent distributions, one can take the same norming constants ([1], p.139).

• An F ∈ MDA (Λ) can have either a finite or infinite endpoint: xF ≤ ∞.

5 • Every F ∈ MDA (ΦΦα ) has an infinite right endpoint: xF = ∞.

• Every F ∈ MDA (ΨΨα ) has a finite right endpoint: xF < ∞. • Proposition 2 is useful when searching for the parameters of a Gumbel distribution. If we can show that the distribution of interest is tail-equivalent to a distribution of reference, it becomes possible to deduce its parameters (see section 2 for an example.).

6 2 Univariate Gaussian distribution

In this section, our aim is to find the parameters of the Gumbel distribution of maxima of univariate Gaussian distribution with arbitrary mean and . We do this as an exercise, whose aim is to show how one goes about identifying such parameters. We first show that the Gaussian distribution is in the Maximum Domain of Attraction of the Gumbel distribution and use the closure property of the MDA as well as a tail-equivalent distribution to find a formula for the . The is easily deduced from the previous steps using theorem 4.

Probability density function

The probability density function of a Gaussian distribution with mean µ ∈ R and standard deviation σ > 0

1  (x − µ)2  f(x) = √ exp − (2.1) 2πσ2 2σ2

Cumulative distribution function

1  x − µ F (x) = 1 + erf √ (2.2) 2 σ 2 where x 2 Z 2 erf(x) = √ e−t dt (2.3) π 0 is the so-called error function.

Mill’s ratio

√ 1 − F (x) 2πσ2  x − µ (x − µ)2  = 1 − erf √ exp f(x) 2 σ 2 2σ2 √ 2πσ2 x − µ (x − µ)2  = erfc √ exp (2.4) 2 σ 2 2σ2 where ∞ 2 Z 2 erfc(x) = √ e−t dt (2.5) π x

7 is the complementary error function. A Taylor expansion when x → +∞ of erfc yields: √ √ ! 1 − F (x) 2πσ2 (x − µ)2  σ 2  (x − µ)2  = exp √ exp − (1 + o(1)) f(x) 2 2σ2 (x − µ) π 2σ2 σ2  1  = + o . (2.6) x − µ x

We can therefore write:

σ2f(x) σ  (x − µ)2  1 − F (x) ∼ = √ exp − (2.7) x − µ 2π(x − µ) 2σ2

0 x−µ Furthermore, f (x) = − σ2 f(x) < 0 and

(1 − F (x))f 0(x) lim = −1. (2.8) x→∞ f 2(x)

Thus F is a Von Mises function (with auxiliary function a) and, as such, is in the Maximum Domain of Attraction of the Gumbel distribution (Example 3.3.23 and proposition 3.3.25 in [1]).

To calculate the norming constants, we use Mill’s ration again:

σ2f(x) σ  (x − µ)2  F ∼ = √ exp − , x → ∞ (2.9) x − µ 2π(x − µ) 2σ2 and interpret the right-hand side as the tail of some distribution function G. Then by proposition 2, F and ← −1 G have the same norming constants. According to theorem 4, dm = G (1 − m ). We therefore look for a solution of − ln G(dm) = ln m; i.e a solution of

1 1 (d − µ)2 + ln d − µ + ln 2π − ln σ = ln m. (2.10) 2σ2 m m 2

 1  (d − µ)2 = 2σ2 ln m − ln (d − µ) − ln 2π + ln σ m m 2 1 ! − ln (dm − µ) − ln 2π + ln σ = 2σ2 ln m 1 + 2 . (2.11) ln m

8 Taking the positive root of the square, we can write the following taylor expansion:

1 !1/2 √ − ln (dm − µ) − ln 2π + ln σ (d − µ) = σ 2 ln m 1 + 2 , m ln m 1   ! √ − ln (dm − µ) − ln 2π + ln σ 1 = σ 2 ln m 1 + 2 + o , 2 ln m ln m 1   √ − ln (dm − µ) − ln 2π + ln σ 1 = σ 2 ln m + σ √ 2 + o √ , 2 ln m ln m √   1 √ − ln σ 2 ln m − 2 ln 2π + ln σ  1  = σ 2 ln m + σ √ + o √ , 2 ln m ln m √ ln (ln m) + ln 4π  1  = σ 2 ln m − σ √ + o √ , 2 2 ln m ln m (2.12) from which we deduce:

ln ln m + ln 4π   d = σ(2 ln m)1/2 + µ − σ + o (ln m)−1/2 (2.13) m 2(2 ln m)1/2

1−F (x) σ2 Since we can take a(x) = f(x) , we have a(x) ∼ (x−µ) and therefore σ cm = a(dm) ∼ √ . (2.14) 2 ln m

9 Figure 2.1: Pdfs of maxima for the univariate standard Gaussian for m = 10, 20, 50, 100, 500, 1000 (from left to right, top to bottom). For each value of m, we show the of simulated maxima (N = 105 in all cases, blue), the pdf obtained using formulae 2.13 and 2.14 (red), the pdf obtained using the formulae in [2] (cyan) and the pdf obtained by maximum likelihood estimation (green).

10 Figure 2.2: Cdfs of maxima for the univariate standard Gaussian for m = 10, 20, 50, 100, 200, 1000 (from left to right, top to bottom). For each case, we show the cdf obtained using formulae 2.13 and 2.14 (red), and the cdf obtained by maximum likelihood estimation (from N = 105 maxima, green).

11 Figure 2.3: Semi-logarithmic plot of the values (top row), absolute (middle row) and relative (bottom row) differences with the corresponding maximum likelihood estimates of dm (left column) and cm (right) values as m increases. The red crosses are obtained using formulae 2.13 and 2.14.

12 3 Univariate one-sided Gaussian distribution

In this section, we procede as in section 2 to find the parameters of the Gumbel distribution of maxima for the univariate one-sided Gaussian distribution.

Univariate Gaussian distribution

Probability density function

The probability density function of a Gaussian distribution with mean µ ∈ R and standard deviation σ > 0

2  (x − µ)2  f(x) = √ exp − , ∀x > 0 (3.1) 2πσ2 2σ2

Cumulative distribution function

x − µ F (x) = erf √ (3.2) σ 2

Mill’s ratio

√ 1 − F (x) 2πσ2  x − µ (x − µ)2  = 1 − erf √ exp f(x) 2 σ 2 2σ2 √ 2πσ2 x − µ (x − µ)2  = erfc √ exp . (3.3) 2 σ 2 2σ2

A Taylor expansion when x ↔ +∞ of erfc yields: √ √ ! 1 − F (x) 2πσ2 (x − µ)2  σ 2  (x − µ)2  = exp √ exp − + o(exp(−x2)) f(x) 2 2σ2 (x − µ) π 2σ2 σ2  1  = + o . (3.4) x − µ x

We can therefore write:

σ2f(x) 2σ  (x − µ)2  1 − F (x) ∼ = √ exp − (3.5) x − µ 2π(x − µ) 2σ2

13 0 x−µ Furthermore, f (x) = − σ2 f(x) < 0 and

(1 − F (x))f 0(x) lim = −1. (3.6) x↔∞ f 2(x)

As in the previous section, F is therefore in the Maximum Domain of Attraction of the Gumbel distribution

σ2f(x) 2σ  (x − µ)2  F ∼ = √ exp − , x → ∞ (3.7) x − µ 2π(x − µ) 2σ2 and interpret the right-hand side as the tail of some distribution function G. Then by proposition 2, F and ← −1 G have the same norming constants. According to theorem 4, dm = G (1 − m ). We therefore look for a solution of − ln G(dm) = ln m; i.e a solution of

1 1 π (d − µ)2 + ln d − µ + ln − ln σ = ln m. (3.8) 2σ2 m m 2 2

 1 π  (d − µ)2 = 2σ2 ln m − ln (d − µ) − ln + ln σ m m 2 2 1 π ! − ln (dm − µ) − ln + ln σ = 2σ2 ln m 1 + 2 2 . (3.9) ln m

Taking the positive root of the square, we can write the following taylor expansion:

1 π !1/2 √ − ln (dm − µ) − ln + ln σ (d − µ) = σ 2 ln m 1 + 2 2 , m ln m 1 π   ! √ − ln (dm − µ) − ln + ln σ 1 = σ 2 ln m 1 + 2 2 + o , 2 ln m ln m 1 π   √ − ln (dm − µ) − ln + ln σ 1 = σ 2 ln m + σ √ 2 2 + o √ , 2 ln m ln m √   1 π √ − ln σ 2 ln m − 2 ln 2 + ln σ  1  = σ 2 ln m + σ √ + o √ , 2 ln m ln m √ ln (ln m) + ln π  1  = σ 2 ln m − σ √ + o √ , 2 2 ln m ln m (3.10)

14 from which we deduce:

ln ln m + ln π   d = σ(2 ln m)1/2 + µ − σ + o (ln m)−1/2 (3.11) m 2(2 ln m)1/2

1−F (x) σ2 Since we can take a(x) = f(x) , we have a(x) ∼ (x−µ) and therefore σ cm = a(dm) ∼ √ . (3.12) 2 ln m

15 Figure 3.1: Pdfs of maxima for the standard one-sided Gaussian for m = 10, 20, 50, 100, 500, 500 (from left to right, top to bottom). For each value of m, we show the histogram of simulated maxima (N = 105 in all cases, blue), the pdf obtained using formulae 3.11 and 3.12 (red), the pdf obtained using the formulae in [2] (cyan) and the pdf obtained by maximum likelihood estimation (green).

16 Figure 3.2: Cdfs of maxima for the standard one-sided Gaussian for m = 10, 20, 50, 100, 500, 1000 (from left to right, top to bottom). For each case, we show the cdf obtained using formulae 3.11 and 3.12 (red), the cdf obtained using the formulae in [2] (cyan) and the cdf obtained by maximum likelihood estimation (from N = 105 maxima, green).

17 Figure 3.3: Semi-logarithmic plot of the values (top row), absolute (middle row) and relative (bottom row) differences with the corresponding maximum likelihood estimates of dm (left column) and cm (right) values as m increases. The red crosses are obtained using formulae 3.11 and 3.12, the cyan stars using the formulae in [2].

18 4 Probability of probabilities

4.1 Sampling in the data space is equivalent to sampling in the image probability space

Let x1, x2,..., xk be samples drawn form a distribution D (univariate or multivariate) for k ∈ N and f(x1), f(x2), . . . , f(xk) the probabilities of these samples with respect to D. The xi are vectors of the sample space S. The f(xi) are real numbers which take values in f(S), the image of S under f. Since f is a function we are sure that f(S) ⊆ [0, +∞[, i.e. f(xi) are positive numbers.

The probability of obtaining y ∈ f(S) by drawing samples from the sample space is strongly related to the form of f. Assuming that X is a random variable distributed according to D, our aim in this section is to determine the form of the probability distribution function g on f(S) according to which f(X) is distributed, for some simple cases ofD

4.2 Univariate standard Gaussian distribution

In the case of the univariate standard Gaussian distribution S = R, f is defined as

1  x2  f(x) = √ exp − , (4.1) 2π 2 i i and f(S) is the interval 0, √1 . 2π i i Let [y , y ] ⊂ 0, √1 , the probability of f(X) being in [y , y ] is 1 2 2π 1 2

−1  Pr (y1 ≤ f(X) ≤ y2) = Pr x ∈ f ([y1, y2]) (4.2)

−1 where f ([y1, y2]) is the preimage of [y1, y2] under f. It is straightforward to see that

−1 f ([y1, y2]) = [−x1, −x2] ∪ [x2, x1] (4.3) where x1 is the unique positive solution of f(x) = y1 and x2 is the unique positive solution of f(x) = y2.

By first replacing in 4.2 and then using the symmetry fo the distribution, we have:

19 Z −x2 Z x1 Pr (y1 ≤ f(X) ≤ y2) = f(t)dt + f(t)dt −x1 x2 Z x1 = 2 f(t)dt x2 Z f(x1) −1 = 2 y. q √ dy f(x2) y −2 ln( 2πy)

Z y2  2 1/2 = √ dy. (4.4) y1 − ln( 2πy)

dy 1 The third line is obtained by the substitution y = f(t) and noticing that = − √ √ . From 4.4 we dt y −2 ln( 2πy) deduce that:

 2 1/2 g(y) = √ . (4.5) − ln( 2πy)

4.3 Multivariate standard Gaussian distribution

n In the case of the n-dimensional standard Gaussian distribution, n is an integer greater than 2, S = R , f is defined as 1  ||x||2  f(x) = exp − , (4.6) (2π)N/2 2 i i and f(S) is the interval 0, 1 . To take advantage of the spherical symmetry of the problem, we rewrite (2π)N/2 the pdf in n-dimensional spherical polar coordinates:

1  r2  f(x) = exp − , (4.7) (2π)N/2 2

+  π π  where x = (r, θ1, θ2, . . . , θn−1) is defined such that r ∈ R , θ1, θ2, . . . , θn−2 range over − 2 , 2 and the base angle θn−1 ranges over [0, 2π[. i i Let [y , y ] ⊂ 0, 1 , the probability of f(X) being in [y , y ] is 1 2 (2π)N/2 1 2

−1  Pr (y1 ≤ f(X) ≤ y2) = Pr x ∈ f ([y1, y2]) (4.8)

Due to the sperical symmetry of the distribution f has the same value on all hyper-spheres centered at

20 −1 the origin and f ([y1, y2]) is the hyper-volume contained between two of these hyper-spheres. We call r1 the radius of the ‘outer’ hyper-sphere and r2 the radius of the ‘inner’ hyper-sphere. They are such that f ((r1, θ1, θ2, . . . , θn−1)) = y1 and f ((r2, θ1, θ2, . . . , θn−1)) = y2 for all values of the angles, i.e:

h π π hn−2 f −1 ([y , y ]) = [r , r ] × − , × [0, 2π[ (4.9) 1 2 2 1 2 2

Starting with 4.8, we have:

Z 1  r2  Pr (y1 ≤ f(X) ≤ y2) = exp − |Jn|drdθ1dθ2 . . . dθn−1 (4.10) −1 N/2 f ([y1,y2]) (2π) 2 where |Jn| is the Jacobian of the transformation to spherical polar coordinates:

n−1 n−2 n−3 Jn = Jn(r, θ1, θ2, . . . , θn−1) = r cos(θ1) cos(θ2) ... cos(θn−2) (4.11)

Since f is independent from the angles, we have:

Z r1 Ω  r2  Pr (y ≤ f(X) ≤ y ) = n exp − rn−1dr. (4.12) 1 2 N/2 r2 (2π) 2

2πp/2 Ωn = Γ(p/2) where Γ is the Euler Gamma function. Ωn results from the integration of the angles and can be interpreted as the total solid angle sustended by the surface of the unit n-sphere.

 2  We then make the substitution y = 1 exp − r using: (2π)N/2 2

h  i1/2 r = −2 ln (2π)n/2y , (4.13) dr 1 h  i−1/2 = − −2 ln (2π)n/2y , (4.14) dy y

4.12 becomes:

Z y1 (n−2)/2 h  n/2 i Pr (y1 ≤ f(X) ≤ y2) = −Ωn −2 ln (2π) y dy y2 Z y2 (n−2)/2 h  n/2 i = Ωn −2 ln (2π) y dy. (4.15) y1

From 4.15,we conclude that:

21 (n−2)/2 h  n/2 i g(y) = Ωn −2 ln (2π) y . (4.16)

Figure 4.1 shows comparisons between simulated normalised of probabilities and the function defined in equation 4.16. To obtain the histograms, 106 samples were drawn from an n-dimensional standard Gaussian distribution. Their probabilities were then computing and plotted as a normalised histogram. There is no noteworthy difference between the shape of these histograms and the curves of our analytical form.

Notes

• Univariate case: substituting n = 1 in equation 4.16 yields equation 4.5. The formula therefore holds ∗ for all n ∈ N .

• Bivariate case: in this case g(y) = Ω2 = 2π and g is the pdf of the uniform distribution between 0 and 1 2π . • On the way towards a multivariate Extreme Theory, being able to obtain such forms for the proba- bilites of probabilities is important, because the image of a multivariate pdf is one-dimensional and therefore allows us to lnk back to univariate extreme theory. In particular, we are going to consider the distribution of minima of such distributions (having defined ‘extreme’ in terms of ‘lowest probability’) which is going to be a univariate extreme value distribution for distributions which always have a left-hand point (0 if they take non-zero value on an infinite , min f if they are defined on a n compact subset of R ).

22 Figure 4.1: Comparison between simulation results and analytical form for the multivariate standard Gaussian. The dimension varies from 1 to 6. The normalised histogram are obtained by computing the probabilities of 106 random samples drawn from the n-dimensional standard Gaussian distribution. The red curves are obtained using equation 4.16. The vertical green lines indicate the greatest value of the multivariate pdf.

23 4.4 Gaussian Distributions with Generic Mean and Covariance Matrix

n Let Fn be the Gaussian distribution with mean µ and covariance matrix Σ, with n ∈ N∗, µ ∈ R , and Σ and n × n symmetric positive semi-definite matrix. The probability density function is

1  1  f (x; µ, Σ) = exp − (x − µ)>Σ(x − µ) , (4.17) n (2π)N/2|Σ|1/2 2

(n−2)/2 1/2 h  n/2 1/2 i gn(y; µ, Σ) = Ωn|Σ| −2 ln (2π) |Σ| y . (4.18)

Figure 4.2 shows a comparison between simulation results and results obtained using eq. 4.18. The covariance matrix used for dimension n is defined as:   i if i = j Σij = (4.19)  1  2 otherwise

24 Figure 4.2: Comparison between simulation results and analytical forms for Gaussian distributions with full covariance matrices. Covariance matrices are defined as in eq. 4.19. The dimension varies from 1 to 6. The normalised histogram are obtained by computing the probabilities of 106 random samples drawn from the n-dimensional Gaussian distribution. The red curves are obtained using equation 4.18. The vertical green lines indicate the greatest value of the multivariate pdf.

25 5 Extreme Value Distribution for the standard bivariate Gaussian dis- tribution

F is the generative distribution. G is the associated probability distribution of probabilities.

Probability density function of F : 1  ||x||2  f(x) = exp − , (5.1) 2π 2

2 for all x ∈ R , or in polar form: 1  r2  f(x) = exp − , (5.2) 2π 2

+ where x = (r, θ), r ∈ R and θ ∈ [0, 2π[ Probability density function of G: g(y) = 2π, (5.3)

1 for all y ∈]0, 2π [.

5.1 EVD of minima for G

Univariariate Extreme Value Theory gives us the form of the distribution of minima for the general uniform distribution. The EVD for the maxima or the minima of the uniform distribution is a Weibull distribution:

 α−1  αzm exp (−(−z )α) if z < 0  dm m m ge(y) = (5.4)   0 if zm ≥ 0 where z = cm−y , α > 0, c and d are the location and shape parameter of the Weibull distribution, m dm m m respectively and are dependent on m.

1  In the case of the U 0, 2π distribution, we can take:

α = 1, (5.5)

cm = 0, (5.6) 1 d = (5.7) m 2πm

26 which allows us to rewrite equation 5.4 as follows:

  2πm exp (−2πmy) if y > 0 ge(y) = (5.8)  0 if y ≤ 0

We recognise the with parameter 2πm, for which the distribution function is:

  1 − exp (−2πmy) if y > 0 Ge(y) = (5.9)  0 if y ≤ 0

5.2 EVD for minima of F

Now probability that the probability of the most extreme sample is less than y0 is

Pr (f(Xm) ≤ y0) = 1 − Ge(y0)

= exp (−2πmy0) (5.10)

Therefore the probability that the radius of the most extreme sample is less than r0 is:   1 −r2/2 Pr (||X )|| ≤ r ) = exp −2πm × e 0 m 0 2π  −r2/2 = exp −me 0 . (5.11)

Equation 5.11 defines a univariate cumulative distribution function with respect to r0 which we call Fe,r. the corresponding probability density function is:

 2  2 r f (r) = mr exp −me−r /2 − . (5.12) e,r 2

This distribution can be interpreted as the distribution of the norm r of the most extreme sample. Since f is isotropic and the level sets of its pdf are concentric circles, we expect the extreme value distribution to have the same value at every point of a given level set 1, i.e. the target pdf is independent from θ.

1TODO: maybe I should formally prove that

27  1/2 We note that the mean of f is ln m and use the formula derived elsewhere for the integration of e,r ln(2) multivariate radial functions to write:

Z +∞ Z 2π  m 1/2 fe,r(t)dtdθ = 2π ln . (5.13) t=0 θ=0 ln (2)

We can now write the pdf of the extreme value distribution of F :

m||x||  m −1/2   ||x||2  ||x||2  f (x) = ln exp −m exp − − (5.14) e 2π ln (2) 2 2

Notes

• Novelty Scores - Using Pr (||Xm)|| ≤ r0) in eq. 5.11 to define:

 −||x||2/2 Fe(x) = exp −me , (5.15)

we can interpret Fe(x) as being the probability that taking m samples from F has generated an

extremum closer to the centre than x. Note that Fe, being a cumulative distribution function ranges from 0 to 1 and depends only on m and the norm of x.

 −r2/2 r2  • Link with the GEV - The univariate function Fe,r : r 7→ mr exp −me − 2 is very well approximated by a GEV distribution as m → ∞. It would make sense that they in fact converge to the same limit distribution.

28 (a) (b)

(c) (d)

Figure 5.1: (a) simulated and (b) analytical standard bivariate Gaussian distribution. (c) simulated and (d) analytical (equation 5.14) bivariate distribution of improbable extrema. m = 30, the number of 6 samples drawn to obtain each distribution is 10 .

29 Figure 5.2: top: probability density function of probabilities for the standard bivariate Gaussian distri- bution (histogram) and distribution of maxima and minima (in magnitude) for this uniform distribution (see eq. 5.14 with n = 2). The value of m is 30. Note the good fit between simulated and analytical curves for the EVDs. Bottom: results of the marginalisation with respect to θ of fig. 5.1.c and 5.1.d. The prediction made by Roberts in [2, 3] is also represented.

30 6 Extreme Value Distribution for a generic bivariate Gaussian distribu- tion

F is the generative distribution, G the associated probability distribution of probabilities. The probability function of F is now:

1  (x − µ)>Σ−1(x − µ) f(x) = exp − (6.1) 2π|Σ|1/2 2 1  ||x||2  = exp − M , (6.2) 2π|Σ|1/2 2 (6.3) where µ and Σ are the mean and the covariance matrix of the distribution, and ||.||M is the Mahalanobis distance associated with F . We can rewrite eq. 6.1 in Mahalanobis-polar coordinates:

1  r2  f(x) = exp − , (6.4) 2π|Σ|1/2 2 + where x = (r, θ), r = ||x||M ∈ R and θ ∈ [0, 2π[. The probability density function of G is: g(y) = 2π|Σ|1/2, (6.5) i h for all y ∈ 0, 1 2π|Σ|1/2 With all the modifications above in mind, eq. 5.12 becomes:

  r2  r2  f (r) = mr exp −m exp − − , (6.6) e,r 2 2 where r is now understood to be the Mahalanobis distance.

Again, we interpret this distribution as the distribution of the Mahalanobis distance to the mean of the most extreme sample. F is isotropic with respect to the Mahalanobis distance, the target pdf is independent of θ. Equation 5.14 therefore becomes:

m||x||   m 1/2   ||x||2  ||x||2  f (x) = M ln exp −m exp − M − M . (6.7) e 2π|Σ|1/2 ln 2 2 2

1/2 Normalisation by |Σ| is made necessary by the fact that fe,r is defined with respect to the Mahalanobis

31 distance.

We can now define:  −||x||2 /2 Fe(x) = exp −me M , (6.8) which we interpret as being the probability that sampling m times from F has generated an extremum closer to the centre than x (in the Mahalanobis sense). Fe can therefore be used as novelty score: the closer to 1

Fe(x) is, the less likely x is to be the extremum of from m samples generated from F .

32 (a) (b)

(c)

Figure 6.1: (a) analytical standard bivariate Gaussian distribution. (b) simulated and (c) analytical bivariate distribution of improbable extrema. m = 10, the number of samples drawn to obtain each distribution is 106.

33 Figure 6.2: top: probability density function of probabilities for a bivariate Gaussian distribution with full covariance matrix (histogram) and distribution of maxima and minima (in magnitude) for this uniform distribution. The value of m is 10. Bottom: results of the marginalisation with respect to θ of fig. 6.1.b and 6.1.c, where r is now the Mahalanobis distance. The prediction made by Roberts in [2] is also represented.

34 (a) (b)

(c)

Figure 6.3: Same figure as 6.1 with m = 100.

35 Figure 6.4: Same figure as 6.2 with m = 100.

36 7 Extreme Value Distribution for the standard n-dimensional Gaussian distribution

Using eq. 4.16 and theorem 3, it is easy to show that for all n > 1 the EVD of minima of the distribution function of probability is in the MDA of the Weibull distribution. However the choice of norming constants is not straightforward as the quantile of G are not known. These can of course be estimated numerically in an efficient way, but closed-from solutions are not to be expected.

Dimension is n, mean is µ and covariance matrix is Σ. Probability distribution function:

1  1  f (x) = exp − (x − µ)>Σ−1(x − µ) (7.1) n (2π)n/2|Σ|1/2 2 1  M(x)2  = exp − (7.2) (2π)n/2|Σ|1/2 2 (7.3) where M(x) is the Mahalanobis distance:

q M(x) = (x − µ)>Σ−1(x − µ) (7.4)

7.1 Cumulative distribution function for the standard n-dimensional Gaussian distri- bution

Cumulative distribution function Z y Gn(y) = gn(t; µ, Σ)dt 0 Z y (n−2)/2 1/2 h  n/2 1/2 i = Ωn|Σ| −2 ln (2π) |Σ| t dt (7.5) 0

Cumulative distribution function:

Ω Z R  r2  F (R) = n exp − rn−1dr, (7.6) n n/2 (2π) 0 2 (7.7)

+ where R ∈ R . Fn(R) is the probability mass contained in the ellipsoid defined by M(x) ≤ R.

37 Now let un(r) be: Z R  2  r n−1 un(R) = exp − r dr. (7.8) 0 2 We wanna evaluate this intergral as a function of n and R. First we note that:

Z R  r2  u1(R) = exp − dr 0 2 rπ  r  = erf √ (7.9) 2 2 (7.10) and

Z R  r2  u2(R) = exp − rdr 0 2   r2 R = − exp − 2 0  R2  = 1 − exp − (7.11) 2

.

Moreover, if n > 2, an integration by parts yield:

Z R  2  r n−1 un(R) = exp − r dr 0 2   r2  R Z R  r2  = − exp − rn−2 + (n − 2) exp − rn−3dr 2 0 0 2  R2  = − exp − Rn−2 + (n − 2)u (R). (7.12) 2 n−2 (7.13)

Eq. 7.12 is a recursive relation for the sequence (un)n∈N. un being a function of un−2 only, we will obtain two closed form formula, one for the odd values of n and one for the even values of n. The first terms are given by eq.7.9 and eq.7.11.

38 if n = 2p, p ≥ 1:

p−2 X 2k(p − 1)!  R2  u (R) = 2p−1(p − 1)!u (R) − Rn−2(k+1) exp − (7.14) 2p 2 (p − 1 − k)! 2 k=0 p−1 X 2k(p − 1)!  R2  = 2p−1(p − 1)! − Rn−2(k+1) exp − . (7.15) (p − 1 − k)! 2 k=0 if n = 2p + 1, p ≥ 1:

p−1 (n − 2)! X (n − 2)!(p − k)!  R2  u (R) = u (R) − Rn−2(k+1) exp − (7.16) 2p+1 1 2p−1(p − 1)! 2k−1(p − 1)!(n − 2k − 1)! 2 k=0

It follows from eq. 7.6, 7.15 and 7.16 that:

Ω F (R) = n u (R). (7.17) n (2π)n/2 n

 2  p−1 k Ω2p R X 2 (p − 1)! F (R) = 1 − exp − R2(p−k−1) (7.18) 2p (2π)p 2 (p − 1 − k)! k=0    2  p−1 R Ω2p+1 R X (n − 2)!(p − k)! 2(p−k)−1 F2p+1(R) = erf √ − exp − R (7.19) 2 (2π)p+1/2 2 2k−1(p − 1)!(2p − 2k)! k=0

Consequently, Fn(R) can be easily calculated with Matlab, without having to sample from the distribution. Figure 7.1 shows a comparison between simulated cumulative histograms and results obtained using eq. 7.15 and 7.16 directly.

Since fn is monotonically decreasing as the Mahalanobis increase, the cumulative distribution function (in the classic univariate sense of the term) is:

q  n/2 1/2  Gn(y) = 1 − Fn −2 ln (2π) |Σ| y (7.20)

39 Figure 7.1: ‘Cumulative distribution function’ as a function of the Mahalanobis distance for multivariate Gaussian distribution (n = 1 to 6). The continuous lines are obtained using either eq. 7.15 or eq. 7.16. For each value of n, the crosses represent the cumulative histogram obtained from a set of 106 samples drawn from the corresponding distribution.

40 p−1 X 2k(p − 1)!   (p−k−1) G (y) = Ω |Σ|1/2y −2 ln (2π)n/2|Σ|1/2y (7.21) 2p 2p (p − 1 − k)! k=0 q  p+1/2 1/2  G2p+1(y) = erfc − ln (2π) |Σ| y

p−1 X (2p − 1)!(p − k)!   (p−k)−1/2 +Ω |Σ|1/2y −2 ln (2π)p+1/2|Σ|1/2y (7.22) 2p+1 2k−1(p − 1)!(2p − 2k)! k=0

Using theorem 3, it can be shown that the distribution of minima of Fn is in the maximum domain of attraction of the Weibull distribution 2, with parameters:

dm = 0 (7.23)  1  c = (1 − G )← 1 − (7.24) m n m α = 1 (7.25) where m is the sampling parameter. The value for α is directly taken from theorem 3.

Therefore the EVD of minima for Gn is:

2 Gn is a regularly varying function with exponent 1

41 8 Notations

||.||M Mahalanobis distance.If Σ is an n × n positive matrix semi-definite (covariance p > −1 matrix), µ a mean vector and x an n-vector, then ||x||M = (x − µ) Σ (x − µ)

EVD Extreme Value Distribution.

dm location parameter for the extreme value distributions.

cm scale parameter for the extreme value distributions.

F distribution function and distribution of a random variable.

F ←(p) p-quantile of F .

F tail of the distribution function F : F = 1 − F .

MDA Maximum Domain of Attraction.

m number of samples from which an extremum is drawn/ Main parameter of Extreme Value Theory .

Mm maximum of X1,...,Xm.

n dimension of the data space.

d d → Am → A: convergence in distribution.

∼t tail-equivalence: F ∼t G means that the distribution F and G are tail equivalent.

Λ Gumbel distribution.

Φα Fr´echet distribution with shape parameter α. See theorem 1.

Ψα Weibull distribution α. See theorem 1.

References

[1] P. Embrechts, C Kl¨uppelberg, and T. Mikosch. Modelling Extremal Events for Insurance and Finance. Springer, 1997.

42 [2] S.J. Roberts. Novelty detection using extreme value . IEE. Proc.-Vis. Image Signal Process, 146(3), June 1999.

[3] S.J. Roberts. Extreme value statistics for novelty detection in biomedical data processing. IEE Proc.-Sci. Meas. Technol., 147(6):363–367, Nov. 2000.

[4] D.W. Scott. Multivariate density estimation: theory, practice and visualization. Wiley-Interscience, 1992.

[5] K. Worden and G. Manson. Experimental validation of a structural health monitoring methodology: Part i. novelty detection on a laboratory structure. Journal of Sound and Vibration, 252(2):323–343, 2002.

43