<<

Generalized and Extreme Value Ariel Amir, 9/2017

Let us consider some distribution P (x) from which we draw x1, x2, ..., xn. If the distribution has a finite σ, then we can define:

PN (x − µ) ξ ≡ i=1√ i . (1) σ N

Where by the Central Limit Theorem P (ξ) approaches a Gaussian with vanishing and standard deviation of 1 as N → ∞. What happens when P (x) does not have a finite ? Or a finite mean? It turns out that there is a generalization of the CLT, which will lead to a different form of scaling, which will depend on the power law tail of the distribution.

Our analysis will have some rather surprising consequences in the case of heavy-tailed distributions where the mean diverges. In that case the sum of N variables will be dominated by rare events, and we will quantify this with the aid of extreme value distributions (characterizing for example the maximum of N i.i.d variables).

Fig. 1 shows one such example, where a running sum of variables drawn from a distribution whose tail falls off as p(x) ∼ 1/x1+µ was used.

140 000 150 120 000 250 100 000 200

100 sum 80 000 150 60 000 100

50 running 40 000 20 000 50 0 0 0 0 200 400 600 800 1000 N

Figure 1: Running sum for heavy tailed distributions (blue: µ = 0.5, red: µ = 1.5, green: µ = 2.5.)

mu=0.5 mu = 1.5 mu = 2.5 1 of sums

An important quantity to define is the characteristic function of the distribution:

Z ∞ fˆ(ω) = f(t)eiωtdt. (2) −∞

Given two variables x,y with distributions f(x) and g(y), consider their sum z = x + y. The distribution of z is given by:

Z ∞ P (z) = f(x)g(z − x)dx, (3) −∞ i.e., it is the convolution of the two variables. Consider now the Fourier transform of the distribution (i.e., its characteristic function):

Z ∞ Z ∞ eiωzP (z)dz = f(x)g(z − x)eiωzdxdz. (4) −∞ −∞

Changing variables to x and z − x ≡ y, the Jacobian of the transformation is 1, and we find:

Z ∞ Z ∞ Z ∞ eiωzP (z)dz = f(x)g(y)eiω(x+y)dxdy. (5) −∞ −∞ −∞

This is the product of the characteristic functions of f and y – which will shortly prove very useful.

This can be generalized to the case of the sum of n variables. Defining:

n X = x1 + x2 + ... + x , (6) we can write: Z ∞ Z ∞ iωX pˆ(ω) = ... p(x1)p(x2)..p(xn)dx1 dx2 ... dxn δ(x1 + x2 + ... + xn − X)e dX, (7) −∞ −∞ which upon doing the integration over X leads to the product of the characteristic functions.

2 Finally, the same property also holds for the Laplace transform of positive variables, with essentially the same proof.

2 Tauberian theorems

Consider n i.i.d variables drawn from a distribution p(x). The characteristic function of the sum is given by:

n gn(ω) = (f(ω)) . (8)

It is clear that the characteristic function is bounded by 1 in magnitude, and that as ω → 0 it tends to 1. The behavior of the characteristic function near the origin is determined by the tail of p(x) for large x, as we shall quantify later, in what’s known as a Tauberian theorem (relating large x and small ω behavior).

Consider first a variable whose distribution has a finite variance σ2 (and hence finite mean). Without loss of generality let’s assume that the mean is 0 – otherwise we can always consider a new variable with the iωx ω2 2 mean subtracted. We can Taylor expand e ≈ 1 + iω − 2 x near the origin, and find:

f(ω) ≈ 1 + aω + bω2 + ..., (9) with:

Z 1 Z ∞ σ2 a = i p(x)xdx = 0; b = − x2p(x)dx = . (10) 2 −∞ 2

Therefore: σ2ω2 g (ω) ≈ (1 − )n. (11) n 2

Thus:

2 2 2 2 n log(1− σ ω ) −n σ ω gn(ω) ≈ e 2 ≈ e 2 . (12)

3 In the next section we shall see the significance of the correction to the last expansion. Taking the inverse Fourier transform leads to:

2 − x Pn(x) ≈ e 2σ2n , (13)

√ i.e., the sum is approximated by a Gaussian with standard deviation σ n, as we know from the CLT.

Note that the finiteness of the variance –associated with the properties of the tail of P (x) – was essential to establishing the behavior of f(ω) for small frequencies. This is an example of a Tauberian theorem, where large x behavior determines the small ω behavior, and we shall see that this kind of relation holds also for cases when the variance diverges.

Before going to these interesting scenarios, let’s consider a particular example with finite variance, and find the regime of validity of the central limit theory.

2.1 Example:

Consider the following distribution, known as the Cauchy distribution:

1 f(x) = x 2 , (14) γπ(1 + ( γ ) )

(Note that this has the same Lorentzian expression as telegraph noise). Its Fourier transform is:

ϕ(ω) = e−γ|ω|. (15)

Thus the characteristic function of a sum of N such variables is:

ϕ(ω) = e−Nγ|ω|, (16)

4 and taking the inverse Fourier transform we find that the sum also follows a Cauchy distribution:

1 f(x) = x 2 . (17) γπ(1 + ( Nγ ) )

Thus, the sum does not converge to a Gaussian, but rather retains its Lorentzian form. Moreover, it is interesting to note that the scaling form governing the width of the Lorentzian evolves with N in a different way than the Gaussian scenario: while in the latter the variance increases linearly with N hence the width √ increases as N, here the scaling factor is linear in N. This remarkable property is in fact useful for certain computer science algoriths [REF].

3 Generalized CLT

We shall now use a different method to generalize this result further, to distributions with arbitrary . We will look for distributions which are stable: this that if we add two (or more) variables drawn from this distribution, the distribution of the sum will retain the same shape – i.e., it will be identical up to a potential shift and scaling, by some yet undetermined factors. Clearly, if the sum of a large number of variables drawn from any distribution converges to a distribution with a well defined shape, it must be such a . The family of such variables is known as Levy stable distributions.

Clearly, a Gaussian distribution obeys this property. We also saw that the inverse Laplace transform of e−xµ with 0 < µ < 1 obeys it. We shall now use a RG (renormalization group) approach to find the general form of such distributions, which will turn out to have a simple representation in Fourier rather than real space – because the characteristic function is the natural object to deal with here.

Defining the partial sums by sn, the general scaling one may consider is:

sn − bn ξn = . (18) an

Here, an determines the width of the distribution, and bn is a “shift”. If the distribution has a finite mean it seems plausible that we should center it by choosing bn = hxin. We will show that this is indeed the case, and that if its mean is infinite we can set bn = 0. Note that we have seen this previously for the case of positive variables.

5 The scaling we are seeking is of the form of Eq. (18), and our hope is that:

limn→∞P (ξn = ξ) (19) exists, i.e., the scaled sum converges to some distribution P (x), which is not necessarily Gaussian (or symmetric).

Let us denote the characteristic function of the scaled variable by ϕ(ω) (assumed to be independent of n as n → ∞). Consider the variable yn = ξnan. We have:

1 P˜(yn) = P (yn/an), (20) an with P the limiting distribution of Eq. (19). Therefore the characteristic function of the variable yn is:

ϕy(ω) = ϕ(anω). (21)

Note that the 1 prefactor canceled, as it should: since the characteristic function must be 1 at ω = 0. an

Consider next the distribution of the sum sn. We have sn = yn + bn, and:

Pˆ(sn) = P˜(sn − bn). (22)

It is straightforward to see that shifting a distribution by bn implies multiplying the characteristic function by eiωbn . Therefore the characteristic function of the sum is:

ibN ω ibN ω ΦN (ω) = e ϕy(ω) = e ϕ(aN ω). (23)

This form will be the basis for the rest of the derivation, where we emphasize that the characteristic function ϕ is N-independent.

Consider N = n · m, where n; m are two large numbers. The important insight is to realize that one may compute sN in two ways: as the sum of N of the original variables, or as the sum of m variables, each one being the sum of n of the original variables. The sum of n variables drawn from the original distribution

6 is given by:

ibnω Φn(ω) = e ϕ(anω). (24)

If we take a sum of m variables drawn from that distribution (i.e., the one corresponding to the sums of n’s), then its characteristic function will be on one hand:

imbnω m f(ω) = e (ϕ(anω)) , (25) and on the other hand it is the distribution of n · m = N variables drawn from the original distribution, and hence does not depend on n or m separately but only on their product N. Therefore:

∂ i N b ω+ N log[ϕ(a ω)] e n n n n = 0. (26) ∂n

bn Defining n ≡ cn, we find:

∂c N N ϕ0 ∂a ⇒ iNω n − log(ϕ) + n ω = 0. (27) ∂n n2 n ϕ ∂n

ϕ0(a ω)ω log(ϕ(a ω)) ∂c n ⇒ n = n − iω n . (28) ϕ(a ω) ∂an ∂n ∂an n n ∂n ∂n

Multiplying both sides by an and defining z ≡ anω, we find that:

ϕ0(z)z a ∂c n − log(ϕ(z)) n + iz n = 0. (29) ϕ(z) ∂an ∂n ∂an n ∂n ∂n

The equation for ϕ(z) has the mathematical structure:

ϕ0 C log(ϕ(z)) − 1 = iC , (30) ϕ z 2 with C1,C2 constants.

7 If C2 = 0, we can write the equation in the form:

ϕ0 log(ϕ)0 C = = [log(log(ϕ))]0 = 1 . (31) ϕ log(ϕ) log(ϕ) z

A|z|C1 ⇒ log(log(ϕ)) = C1 log |z| + D ⇒ ϕ = e . (32)

This is the general solution to the homogenous equation. Guessing a particular solution to the inhomoge- neous equation in the form log(ϕ) = Dz, leads to:

iC2 D − C1D = iC2 → D = . (33) 1 − C1

As long as C1 6= 1, we found a solution!

In the case C1 = 1, we can guess a solution of the form log(ϕ) = Dz log(z), and find:

D log(z) + D − D log(z) = iC2, (34) hence we have a solution when D = iC2.

Going back to Eq. (29), we can also get the scaling for the coefficients:

an ∂ log(an) = C1 → C1 = 1/n. (35) ∂an ∂n n ∂n

This implies that:

1 1/C1 log(an) = log(n) + const → an ∝ n . (36) C1

Similarly:

∂cn n = C2. (37) ∂n ∂an ∂n

8 Hence: ∂c n ∝ n1/C1−2. (38) ∂n

Therefore:

1/C1−1 1/C1 cn = C3n + C4 → bn = C3n + C4n. (39)

The first term will become a constant when we divide by the term n1/C1 of Eq. (18), leading to a simple shift of the resulting distribution. The second term will vanish for C1 < 1. We shall soon see that the case C1 > 1 corresponds to the case of a variable with finite mean, in which case the C4n term will be associated with centering of the scaled variable by subtracting their mean, as in the standard CLT.

3.1 General formula for the characteristic function

According to Eqs. (32) and (33), the general formula for the characteristic function for C1 6= 1 is:

C ϕ(z) = eA|z| 1 +Dz. (40)

the D term is associated with a trivial shift of the distribution (related to the linear scaling of bn) and can be eliminated. We will therefore not consider it in the following. The case of C1 = 1 will be consider in the next section.

The requirement that the inverse Fourier transform of ϕ is a probability distribution imposes that ϕ(−z) = ϕ∗(z). Therefore the characteristic function takes the form:

 C eAω 1 ω > 0 ϕ = (41) ∗ C eA |ω| 1 ω < 0.

This may be rewritten as: −a|ω|µ[1−iβsign(ω)tan( πµ )] ϕ = e 2 . (42)

The asymmetry of the distribution is determined by β. For this representation of ϕ, we will show that

9 −1 ≤ β ≤ 1, and that β = 1 corresponds to a positive distribution, β = −1 to a strictly negative distribution, and β = 0 to a symmetric one. This formula will not be valid for µ = 1, a case that we will discuss in the next section.

Consider f(x) which decays, for x > x∗, as

A f(x) = + , (43) x1+µ with 0 < µ < 1. We shall now derive yet another Tauberian relation, finding the form of the Fourier transform of f(x) near the origin in terms of the tail of the distribution. For small, positive ω we find:

Z ∞ Z ∞ im 1+µ A+ iωx e ω Φ(ω) ≡ 1+µ e dx = A+ 1+µ dm , (44) x∗ x x∗ω m ω where we substituted m = ωx. Evaluating the integral on the RHS by parts we obtain:

−µ Z ∞ −µ µ m im ∞ µ im im I = [−ω e x∗ω + ω e dm. (45) µ x∗ω µ

For µ < 1, we may approximate the integral by replacing the lower limit of integration by 0, to find:

∗−µ µ Z ∞ im x ix∗ω ω e I ≈ e + i µ dm, (46) µ µ 0 m

∞ im π R e −i 2 µ and 0 mµ dm = iΓ(1 − µ)e (this can easily be evaluated using contour integration). Thus, for small ω we have:

Φ(ω) ≈ −Cωµ, (47)

−i π µ with C ∝ e 2 . It is possible to bound the rest of the integral to be at most linear in ω for small ω (up to a constant). Therefore the characteristic function near the origin is approximated by:

ϕ(ω) ≈ 1 − Cωµ, (48)

10 with: Im(C) π π = −tan( µ) ⇒ C ∝ 1 − i · tan( µ). (49) Re(C) 2 2

This corresponds to β = 1 in our previous representation. If we similarly look at a distribution with a left π tail, a similar analysis leads to the same form of Eq. (48) albeit with C ∝ 1 + i · tan( 2 µ), corresponding to β = −1 in Eq. (42). In the general case where both A+ and A− exit, we obtain the expression

µ iµ π −iµ π ϕ(ω) ≈ 1 − |ω| (A+e 2 + A−e 2 ). (50)

π   Im(C) −sin( 2 µ) A+ − A− π ⇒ = π = −tan( µ)β, (51) Re(C) cos( 2 µ) A+ + A− 2 with β defined as:

A − A β = + − . (52) A+ + A−

What about the case where 1 < µ < 2?

Note that in this case p(x) = g0(x) where g decays as a power-law with 0 < µ˜ < 1 – but note that g will decay to a finite constant rather than to zero. Its Fourier transform will thus, by the same logic, be approximated by the following form for small ω:

µ˜ gˆ(ω) ≈ a1/ω + a2 − ω , (53) where the 1/ω divergence stems from the constant value of g at infinity. The Fourier transforms of the two functions are related byp ˆ = −iωgˆ. Therefore for small ω we have:

pˆ(ω) = 1 − (−iω)Cωµ˜ + iBω, (54) where the constant B must be equal to the expectation value of x, and Eq. (49) gives the form of C. We

11 can simply this using the trigonometric relations tan(π/2µ) = tan(π/2˜µ + π/2) = − cot(π/2˜µ), to find

π π −iC ∝ −i[1 − i tan( µ˜)] = − tan( µ˜) − i. (55) 2 2

Up to a real constant, this is proportional to:

π π 1 + i cot( µ˜) = 1 − i tan( µ). (56) 2 2

Thus the form of Eq. (42) (with |β| ≤ 1) is still intact also for 1 < µ < 2. Note that the linear term will drop out due to the shift of Eq. (18).

Also, it is worth mentioning that the asymmetry term vanishes as µ → 2: in the case of finite variance we always obtain a symmetric Gaussian, i.e., we become insensitive to the asymmetry of the original distribution.

Special Cases

µ = 1/2, β = 1: Levy distribution

Consider the Levy distribution: r − C C e 2x p(x) = (x ≥ 0) (57) 2π (x)3/2

The Fourier transform of p(x) for ω > 0 is

√ ϕ(ω) = e− −2iCω, (58)

π 1 which indeed correspond to tan( 2 2 ) = 1 → β = 1. µ = 1

The case µ = 1 and β = 0 corresponds to the previously discussed Cauchy distribution. In the general case µ = 1 and β 6= 0, we have seen that the general form of the characteristic function is, according to Eqs. (32) and (34):

C ϕ(z) = eA|z| 1 +Dz log(z). (59)

12 Repeating the logic we used before to establish the coefficients D for z > 0 and z < 0, based on the power-law tails of the distributions (which in this case fall off like 1/x2) leads to:

2 ϕ(ω) = e−|Cω|[1−iβSign(ω)φ]; φ = − log|ω|. (60) π

This is the only exception to the form of Eq. (42). It remains to be shown why β = 1 corresponds to the 2 case of a strictly positive distribution (which would thus justify the π factor in the definition of φ). To see this, note that the logic following up to Eq. (45) is still intact for the case µ = 1. However, we can no longer replace the lower limit of integration by 0. Instead, note that the real part of the integral can be evaluated by parts, and diverges as − log(ω), while the imaginary part of the integral does not suffer from such a divergence and we can well approximate it by replacing the lower limit of integration with 0. Therefore we find:

Z ∞ I/ω ≈ − log(ω) + i sin(x)/xdx = − log(ω) + iπ/2. (61) 0 This leads to the form of Eq. (60).

4 Exploring the stable distributions numerically

Since we defined the stable distributions in terms of their Fourier transforms, it is very easy to find them numerically by calculating the inverse Fourier transform. Here is a short code that finds the stable distribution for µ = 0.5 and β = 0.5 (i.e., corresponding to a distribution with right and left power-law tails with asymmetric magnitudes). It is easy to check that if β exceeds one, the inverse Fourier transform leads to a function with negative values – and hence does not correspond to a probability distribution – as we anticipated. mu=0.5; B=0.5; dt=0.001; w_vec=-10000:dt:10000; for indx=1:length(w_vec) w=w_vec(indx); f(indx)=exp(-abs(w)^mu/2*(1-i*B*tan(pi*mu/2)*sign(w)));

13 end; x_vec=-5:0.1:5; for indx=1:length(x_vec) x=x_vec(indx); y(indx)=0; for tmp=1:length(f) y(indx)=y(indx)+exp(-i*w_vec(tmp)*x)*f(tmp)*dt; end; end; y=y/(2*pi); plot(x_vec,y);

Fig. 2 shows the result.

There are much better numerical methods to evaluate the Levy stable distribution [1]. Fig. 3 shows examples for Levy distributions obtained using such a method. Note that for µ < 1 and β = 1 the support of the stable distribution is strictly positive, while this is not the case for µ > 1. This is expected, since for

µ ≤ 1 we have seen that we can set the shift bn arbitrarily to zero and still convergence to the Levy stable distribution. Since the initial distribution is strictly positive, the scaled sum will also be strictly positive, and hence the result for the limiting distribution follows. Note that this arguments fails for µ > 1.

5 RG Approach for Extreme Value Distributions

Consider the maximum of n variables drawn from some distribution p(x), characterized by a cumulative distribution C(x). We will now find the behavior of the maximum for large n, that will turn out to also follow universal statistics – much like in the case of the generalized CLT – that depend on the tails of p(x). This was discovered by Fisher and Tippett, motivated by an attempt to characterize the distribution of strengths of cotton fibers [2].

If we define:

Xn ≡ max(x1, x2, ..., xn), (62)

14 2.5

2

1.5 P(x) 1

0.5

0 −5 −4 −3 −2 −1 0 1 2 3 4 5 x

Figure 2: Levy stable distribution for µ = 0.5 and asymmetry β = 1, obtained by numerically inverting Eq. (42). then we have:

n P rob(Xn < c) = P rob(x1 < C)P rob(x2 < C)...P rob(xn < C) = C (x). (63)

For this reason it is natural to work with the cumulative distribution when dealing with extreme value statistics, akin to the role which the characteristic function played in the previous chapter. Clearly, it is easy to convert the question of the minimum of n variables to one related to the maximum, if we define p˜(x) = p(−x).

15 0.7

µ = 0.5, β = 0 0.6 µ = 1, β = 0 µ = 1.5, β = 0 µ = 0.5, β = 1 0.5 µ = 1, β = 1 µ = 1.5, β = 1

0.4 P(x) 0.3

0.2

0.1

0 −10 −8 −6 −4 −2 0 2 4 6 8 10 x

Figure 3: Levy stable distributions for µ ranging from 0.5 to 1.5 and asymmetry β ranging from 0 to 1.

5.1 Example I: the Gumbel distribution

Consider the distribution:

p(x) = e−x. (64)

The cumulative is therefore:

C(x) = 1 − e−x. (65)

16 The cumulative distribution for the maximum of n variables is therefore:

−x −(x−µ) G(x) = (1 − e−x)n ≈ e−ne = e−e , (66) with µ ≡ log(n).

This is an example of the Gumbel distribution.

Taking the derivative to find the probability distribution for the maximum, we find:

−(x−µ) p(x) = e−e e−(x−µ). (67)

Denoting l ≡ e−(x−µ), we have:

p(x) = e−ll, (68) and we can find the maximum by taking the derivative with respect to l, leading to:

e−ll = e−l → l = 1. (69)

Hence the distribution is peaked at x = log(n). It is easy to see that its width is of order unity. We can now revisit the approximation we made in Eq. (66), and check its validity.

We have seen before that making the approximation:

(1 − x/n)n ≈ e−x, (70)

√ is valid under the condition x  n.

In our case, this implies:

√ e−xn  n. (71)

At the peak of the distribution, we have e−xn = 1, and the approximation is valid for n  1. From

17 √ Eq. (71) we see that this would hold in a region of size o(log( n)) around it – since the width of the distribution is of order unity, this implies that for large n the Gumbel distribution would approximate the exact solution well, until we are sufficiently far in the tail that the probability distribution is vanishingly small. However, a note of caution is in place: the logarithmic dependence we found signals a very slow converge to the limiting form. This is also true in the case where the distribution p(x) is Gaussian, as was already noted in Fisher and Tippett’s original work [2]

We will soon show that for the maximum of a large number of variables drawn from any distributions with a sufficiently fast decaying tail, the Gumbel distribution arises.

5.2 Example II: the

Consider the minimum of the same distribution we had in the previous example. The same logic would give us that:

−ξn P rob[min(x1, ..xn) > ξ] = P rob[x1 > ξ]P rob[x2 > ξ]..P rob[xn > ξ] = e . (72)

This is an example of the Weibull distribution, which occurs when the variable is bounded (e.g: in this case the variable is never negative).

The general case, for the case of a maximum of n variables with distribution bounded by C, would be:

 α − C−x e ( b ) , x ≤ C G(x) = (73) 0, x > C

5.3 Example III: the Frechet distribution

The final example will belong in the third possible universality class, corresponding to variables with a power-law tail.

If at large x we have:

a p(x) = , (74) (x − b)1+µ

18 Then the cumulative distribution is:

a C(x) = 1 − . (75) µ(x − b)µ

Therefore taking it to a large power n we find:

− an Cn(x) ≈ e µ(x−b)µ . (76)

Upon appropriately scaling the variable, we find that:

 −µ −A x−B G(x) = e n1/µ , (77)

(where A and B do not depend on n). We shall now show that these 3 cases can be derived in a general framework, using a similar approach to the one we used earlier.

5.4 General form for extreme value distributions

We will follow similar logic to the “RG” used to find the form of the characteristic functions in the generalized CLT. Let us assume that there exists some scaling coefficients an, bn such that when we define:

X − bn ξn ≡ , (78) an The following limit exists:

lim P rob(ξn = ξ) = g(ξ). (79) n→∞ (note that this limit is not unique: we can always shift and rescale by a constant).     This would imply that p(X) ≈ a−1g X−bn and the cumulative is given by: G X−bn . By the same n an an logic we used before, we know that: X − b  Gm n (80) an

19 depends only on the quantity N = n × m. Therefore we have:

∂  X − b  GN/n n = 0. (81) ∂n an

This implies that: ∂ N  X − b  log G n = 0. (82) ∂n n an

From this we can deduce that:

0     N N G ∂ bn x ∂an − 2 log G + − − 2 = 0. (83) n n G ∂n an an ∂n

Now, if we fix an and bn, we have that:

G(z)0 (a − zb) = c. (84) G(z) log G(z)

From this we can see that: 1 log(log(G)) = log(a1z + a2) (85) a3

Now we simply solve for G to recover that G takes the form:

1 a G = e(a1z+a2) 3 . (86)

How can the three forms of extreme value distribution be recovered from this form? Frechet Distribution: One choice would yield the Frechet distribution:

−µ e−a(x−b) Frechet, (87) where we have µ > 0.

Weibull Distribution:

20 A different choice of the signs of the coefficients of Eq. (86) yields the Weibull distribution:

µ ea(c−x) Weibull, (88) with µ > 0.

Gumbel Distribution:

Finally, if we take the limit as a3 = 1/µ → 0 one can choose an a such that:

 x − dµ  x − d 1 − → exp − . (89) bµ b

This gets us the Gumbel distribution:

  x − d exp −exp − Gumbel. (90) b

Let us now consider the scaling coefficients an and bn. For the Frechet distribution, we have:

 −µ −(N/n)a X−bn −b GN/n(X) = e an . (91)

Since this has to be n-independent, it is easy to see that we must have bn = const + ban, and:

−µ 1/µ an ∝ 1/n → an ∝ n . (92)

Note that for µ < 1 this implies that the maximum scales in the same way as the sum! This elucidates the behavior of the Levy flights which we have seen earlier.

By the same reasoning we find that for the Weibull distribution of Eq .(87) we have:

( bn−X )µ G(X) = e an , (93)

−1/µ with bn = const and an ∝ n .

Note that in this case because the original distribution is bounded, the distribution of the maximum

21 becomes narrower with larger n.

In a similar fashion the scaling for the Gumbel distribution can be determined. By the same logic, the following expression should be n-independent:

!! N X−bn − d exp − exp − an . (94) n b

Equivalently, we can demand the following expression to be n-independent:

X−bn − d an + log(N/n). (95) b

This implies, remarkably, that bn ∝ log(n) while an = constant: the distribution retains the same shape and width, but (slowly) shifts towards larger values.

Acknowledgments I thank Ori Hirschberg, Felix Wong and Po-Yi Ho for useful comments and the students of AM 203 at Harvard where this material was first tested.

References

[1] http://math.bu.edu/people/mveillet/html/alphastablepub.html

[2] Fisher, R.A. and Tippett, L.H.C., 1928, April. Limiting forms of the frequency distribution of the largest or smallest member of a sample. In Mathematical Proceedings of the Cambridge Philosophical Society (Vol. 24, No. 2, pp. 180-190). Cambridge University Press.

22