Generalized Central Limit Theorem and Extreme Value Statistics Ariel Amir, 9/2017

Generalized Central Limit Theorem and Extreme Value Statistics Ariel Amir, 9/2017 Let us consider some distribution P (x) from which we draw x1; x2; :::; xn. If the distribution has a finite standard deviation σ, then we can define: PN (x − µ) ξ ≡ i=1p i : (1) σ N Where by the Central Limit Theorem P (ξ) approaches a Gaussian with vanishing mean and standard deviation of 1 as N ! 1. What happens when P (x) does not have a finite variance? Or a finite mean? It turns out that there is a generalization of the CLT, which will lead to a different form of scaling, which will depend on the power law tail of the distribution. Our analysis will have some rather surprising consequences in the case of heavy-tailed distributions where the mean diverges. In that case the sum of N variables will be dominated by rare events, and we will quantify this with the aid of extreme value distributions (characterizing for example the maximum of N i.i.d variables). Fig. 1 shows one such example, where a running sum of variables drawn from a distribution whose tail falls off as p(x) ∼ 1=x1+µ was used. 140 000 150 120 000 250 100 000 200 100 sum 80 000 150 60 000 100 50 running 40 000 20 000 50 0 0 0 0 200 400 600 800 1000 N Figure 1: Running sum for heavy tailed distributions (blue: µ = 0:5, red: µ = 1:5, green: µ = 2:5.) mu=0.5 mu = 1.5 mu = 2.5 1 Probability distribution of sums An important quantity to define is the characteristic function of the distribution: Z 1 f^(!) = f(t)ei!tdt: (2) −∞ Given two variables x,y with distributions f(x) and g(y), consider their sum z = x + y. The distribution of z is given by: Z 1 P (z) = f(x)g(z − x)dx; (3) −∞ i.e., it is the convolution of the two variables. Consider now the Fourier transform of the distribution (i.e., its characteristic function): Z 1 Z 1 ei!zP (z)dz = f(x)g(z − x)ei!zdxdz: (4) −∞ −∞ Changing variables to x and z − x ≡ y, the Jacobian of the transformation is 1, and we find: Z 1 Z 1 Z 1 ei!zP (z)dz = f(x)g(y)ei!(x+y)dxdy: (5) −∞ −∞ −∞ This is the product of the characteristic functions of f and y { which will shortly prove very useful. This can be generalized to the case of the sum of n variables. Defining: n X = x1 + x2 + ::: + x ; (6) we can write: Z 1 Z 1 i!X p^(!) = ::: p(x1)p(x2)::p(xn)dx1 dx2 ::: dxn δ(x1 + x2 + ::: + xn − X)e dX; (7) −∞ −∞ which upon doing the integration over X leads to the product of the characteristic functions. 2 Finally, the same property also holds for the Laplace transform of positive variables, with essentially the same proof. 2 Tauberian theorems Consider n i.i.d variables drawn from a distribution p(x). The characteristic function of the sum is given by: n gn(!) = (f(!)) : (8) It is clear that the characteristic function is bounded by 1 in magnitude, and that as ! ! 0 it tends to 1. The behavior of the characteristic function near the origin is determined by the tail of p(x) for large x, as we shall quantify later, in what's known as a Tauberian theorem (relating large x and small ! behavior). Consider first a variable whose distribution has a finite variance σ2 (and hence finite mean). Without loss of generality let's assume that the mean is 0 { otherwise we can always consider a new variable with the i!x !2 2 mean subtracted. We can Taylor expand e ≈ 1 + i! − 2 x near the origin, and find: f(!) ≈ 1 + a! + b!2 + :::; (9) with: Z 1 Z 1 σ2 a = i p(x)xdx = 0; b = − x2p(x)dx = : (10) 2 −∞ 2 Therefore: σ2!2 g (!) ≈ (1 − )n: (11) n 2 Thus: 2 2 2 2 n log(1− σ ! ) −n σ ! gn(!) ≈ e 2 ≈ e 2 : (12) 3 In the next section we shall see the significance of the correction to the last expansion. Taking the inverse Fourier transform leads to: 2 − x Pn(x) ≈ e 2σ2n ; (13) p i.e., the sum is approximated by a Gaussian with standard deviation σ n, as we know from the CLT. Note that the finiteness of the variance {associated with the properties of the tail of P (x) { was essential to establishing the behavior of f(!) for small frequencies. This is an example of a Tauberian theorem, where large x behavior determines the small ! behavior, and we shall see that this kind of relation holds also for cases when the variance diverges. Before going to these interesting scenarios, let's consider a particular example with finite variance, and find the regime of validity of the central limit theory. 2.1 Example: Cauchy distribution Consider the following distribution, known as the Cauchy distribution: 1 f(x) = x 2 ; (14) γπ(1 + ( γ ) ) (Note that this has the same Lorentzian expression as telegraph noise). Its Fourier transform is: '(!) = e−γj!j: (15) Thus the characteristic function of a sum of N such variables is: '(!) = e−Nγj!j; (16) 4 and taking the inverse Fourier transform we find that the sum also follows a Cauchy distribution: 1 f(x) = x 2 : (17) γπ(1 + ( Nγ ) ) Thus, the sum does not converge to a Gaussian, but rather retains its Lorentzian form. Moreover, it is interesting to note that the scaling form governing the width of the Lorentzian evolves with N in a different way than the Gaussian scenario: while in the latter the variance increases linearly with N hence the width p increases as N, here the scaling factor is linear in N. This remarkable property is in fact useful for certain computer science algoriths [REF]. 3 Generalized CLT We shall now use a different method to generalize this result further, to distributions with arbitrary support. We will look for distributions which are stable: this means that if we add two (or more) variables drawn from this distribution, the distribution of the sum will retain the same shape { i.e., it will be identical up to a potential shift and scaling, by some yet undetermined factors. Clearly, if the sum of a large number of variables drawn from any distribution converges to a distribution with a well defined shape, it must be such a stable distribution. The family of such variables is known as Levy stable distributions. Clearly, a Gaussian distribution obeys this property. We also saw that the inverse Laplace transform of e−xµ with 0 < µ < 1 obeys it. We shall now use a RG (renormalization group) approach to find the general form of such distributions, which will turn out to have a simple representation in Fourier rather than real space { because the characteristic function is the natural object to deal with here. Defining the partial sums by sn, the general scaling one may consider is: sn − bn ξn = : (18) an Here, an determines the width of the distribution, and bn is a \shift". If the distribution has a finite mean it seems plausible that we should center it by choosing bn = hxin. We will show that this is indeed the case, and that if its mean is infinite we can set bn = 0. Note that we have seen this previously for the case of positive variables. 5 The scaling we are seeking is of the form of Eq. (18), and our hope is that: limn!1P (ξn = ξ) (19) exists, i.e., the scaled sum converges to some distribution P (x), which is not necessarily Gaussian (or symmetric). Let us denote the characteristic function of the scaled variable by '(!) (assumed to be independent of n as n ! 1). Consider the variable yn = ξnan. We have: 1 P~(yn) = P (yn=an); (20) an with P the limiting distribution of Eq. (19). Therefore the characteristic function of the variable yn is: 'y(!) = '(an!): (21) Note that the 1 prefactor canceled, as it should: since the characteristic function must be 1 at ! = 0. an Consider next the distribution of the sum sn. We have sn = yn + bn, and: P^(sn) = P~(sn − bn): (22) It is straightforward to see that shifting a distribution by bn implies multiplying the characteristic function by ei!bn . Therefore the characteristic function of the sum is: ibN ! ibN ! ΦN (!) = e 'y(!) = e '(aN !): (23) This form will be the basis for the rest of the derivation, where we emphasize that the characteristic function ' is N-independent. Consider N = n · m, where n; m are two large numbers. The important insight is to realize that one may compute sN in two ways: as the sum of N of the original variables, or as the sum of m variables, each one being the sum of n of the original variables. The sum of n variables drawn from the original distribution 6 is given by: ibn! Φn(!) = e '(an!): (24) If we take a sum of m variables drawn from that distribution (i.e., the one corresponding to the sums of n's), then its characteristic function will be on one hand: imbn! m f(!) = e ('(an!)) ; (25) and on the other hand it is the distribution of n · m = N variables drawn from the original distribution, and hence does not depend on n or m separately but only on their product N. Therefore: @ i N b !+ N log['(a !)] e n n n n = 0: (26) @n bn Defining n ≡ cn, we find: @c N N '0 @a ) iN! n − log(') + n ! = 0: (27) @n n2 n ' @n '0(a !)! log('(a !)) @c n ) n = n − i! n : (28) '(a !) @an @n @an n n @n @n Multiplying both sides by an and defining z ≡ an!, we find that: '0(z)z a @c n − log('(z)) n + iz n = 0: (29) '(z) @an @n @an n @n @n The equation for '(z) has the mathematical structure: '0 C log('(z)) − 1 = iC ; (30) ' z 2 with C1;C2 constants.

Generalized Central Limit Theorem and Extreme Value Statistics Ariel Amir, 9/2017

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support