Quick viewing(Text Mode)

The Central Limit Theorem (CLT) Says That If Ξ Is a Random Variable With

The (CLT) says that if ξ is a with < ξ > and σ2, then for large N the sample mean ξ + ξ + ... + ξ m = 1 2 N (1) N is approximately Gaussian with mean < ξ > and variance σ2/N. Let us take an example: let ξ be a Poisson random variable with mean 10. Then its variance is also 10 (a peculiar property of the Poisson distribution). If we let N be 20 in the sample mean above, then we expect the sample mean to be approximately Gaussian with mean 10 and variance σ2 = 10/20 = 0.5, so that 2σ2 = 1; thus the of the sample should be approximately the Gaussian distribution

1 2 P (x) = √ e−(x−10) (2) π In fact, though, m is a discrete random variable. What do we mean when we say that its distribution is described by the distribution function (or probability density) P(x)? This can only mean that when we look in some , the two agree pretty well. Now if we compute sample means many times and make a histogram of the results, we fill up “bins” that correspond to certain ranges of values. Since ξ takes integer values, and in finding the mean we divide by N=20, the actual values that can occur for m are things like ... 9.35, 9.40, 9.45, .... These could all go into a bin for values between 9 and 9.5. Suppose we sample m many times, say Nm times. Then what does the Gaussian distribution predict for the number of values in that bin? We would have to integrate over the interval, to get the probability for landing in the interval, and then multiply by Nm, to find the actual number that landed there. That is we predict Z 9.5 Nm P (x)dx ≈ NmP (9.25)(0.5) (3) 9 values in the interval. Notice that the bin size, 0.5, comes in here. Clearly if we chose the bins larger than 0.5, there would be more values in each bin (and fewer bins). This number should agree with the histogram. You should see the agreement if you sample m many times (Nm times), do a histogram, and on top of it NmP (x)∆x, where ∆x is the bin size in the histogram. We’ll see an example. Here is a possibly confusing point. The Poisson distribution itself looks Gaus- sian for large mean λ. Notice that this is a completely different phenomenon! It means that if λ is large and you sample the distribution many times (Np times), and make a histogram, it will look like the Gaussian

1 −(x−λ)2/2λ Np √ e (4) 2πλ (I’m assuming here that you make the bin size 1, since the Poisson distribution takes values 0, 1, 2, ..., i.e., each integer value gets its own bin.) Think why this is NOT the Central Limit Theorem.

1