Central Limit Theorem The Central Limit Theorem is used to approximate probabilities when dealing with the sum or average of a large number of variables. We use our properties of expectation and to determine those parameters for our sum/average. Let S/A be the representing our Sum or Average. Once we have an   x − µS/A , µS/A, and standard deviation, σS/A, then FS/A(x) may be approximated by Φ , σS/A where Φ represents the cumulative distribution function of a standard (use the Z-table).

2 Suppose for a random variable Xi, the mean is µ, and variance is σ .

n X • Sums: There is no shorthand notation for the sum of n random variables, Xi. i=1 • Mean of Sum: " n # X E Xi = E [X1 + ··· + Xn] = E[X1] + ··· + E[Xn] = µ + ··· + µ = nµ i=1 • Variance of Sum (Note: All Covariances are zero, since we are talking about independent RVs): n ! X 2 2 2 Var Xi = Var (X1 + ··· + Xn) = Var(X1) + ··· Var(Xn) = σ + ··· + σ = nσ i=1 √ • Standard Deviation of Sum: n · σ

n P Xi • Averages - Notation: X¯ = i=1 n • Mean of Average (using Mean of Sum result): n  P  Xi " n # ¯  i=1  1 X 1 E[X] = E   = · E Xi = · nµ = µ n n n   i=1

• Variance of Average (using Variance of Sum result): n  P  Xi n ! 2 ¯  i=1  1 X 1 2 σ Var X = Var   = · Var Xi = · nσ = n n2 n2 n   i=1

σ • Standard Deviation of Average: √ n

Most places you look will discuss and give formulas for the Central Limit Theorem using sample averages. However, many of our questions ask us to calculate for sums of random variables, so I have included both sums and average formulas above. I do not have these parameters memorized. I use the properties of expected value and variance to calculate them each time. You are welcome to memorize them if you so choose. We are most concerned with the mean and standard deviation, since that is what goes into our normal distribution calculation. Once we have these parameters, we can convert to a standard normal distribution to solve the problem.

1 Ex. Salaries in a large insurance company are distributed with mean 43,500 and standard deviation 10,000. Calculate the probability that the mean salary of a selected sample of 120 workers from this company is greater than 45,000.

2 The Continuity Correction: Using Central Limit Theorem on Discrete RVs Suppose we are given the following problem: Ex. The probability that an inquiry leads to a sale is 0.7, independently for each inquiry. Over a period of time 200 such inquiries are received. By making a suitable approximation, estimate the probability that the number of sales from inquiries is more than 145.

The number of inquiries is a Binomial Random Variable, and we could calculate this probability directly, but then we wouldn’t have time for any other questions on the exam. We will use the Central Limit Theorem to estimate the number of inquiries, which we will call X. Since X is a discrete random variable, we run into a curious situation. We should be able to calculate this probability as P (X > 145) or P (X ≥ 146). However, if we let Y be our normal approximation to this , then when we calculate P (Y > 145) and P (Y ≥ 146), we arrive at two different answers! This is the result of using a continuous random variable to approximate a discrete random variable. What value should we use? P (Y > 145)? P (Y ≥ 146)? The answer is we choose the midpoint of these two values. We will calculate P (Y > 145.5). This concept is referred to as “the Continuity Correction”.

Comments: For the continuous distribution probabilities, it does not matter if we use P (Y > 145.5) or P (Y ≥ 145.5). We can also discuss this concept using Riemann sums, which I will likely do in the completed version/video of these notes. If we do want to include 145 in our calculations, such as if the question asked us to find P (X ≥ 145), we would use P (Y > 144.5).

Ex. Estimate the probability that the number of sales from inquiries is more than 145.

3 Ex. The number of automobile accidents each year in a particular suburb of Chicago is modeled using a with mean 225. Using the Central Limit Theorem, calculate the approximate probability that the number of motor accidents in a given year is more than 230 but at most 247.

4 Ex. An insurance company has 180 policy holders. An actuary has determined that the probability that a policy holder makes a claim in a given year is 1/6. Find the approximate probability that the number of claims is no more than 40 in a year.

5 Ex. The number of claims arriving at an insurance office per day can be modelled by a Poisson distribution with mean 10. In a working week of 5 days, calculate the approximate probability that the number of claims received lies strictly between 46 and 52.

6