Quick viewing(Text Mode)

Variance and Standard Deviation Math 217 Probability and Statistics

Variance and Standard Deviation Math 217 Probability and Statistics

Pair of dice. Let’s take as an example the roll X of one fair die. We know µ = E(X) = 3.5. What’s the and standard of X? Variance and Var(X) Math 217 Probability and = E(X − µ)2 Prof. D. Joyce, Fall 2014 6 X = (x − 3.5)2 P (X = x) Variance for discrete random variables. The x=1 variance of a X is intended to give 6 1 X 2 a measure of the spread of the random variable. If = 6 (x − 3.5) X takes values near its µ = E(X), then the x=1 1 5 2 3 2 1 2 1 2 3 2 5 2 variance should be small, but if it takes values from = 6 (− 2 ) + (− 2 ) + (− 2 ) + ( 2 ) + ( 2 ) + ( 2 ) from µ, then the variance should be large. 35 = 12 The measure we’ll use for distance from the mean 2 35 will be the of the distance from the mean, Since the variance is σ = 12 , therefore the stan- p (x − µ)2, rather than the distance from the mean, dard deviation is σ = 35/12 ≈ 1.707. |x−µ|. There are three reasons for using the square of the distance rather than the absolute value of the Properties of variance. Although the defini- difference. First, the square is easier to work with tion works okay for computing variance, there is an mathematically. For instance, x2 has a derivative, alternative way to compute it that usually works but |x| doesn’t. Second, large distances from the better, namely, mean become more significant, and it has been ar- 2 2 2 gued that this is a desirable property. Most impor- σ = Var(X) = E(X ) − µ tant, though, is that the square is the right measure Here’s why that works. to use in order to derive the important theorems in the theory of probability, in particular, the Central σ2 = Var(x) Limit Theorem. = E(X − µ)2 Anyway, we define the variance Var(X) of a ran- = E(X2 − 2µX + µ2) dom variable X as the expectation of the square of 2 2 the distance from the mean, that is, = E(X ) − 2µE(X) + µ = E(X2) − 2µµ + µ2 Var(X) = E(X − µ)2. = E(X2) − µ2 As the square is used in the definition of vari- Here are a couple more properties of variance. ance, we’ll use the of the variance to First, if you multiply a random variable X by a normalize this measure of the spread of the ran- constant c to get cX, the variance changes by a dom variable. The square root of the variance is factor of the square of c, that is called the standard deviation, denoted in our text as SD(X) Var(cX) = c2 Var(X). q SD(X) = pVar(X) = E(X − µ)2. That’s the main reason why we take the square root of variance to normalize it—the standard deviation Just as we have a symbol µ for the mean, or expec- of cX is c times the standard deviation of X: tation, of X, we denote the standard deviation of X as σ, and so the variance is σ2. SD(cX) = |c| SD(x).

1 (Absolute value is needed in case c is negative.) It’s S = X1 + X2 + ··· + Xn where each Xi equals 1 easy to show that Var(cX) = c2 Var(X): with probability p and 0 with probability q = 1−p. By the preceding theorem, Var(cX) = E(cX − µ)2 Var(S) = Var(X ) + Var(X ) + ··· + Var(X ). = Ec2(X − µ)2 1 2 n = c2E(X − µ)2 We can determine that if we can determine the vari- = c2 Var(X) ance of one Bernoulli trial X. Now, Var(X) = E(X2) − µ2, and for a Bernoulli 2 2 The next important property of variance is that trial µ = p. Let’s compute E(X ). E(X ) = it’s translation , that is, if you add a con- P (X=0)0 + P (X=1)1 = p. Therefore, the variance 2 stant to a random variable, the variance doesn’t of one Bernoulli trial is Var(X) = p − p = pq. change: From that observation, we conclude the variance Var(X + c) = Var(X). of the is In general, the variance of the sum of two random Var(S) = n Var(X) = npq. variables is not the sum of the of the two Taking the square root, we see that the standard√ random variables. But it is when the two random deviation of that binomial distribution is npq. variables are independent. That gives us the important observation that the Theorem. If X and Y are independent random spread of a binomial distribution is proportional to variables, then Var(X + Y ) = Var(X) + Var(Y ). the square root of n, the number of trials. Proof: This proof relies on the fact that E(XY ) = The argument generalizes to other distributions: E(X) E(Y ) when X and Y are independent. The standard deviation of a random is pro- portional to the square root of the number of trials Var(X + Y ) in the sample. = E((X + Y )2) − µ2 X+Y Variance of a . Con- = E(X2 + 2XY + Y 2) − (µ + µ )2 X Y sider the time T to the first success in a Bernoulli 2 2 = E(X ) + 2E(X)E(Y ) + E(Y ) process. Its probability mass function is f(t) = 2 2 t−1 − µX − 2µX µY − µY pq . We saw that its mean was µ = E(T ) = 2 2 1 = E(X ) + 2µX µY + E(Y ) p . We’ll compute its variance using the formula 2 2 Var(X) = E(X2) − µ2. − µX − 2µX µY − µY 2 2 2 2 ∞ = E(X ) − µX + E(Y ) − µY X E(T 2) = t2pqt−1 = Var(X) + Var(Y ) t=1 2 2 2 2 n−1 q.e.d. = 1p + 2 pq + 3 pq + ··· + n pq + ··· The last power series we got when we evaluated Variance of the binomial distribution. Let S E(T ) was be the number of successes in n Bernoulli trials, 1 where the probability of success is p. This random = 1 + 2x + 3x2 + ··· + nxn−1 + ··· (1 − x)2 variable S has a binomial distribution, and we could use its probability mass function to compute it, but Multiply it by x to get there’s an easier way. The random variable S is x = x + 2x2 + 3x3 + ··· + nxn + ··· actually a sum of n independent Bernoulli trials (1 − x)2

2 Differentiate that to get 1 + x = 1 + 22x + 32x2 + ··· + n2xn−1 + ··· (1 − x)3

Set x to q, and multiply the equation by p, and we get 1 + q = p + 22pq + 32pq2 + ··· + n2pqn−1 + ··· p2 1 + q Therefore E(T 2) = . Finally, p2 1 + q 1 q Var(X) = E(X2) − µ2 = − = . p2 p2 p2

Math 217 Home Page at http://math.clarku. edu/~djoyce/ma217/

3