Variance and Standard Deviation Math 217 Probability and Statistics
Total Page:16
File Type:pdf, Size:1020Kb
Pair of dice. Let's take as an example the roll X of one fair die. We know µ = E(X) = 3:5. What's the variance and standard deviation of X? Variance and standard deviation Var(X) Math 217 Probability and Statistics = E(X − µ)2 Prof. D. Joyce, Fall 2014 6 X = (x − 3:5)2 P (X = x) Variance for discrete random variables. The x=1 variance of a random variable X is intended to give 6 1 X 2 a measure of the spread of the random variable. If = 6 (x − 3:5) X takes values near its mean µ = E(X), then the x=1 1 5 2 3 2 1 2 1 2 3 2 5 2 variance should be small, but if it takes values from = 6 (− 2 ) + (− 2 ) + (− 2 ) + ( 2 ) + ( 2 ) + ( 2 ) from µ, then the variance should be large. 35 = 12 The measure we'll use for distance from the mean 2 35 will be the square of the distance from the mean, Since the variance is σ = 12 , therefore the stan- p (x − µ)2, rather than the distance from the mean, dard deviation is σ = 35=12 ≈ 1:707. jx−µj. There are three reasons for using the square of the distance rather than the absolute value of the Properties of variance. Although the defini- difference. First, the square is easier to work with tion works okay for computing variance, there is an mathematically. For instance, x2 has a derivative, alternative way to compute it that usually works but jxj doesn't. Second, large distances from the better, namely, mean become more significant, and it has been ar- 2 2 2 gued that this is a desirable property. Most impor- σ = Var(X) = E(X ) − µ tant, though, is that the square is the right measure Here's why that works. to use in order to derive the important theorems in the theory of probability, in particular, the Central σ2 = Var(x) Limit Theorem. = E(X − µ)2 Anyway, we define the variance Var(X) of a ran- = E(X2 − 2µX + µ2) dom variable X as the expectation of the square of 2 2 the distance from the mean, that is, = E(X ) − 2µE(X) + µ = E(X2) − 2µµ + µ2 Var(X) = E(X − µ)2: = E(X2) − µ2 As the square is used in the definition of vari- Here are a couple more properties of variance. ance, we'll use the square root of the variance to First, if you multiply a random variable X by a normalize this measure of the spread of the ran- constant c to get cX, the variance changes by a dom variable. The square root of the variance is factor of the square of c, that is called the standard deviation, denoted in our text as SD(X) Var(cX) = c2 Var(X): q SD(X) = pVar(X) = E(X − µ)2: That's the main reason why we take the square root of variance to normalize it|the standard deviation Just as we have a symbol µ for the mean, or expec- of cX is c times the standard deviation of X: tation, of X, we denote the standard deviation of X as σ, and so the variance is σ2. SD(cX) = jcj SD(x): 1 (Absolute value is needed in case c is negative.) It's S = X1 + X2 + ··· + Xn where each Xi equals 1 easy to show that Var(cX) = c2 Var(X): with probability p and 0 with probability q = 1−p. By the preceding theorem, Var(cX) = E(cX − µ)2 Var(S) = Var(X ) + Var(X ) + ··· + Var(X ): = Ec2(X − µ)2 1 2 n = c2E(X − µ)2 We can determine that if we can determine the vari- = c2 Var(X) ance of one Bernoulli trial X. Now, Var(X) = E(X2) − µ2, and for a Bernoulli 2 2 The next important property of variance is that trial µ = p. Let's compute E(X ). E(X ) = it's translation invariant, that is, if you add a con- P (X=0)0 + P (X=1)1 = p. Therefore, the variance 2 stant to a random variable, the variance doesn't of one Bernoulli trial is Var(X) = p − p = pq. change: From that observation, we conclude the variance Var(X + c) = Var(X): of the binomial distribution is In general, the variance of the sum of two random Var(S) = n Var(X) = npq: variables is not the sum of the variances of the two Taking the square root, we see that the standardp random variables. But it is when the two random deviation of that binomial distribution is npq. variables are independent. That gives us the important observation that the Theorem. If X and Y are independent random spread of a binomial distribution is proportional to variables, then Var(X + Y ) = Var(X) + Var(Y ). the square root of n, the number of trials. Proof: This proof relies on the fact that E(XY ) = The argument generalizes to other distributions: E(X) E(Y ) when X and Y are independent. The standard deviation of a random sample is pro- portional to the square root of the number of trials Var(X + Y ) in the sample. = E((X + Y )2) − µ2 X+Y Variance of a geometric distribution. Con- = E(X2 + 2XY + Y 2) − (µ + µ )2 X Y sider the time T to the first success in a Bernoulli 2 2 = E(X ) + 2E(X)E(Y ) + E(Y ) process. Its probability mass function is f(t) = 2 2 t−1 − µX − 2µX µY − µY pq . We saw that its mean was µ = E(T ) = 2 2 1 = E(X ) + 2µX µY + E(Y ) p . We'll compute its variance using the formula 2 2 Var(X) = E(X2) − µ2. − µX − 2µX µY − µY 2 2 2 2 1 = E(X ) − µX + E(Y ) − µY X E(T 2) = t2pqt−1 = Var(X) + Var(Y ) t=1 2 2 2 2 n−1 q.e.d. = 1p + 2 pq + 3 pq + ··· + n pq + ··· The last power series we got when we evaluated Variance of the binomial distribution. Let S E(T ) was be the number of successes in n Bernoulli trials, 1 where the probability of success is p. This random = 1 + 2x + 3x2 + ··· + nxn−1 + ··· (1 − x)2 variable S has a binomial distribution, and we could use its probability mass function to compute it, but Multiply it by x to get there's an easier way. The random variable S is x = x + 2x2 + 3x3 + ··· + nxn + ··· actually a sum of n independent Bernoulli trials (1 − x)2 2 Differentiate that to get 1 + x = 1 + 22x + 32x2 + ··· + n2xn−1 + ··· (1 − x)3 Set x to q, and multiply the equation by p, and we get 1 + q = p + 22pq + 32pq2 + ··· + n2pqn−1 + ··· p2 1 + q Therefore E(T 2) = . Finally, p2 1 + q 1 q Var(X) = E(X2) − µ2 = − = : p2 p2 p2 Math 217 Home Page at http://math.clarku. edu/~djoyce/ma217/ 3.