Mean, the Variance, and Simple Statistics
Total Page:16
File Type:pdf, Size:1020Kb
©2009-2018 Michael J. Rosenfeld, draft version 1.7 (under construction). draft October 3, 2018 Notes on the Mean, the Standard Deviation, and the Standard Error. Practical Applied Statistics for Sociologists. An introductory word on philosophy of the class: My goal in this class is to give you an intuitive understanding of some basic statistical ideas, and to give you practical experience with using basic statistics. Toward these pedagogical ends, I dispense with as much statistical formality as I possibly can. We will talk a little bit about linear algebra and about calculus (two bedrocks of statistical theory) in this class, but only as asides. Consider a variable X. E(X) is the expected value of X or the mean of X. The formal definition of E(X) is E()Xxpx ii () i if X is a discrete function, meaning you sum up the different outcomes weighted by how likely each different outcome is. If X is a continuous function, the expectation is defined this way: EX() xfxdx () where f(x) is the probability density function. Expectation is an important idea, but it is somewhat abstract. If X is Normally distributed (an assumption that is actually quite reasonable in the kinds of data and questions we will be looking at), and we have a bunch of x’s to observe, the sample mean of our x’s is actually equal to the expectation of X. Since the sample mean is very concrete and tangible and already familiar to you, I am going to talk a lot about the sample mean and not so much about E(X). These are notes on the Sample mean, the Variance, the Standard Deviation, and so on. In this discussion you will have to know a few basic things about summary notation: n XXXin(12 ... X ) i1 nn aXii a X ii11 nnn ()aXii bY a X i b Y i iii111 In words, summary notation is just a sum of things. No big deal. When you multiply each value by a constant, it is the same as multiplying the sum by the same constant. If the sight of summary notation scares you, don’t worry. Summary notation is just shorthand for a simple idea. 1) The Sample mean, or the average. If we have n observations, X1, X2,....Xn, the average of these is simply 1 n Avg() Xii X n i1 In other words, you take sum of your observations, and divide by the number of observations. We will generally write this more simply as 1 Avg() X X iin This is a formula you are all familiar with. The simple formula has some interesting implications. 2) How the Average changes when we add or multiply the Xi’s by constant values. In all the below equations, a and b are constants. 1 a Avg() aX aX X a ( Avg ()) X iiiinn When we take a variable and double it, the average also doubles. That should be no surprise. To be slightly more general: Avg()(()) a bXii a b Avg X (this is easy enough to show. See Homework 2) 3) Also, the Average of a sum of two variables is the sum of the Averages. More formally: 111 AvgXY() () XY ()()()() X Y AvgXAvgY iinnn ii i i i i If X is January income, and Y is February income, then the Average of January plus February income is the same as the Average for January plus the Average for February. No surprise there. 4) The sample variance, defined: 1 Var() X ( X Avg ()) X 2 iiin The Variance is basically the average squared distance between Xi and Avg(Xi). Variance can’t be negative, because every element has to be positive or zero. If all of the observations Xi are the same, then each Xi= Avg(Xi) and Variance=0. Variance has some down sides. For one thing, the units of Variance are squared units. If X is measured in dollars, then Var(X) is measured in dollars squared. That can be awkward. That’s one reason we more usually use the standard deviation rather than the variance is that the standard deviation (just the square root of the variance) puts the units back to the units of X. Sometimes the sample variance is calculated with 1/(n-1) rather than 1/n. With large enough samples, the difference is small. For simplicity’s sake, we will stick with the 1/n. 5) How does variance respond to changes in scale? 2 Var() a bXii b Var () X If you move the bell curve over (displacement by a), the variance does not change. If you 2 increase the Xi by a factor of b, the variance increases by b . 6) How about the Variance of the combination of two variables? Var()()()2(,) Xii Y Var X i Var Y i Cov X ii Y Var()()()2(,) Xii Y Var X i Var Y i Cov X ii Y If X and Y are independent, then covariance(X,Y)=0, and life becomes simple and sweet. Variance of (X+Y) is simply Var(X)+ Var(Y). Also note that Var(X-Y)= Var(X)+Var(Y), because you could think of -Y as (-1)Y. If you take the distribution and move it to the negative numbers, the variance is still the same. Of course we could just calculate the covariance (it’s not that hard). But most of the time it is simpler and cleaner to make the assumption of independence (and sometimes, it is even true!) 7) Standard Deviation StDev() Xii Var () X Standard Deviation is simply the square root of the variance. Standard deviation of X has the same units as X, whereas variance has squared units. When you want to know the standard deviation of the combination of two variables, the easiest thing to do is first calculate the variances, and then take the square root last. 8) Standard Error of the Mean Usually in social statistics we are interested not only distribution of a population (let’s say, the income of Nurses), but also in the mean and in the comparison of means (do nurses earn more than sociologists? How sure are we?) So let’s look at the variance and standard error of the mean. How sure are we about the mean earnings of nurses? 11n Var(())( Avg X Var X ) Var ( X ) ii2 i nni1 2 Because Var(bXi)=b Var(Xi). Now we take advantage of the fact that the X’s are independent, and identically distributed, so that the covariance between them is zero: 11n 11 Var( X ) Var ( X X ... X ) ( ) nVar ( X ) ( ) Var ( X ) 22 inii12 2 nni1 nn On the importance of sample size. Standard Deviation of the mean is usually called the Standard Error: Var() X Standard Error=Stdev ( Avg ( X )) i i n What is new here is the factor of square root of n in the denominator. What this means is the larger the sample size, the smaller the standard error of the mean. High standard error means we are less sure of what we are trying to measure (in this case the average of X). Small standard error implies that we are more sure. Sample size is crucial in social statistics, but if you want a standard error half as large, you need a sample size 4 times as big (because of the square root). If you increase the sample size, the population variance of nurse’s income stays the same, but the standard error of the mean of nurse’s income decreases. It is important to keep in mind the difference between standard deviation of the population and the standard error of the mean. 9a) Now let’s say we want to compare two means, Xi and Yi, say they are the incomes of lawyers and nurses. The difference between the means is easy to calculate- it’s just average income of the lawyers minus average income of the nurses. But what about the standard error of the difference? You take the standard errors of the individual means of X and Y, and you square them, to get the variance of the mean. Then you add them together to get the variance of the difference (because Var(X-Y)= Var(X)+ Var(Y) under independence), then you take the square root to get the standard error of the difference. 22 StdError(()())((()))((())) Avg Xij Avg Y StdError Avg X i StdError Avg Y j Avg() X Avg () Y the difference ijT Statistic StdError( Avg ( Xij ) Avg ( Y )) standard error of that difference which we will compare to the Normal distribution or sometimes to the T distribution (a close and slightly fatter relative of the Normal distribution). The T-statistic is unit free, because it has units of X in the numerator and denominator, which cancel. The T-statistic is also immune to changes in scale. I know all the algebra looks a little daunting, but the idea is simple, and it is the basis for a lot of hypothesis testing. See my Excel sheet for an example. Why does the average divided by its standard error take a Normal (or close to Normal) distribution? There is a famous theorem in statistics called the Central Limit Theorem which explains why. This Theorem requires a lot of advanced mathematics to prove, but the basic point is this: No matter what shape the distribution of the X’s take- it could be flat, it could have three modes, etcetera, the mean of the X’s approaches a Normal distribution as n grows large.