COMP6053 Lecture: Sampling and the Central Limit Theorem Markus Brede, [email protected] Populations: Long-Run Distributions

Total Page:16

File Type:pdf, Size:1020Kb

COMP6053 Lecture: Sampling and the Central Limit Theorem Markus Brede, Mb8@Ecs.Soton.Ac.Uk Populations: Long-Run Distributions COMP6053 lecture: Sampling and the central limit theorem Markus Brede, [email protected] Populations: long-run distributions ●Two kinds of distributions: populations and samples. ●A population is the set of all relevant measurements. Think of it as the big picture. Populations: finite or infinite? A population can have a finite number of outcomes, but an infinite extent. ●Consider the set of all possible two-dice throws [2,3,4,5,6,7,8,9,10,11,12]. ●We can ask what the distribution across totals would be if you threw a theoretical pair of dice an infinite number of times. Populations: finite or infinite? Alternatively, a population can also have an infinite number of outcomes and an infinite extent. ●Consider a simulation that produced a predicted global average temperature for 2050. ●The simulation won't give the same result every time it's run: 15.17, 14.81, 15.02, 14.46 ... ●We can ask how the prediction values would be distributed across an infinite number of runs of the simulation, each linked to a different sequence of pseudo-random numbers. Populations: finite or infinite? A population can be finite but large. ●The set of all fish in the Pacific Ocean. ●The set of all people currently living in the UK. A population can be finite and small. ●The set of Nobel prize winners born in Hungary (9). ●The set of distinct lineages of living things (only 1, that we know of). Known population distributions ●Sometimes our knowledge of probability allows us to specify exactly what the infinite long-run distribution of some process looks like. ●We can illustrate this with a probability density function. In other words, a histogram that describes the probability of an outcome rather than counting occurrences of that outcome. ●Take the two-dice case... The need for sampling ●More commonly, we don't know the precise shape of the population's distribution on some variable. But we'd like to know. ●We have no alternative but to sample the population in some way. ●This might mean empirical sampling: we go out into the middle of the Pacific and catch 100 fish in order to learn something about the distribution of fish weights. ●It might mean sampling from many repeated runs of a simulation. Samples A sample is just a group of observations drawn in some way from a wider population. Statistics has its roots in the effort to figure out just what you can reasonably infer about this wider population from the sample you've got. The size of your sample turns out to be an important limiting factor. Sampling from a known distribution ●How can we learn about the effects of sampling? ●Let's take a very simple distribution that we understand well: the results from throwing a single die (i.e., the uniform distribution across the integers from 1 to 6 inclusive). ●We know that the mean of this distribution is 3.500, the variance is 2.917, and the standard deviation is 1.708. ●Mean = ( 1 + 2 + 3 + 4 + 5 + 6 ) / 6 = 3.5. ●Variance = ( (1 - 3.5)^2 + (2 - 3.5)^2 + ... (6 - 3.5)^2 ) / 6 = 2.917. Sampling from a known distribution ●Standard deviation = sqrt(variance) = 1.708. ●We can simulate drawing some samples from this distribution to see how the size of our sample affects our attempts to draw conclusions about the population. ●What would samples of size one look like? That would just mean drawing a single variate from the population, i.e., throwing a single die, once. Some samples A small sample of 3 A larger sample of 25 observations gives a mean of observations gives a mean of 2.667. 3.240. Samples give us varying results ●In both cases we didn't reproduce the shape of the true distribution nor get exactly 3.5 as the mean, of course. ●The bigger sample gave us a more accurate estimate of the population mean which is hopefully not too surprising. ●But how much variation from the true mean should we expect if we kept drawing samples of a given size? ●This leads us to the "meta-property" of the sampling distribution of the mean: let's simulate drawing a size 3 sample 10,000 times, calculate the sample mean, and see what that distribution looks like... Sample distribution of the mean ●For the sample-size-3 case, it looks like the mean of the sample means centres in on the true mean of 3.5. ●But there's a lot of variation. With such a small sample size, we can get extreme results such as a sample mean of 1 or 6 reasonably often. ●Do things improve if we look at the distribution of the sample means of sample of size 25 for example? Sample distribution of the mean So there are a few things going on here... ●The distribution of the sample means looks like it is shaped like a bell curve, despite the fact that we've been sampling from a flat (uniform) distribution. ●The width of the bell curve is getting gradually smaller as the size of our samples go up. So bigger samples seem to give tighter, more accurate estimates. ●Even for really small sample sizes, like 3, the sample mean distribution looks like it is centred on the true mean, but for a particular sample we could be way off. Sample distribution of the mean ●Given our usual tools of means, variances, standard deviations, etc., we might ask how to characterize these sample distributions? ●It looks like the mean of the sample means will be the true mean, but what will happen to the variance / standard deviation of the sample means? ●Can we predict, for example, what the variance of the sample mean distribution would be if we took an infinite number of samples of a given size N? Distribution arithmetic revisited We talked last week about taking the distribution of die-A throws and adding it to the distribution of die-B throws to find out something about two-dice throws. When two distributions are "added together", we know some things about the resulting distribution: ●The means are additive. ●The variances are additive. ●The standard deviations are not additive. Distribution arithmetic revisited ●A question: what about dividing and multiplying distributions by constants? How does that work? Distributional arithmetic revisited Scaling a distribution (multiplying or dividing by some constant) can be thought of as just changing the labels on the axes of the histogram. ●The mean scales directly. E[cX ]=c E[ X ] ●This time it's the variance that does not scale directly. V [cX ]=E[(cX)2]−E[cX]2=c2 V [X ] ●The standard deviation (in the same units as the mean) scales directly. SD[cX ]=√V [cX ]=c SD[X ] Distributional arithmetic revisited ●When we calculate the mean of a sample, what are we really doing? ●For each observation in the sample, we're drawing a score from the true distribution. ●Then we add those scores together. So the mean and variance will be additive. ●Then we divide by the size of the sample. So the mean and standard deviation will scale by 1/N. Some results For the 1-die case: ●Mean of the sample total will be 3.5 x N. ●Variance of the sample total will be 2.917 x N. ●Standard deviation of the total will be sqrt(2.917N). Then we divide through by N... ●The mean of the sample means will be 3.5 (easy). ●The variance of the sample means will be 2.917 / N (tricky: have to calculate the SD first). ●The standard deviation of the sample means will be sqrt(2.917N) / N (easy) which comes out as 1.708 / sqrt(N). What do we have now? We know that if we repeatedly sample from a population, taking samples of a given size N: ●The mean of our sample means will converge on the true mean: great news! ●The standard deviation of our distribution of sample means will tighten up in proportion to 1 / sqrt(N). In other words, accuracy improves with bigger sample sizes, but with diminishing returns. ●Remember this 1 / sqrt(N) ratio; it's related to something called the standard error which we'll come back to. What do we have now? ●We also have a strong hint that the distribution of our sample means will itself take on a normal or bell curve shape, especially as we increase the sample size. ●This is interesting because of course the population distribution in this case was uniform: the results from throwing a single die many times do not look anything like a bell curve. An unusual distribution ●How strong is this tendency for the sample means to be themselves normally distributed? ●Let's take a deliberately weird distribution that is as far from normal as possible and simulate sampling from it... Central limit theorem ●The central limit theorem states that the mean of a sufficiently large number of independent random variables will itself be approximately normally distributed. ●Let's look at the distribution of the sample means for our strange distribution, given increasing sample sizes. ●At first glance, given its tri-modal nature, it's not obvious how we're going to get a normal (bell-shaped) distribution out of this. Central limit theorem ●We do reliably get a normal distribution when we look at the distribution of sample means, no matter how strange the original distribution that we were sampling from. ●This surprising result turns out to be very useful in allowing us to make inferences about populations from samples.
Recommended publications
  • Lecture 22: Bivariate Normal Distribution Distribution
    6.5 Conditional Distributions General Bivariate Normal Let Z1; Z2 ∼ N (0; 1), which we will use to build a general bivariate normal Lecture 22: Bivariate Normal Distribution distribution. 1 1 2 2 f (z1; z2) = exp − (z1 + z2 ) Statistics 104 2π 2 We want to transform these unit normal distributions to have the follow Colin Rundel arbitrary parameters: µX ; µY ; σX ; σY ; ρ April 11, 2012 X = σX Z1 + µX p 2 Y = σY [ρZ1 + 1 − ρ Z2] + µY Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 1 / 22 6.5 Conditional Distributions 6.5 Conditional Distributions General Bivariate Normal - Marginals General Bivariate Normal - Cov/Corr First, lets examine the marginal distributions of X and Y , Second, we can find Cov(X ; Y ) and ρ(X ; Y ) Cov(X ; Y ) = E [(X − E(X ))(Y − E(Y ))] X = σX Z1 + µX h p i = E (σ Z + µ − µ )(σ [ρZ + 1 − ρ2Z ] + µ − µ ) = σX N (0; 1) + µX X 1 X X Y 1 2 Y Y 2 h p 2 i = N (µX ; σX ) = E (σX Z1)(σY [ρZ1 + 1 − ρ Z2]) h 2 p 2 i = σX σY E ρZ1 + 1 − ρ Z1Z2 p 2 2 Y = σY [ρZ1 + 1 − ρ Z2] + µY = σX σY ρE[Z1 ] p 2 = σX σY ρ = σY [ρN (0; 1) + 1 − ρ N (0; 1)] + µY = σ [N (0; ρ2) + N (0; 1 − ρ2)] + µ Y Y Cov(X ; Y ) ρ(X ; Y ) = = ρ = σY N (0; 1) + µY σX σY 2 = N (µY ; σY ) Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 2 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 3 / 22 6.5 Conditional Distributions 6.5 Conditional Distributions General Bivariate Normal - RNG Multivariate Change of Variables Consequently, if we want to generate a Bivariate Normal random variable Let X1;:::; Xn have a continuous joint distribution with pdf f defined of S.
    [Show full text]
  • Applied Biostatistics Mean and Standard Deviation the Mean the Median Is Not the Only Measure of Central Value for a Distribution
    Health Sciences M.Sc. Programme Applied Biostatistics Mean and Standard Deviation The mean The median is not the only measure of central value for a distribution. Another is the arithmetic mean or average, usually referred to simply as the mean. This is found by taking the sum of the observations and dividing by their number. The mean is often denoted by a little bar over the symbol for the variable, e.g. x . The sample mean has much nicer mathematical properties than the median and is thus more useful for the comparison methods described later. The median is a very useful descriptive statistic, but not much used for other purposes. Median, mean and skewness The sum of the 57 FEV1s is 231.51 and hence the mean is 231.51/57 = 4.06. This is very close to the median, 4.1, so the median is within 1% of the mean. This is not so for the triglyceride data. The median triglyceride is 0.46 but the mean is 0.51, which is higher. The median is 10% away from the mean. If the distribution is symmetrical the sample mean and median will be about the same, but in a skew distribution they will not. If the distribution is skew to the right, as for serum triglyceride, the mean will be greater, if it is skew to the left the median will be greater. This is because the values in the tails affect the mean but not the median. Figure 1 shows the positions of the mean and median on the histogram of triglyceride.
    [Show full text]
  • 1. How Different Is the T Distribution from the Normal?
    Statistics 101–106 Lecture 7 (20 October 98) c David Pollard Page 1 Read M&M §7.1 and §7.2, ignoring starred parts. Reread M&M §3.2. The eects of estimated variances on normal approximations. t-distributions. Comparison of two means: pooling of estimates of variances, or paired observations. In Lecture 6, when discussing comparison of two Binomial proportions, I was content to estimate unknown variances when calculating statistics that were to be treated as approximately normally distributed. You might have worried about the effect of variability of the estimate. W. S. Gosset (“Student”) considered a similar problem in a very famous 1908 paper, where the role of Student’s t-distribution was first recognized. Gosset discovered that the effect of estimated variances could be described exactly in a simplified problem where n independent observations X1,...,Xn are taken from (, ) = ( + ...+ )/ a normal√ distribution, N . The sample mean, X X1 Xn n has a N(, / n) distribution. The random variable X Z = √ / n 2 2 Phas a standard normal distribution. If we estimate by the sample variance, s = ( )2/( ) i Xi X n 1 , then the resulting statistic, X T = √ s/ n no longer has a normal distribution. It has a t-distribution on n 1 degrees of freedom. Remark. I have written T , instead of the t used by M&M page 505. I find it causes confusion that t refers to both the name of the statistic and the name of its distribution. As you will soon see, the estimation of the variance has the effect of spreading out the distribution a little beyond what it would be if were used.
    [Show full text]
  • On the Meaning and Use of Kurtosis
    Psychological Methods Copyright 1997 by the American Psychological Association, Inc. 1997, Vol. 2, No. 3,292-307 1082-989X/97/$3.00 On the Meaning and Use of Kurtosis Lawrence T. DeCarlo Fordham University For symmetric unimodal distributions, positive kurtosis indicates heavy tails and peakedness relative to the normal distribution, whereas negative kurtosis indicates light tails and flatness. Many textbooks, however, describe or illustrate kurtosis incompletely or incorrectly. In this article, kurtosis is illustrated with well-known distributions, and aspects of its interpretation and misinterpretation are discussed. The role of kurtosis in testing univariate and multivariate normality; as a measure of departures from normality; in issues of robustness, outliers, and bimodality; in generalized tests and estimators, as well as limitations of and alternatives to the kurtosis measure [32, are discussed. It is typically noted in introductory statistics standard deviation. The normal distribution has a kur- courses that distributions can be characterized in tosis of 3, and 132 - 3 is often used so that the refer- terms of central tendency, variability, and shape. With ence normal distribution has a kurtosis of zero (132 - respect to shape, virtually every textbook defines and 3 is sometimes denoted as Y2)- A sample counterpart illustrates skewness. On the other hand, another as- to 132 can be obtained by replacing the population pect of shape, which is kurtosis, is either not discussed moments with the sample moments, which gives or, worse yet, is often described or illustrated incor- rectly. Kurtosis is also frequently not reported in re- ~(X i -- S)4/n search articles, in spite of the fact that virtually every b2 (•(X i - ~')2/n)2' statistical package provides a measure of kurtosis.
    [Show full text]
  • Characteristics and Statistics of Digital Remote Sensing Imagery (1)
    Characteristics and statistics of digital remote sensing imagery (1) Digital Images: 1 Digital Image • With raster data structure, each image is treated as an array of values of the pixels. • Image data is organized as rows and columns (or lines and pixels) start from the upper left corner of the image. • Each pixel (picture element) is treated as a separate unite. Statistics of Digital Images Help: • Look at the frequency of occurrence of individual brightness values in the image displayed • View individual pixel brightness values at specific locations or within a geographic area; • Compute univariate descriptive statistics to determine if there are unusual anomalies in the image data; and • Compute multivariate statistics to determine the amount of between-band correlation (e.g., to identify redundancy). 2 Statistics of Digital Images It is necessary to calculate fundamental univariate and multivariate statistics of the multispectral remote sensor data. This involves identification and calculation of – maximum and minimum value –the range, mean, standard deviation – between-band variance-covariance matrix – correlation matrix, and – frequencies of brightness values The results of the above can be used to produce histograms. Such statistics provide information necessary for processing and analyzing remote sensing data. A “population” is an infinite or finite set of elements. A “sample” is a subset of the elements taken from a population used to make inferences about certain characteristics of the population. (e.g., training signatures) 3 Large samples drawn randomly from natural populations usually produce a symmetrical frequency distribution. Most values are clustered around the central value, and the frequency of occurrence declines away from this central point.
    [Show full text]
  • The Probability Lifesaver: Order Statistics and the Median Theorem
    The Probability Lifesaver: Order Statistics and the Median Theorem Steven J. Miller December 30, 2015 Contents 1 Order Statistics and the Median Theorem 3 1.1 Definition of the Median 5 1.2 Order Statistics 10 1.3 Examples of Order Statistics 15 1.4 TheSampleDistributionoftheMedian 17 1.5 TechnicalboundsforproofofMedianTheorem 20 1.6 TheMedianofNormalRandomVariables 22 2 • Greetings again! In this supplemental chapter we develop the theory of order statistics in order to prove The Median Theorem. This is a beautiful result in its own, but also extremely important as a substitute for the Central Limit Theorem, and allows us to say non- trivial things when the CLT is unavailable. Chapter 1 Order Statistics and the Median Theorem The Central Limit Theorem is one of the gems of probability. It’s easy to use and its hypotheses are satisfied in a wealth of problems. Many courses build towards a proof of this beautiful and powerful result, as it truly is ‘central’ to the entire subject. Not to detract from the majesty of this wonderful result, however, what happens in those instances where it’s unavailable? For example, one of the key assumptions that must be met is that our random variables need to have finite higher moments, or at the very least a finite variance. What if we were to consider sums of Cauchy random variables? Is there anything we can say? This is not just a question of theoretical interest, of mathematicians generalizing for the sake of generalization. The following example from economics highlights why this chapter is more than just of theoretical interest.
    [Show full text]
  • Central Limit Theorem and Its Applications to Baseball
    Central Limit Theorem and Its Applications to Baseball by Nicole Anderson A project submitted to the Department of Mathematical Sciences in conformity with the requirements for Math 4301 (Honours Seminar) Lakehead University Thunder Bay, Ontario, Canada copyright c (2014) Nicole Anderson Abstract This honours project is on the Central Limit Theorem (CLT). The CLT is considered to be one of the most powerful theorems in all of statistics and probability. In probability theory, the CLT states that, given certain conditions, the sample mean of a sufficiently large number or iterates of independent random variables, each with a well-defined ex- pected value and well-defined variance, will be approximately normally distributed. In this project, a brief historical review of the CLT is provided, some basic concepts, two proofs of the CLT and several properties are discussed. As an application, we discuss how to use the CLT to study the sampling distribution of the sample mean and hypothesis testing using baseball statistics. i Acknowledgements I would like to thank my supervisor, Dr. Li, who helped me by sharing his knowledge and many resources to help make this paper come to life. I would also like to thank Dr. Adam Van Tuyl for all of his help with Latex, and support throughout this project. Thank you very much! ii Contents Abstract i Acknowledgements ii Chapter 1. Introduction 1 1. Historical Review of Central Limit Theorem 1 2. Central Limit Theorem in Practice 1 Chapter 2. Preliminaries 3 1. Definitions 3 2. Central Limit Theorem 7 Chapter 3. Proofs of Central Limit Theorem 8 1.
    [Show full text]
  • Linear Regression
    eesc BC 3017 statistics notes 1 LINEAR REGRESSION Systematic var iation in the true value Up to now, wehav e been thinking about measurement as sampling of values from an ensemble of all possible outcomes in order to estimate the true value (which would, according to our previous discussion, be well approximated by the mean of a very large sample). Givenasample of outcomes, we have sometimes checked the hypothesis that it is a random sample from some ensemble of outcomes, by plotting the data points against some other variable, such as ordinal position. Under the hypothesis of random selection, no clear trend should appear.Howev er, the contrary case, where one finds a clear trend, is very important. Aclear trend can be a discovery,rather than a nuisance! Whether it is adiscovery or a nuisance (or both) depends on what one finds out about the reasons underlying the trend. In either case one must be prepared to deal with trends in analyzing data. Figure 2.1 (a) shows a plot of (hypothetical) data in which there is a very clear trend. The yaxis scales concentration of coliform bacteria sampled from rivers in various regions (units are colonies per liter). The x axis is a hypothetical indexofregional urbanization, ranging from 1 to 10. The hypothetical data consist of 6 different measurements at each levelofurbanization. The mean of each set of 6 measurements givesarough estimate of the true value for coliform bacteria concentration for rivers in a region with that urbanization level. The jagged dark line drawn on the graph connects these estimates of true value and makes the trend quite clear: more extensive urbanization is associated with higher true values of bacteria concentration.
    [Show full text]
  • Lecture 4 Multivariate Normal Distribution and Multivariate CLT
    Lecture 4 Multivariate normal distribution and multivariate CLT. T We start with several simple observations. If X = (x1; : : : ; xk) is a k 1 random vector then its expectation is × T EX = (Ex1; : : : ; Exk) and its covariance matrix is Cov(X) = E(X EX)(X EX)T : − − Notice that a covariance matrix is always symmetric Cov(X)T = Cov(X) and nonnegative definite, i.e. for any k 1 vector a, × a T Cov(X)a = Ea T (X EX)(X EX)T a T = E a T (X EX) 2 0: − − j − j � We will often use that for any vector X its squared length can be written as X 2 = XT X: If we multiply a random k 1 vector X by a n k matrix A then the covariancej j of Y = AX is a n n matrix × × × Cov(Y ) = EA(X EX)(X EX)T AT = ACov(X)AT : − − T Multivariate normal distribution. Let us consider a k 1 vector g = (g1; : : : ; gk) of i.i.d. standard normal random variables. The covariance of g is,× obviously, a k k identity × matrix, Cov(g) = I: Given a n k matrix A, the covariance of Ag is a n n matrix × × � := Cov(Ag) = AIAT = AAT : Definition. The distribution of a vector Ag is called a (multivariate) normal distribution with covariance � and is denoted N(0; �): One can also shift this disrtibution, the distribution of Ag + a is called a normal distri­ bution with mean a and covariance � and is denoted N(a; �): There is one potential problem 23 with the above definition - we assume that the distribution depends only on covariance ma­ trix � and does not depend on the construction, i.e.
    [Show full text]
  • Volatility Modeling Using the Student's T Distribution
    Volatility Modeling Using the Student’s t Distribution Maria S. Heracleous Dissertation submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Economics Aris Spanos, Chair Richard Ashley Raman Kumar Anya McGuirk Dennis Yang August 29, 2003 Blacksburg, Virginia Keywords: Student’s t Distribution, Multivariate GARCH, VAR, Exchange Rates Copyright 2003, Maria S. Heracleous Volatility Modeling Using the Student’s t Distribution Maria S. Heracleous (ABSTRACT) Over the last twenty years or so the Dynamic Volatility literature has produced a wealth of uni- variateandmultivariateGARCHtypemodels.Whiletheunivariatemodelshavebeenrelatively successful in empirical studies, they suffer from a number of weaknesses, such as unverifiable param- eter restrictions, existence of moment conditions and the retention of Normality. These problems are naturally more acute in the multivariate GARCH type models, which in addition have the problem of overparameterization. This dissertation uses the Student’s t distribution and follows the Probabilistic Reduction (PR) methodology to modify and extend the univariate and multivariate volatility models viewed as alternative to the GARCH models. Its most important advantage is that it gives rise to internally consistent statistical models that do not require ad hoc parameter restrictions unlike the GARCH formulations. Chapters 1 and 2 provide an overview of my dissertation and recent developments in the volatil- ity literature. In Chapter 3 we provide an empirical illustration of the PR approach for modeling univariate volatility. Estimation results suggest that the Student’s t AR model is a parsimonious and statistically adequate representation of exchange rate returns and Dow Jones returns data.
    [Show full text]
  • The Central Limit Theorem (Review)
    Introduction to Confidence Intervals { Solutions STAT-UB.0103 { Statistics for Business Control and Regression Models The Central Limit Theorem (Review) 1. You draw a random sample of size n = 64 from a population with mean µ = 50 and standard deviation σ = 16. From this, you compute the sample mean, X¯. (a) What are the expectation and standard deviation of X¯? Solution: E[X¯] = µ = 50; σ 16 sd[X¯] = p = p = 2: n 64 (b) Approximately what is the probability that the sample mean is above 54? Solution: The sample mean has expectation 50 and standard deviation 2. By the central limit theorem, the sample mean is approximately normally distributed. Thus, by the empirical rule, there is roughly a 2.5% chance of being above 54 (2 standard deviations above the mean). (c) Do you need any additional assumptions for part (c) to be true? Solution: No. Since the sample size is large (n ≥ 30), the central limit theorem applies. 2. You draw a random sample of size n = 16 from a population with mean µ = 100 and standard deviation σ = 20. From this, you compute the sample mean, X¯. (a) What are the expectation and standard deviation of X¯? Solution: E[X¯] = µ = 100; σ 20 sd[X¯] = p = p = 5: n 16 (b) Approximately what is the probability that the sample mean is between 95 and 105? Solution: The sample mean has expectation 100 and standard deviation 5. If it is approximately normal, then we can use the empirical rule to say that there is a 68% of being between 95 and 105 (within one standard deviation of its expecation).
    [Show full text]
  • Calculating Variance and Standard Deviation
    VARIANCE AND STANDARD DEVIATION Recall that the range is the difference between the upper and lower limits of the data. While this is important, it does have one major disadvantage. It does not describe the variation among the variables. For instance, both of these sets of data have the same range, yet their values are definitely different. 90, 90, 90, 98, 90 Range = 8 1, 6, 8, 1, 9, 5 Range = 8 To better describe the variation, we will introduce two other measures of variation—variance and standard deviation (the variance is the square of the standard deviation). These measures tell us how much the actual values differ from the mean. The larger the standard deviation, the more spread out the values. The smaller the standard deviation, the less spread out the values. This measure is particularly helpful to teachers as they try to find whether their students’ scores on a certain test are closely related to the class average. To find the standard deviation of a set of values: a. Find the mean of the data b. Find the difference (deviation) between each of the scores and the mean c. Square each deviation d. Sum the squares e. Dividing by one less than the number of values, find the “mean” of this sum (the variance*) f. Find the square root of the variance (the standard deviation) *Note: In some books, the variance is found by dividing by n. In statistics it is more useful to divide by n -1. EXAMPLE Find the variance and standard deviation of the following scores on an exam: 92, 95, 85, 80, 75, 50 SOLUTION First we find the mean of the data: 92+95+85+80+75+50 477 Mean = = = 79.5 6 6 Then we find the difference between each score and the mean (deviation).
    [Show full text]