Standard Deviation, Standard Error. Which

Total Page:16

File Type:pdf, Size:1020Kb

Standard Deviation, Standard Error. Which should never be summarized with the stan¬ Standard Deviation, dard error of the mean.3*"25™ A closer look at the source and mean¬ ing of SD and SE may clarify why Standard Error medical investigators, journal review¬ ers, and editors should scrutinize their usage with considerable care. Which 'Standard' Should We Use? DISPERSION An essential function of "descriptive George W. Brown, MD statistics" is the presentation of con¬ densed, shorthand symbols that epito¬ mize the important features of a collec¬ \s=b\Standard deviation (SD) and standard shorthand expression" in 1968; Fein- tion of data. The idea of a central value error (SE) are quietly but extensively used stein2 later again warned about the is intuitively satisfactory to anyone who biomedical These terms in publications. fatuity and confusion contained in any needs to summarize a group of measure¬ and notations are used as sta- descriptive a ± b statements where b is not defined. ments, or counts. The traditional indica¬ tistics (summarizing numerical data), and Warnings notwithstanding, a glance tors of a central are the mode they are used as inferential statistics (esti- tendency almost medical will most the median mating population parameters from sam- through any journal (the frequent value), of this value between the lowest ples). I review the use and misuse of SD show examples usage. (the midway and SE in several authoritative medical Medical journals seldom state why and the highest value), and the mean journals and make suggestions to help SD or SE is selected to summarize data (the average). Each has its special uses, clarify the usage and meaning of SD and in a given report. A search of the three but the mean has great convenience and SE in biomedical reports. major pediatrie journals for 1981 (Amer¬ flexibility for many purposes. (Am J Dis Child 1982;136:937-941) ican Journal of Diseases of Children, The dispersion of a collection of values Journal of Pediatrics, and Pediatrics) can be shown in several ways; some are deviation (SD) and stan¬ failed to turn up a single article in which simple and concise, and others are com¬ Standarddard error (SE) have surface simi¬ the selection of SD or SE was explained. plex and esoteric. The range is a simple, larities; yet, they are conceptually so There seems to be no uniformity in the direct way to indicate the spread of a different that we must wonder why they use of SD or SE in these journals or in collection of values, but it does not tell are used almost interchangeably in the The Journal of the American Medical how the values are distributed. Knowl¬ medical literature. Both are usually Association (JAMA), the New England edge of the mean adds considerably to preceded by a plus-minus symbol (±), Journal of Medicine, or Science. The the information carried by the range. suggesting that they define a sym¬ use of SD and SE in the journals will be Another index of dispersion is pro¬ metric interval or range of some sort. discussed further. vided by the differences (deviations) of They both appear almost always with a If these respected, well-edited jour¬ each value from the mean of the values. mean (average) of a set of measure¬ nals do not demand consistent use of The trouble with this approach is that ments or counts of something. The med¬ either SD or SE, are there really any some deviations will be positive, and ical literature is replete with statements important differences between them? some will be negative, and their sum like, "The serum cholesterol measure¬ Yes, they are remarkably different, will be zero. We could ignore the sign of ments were distributed with a mean of despite their superficial similarities. each deviation, ie, use the "absolute 180±30 mg/dL (SD)." They are so different in fact that some mean deviation," but mathematicians In the same journal, perhaps in the authorities have recommended that SE tell us that working with absolute num¬ same article, a different statement may should rarely or never be used to sum¬ bers is extremely difficult and fraught appear: "The weight gains of the sub¬ marize medical research data. Fein- with technical disadvantages. jects averaged 720 (mean) ±32 g/mo stein2 noted the following: A neglected method for summarizing the of data is the calculation (SE)." Sometimes, as discussed further, A standard error has nothing to do with dispersion the summary data are presented as the standards, with errors, or with the commu¬ of percentiles (or deciles, or quartiles). "mean of 120 mg/dL ±12" without the nication of scientific data. The concept is an Percentiles are used more frequently in "12" being defined as SD or SE, or as abstract idea, spawned by the imaginary pediatrics than in other branches of some other index of dispersion. Eisen¬ world of statistical inference and pertinent medicine, usually in growth charts or in hart1 warned against this "peril of only when certain operations of that imagi¬ other data arrays that are clearly not world are met in scientific nary reality.2(p336) symmetric or bell shaped. In the gen¬ Glantz3 also has made the following rec¬ eral medical literature, percentiles are From the Los Lunas Hospital and Training ommendation: because of a School, New Mexico, and the Department of Pedi- sparsely used, apparently atrics, University of New Mexico School of Medi- Most medical investigators summarize their common, but erroneous, assumption ± cine, Albuquerque. data with the standard error because it is that the mean SD or SE is satisfactory Reprint requests to Los Lunas Hospital and central and Training School, Box 1269, Los Lunas, NM 87031 always smaller than the standard deviation. for summarizing tendency (Dr Brown). It makes their data look better data dispersion of all sorts of data. STANDARD DEVIATION The generally accepted answer to the a for ( -µ)' - )7 need for concise expression the SD = dispersion of data is to square the differ¬ "V< ence of each value from the group mean, giving all positive values. When these SD of Population Estimate of Population SD From Sample squared deviations are added up and then divided by the number of values in µ = Mean of Population X = Mean of Sample the group, the result is the variance. = Number in Population = Number in Sample The variance is always a positive num¬ ber, but it is in different units than the mean. The way around this inconve¬ Fig 1.—Standard deviation (SD) of population is shown at left. Estimate of population SD derived from is shown at nience is to use the square root of the sample right. variance, which is the population stan¬ dard deviation ( ), which for conve¬ nience will be called SD. Thus, the SD is the square root of the averaged squared QT) - ) pq mean. The is = / ( deviations from the SD SD =_ = SEM SE sometimes called by the shorthand * s/a term, "root-mean-square." The SD, calculated in this way, is in the same as the values and units original SEM SE of Proportion the mean. The SD has additional prop¬ erties that make it attractive for sum¬ SD = Estimate of Population SD = Proportion Estimated From Sample if the = (1 marizing dispersion, especially = Sample Size q -P) data are distributed symmetrically = Sample Size in the revered bell-shaped, gaussian curve. Although there are an infinite Fig 2.—Standard error of mean (SEM) is shown at left. Note that SD is estimate of population SD number of gaussian curves, the one for (not , actual SD of population). Sample size used to calculate SEM is n. Standard error of the data at hand is described completely proportion is shown at right. by the mean and SD. For example, the mean+ 1.96 SD will enclose 95% of the values; the mean ±2.58 SD will enclose that no matter how many times the die determine the deviations is concep¬ 99% of the values. It is this symmetry is thrown, it will never show its aver¬ tualized as an estimate of the mean, x, and elegance that contribute to our age score of 3.5.) rather than as a true and exact popula¬ admiration of the gaussian curve. The SD wears two hats. So far, we tion mean (µ). Both means are calcu¬ The bad news, especially for biologic have looked at its role as a descriptive lated in the same way, but a population data, is that many collections of mea¬ statistic for measurements or counts mean, µ, stands for itself and is a pa¬ surements or counts are not sym¬ that are representative only of them¬ rameter; a sample mean, x, is an esti¬ metric or bell shaped. Biologic data selves, ie, the data being summarized mate of the mean of a larger population tend to be skewed or double humped, J are not a sample representing a larger and is a statistic. shaped, U shaped, or flat on top. Re¬ (and itself unmeasurable) universe or The second change in calculation is in gardless of the shape of the distribu¬ population. the arithmetic: the sum of the squared tion, it is still possible by rote arithme¬ The second hat involves the use of SD deviations from the (estimated) mean is tic to calculate an SD although it may from a random sample as an estimate of divided by -1, rather than by N. (This be inappropriate and misleading. the population standard deviation ( ). makes sense intuitively when we recall For example, one can imagine The formal statistical language says that a sample would not show as great a throwing a six-sided die several hun¬ that the sample statistic, SD, is an spread of values as the source popula¬ dred times and recording the score at unbiased estimate of a population pa¬ tion.
Recommended publications
  • Lecture 22: Bivariate Normal Distribution Distribution
    6.5 Conditional Distributions General Bivariate Normal Let Z1; Z2 ∼ N (0; 1), which we will use to build a general bivariate normal Lecture 22: Bivariate Normal Distribution distribution. 1 1 2 2 f (z1; z2) = exp − (z1 + z2 ) Statistics 104 2π 2 We want to transform these unit normal distributions to have the follow Colin Rundel arbitrary parameters: µX ; µY ; σX ; σY ; ρ April 11, 2012 X = σX Z1 + µX p 2 Y = σY [ρZ1 + 1 − ρ Z2] + µY Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 1 / 22 6.5 Conditional Distributions 6.5 Conditional Distributions General Bivariate Normal - Marginals General Bivariate Normal - Cov/Corr First, lets examine the marginal distributions of X and Y , Second, we can find Cov(X ; Y ) and ρ(X ; Y ) Cov(X ; Y ) = E [(X − E(X ))(Y − E(Y ))] X = σX Z1 + µX h p i = E (σ Z + µ − µ )(σ [ρZ + 1 − ρ2Z ] + µ − µ ) = σX N (0; 1) + µX X 1 X X Y 1 2 Y Y 2 h p 2 i = N (µX ; σX ) = E (σX Z1)(σY [ρZ1 + 1 − ρ Z2]) h 2 p 2 i = σX σY E ρZ1 + 1 − ρ Z1Z2 p 2 2 Y = σY [ρZ1 + 1 − ρ Z2] + µY = σX σY ρE[Z1 ] p 2 = σX σY ρ = σY [ρN (0; 1) + 1 − ρ N (0; 1)] + µY = σ [N (0; ρ2) + N (0; 1 − ρ2)] + µ Y Y Cov(X ; Y ) ρ(X ; Y ) = = ρ = σY N (0; 1) + µY σX σY 2 = N (µY ; σY ) Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 2 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 3 / 22 6.5 Conditional Distributions 6.5 Conditional Distributions General Bivariate Normal - RNG Multivariate Change of Variables Consequently, if we want to generate a Bivariate Normal random variable Let X1;:::; Xn have a continuous joint distribution with pdf f defined of S.
    [Show full text]
  • Applied Biostatistics Mean and Standard Deviation the Mean the Median Is Not the Only Measure of Central Value for a Distribution
    Health Sciences M.Sc. Programme Applied Biostatistics Mean and Standard Deviation The mean The median is not the only measure of central value for a distribution. Another is the arithmetic mean or average, usually referred to simply as the mean. This is found by taking the sum of the observations and dividing by their number. The mean is often denoted by a little bar over the symbol for the variable, e.g. x . The sample mean has much nicer mathematical properties than the median and is thus more useful for the comparison methods described later. The median is a very useful descriptive statistic, but not much used for other purposes. Median, mean and skewness The sum of the 57 FEV1s is 231.51 and hence the mean is 231.51/57 = 4.06. This is very close to the median, 4.1, so the median is within 1% of the mean. This is not so for the triglyceride data. The median triglyceride is 0.46 but the mean is 0.51, which is higher. The median is 10% away from the mean. If the distribution is symmetrical the sample mean and median will be about the same, but in a skew distribution they will not. If the distribution is skew to the right, as for serum triglyceride, the mean will be greater, if it is skew to the left the median will be greater. This is because the values in the tails affect the mean but not the median. Figure 1 shows the positions of the mean and median on the histogram of triglyceride.
    [Show full text]
  • 1. How Different Is the T Distribution from the Normal?
    Statistics 101–106 Lecture 7 (20 October 98) c David Pollard Page 1 Read M&M §7.1 and §7.2, ignoring starred parts. Reread M&M §3.2. The eects of estimated variances on normal approximations. t-distributions. Comparison of two means: pooling of estimates of variances, or paired observations. In Lecture 6, when discussing comparison of two Binomial proportions, I was content to estimate unknown variances when calculating statistics that were to be treated as approximately normally distributed. You might have worried about the effect of variability of the estimate. W. S. Gosset (“Student”) considered a similar problem in a very famous 1908 paper, where the role of Student’s t-distribution was first recognized. Gosset discovered that the effect of estimated variances could be described exactly in a simplified problem where n independent observations X1,...,Xn are taken from (, ) = ( + ...+ )/ a normal√ distribution, N . The sample mean, X X1 Xn n has a N(, / n) distribution. The random variable X Z = √ / n 2 2 Phas a standard normal distribution. If we estimate by the sample variance, s = ( )2/( ) i Xi X n 1 , then the resulting statistic, X T = √ s/ n no longer has a normal distribution. It has a t-distribution on n 1 degrees of freedom. Remark. I have written T , instead of the t used by M&M page 505. I find it causes confusion that t refers to both the name of the statistic and the name of its distribution. As you will soon see, the estimation of the variance has the effect of spreading out the distribution a little beyond what it would be if were used.
    [Show full text]
  • Characteristics and Statistics of Digital Remote Sensing Imagery (1)
    Characteristics and statistics of digital remote sensing imagery (1) Digital Images: 1 Digital Image • With raster data structure, each image is treated as an array of values of the pixels. • Image data is organized as rows and columns (or lines and pixels) start from the upper left corner of the image. • Each pixel (picture element) is treated as a separate unite. Statistics of Digital Images Help: • Look at the frequency of occurrence of individual brightness values in the image displayed • View individual pixel brightness values at specific locations or within a geographic area; • Compute univariate descriptive statistics to determine if there are unusual anomalies in the image data; and • Compute multivariate statistics to determine the amount of between-band correlation (e.g., to identify redundancy). 2 Statistics of Digital Images It is necessary to calculate fundamental univariate and multivariate statistics of the multispectral remote sensor data. This involves identification and calculation of – maximum and minimum value –the range, mean, standard deviation – between-band variance-covariance matrix – correlation matrix, and – frequencies of brightness values The results of the above can be used to produce histograms. Such statistics provide information necessary for processing and analyzing remote sensing data. A “population” is an infinite or finite set of elements. A “sample” is a subset of the elements taken from a population used to make inferences about certain characteristics of the population. (e.g., training signatures) 3 Large samples drawn randomly from natural populations usually produce a symmetrical frequency distribution. Most values are clustered around the central value, and the frequency of occurrence declines away from this central point.
    [Show full text]
  • Linear Regression
    eesc BC 3017 statistics notes 1 LINEAR REGRESSION Systematic var iation in the true value Up to now, wehav e been thinking about measurement as sampling of values from an ensemble of all possible outcomes in order to estimate the true value (which would, according to our previous discussion, be well approximated by the mean of a very large sample). Givenasample of outcomes, we have sometimes checked the hypothesis that it is a random sample from some ensemble of outcomes, by plotting the data points against some other variable, such as ordinal position. Under the hypothesis of random selection, no clear trend should appear.Howev er, the contrary case, where one finds a clear trend, is very important. Aclear trend can be a discovery,rather than a nuisance! Whether it is adiscovery or a nuisance (or both) depends on what one finds out about the reasons underlying the trend. In either case one must be prepared to deal with trends in analyzing data. Figure 2.1 (a) shows a plot of (hypothetical) data in which there is a very clear trend. The yaxis scales concentration of coliform bacteria sampled from rivers in various regions (units are colonies per liter). The x axis is a hypothetical indexofregional urbanization, ranging from 1 to 10. The hypothetical data consist of 6 different measurements at each levelofurbanization. The mean of each set of 6 measurements givesarough estimate of the true value for coliform bacteria concentration for rivers in a region with that urbanization level. The jagged dark line drawn on the graph connects these estimates of true value and makes the trend quite clear: more extensive urbanization is associated with higher true values of bacteria concentration.
    [Show full text]
  • Random Variables and Applications
    Random Variables and Applications OPRE 6301 Random Variables. As noted earlier, variability is omnipresent in the busi- ness world. To model variability probabilistically, we need the concept of a random variable. A random variable is a numerically valued variable which takes on different values with given probabilities. Examples: The return on an investment in a one-year period The price of an equity The number of customers entering a store The sales volume of a store on a particular day The turnover rate at your organization next year 1 Types of Random Variables. Discrete Random Variable: — one that takes on a countable number of possible values, e.g., total of roll of two dice: 2, 3, ..., 12 • number of desktops sold: 0, 1, ... • customer count: 0, 1, ... • Continuous Random Variable: — one that takes on an uncountable number of possible values, e.g., interest rate: 3.25%, 6.125%, ... • task completion time: a nonnegative value • price of a stock: a nonnegative value • Basic Concept: Integer or rational numbers are discrete, while real numbers are continuous. 2 Probability Distributions. “Randomness” of a random variable is described by a probability distribution. Informally, the probability distribution specifies the probability or likelihood for a random variable to assume a particular value. Formally, let X be a random variable and let x be a possible value of X. Then, we have two cases. Discrete: the probability mass function of X specifies P (x) P (X = x) for all possible values of x. ≡ Continuous: the probability density function of X is a function f(x) that is such that f(x) h P (x < · ≈ X x + h) for small positive h.
    [Show full text]
  • The Central Limit Theorem (Review)
    Introduction to Confidence Intervals { Solutions STAT-UB.0103 { Statistics for Business Control and Regression Models The Central Limit Theorem (Review) 1. You draw a random sample of size n = 64 from a population with mean µ = 50 and standard deviation σ = 16. From this, you compute the sample mean, X¯. (a) What are the expectation and standard deviation of X¯? Solution: E[X¯] = µ = 50; σ 16 sd[X¯] = p = p = 2: n 64 (b) Approximately what is the probability that the sample mean is above 54? Solution: The sample mean has expectation 50 and standard deviation 2. By the central limit theorem, the sample mean is approximately normally distributed. Thus, by the empirical rule, there is roughly a 2.5% chance of being above 54 (2 standard deviations above the mean). (c) Do you need any additional assumptions for part (c) to be true? Solution: No. Since the sample size is large (n ≥ 30), the central limit theorem applies. 2. You draw a random sample of size n = 16 from a population with mean µ = 100 and standard deviation σ = 20. From this, you compute the sample mean, X¯. (a) What are the expectation and standard deviation of X¯? Solution: E[X¯] = µ = 100; σ 20 sd[X¯] = p = p = 5: n 16 (b) Approximately what is the probability that the sample mean is between 95 and 105? Solution: The sample mean has expectation 100 and standard deviation 5. If it is approximately normal, then we can use the empirical rule to say that there is a 68% of being between 95 and 105 (within one standard deviation of its expecation).
    [Show full text]
  • Calculating Variance and Standard Deviation
    VARIANCE AND STANDARD DEVIATION Recall that the range is the difference between the upper and lower limits of the data. While this is important, it does have one major disadvantage. It does not describe the variation among the variables. For instance, both of these sets of data have the same range, yet their values are definitely different. 90, 90, 90, 98, 90 Range = 8 1, 6, 8, 1, 9, 5 Range = 8 To better describe the variation, we will introduce two other measures of variation—variance and standard deviation (the variance is the square of the standard deviation). These measures tell us how much the actual values differ from the mean. The larger the standard deviation, the more spread out the values. The smaller the standard deviation, the less spread out the values. This measure is particularly helpful to teachers as they try to find whether their students’ scores on a certain test are closely related to the class average. To find the standard deviation of a set of values: a. Find the mean of the data b. Find the difference (deviation) between each of the scores and the mean c. Square each deviation d. Sum the squares e. Dividing by one less than the number of values, find the “mean” of this sum (the variance*) f. Find the square root of the variance (the standard deviation) *Note: In some books, the variance is found by dividing by n. In statistics it is more useful to divide by n -1. EXAMPLE Find the variance and standard deviation of the following scores on an exam: 92, 95, 85, 80, 75, 50 SOLUTION First we find the mean of the data: 92+95+85+80+75+50 477 Mean = = = 79.5 6 6 Then we find the difference between each score and the mean (deviation).
    [Show full text]
  • 4. Descriptive Statistics
    4. Descriptive statistics Any time that you get a new data set to look at one of the first tasks that you have to do is find ways of summarising the data in a compact, easily understood fashion. This is what descriptive statistics (as opposed to inferential statistics) is all about. In fact, to many people the term “statistics” is synonymous with descriptive statistics. It is this topic that we’ll consider in this chapter, but before going into any details, let’s take a moment to get a sense of why we need descriptive statistics. To do this, let’s open the aflsmall_margins file and see what variables are stored in the file. In fact, there is just one variable here, afl.margins. We’ll focus a bit on this variable in this chapter, so I’d better tell you what it is. Unlike most of the data sets in this book, this is actually real data, relating to the Australian Football League (AFL).1 The afl.margins variable contains the winning margin (number of points) for all 176 home and away games played during the 2010 season. This output doesn’t make it easy to get a sense of what the data are actually saying. Just “looking at the data” isn’t a terribly effective way of understanding data. In order to get some idea about what the data are actually saying we need to calculate some descriptive statistics (this chapter) and draw some nice pictures (Chapter 5). Since the descriptive statistics are the easier of the two topics I’ll start with those, but nevertheless I’ll show you a histogram of the afl.margins data since it should help you get a sense of what the data we’re trying to describe actually look like, see Figure 4.2.
    [Show full text]
  • Annex : Calculation of Mean and Standard Deviation
    Annex : Calculation of Mean and Standard Deviation • A cholesterol control is run 20 times over 25 days yielding the following results in mg/dL: 192, 188, 190, 190, 189, 191, 188, 193, 188, 190, 191, 194, 194, 188, 192, 190, 189, 189, 191, 192. • Using the cholesterol control results, follow the steps described below to establish QC ranges . An example is shown on the next page. 1. Make a table with 3 columns, labeled A, B, C. 2. Insert the data points on the left (column A). 3. Add Data in column A. 4. Calculate the mean: Add the measurements (sum) and divide by the number of measurements (n). Mean= ∑ x +x +x +…. x 3809 = 190.5 mg/ dL 1 2 3 n N 20 5. Calculate the variance and standard deviation: (see formulas below) a. Subtract each data point from the mean and write in column B. b. Square each value in column B and write in column C. c. Add column C. Result is 71 mg/dL. d. Now calculate the variance: Divide the sum in column C by n-1 which is 19. Result is 4 mg/dL. e. The variance has little value in the laboratory because the units are squared. f. Now calculate the SD by taking the square root of the variance. g. The result is 2 mg/dL. Quantitative QC ● Module 7 ● Annex 1 A B C Data points. 2 xi −x (x −x) X1-Xn i 192 mg/dL 1.5 2.25 mg 2/dL 2 188 mg/dL -2.5 6.25 mg 2/dL2 190 mg/dL -0.5 0.25 mg 2/dL2 190 mg/dL -0.5 0.25 mg 2/dL2 189 mg/dL -1.5 2.25 mg 2/dL2 191 mg/dL 0.5 0.25 mg 2/dL2 188 mg/dL -2.5 6.25 mg 2/dL2 193 mg/dL 2.5 6.25 mg 2/dL2 188 mg/dL -2.5 6.25 mg 2/dL2 190 mg/dL -0.5 0.25 mg 2/dL2 191 mg/dL 0.5 0.25 mg 2/dL2 194 mg/dL 3.5 12.25 mg 2/dL2 194 mg/dL 3.5 12.25 mg 2/dL2 188 mg/dL -2.5 6.25 mg 2/dL2 192 mg/dL 1.5 2.25 mg 2/dL2 190 mg/dL -0.5 0.25 mg 2/dL2 189 mg/dL -1.5 2.25 mg 2/dL2 189 mg/dL -1.5 2.25 mg 2/dL2 191 mg/dL 0.5 0.25 mg 2/dL2 192 mg/dL 1.5 2.25 mg 2/dL2 2 2 2 ∑x=3809 ∑= -1 ∑ ( x i − x ) Sum of Col C is 71 mg /dL 2 − 2 ∑ (X i X) 2 SD = S = n−1 mg/dL SD = S = 71 19/ = 2mg / dL The square root returns the result to the original units .
    [Show full text]
  • Measures of Dispersion
    CHAPTER Measures of Dispersion Studying this chapter should Three friends, Ram, Rahim and enable you to: Maria are chatting over a cup of tea. • know the limitations of averages; • appreciate the need for measures During the course of their conversation, of dispersion; they start talking about their family • enumerate various measures of incomes. Ram tells them that there are dispersion; four members in his family and the • calculate the measures and average income per member is Rs compare them; • distinguish between absolute 15,000. Rahim says that the average and relative measures. income is the same in his family, though the number of members is six. Maria 1. INTRODUCTION says that there are five members in her family, out of which one is not working. In the previous chapter, you have She calculates that the average income studied how to sum up the data into a in her family too, is Rs 15,000. They single representative value. However, are a little surprised since they know that value does not reveal the variability that Maria’s father is earning a huge present in the data. In this chapter you will study those measures, which seek salary. They go into details and gather to quantify variability of the data. the following data: 2021-22 MEASURES OF DISPERSION 75 Family Incomes in values, your understanding of a Sl. No. Ram Rahim Maria distribution improves considerably. 1. 12,000 7,000 0 For example, per capita income gives 2. 14,000 10,000 7,000 only the average income. A measure of 3.
    [Show full text]
  • Descriptive Statistics
    Statistics: Descriptive Statistics When we are given a large data set, it is necessary to describe the data in some way. The raw data is just too large and meaningless on its own to make sense of. We will sue the following exam scores data throughout: x1, x2, x3, x4, x5, x6, x7, x8, x9, x10 45, 67, 87, 21, 43, 98, 28, 23, 28, 75 Summary Statistics We can use the raw data to calculate summary statistics so that we have some idea what the data looks like and how sprad out it is. Max, Min and Range The maximum value of the dataset and the minimum value of the dataset are very simple measures. The range of the data is difference between the maximum and minimum value. Range = Max Value − Min Value = 98 − 21 = 77 Mean, Median and Mode The mean, median and mode are measures of central tendency of the data (i.e. where is the center of the data). Mean (µ) The mean is sum of all values divided by how many values there are N 1 X 45 + 67 + 87 + 21 + 43 + 98 + 28 + 23 + 28 + 75 xi = = 51.5 N i=1 10 Median The median is the middle data point when the dataset is arranged in order from smallest to largest. If there are two middle values then we take the average of the two values. Using the data above check that the median is: 44 Mode The mode is the value in the dataset that appears most. Using the data above check that the mode is: 28 Standard Deviation The standard deviation (σ) measures how spread out the data is.
    [Show full text]