U3 Introduction to Summary Statistics

Presentation Name Course Name Unit # – Lesson #.# – Lesson Name Statistics • The collection, evaluation, and interpretation of data Introduction to Summary Statistics • Statistical analysis of measurements can help verify the quality of a design or process Summary Statistics Mean Central Tendency Central Tendency • The mean is the sum of the values of a set • “Center” of a distribution of data divided by the number of values in – Mean, median, mode that data set. Variation • Spread of values around the center – Range, standard deviation, interquartile range x μ = i Distribution N • Summary of the frequency of values – Frequency tables, histograms, normal distribution Project Lead The Way, Inc. Copyright 2010 1 Presentation Name Course Name Unit # – Lesson #.# – Lesson Name Mean Central Tendency Mean Central Tendency x • Data Set μ = i 3 7 12 17 21 21 23 27 32 36 44 N • Sum of the values = 243 • Number of values = 11 μ = mean value x 243 x = individual data value Mean = μ = i = = 22.09 i N 11 xi = summation of all data values N = # of data values in the data set A Note about Rounding in Statistics Mean – Rounding • General Rule: Don’t round until the final • Data Set answer 3 7 12 17 21 21 23 27 32 36 44 – If you are writing intermediate results you may • Sum of the values = 243 round values, but keep unrounded number in memory • Number of values = 11 • Mean – round to one more decimal place xi 243 Mean = μ = = = 22.09 than the original data N 11 • Standard Deviation: Round to one more decimal place than the original data • Reported: Mean = 22.1 Project Lead The Way, Inc. Copyright 2010 2 Presentation Name Course Name Unit # – Lesson #.# – Lesson Name Mode Central Tendency Mode Central Tendency • Measure of central tendency • The most frequently occurring value in a • The most frequently occurring value in a set of data is the mode set of data is the mode • Symbol is M Data Set: 3 7 12 17 21 21 23 27 32 36 44 Data Set: Mode = M = 21 27 17 12 7 21 44 23 3 36 32 21 Mode Central Tendency Mode Central Tendency • The most frequently occurring value in a Determine the mode of set of data is the mode 48, 63, 62, 49, 58, 2, 63, 5, 60, 59, 55 • Bimodal Data Set: Two numbers of equal Mode = 63 frequency stand out Determine the mode of • Multimodal Data Set: More than two 48, 63, 62, 59, 58, 2, 63, 5, 60, 59, 55 numbers of equal frequency stand out Mode = 63 & 59 Bimodal Determine the mode of 48, 63, 62, 59, 48, 2, 63, 5, 60, 59, 55 Mode = 63, 59, & 48 Multimodal Project Lead The Way, Inc. Copyright 2010 3 Presentation Name Course Name Unit # – Lesson #.# – Lesson Name Median Central Tendency Median Central Tendency • Measure of central tendency • The median is the value that occurs in the • The median is the value that occurs in the middle of a set of data that has been middle of a set of data that has been arranged in numerical order arranged in numerical order • Symbol is ~x, pronounced “x-tilde” Data Set: 273 717 12 12 17 7 2121 2144 2323 273 36 32 32 36 21 44 Median Central Tendency Median Central Tendency • A data set that contains an odd number of • For a data set that contains an even values always has a Median number of values, the two middle values are averaged with the result being the Median Data Set: Middle of data set Data Set: 3 7 12 17 21 21 23 27 32 36 44 3 7 12 17 21 21 23 27 31 32 36 44 Project Lead The Way, Inc. Copyright 2010 4 Presentation Name Course Name Unit # – Lesson #.# – Lesson Name Range Variation Standard Deviation Variation • Measure of data variation • Measure of data variation • The range is the difference between the • The standard deviation is a measure of largest and smallest values that occur in a the spread of data values set of data – A larger standard deviation indicates a wider • Symbol is R spread in data values Data Set: 3 7 12 17 21 21 23 27 32 36 44 Range = R = maximum value – minimum value R = 44 – 3 = 41 Standard Deviation Variation Standard Deviation Variation 2 Procedure x − μ σ = i 2 1. Calculate the mean, μ N xi − μ σ = 2. Subtract the mean from each value and N then square each difference 3. Sum all squared differences σ = standard deviation 4. Divide the summation by the size of the xi = individual data value ( x1, x2, x3, …) population (number of data values), N μ = mean 5. Calculate the square root of the result N = size of population Project Lead The Way, Inc. Copyright 2010 5 Presentation Name Course Name Unit # – Lesson #.# – Lesson Name 2 A Note about Rounding in Statistics, Again Standard Deviation x − μ σ = i Calculate the standard N • General Rule: Don’t round until the final deviation for the data array answer 2, 5, 48, 49, 55, 58, 59, 60, 62, 63, 63 – If you are writing intermediate results you may x 524 1. Calculate the mean μ = i round values, but keep unrounded number in N 11 = 47.63 memory 2. Subtract the mean from each data value and square each difference 2 • Standard Deviation: Round to one more xi − μ decimal place than the original data (2 - 47.63)2 = 2082.6777 (59 - 47.63)2 = 129.1322 (5 - 47.63)2 = 1817.8595 (60 - 47.63)2 = 152.8595 (48 - 47.63)2 = 0.1322 (62 - 47.63)2 = 206.3140 (49 - 47.63)2 = 1.8595 (63 - 47.63)2 = 236.0413 (55 - 47.63)2 = 54.2231 (63 - 47.63)2 = 236.0413 (58 - 47.63)2 = 107.4050 Standard Deviation Variation Histogram Distribution 3. Sum all squared differences • A histogram is a common data distribution 2 2082.6777 + 1817.8595 + 0.1322 + 1.8595 + 54.2231 + chart that is used to show the frequency xi − μ = 107.4050 + 129.1322 + 152.8595 + 206.3140 + 236.0413 + 236.0413 with which specific values, or values within ranges, occur in a set of data. = 5,024.5455 Note that this is the sum of the unrounded squared differences. • An engineer might use a histogram to 4. Divide the summation by the number of data values 2 show the variation of a dimension that x − μ 5024.5455 i = = 456.7769 exists among a group of parts that are N 11 intended to be identical. 5. Calculate the square root of the result 5 4 3 2 2 xi − μ 1 = 456.7769 = 21.4 Frequency 0 N 0.745 0.746 0.747 0.748 0.749 0.750 0.751 0.752 0.753 0.754 0.755 0.756 0.757 0.758 0.759 0.760 Length (in.) Project Lead The Way, Inc. Copyright 2010 6 Presentation Name Course Name Unit # – Lesson #.# – Lesson Name Histogram Distribution Histogram Distribution • Large sets of data are often divided into a • The number of data elements in each limited number of groups. These groups class interval is shown by the frequency, are called class intervals. which is indicated along the Y-axis of the graph. -16 to -6 -5 to 5 6 to 16 7 Class Intervals 5 3 Frequency 1 -16 to -6 -5 to 5 6 to 16 Histogram Distribution Histogram Distribution Example • The height of each bar in the chart indicates the number of data elements, or 1, 7, 15, 4, 8, 8, 5, 12, 10 frequency of occurrence, within each range. 1, 4, 5, 7, 8, 8, 10, 12,15 1, 4, 5, 7, 8, 8, 10,12,15 4 3 4 0.5 < x ≤ 5.5 5.5 < x ≤ 10.5 10.5 < x ≤ 15.5 3 2 2 1 Frequency 1 to 5 6 to 10 11 to 15 Frequency 1 0.5 5.5 10.5 15.5 1 to 5 6 to 10 11 to 15 Project Lead The Way, Inc. Copyright 2010 7 Presentation Name Course Name Unit # – Lesson #.# – Lesson Name Histogram Distribution Dot Plot Distribution 0.7495 < x ≤ 0.7505 5 0 3 -1 -3 4 3 2 1 0 -1 -1 2 1 3 0 1 -1 -2 2 1 2 1 0 Frequency 1 -2 -4 0 0 0 Length (in.) MINIMUM MAXIMUM = 0.745 in. = 0.760 in. -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 Dot Plot Distribution Normal Distribution Distribution 0 3 -1 -3 3 2 1 0 -1 -1 2 1 Bell shaped curve 0 1 -1 -2 1 2 1 0 -2 -4 0 0 5 Frequency -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 3 1 Data Elements Frequency -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 Project Lead The Way, Inc. Copyright 2010 8 .

U3 Introduction to Summary Statistics

Summarize — Summary Statistics

Descriptive Statistics

Measures of Dispersion for Multidimensional Data

Summary Statistics, Distributions of Sums and Means

Numerical Summary Values for Quantitative Data 35

Measures of Dispersion

Section II Descriptive Statistics for Continuous & Binary Data (Including

4. Descriptive Statistics

Analysis of Variance with Summary Statistics in Microsoft Excel

Descriptive Statistics

1 - to Overcoming the Reluctance of Epidemiologists to Explicitly Tackle the Measurement Error Problem

Correlations (Covariances) of Variables Or Coefﬁcients