Numerically Summarizing Data

Total Page:16

File Type:pdf, Size:1020Kb

Numerically Summarizing Data 3 Numerically Summarizing Data Outline Making an Informed Decision 3.1 Measures of Central Tendency Suppose that you are in the market for a used 3.2 Measures of car. To make an informed decision regarding Dispersion your purchase, you decide to collect as much information as possible. What information is 3.3 Measures of Central important in helping you make this decision? See Tendency and the Decisions project on page 178. Dispersion from Grouped Data 3.4 Measures of Position and Outliers 3.5 The Five-Number Summary and Boxplots PUTTING IT TOGETHER When we look at a distribution of data, we should consider three characteristics of the distribution: shape, center, and spread. In the last chapter, we discussed methods for organizing raw data into tables and graphs. These graphs (such as the histogram) allow us to identify the shape of the distribution: symmetric (in particular, bell shaped or uniform), skewed right, or skewed left. The center and spread are numerical summaries of the data. The center of a data set is commonly called the average. There are many ways to describe the average value of a distribution. In addition, there are many ways to measure the spread of a distribution. The most appropriate measure of center and spread depends on the distribution’s shape. Once these three characteristics of the distribution are known, we can analyze the data for interesting features, including unusual data values, called outliers. 117 M03B_SULL3539_05_SE_C03.1.indd 117 11/02/15 4:01 PM 118 CHApTER 3 Numerically Summarizing Data 3.1 Measures of Central Tendency Preparing for This Section Before getting started, review the following: • Population versus sample (Section 1.1, p. 5) • Qualitative data (Section 1.1, p. 8) • Parameter versus statistic (Section 1.1, p. 5) • Simple random sampling (Section 1.3, pp. 21–25) • Quantitative data (Section 1.1, p. 8) Objectives ➊ Determine the arithmetic mean of a variable from raw data ➋ Determine the median of a variable from raw data ➌ Explain what it means for a statistic to be resistant ➍ Determine the mode of a variable from raw data A measure of central tendency numerically describes the average or typical data value. We hear the word average in the news all the time: • The average miles per gallon of gasoline of the 2015 Chevrolet Corvette Z06 in highway driving is 29. • According to the U.S. Census Bureau, the national average commute time to work in 2013 was 25.4 minutes. • According to the U.S. Census Bureau, the average household income in 2013 was $51,939. • The average American woman is 5′4″ tall and weighs 142 pounds. CAUTION! Whenever you hear the word In this chapter, we discuss the three most widely-used measures of central tendency: average, be aware that the word may not always be referring to the the mean, the median, and the mode. In the media (newspapers, blogs, and so on), mean. One average could be used to average usually refers to the mean. But beware: some reporters use average to refer to support one position, while another the median or mode. As we shall see, these three measures of central tendency can give average could be used to support a different position. very different results! ➊ Determine the Arithmetic Mean of a Variable from Raw Data In everyday language, the word average often represents the arithmetic mean. To compute the arithmetic mean of a set of data, the data must be quantitative. Definitions The arithmetic mean of a variable is computed by adding all the values of the variable in the data set and dividing by the number of observations. The population arithmetic mean, m (pronounced “mew”), is computed using all the individuals in a population. The population mean is a parameter. The sample arithmetic mean, x (pronounced “x-bar”), is computed using sample data. The sample mean is a statistic. While other types of means exist (see Problems 39 and 40), the arithmetic mean is generally referred to as the mean. We will follow this practice for the remainder of the text. We usually use Greek letters to represent parameters and Roman letters (such as x or s) to represent statistics. The formulas for computing population and sample means follow: If x1, x2, c, xN are the N observations of a variable from a population, then the In Other Words population mean, m, is To find the mean of a set of data, add up all the observations x1 + x2 + g + xN a xi m = = (1) and divide by the number of N N observations. M03B_SULL3539_05_SE_C03.1.indd 118 11/02/15 4:01 PM SEcTION 3.1 Measures of Central Tendency 119 If x1, x2, c, xn are n observations of a variable from a sample, then the sample mean, x, is x1 + x2 + g + xn a xi x = = (2) n n Note that N represents the size of the population, and n represents the size of the sample. The symbol Σ (the Greek letter capital sigma) tells us to add the terms. The subscript i shows that the various values are distinct and does not serve as a mathematical operation. For example, x1 is the first data value, x2 is the second, and so on. EXAmpLe 1 Computing a Population Mean and a Sample Mean Table 1 Problem The data in Table 1 represent the first exam score of 10 students enrolled in Introductory Statistics. Treat the 10 students as a population. Student Score (a) Compute the population mean. 1. Michelle 82 (b) Find a simple random sample of size n = 4 students. 2. Ryanne 77 (c) Compute the sample mean of the sample found in part (b). 3. Bilal 90 4. Pam 71 Approach 5. Jennifer 62 (a) To compute the population mean, add all the data values (test scores) and divide by the number of individuals in the population. 6. Dave 68 (b) Recall from Section 1.3 that we can use Table I in Appendix A, a calculator with 7. Joel 74 a random-number generator, or computer software to obtain simple random 8. Sam 84 samples. We will use a TI-84 Plus C graphing calculator. 9. Justine 94 (c) Find the sample mean by adding the data values corresponding to the individuals 10. Juan 88 in the sample and then dividing by n = 4, the sample size. Solution (a) Compute the population mean by adding the scores of all 10 students: a xi = x1 + x2 + x3 + g + x10 = 82 + 77 + 90 + 71 + 62 + 68 + 74 + 84 + 94 + 88 = 790 Divide this result by 10, the number of students in the class. a xi 790 m = = = 79 N 10 Although it was not necessary in this problem, we will agree to round the mean to one more decimal place than that in the raw data. Figure 1 (b) To find a simple random sample of size n = 4 from a population of size N = 10, we will use the TI-84 Plus C random-number generator with a seed of 14. (Recall that this gives the calculator its starting point to generate the list of random numbers.) Figure 1 shows the students in the sample: Jennifer (62), Juan (88), Ryanne (77), and Dave (68). (c) Compute the sample mean by adding the scores of the four students: a xi = x1 + x2 + x3 + x4 = 62 + 88 + 77 + 68 = 295 Divide this result by 4, the number of individuals in the sample. a xi 295 x = = = 73.8 Round to the nearest tenth. • Now Work Problem 21 n 4 • M03B_SULL3539_05_SE_C03.1.indd 119 11/02/15 4:01 PM 120 CHApTER 3 Numerically Summarizing Data It helps to think of the mean of a data set as the center of gravity. In other words, the mean is the value such that a histogram of the data is perfectly balanced, with equal weight on each side of the mean. Figure 2 shows a histogram of the data in Table 1 with the mean labeled. The histogram balances at m = 79. Figure 2 Scores on First Exam 3 y 2 requenc F 1 60 65 70 75 80 85 90 95 m ϭ 79 Score ➋ Determine the Median of a Variable from Raw Data A second measure of central tendency is the median. To compute the median of a set of data, the data must be quantitative. Definition The median of a variable is the value that lies in the middle of the data when arranged in ascending order. We use M to represent the median. In Other Words To help remember the idea behind the median, think of the Steps in Finding the Median of a Data Set median of a highway; it divides the highway in half. So the Step 1 Arrange the data in ascending order. median divides the data in half, Step 2 Determine the number of observations, n. with at most half the data below the median and at most half Step 3 Determine the observation in the middle of the data set. above it. • If the number of observations is odd, then the median is the data value exactly in the middle of the data set. That is, the median is the observation that lies in n + 1 the position. 2 • If the number of observations is even, then the median is the mean of the two middle observations in the data set. That is, the median is the mean of the n n observations that lie in the position and the + 1 position. 2 2 EXAmpLe 2 Determining the Median of a Data Set with an Odd Number of Observations Problem The data in Table 2 represent the length (in seconds) of a random sample of songs released in the 1970s.
Recommended publications
  • A Review on Outlier/Anomaly Detection in Time Series Data
    A review on outlier/anomaly detection in time series data ANE BLÁZQUEZ-GARCÍA and ANGEL CONDE, Ikerlan Technology Research Centre, Basque Research and Technology Alliance (BRTA), Spain USUE MORI, Intelligent Systems Group (ISG), Department of Computer Science and Artificial Intelligence, University of the Basque Country (UPV/EHU), Spain JOSE A. LOZANO, Intelligent Systems Group (ISG), Department of Computer Science and Artificial Intelligence, University of the Basque Country (UPV/EHU), Spain and Basque Center for Applied Mathematics (BCAM), Spain Recent advances in technology have brought major breakthroughs in data collection, enabling a large amount of data to be gathered over time and thus generating time series. Mining this data has become an important task for researchers and practitioners in the past few years, including the detection of outliers or anomalies that may represent errors or events of interest. This review aims to provide a structured and comprehensive state-of-the-art on outlier detection techniques in the context of time series. To this end, a taxonomy is presented based on the main aspects that characterize an outlier detection technique. Additional Key Words and Phrases: Outlier detection, anomaly detection, time series, data mining, taxonomy, software 1 INTRODUCTION Recent advances in technology allow us to collect a large amount of data over time in diverse research areas. Observations that have been recorded in an orderly fashion and which are correlated in time constitute a time series. Time series data mining aims to extract all meaningful knowledge from this data, and several mining tasks (e.g., classification, clustering, forecasting, and outlier detection) have been considered in the literature [Esling and Agon 2012; Fu 2011; Ratanamahatana et al.
    [Show full text]
  • Mean, Median and Mode We Use Statistics Such As the Mean, Median and Mode to Obtain Information About a Population from Our Sample Set of Observed Values
    INFORMATION AND DATA CLASS-6 SUB-MATHEMATICS The words Data and Information may look similar and many people use these words very frequently, But both have lots of differences between What is Data? Data is a raw and unorganized fact that required to be processed to make it meaningful. Data can be simple at the same time unorganized unless it is organized. Generally, data comprises facts, observations, perceptions numbers, characters, symbols, image, etc. Data is always interpreted, by a human or machine, to derive meaning. So, data is meaningless. Data contains numbers, statements, and characters in a raw form What is Information? Information is a set of data which is processed in a meaningful way according to the given requirement. Information is processed, structured, or presented in a given context to make it meaningful and useful. It is processed data which includes data that possess context, relevance, and purpose. It also involves manipulation of raw data Mean, Median and Mode We use statistics such as the mean, median and mode to obtain information about a population from our sample set of observed values. Mean The mean/Arithmetic mean or average of a set of data values is the sum of all of the data values divided by the number of data values. That is: Example 1 The marks of seven students in a mathematics test with a maximum possible mark of 20 are given below: 15 13 18 16 14 17 12 Find the mean of this set of data values. Solution: So, the mean mark is 15. Symbolically, we can set out the solution as follows: So, the mean mark is 15.
    [Show full text]
  • Accelerating Raw Data Analysis with the ACCORDA Software and Hardware Architecture
    Accelerating Raw Data Analysis with the ACCORDA Software and Hardware Architecture Yuanwei Fang Chen Zou Andrew A. Chien CS Department CS Department University of Chicago University of Chicago University of Chicago Argonne National Laboratory [email protected] [email protected] [email protected] ABSTRACT 1. INTRODUCTION The data science revolution and growing popularity of data Driven by a rapid increase in quantity, type, and number lakes make efficient processing of raw data increasingly impor- of sources of data, many data analytics approaches use the tant. To address this, we propose the ACCelerated Operators data lake model. Incoming data is pooled in large quantity { for Raw Data Analysis (ACCORDA) architecture. By ex- hence the name \lake" { and because of varied origin (e.g., tending the operator interface (subtype with encoding) and purchased from other providers, internal employee data, web employing a uniform runtime worker model, ACCORDA scraped, database dumps, etc.), it varies in size, encoding, integrates data transformation acceleration seamlessly, en- age, quality, etc. Additionally, it often lacks consistency or abling a new class of encoding optimizations and robust any common schema. Analytics applications pull data from high-performance raw data processing. Together, these key the lake as they see fit, digesting and processing it on-demand features preserve the software system architecture, empower- for a current analytics task. The applications of data lakes ing state-of-art heuristic optimizations to drive flexible data are vast, including machine learning, data mining, artificial encoding for performance. ACCORDA derives performance intelligence, ad-hoc query, and data visualization. Fast direct from its software architecture, but depends critically on the processing on raw data is critical for these applications.
    [Show full text]
  • Weighted Means and Grouped Data on the Graphing Calculator
    Weighted Means and Grouped Data on the Graphing Calculator Weighted Means. Suppose that, in a recent month, a bank customer had $2500 in his account for 5 days, $1900 for 2 days, $1650 for 3 days, $1375 for 9 days, $1200 for 1 day, $900 for 6 days, and $675 for 4 days. To calculate the average balance for that month, you would use a weighted mean: ∑ ̅ ∑ To do this automatically on a calculator, enter the account balances in L1 and the number of days (weight) in L2: As before, go to STAT CALC and 1:1-Var Stats. This time, type L1, L2 after 1-Var Stats and ENTER: The weighted (sample) mean is ̅ (so that the average balance for the month is $1430.83.) We can also see that the weighted sample standard deviation is . Estimating the Mean and Standard Deviation from a Frequency Distribution. If your data is organized into a frequency distribution, you can still estimate the mean and standard deviation. For example, suppose that we are given only a frequency distribution of the heights of the 30 males instead of the list of individual heights: Height (in.) Frequency (f) 62 - 64 3 65- 67 7 68 - 70 9 71 - 73 8 74 - 76 3 ∑ n = 30 We can just calculate the midpoint of each height class and use that midpoint to represent the class. We then find the (weighted) mean and standard deviation for the distribution of midpoints with the given frequencies as in the example above: Height (in.) Height Class Frequency (f) Midpoint 62 - 64 (62 + 64)/2 = 63 3 65- 67 (65 + 67)/2 = 66 7 68 - 70 69 9 71 - 73 72 8 74 - 76 75 3 ∑ n = 30 The approximate sample mean of the distribution is ̅ , and the approximate sample standard deviation of the distribution is .
    [Show full text]
  • Lecture 3: Measure of Central Tendency
    Lecture 3: Measure of Central Tendency Donglei Du ([email protected]) Faculty of Business Administration, University of New Brunswick, NB Canada Fredericton E3B 9Y2 Donglei Du (UNB) ADM 2623: Business Statistics 1 / 53 Table of contents 1 Measure of central tendency: location parameter Introduction Arithmetic Mean Weighted Mean (WM) Median Mode Geometric Mean Mean for grouped data The Median for Grouped Data The Mode for Grouped Data 2 Dicussion: How to lie with averges? Or how to defend yourselves from those lying with averages? Donglei Du (UNB) ADM 2623: Business Statistics 2 / 53 Section 1 Measure of central tendency: location parameter Donglei Du (UNB) ADM 2623: Business Statistics 3 / 53 Subsection 1 Introduction Donglei Du (UNB) ADM 2623: Business Statistics 4 / 53 Introduction Characterize the average or typical behavior of the data. There are many types of central tendency measures: Arithmetic mean Weighted arithmetic mean Geometric mean Median Mode Donglei Du (UNB) ADM 2623: Business Statistics 5 / 53 Subsection 2 Arithmetic Mean Donglei Du (UNB) ADM 2623: Business Statistics 6 / 53 Arithmetic Mean The Arithmetic Mean of a set of n numbers x + ::: + x AM = 1 n n Arithmetic Mean for population and sample N P xi µ = i=1 N n P xi x¯ = i=1 n Donglei Du (UNB) ADM 2623: Business Statistics 7 / 53 Example Example: A sample of five executives received the following bonuses last year ($000): 14.0 15.0 17.0 16.0 15.0 Problem: Determine the average bonus given last year. Solution: 14 + 15 + 17 + 16 + 15 77 x¯ = = = 15:4: 5 5 Donglei Du (UNB) ADM 2623: Business Statistics 8 / 53 Example Example: the weight example (weight.csv) The R code: weight <- read.csv("weight.csv") sec_01A<-weight$Weight.01A.2013Fall # Mean mean(sec_01A) ## [1] 155.8548 Donglei Du (UNB) ADM 2623: Business Statistics 9 / 53 Will Rogers phenomenon Consider two sets of IQ scores of famous people.
    [Show full text]
  • Random Variables and Applications
    Random Variables and Applications OPRE 6301 Random Variables. As noted earlier, variability is omnipresent in the busi- ness world. To model variability probabilistically, we need the concept of a random variable. A random variable is a numerically valued variable which takes on different values with given probabilities. Examples: The return on an investment in a one-year period The price of an equity The number of customers entering a store The sales volume of a store on a particular day The turnover rate at your organization next year 1 Types of Random Variables. Discrete Random Variable: — one that takes on a countable number of possible values, e.g., total of roll of two dice: 2, 3, ..., 12 • number of desktops sold: 0, 1, ... • customer count: 0, 1, ... • Continuous Random Variable: — one that takes on an uncountable number of possible values, e.g., interest rate: 3.25%, 6.125%, ... • task completion time: a nonnegative value • price of a stock: a nonnegative value • Basic Concept: Integer or rational numbers are discrete, while real numbers are continuous. 2 Probability Distributions. “Randomness” of a random variable is described by a probability distribution. Informally, the probability distribution specifies the probability or likelihood for a random variable to assume a particular value. Formally, let X be a random variable and let x be a possible value of X. Then, we have two cases. Discrete: the probability mass function of X specifies P (x) P (X = x) for all possible values of x. ≡ Continuous: the probability density function of X is a function f(x) that is such that f(x) h P (x < · ≈ X x + h) for small positive h.
    [Show full text]
  • Big Data and Open Data As Sustainability Tools a Working Paper Prepared by the Economic Commission for Latin America and the Caribbean
    Big data and open data as sustainability tools A working paper prepared by the Economic Commission for Latin America and the Caribbean Supported by the Project Document Big data and open data as sustainability tools A working paper prepared by the Economic Commission for Latin America and the Caribbean Supported by the Economic Commission for Latin America and the Caribbean (ECLAC) This paper was prepared by Wilson Peres in the framework of a joint effort between the Division of Production, Productivity and Management and the Climate Change Unit of the Sustainable Development and Human Settlements Division of the Economic Commission for Latin America and the Caribbean (ECLAC). Some of the imput for this document was made possible by a contribution from the European Commission through the EUROCLIMA programme. The author is grateful for the comments of Valeria Jordán and Laura Palacios of the Division of Production, Productivity and Management of ECLAC on the first version of this paper. He also thanks the coordinators of the ECLAC-International Development Research Centre (IDRC)-W3C Brazil project on Open Data for Development in Latin America and the Caribbean (see [online] www.od4d.org): Elisa Calza (United Nations University-Maastricht Economic and Social Research Institute on Innovation and Technology (UNU-MERIT) PhD Fellow) and Vagner Diniz (manager of the World Wide Web Consortium (W3C) Brazil Office). The European Commission support for the production of this publication does not constitute endorsement of the contents, which reflect the views of the author only, and the Commission cannot be held responsible for any use which may be made of the information obtained herein.
    [Show full text]
  • Mistaking the Forest for the Trees: the Mistreatment of Group-Level Treatments in the Study of American Politics
    Mistaking the Forest for the Trees: The Mistreatment of Group-Level Treatments in the Study of American Politics Kelly T. Rader Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2012 c 2012 Kelly T. Rader All Rights Reserved ABSTRACT Mistaking the Forest for the Trees: The Mistreatment of Group-Level Treatments in the Study of American Politics Kelly T. Rader Over the past few decades, the field of political science has become increasingly sophisticated in its use of empirical tests for theoretical claims. One particularly productive strain of this devel- opment has been the identification of the limitations of and challenges in using observational data. Making causal inferences with observational data is difficult for numerous reasons. One reason is that one can never be sure that the estimate of interest is un-confounded by omitted variable bias (or, in causal terms, that a given treatment is ignorable or conditionally random). However, when the ideal hypothetical experiment is impractical, illegal, or impossible, researchers can often use quasi-experimental approaches to identify causal effects more plausibly than with simple regres- sion techniques. Another reason is that, even if all of the confounding factors are observed and properly controlled for in the model specification, one can never be sure that the unobserved (or error) component of the data generating process conforms to the assumptions one must make to use the model. If it does not, then this manifests itself in terms of bias in standard errors and incor- rect inference on statistical significance of quantities of interest.
    [Show full text]
  • Principal Component Analysis: Application to Statistical Process Control
    Chapter 1 Principal Component Analysis: Application to Statistical Process Control 1.1. Introduction Principal component analysis (PCA) is an exploratory statistical method for graphical description of the information present in large datasets. In most applications, PCA consists of studying p variables measured on n individuals. When n and p are large, the aim is to synthesize the huge quantity of information into an easy and understandable form. Unidimensional or bidimensional studies can be performed on variables using graphical tools (histograms, box plots) or numerical summaries (mean, variance, correlation). However, these simple preliminary studies in a multidimensional context are insufficient since they do not take into account the eventual relationships between variables, which is often the most important point. Principal component analysis is often considered as the basic method of factor analysis, which aims to find linear combinations of the p variables called components used to visualize the observations in a simple way. Because it transforms a large number of correlated variables into a few uncorrelated principal components, PCA is a dimension reduction method. However, PCA can also be used as a multivariate outlier detection method, especially by studying the last principal components. This property is useful in multidimensional quality control. Chapter written by Gilbert SAPORTA and Ndèye NIANG. 1 2 Data Analysis 1.2. Data table and related subspaces 1.2.1. Data and their characteristics Data are generally represented in a rectangulartable with n rows for the individuals and p columns corresponding to the variables. Choosing individuals and variables to analyze is a crucial phase which has an important influence on PCA results.
    [Show full text]
  • Statistics, Measures of Central Tendency I
    Statistics, Measures of Central TendencyI We are considering a random variable X with a probability distribution which has some parameters. We want to get an idea what these parameters are. We perfom an experiment n times and record the outcome. This means we have X1;:::; Xn i.i.d. random variables, with probability distribution same as X . We want to use the outcome to infer what the parameters are. Mean The outcomes are x1;:::; xn. The Sample Mean is x1+···+xn x := n . Also sometimes called the average. The expected value of X , EX , is also called the mean of X . Often denoted by µ. Sometimes called population mean. Median The number so that half the values are below, half above. If the sample is of even size, you take the average of the middle terms. Mode The number that occurs most frequently. There could be several modes, or no mode. Dan Barbasch Math 1105 Chapter 9 Week of September 25 1 / 24 Statistics, Measures of Central TendencyII Example You have a coin for which you know that P(H) = p and P(T ) = 1 − p: You would like to estimate p. You toss it n times. You count the number of heads. The sample mean should be an estimate of p: EX = p, and E(X1 + ··· + Xn) = np: So X + ··· + X E 1 n = p: n Dan Barbasch Math 1105 Chapter 9 Week of September 25 2 / 24 Descriptive StatisticsI Frequency Distribution Divide into a number of equal disjoint intervals. For each interval count the number of elements in the sample occuring.
    [Show full text]
  • Lesson 16.5 Measures of Central Tendency and Grouped Data
    NAME: INTEGRATED ALGEBRA 1 DATE: MR. THOMPSON LESSON 16.5 MEASURES OF CENTRAL TENDENCY AND GROUPED DATA HOMEWORK ASSIGNMENT # 126: PAGES 695-697: # 2 - 12 EVENS EXAMPLE I OiliflitEMIZZEITIOMMESEMZESEM72ra, In the table, the data indicate the heights, in inches, of 17 basketball players. For these data find: a. the mode b. the median c. the mean 77 2 Solution a. The greatest frequency, 5, occurs for the height 76 0 of 75 inches. The mode, or height appearing most often, is 75. 75 5 74 3 b.For 17 players, the median is the 9th number, so there are 8 heights greater than or equal to the 73 4 median and 8 heights less than or equal to the 72 2 median. Counting the frequencies going down, 1 we have 2 + 0 + 5 = 7. Since the frequency of 71 the next interval is 3, the 8th, 9th, and 10th heights are in this interval, 74. Counting the frequencies going up, we have 1 + 2 + 4 = 7. Again, the fre- quency of the next interval is 3, and the 8th, 9th, and 10th heights are in this interval. The 9th height, the median, is 74. c.(1) Multiply each height by its corresponding frequency: 77 x 2 = 154 76 x 0 = 0 75 x 5 375 74 x 3 = 222 73 X 4 = 292 72 X 2 = 144 71 x 1 = 71 (2) Fmd the total of these products: 154 + 0 + 375 + 222 + 292 + 144 + 71 = 1,258 (3) Divide this total, 1,258, by the total frequency, 17 to obtain the mean: 1258 +47 = 74 14.
    [Show full text]
  • Have You Seen the Standard Deviation?
    Nepalese Journal of Statistics, Vol. 3, 1-10 J. Sarkar & M. Rashid _________________________________________________________________ Have You Seen the Standard Deviation? Jyotirmoy Sarkar 1 and Mamunur Rashid 2* Submitted: 28 August 2018; Accepted: 19 March 2019 Published online: 16 September 2019 DOI: https://doi.org/10.3126/njs.v3i0.25574 ___________________________________________________________________________________________________________________________________ ABSTRACT Background: Sarkar and Rashid (2016a) introduced a geometric way to visualize the mean based on either the empirical cumulative distribution function of raw data, or the cumulative histogram of tabular data. Objective: Here, we extend the geometric method to visualize measures of spread such as the mean deviation, the root mean squared deviation and the standard deviation of similar data. Materials and Methods: We utilized elementary high school geometric method and the graph of a quadratic transformation. Results: We obtain concrete depictions of various measures of spread. Conclusion: We anticipate such visualizations will help readers understand, distinguish and remember these concepts. Keywords: Cumulative histogram, dot plot, empirical cumulative distribution function, histogram, mean, standard deviation. _______________________________________________________________________ Address correspondence to the author: Department of Mathematical Sciences, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, USA, E-mail: [email protected] 1;
    [Show full text]