Day 11.4 – Box and Whisker Plots and Percentiles

Total Page:16

File Type:pdf, Size:1020Kb

Day 11.4 – Box and Whisker Plots and Percentiles Day 11.4 – Box and Whisker Plots and Percentiles In a previous lesson, we learned that the median divides a set a data into 2 equal parts. Sometimes it is necessary to divide the data into smaller more precise parts. We can divide data into 4 equal parts called quartiles. The following diagram illustrates how this works: Minimum 1st Quartile 2nd Quartile 3st Quartile Maximum (median of (median) (median of lower half) upper half) Example: Find the median, first quartile and third quartile of the following data: {43, 42, 73, 74, 78, 82, 86, 80, 80, 87, 97, 91, 91} Step 1: Arrange in order- lowest to highest. Step 2: Find and label the median. Step 3: Find and label the median of the lower half. Label this 1st Quartile. Step 4: Find the median of the upper half. Label this appropriately as the 3rd Quartile. The five number summary of a set of data includes the minimum, the 1st quartile, the median, the 3rd quartile and the maximum. The minimum is the least value and the maximum is the greatest value. Find the five number summary for the data below: 125 , 140, 80, 135, 126, 140, 350 The five number summary is min _____, 1st Q ______, median ______, 3rd Q ______, max _______ Percentiles are another way to divide data into even more precise groups. You have all seen percentiles when your guidance councilor talked to you about your performance on standardized tests. Percentiles separate data sets into 100 equal parts. The percentile rank of a score tells us the percent of values in the set less than or equal to that member. The median describes the th percentile. The first quartile describes the th percentile. The third quartile describes the th percentile. Example: Of 25 test scores, eight are less than or equal to 75. What is the percentile rank of a test score of 75? Write a ratio of the number of score less than or equal to 75 compared to the total number of test scores Find the percentile rank of the following using the 20-item data set. 63, 64, 65, 70, 74, 75, 80, 80, 83, 84, 85, 87, 88, 89, 90, 91, 97, 98, 99, 100 Percentile rank of 70: How many data points are less than or equal to 70 ? Put this number over the total number of data points and change to a percent. Percentile rank of 80 How many data points are less than or equal to 80 ? Put this number over the total number of data points and change to a percent. Percentile rank of 90: How many data points are less than or equal to 90 ? Put this number over the number of data points and change to a percent. 1. 16 students received the following scores on a math quiz. 60, 62, 62, 65, 70, 74, 74, 76, 78, 82, 85, 94, 96, 98, 98, 99 Find the median __________ Find the first quartile __________ Find the third quartile _____________ What is the percentile rank of 94? In fraction form ______ in decimal form ______ in percent form ____ What is the percentile rank of 98? In fraction form ______ in decimal form ______ in percent form ____ What is the percentile rank of 74? In fraction form ______ in decimal form ______ in percent form ____ A Box and Whisker Plot is a graph that describes data using the five number summary of a set. This plot is useful for comparing two or more data sets. The Box and Whisker Plot shows how the data for each set are distributed and what the extreme values are. Terms: median: the middle piece of data (50%) first or lower quartile : middle of lower half of data (25%) third or upper quartile: middle of upper half of data (75%) interquartile range: difference between the third and first quartile DATA MUST BE PUT INTO NUMERICAL ORDER BEFORE A BOX AND WHISKER PLOT CAN BE CONSTRUCTED Steps in constructing a Box and Whisker Plot 1. Arrange the values in numerical order 2. Draw a number line to include the lowest and highest value in the data set 3. Calculate the median, the lower quartile and the upper quartile 4. Place dots above the number line to mark the 5 values : lowest value, lower quartile, median, upper quartile, and highest value Draw a box with the vertical ends passing through the lower and upper quartiles. Draw a vertical line in box through the median. Connect the two extreme values to the box with lines called whiskers. median Lower quartile Upper quartile Lowest value Highest value * * * * * Example 1: Data: 2, 2, 5, 5, 5, 6, 8, 5, 8, 9, 10, 12, 7, 8, 11 1. Arrange the data in order: 2. Draw number line to include 2 to 12 3. Determine the median = lower quartile = upper quartile = 4. Mark points the 5 data points 5. Draw box and whiskers. 6. What is the interquartile range? Example 2: Make a box and whisker plot from the following data number of frequency shoes Low score ________ 4 3 High score ________ 5 4 Median ___________ 6 1 Lower quartile __________ 7 1 Upper quartile ___________ 8 3 9 1 10 2 11 1 15 1 Example 3 : The following box and whisker plots are student test scores from 2 different math sections. Section B * * * * * Section A * * * * * 0 10 20 30 40 50 60 70 80 90 100 1. In which section was the student who had the highest test score? ________ 2. The student who had the lowest score? _________ 3. Which students had the higher median score? _________ 4. Which section do you think had the higher mean test score? Explain 5. Which section did better on the test? Explain. Example 4: A movie theater recorded the number of tickets sold daily for a popular movie during the month of June. The box-and-whisker plot shown below represents the data for the number of tickets sold, in hundreds. Which conclusion can be made using this plot? (1) The second quartile is 600. (2) The mean of the attendance is 400. (3) The range of the attendance is 300 to 600. (4) Twenty-five percent of the attendance is between 300 and 400. Day 11.4 Homework 1. Make a box and whisker plot given the data on the number of points scored by Jones High School football team. 12, 20, 19, 15, 24, 26, 27, 23, 21, 18, 14, 29 low score : high score: median:Lis lower quartile : upper quartile : 2. Rent costs are compared in this box and whisker plot. What is the median cost? ________ * * * * * 1100 300 500 700 900 3. Which value is at the lower quartile in this box and whisker plot? ______________ * * * * * 0 10 20 30 40 50 60 70 4. Graph a box and whisker plot using the following information: smallest data = 12 largest data = 20 lower quartile = 15 upper quartile = 19 median = 18 5. The box and whisker plot shows the weights of 20 students. What percent of the students weighed between 110 and 140? a) 25% b) 50% c) 75% d) 100% * * * * * 110 120 130 140 150 What does the heaviest person weigh? ______________ 6. The box-and-whisker plots below show the normal precipitation (in inches) each month in Dallas and in Houston Texas. 1 2 3 4 5 6 Dallas * * * * * * * * * * Houston For how many months is Houston’s precipitation less than 3.5 inches? _____________ Compare the precipitation of Dallas with that of Houston. ______________________ ___________________________________________________________________ 7. The accompanying box-and-whisker plots can be used to compare the annual incomes of three professions. Based on the box-and-whisker plots, which statement is true? (1) The median income for nuclear engineers is greater than the income of all musicians. (2) The median income for police officers and musicians is the same. (3) All nuclear engineers earn more than all police officers. (4) A musician will eventually earn more than a police officer. 8. The box-and-whisker plot below represents students' scores on a recent English test. What is the value of the upper quartile? (1) 68 (3) 84 (2) 76 (4) 94 .
Recommended publications
  • The Statistical Analysis of Distributions, Percentile Rank Classes and Top-Cited
    How to analyse percentile impact data meaningfully in bibliometrics: The statistical analysis of distributions, percentile rank classes and top-cited papers Lutz Bornmann Division for Science and Innovation Studies, Administrative Headquarters of the Max Planck Society, Hofgartenstraße 8, 80539 Munich, Germany; [email protected]. 1 Abstract According to current research in bibliometrics, percentiles (or percentile rank classes) are the most suitable method for normalising the citation counts of individual publications in terms of the subject area, the document type and the publication year. Up to now, bibliometric research has concerned itself primarily with the calculation of percentiles. This study suggests how percentiles can be analysed meaningfully for an evaluation study. Publication sets from four universities are compared with each other to provide sample data. These suggestions take into account on the one hand the distribution of percentiles over the publications in the sets (here: universities) and on the other hand concentrate on the range of publications with the highest citation impact – that is, the range which is usually of most interest in the evaluation of scientific performance. Key words percentiles; research evaluation; institutional comparisons; percentile rank classes; top-cited papers 2 1 Introduction According to current research in bibliometrics, percentiles (or percentile rank classes) are the most suitable method for normalising the citation counts of individual publications in terms of the subject area, the document type and the publication year (Bornmann, de Moya Anegón, & Leydesdorff, 2012; Bornmann, Mutz, Marx, Schier, & Daniel, 2011; Leydesdorff, Bornmann, Mutz, & Opthof, 2011). Until today, it has been customary in evaluative bibliometrics to use the arithmetic mean value to normalize citation data (Waltman, van Eck, van Leeuwen, Visser, & van Raan, 2011).
    [Show full text]
  • (QQ-Plot) and the Normal Probability Plot Section
    MAT 2377 (Winter 2012) Quantile-Quantile Plot (QQ-plot) and the Normal Probability Plot Section 6-6 : Normal Probability Plot Goal : To verify the underlying assumption of normality, we want to compare the distribution of the sample to a normal distribution. Normal Population : Suppose that the population is normal, i.e. X ∼ N(µ, σ2). Thus, X − µ 1 µ Z = = X − ; σ σ σ where Z ∼ N(0; 1). Hence, there is a linear association between a normal variable and a standard normal random variable. If our sample is randomly selected from a normal population, then we should be able to observer this linear association. Consider a random sample of size n : x1; x2; : : : ; xn. We will obtain the order statistics (i.e. order the values in an ascending order) : y1 ≤ y2 ≤ ::: ≤ yn: We will compare the order statistics (called sample quantiles) to quantiles from a standard normal distribution N(0; 1). We rst need to compute the percentile rank of the ith order statistic. In practice (within the context of QQ-plots), it is computed as follows i − 3=8 i − 1=2 p = or (alternatively) p = : i n + 1=4 i n Consider yi, we will compare it to a lower quantile zi of order pi from N(0; 1). We get −1 zi = Φ (pi) : 1 The plot of zi against yi (or alternatively of yi against zi) is called a quantile- quantile plot or QQ-plot If the data are normal, then it should exhibit a linear tendency. To help visualize the linear tendency we can overlay the following line 1 x z = x + ; s s where x is the sample mean and s is the sample standard deviation.
    [Show full text]
  • Test Scores Explanation for Parents and Teachers: Youtube Video Script by Lara Langelett July 2019 1
    Test Scores Explanation for Parents and Teachers: YouTube Video Script By Lara Langelett July 2019 1 Introduction: Hello Families and Teachers. My name is Lara Langelett and I would like to teach you about how to read your child’s or student’s test scores. The purpose of this video message is to give you a better understanding of standardized test scores and to be able to apply it to all normed assessment tools. First I will give a definition of what the bell curve is by using a familiar example. Then I will explain the difference between a percentile rank and percentage. Next, I will explain Normed-Referenced Standard test scores. In addition, the MAPs percentile rank and the RIT scores will be explained. Finally, I will explain IQ scores on the bell curve. 2 Slide 1: Let’s get started: here is a bell curve; it is shaped like a bell. 3 Slide 2: To understand the Bell Curve, we will look at a familiar example of basketball players in the U.S.A. Everyone, on this day, who plays basketball fits into this bell curve around the United States. I’m going to emphasis “on this day” as this is important piece information of for explaining standardized scores. 4 Slide 3: On the right side of the bell curve we can section off a part of the bell curve (2.1 % blue area). Inside this section are people who play basketball at the highest skill level, like the U.S. Olympic basketball team. These athletes are skilled athletes who have played basketball all their life, practice on a daily basis, have extensive knowledge of the game, and are at a caliber that the rest of the basketball population have not achieved.
    [Show full text]
  • HMH ASSESSMENTS Glossary of Testing, Measurement, and Statistical Terms
    hmhco.com HMH ASSESSMENTS Glossary of Testing, Measurement, and Statistical Terms Resource: Joint Committee on the Standards for Educational and Psychological Testing of the AERA, APA, and NCME. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association. Glossary of Testing, Measurement, and Statistical Terms Adequate yearly progress (AYP) – A requirement of the No Child Left Behind Act (NCLB, 2001). This requirement states that all students in each state must meet or exceed the state-defined proficiency level by 2014 on state Ability – A characteristic indicating the level of an individual on a particular trait or competence in a particular area. assessments. Each year, the minimum level of improvement that states, school districts, and schools must achieve Often this term is used interchangeably with aptitude, although aptitude actually refers to one’s potential to learn or is defined. to develop a proficiency in a particular area. For comparison see Aptitude. Age-Based Norms – Developed for the purpose of comparing a student’s score with the scores obtained by other Ability/Achievement Discrepancy – Ability/Achievement discrepancy models are procedures for comparing an students at the same age on the same test. How much a student knows is determined by the student’s standing individual’s current academic performance to others of the same age or grade with the same ability score. The ability or rank within the age reference group. For example, a norms table for 12 year-olds would provide information score could be based on predicted achievement, the general intellectual ability score, IQ score, or other ability score.
    [Show full text]
  • How to Use MGMA Compensation Data: an MGMA Research & Analysis Report | JUNE 2016
    How to Use MGMA Compensation Data: An MGMA Research & Analysis Report | JUNE 2016 1 ©MGMA. All rights reserved. Compensation is about alignment with our philosophy and strategy. When someone complains that they aren’t earning enough, we use the surveys to highlight factors that influence compensation. Greg Pawson, CPA, CMA, CMPE, chief financial officer, Women’s Healthcare Associates, LLC, Portland, Ore. 2 ©MGMA. All rights reserved. Understanding how to utilize benchmarking data can help improve operational efficiency and profits for medical practices. As we approach our 90th anniversary, it only seems fitting to celebrate MGMA survey data, the gold standard of the industry. For decades, MGMA has produced robust reports using the largest data sets in the industry to help practice leaders make informed business decisions. The MGMA DataDive® Provider Compensation 2016 remains the gold standard for compensation data. The purpose of this research and analysis report is to educate the reader on how to best use MGMA compensation data and includes: • Basic statistical terms and definitions • Best practices • A practical guide to MGMA DataDive® • Other factors to consider • Compensation trends • Real-life examples When you know how to use MGMA’s provider compensation and production data, you will be able to: • Evaluate factors that affect compensation andset realistic goals • Determine alignment between medical provider performance and compensation • Determine the right mix of compensation, benefits, incentives and opportunities to offer new physicians and nonphysician providers • Ensure that your recruitment packages keep pace with the market • Understand the effects thatteaching and research have on academic faculty compensation and productivity • Estimate the potential effects of adding physicians and nonphysician providers • Support the determination of fair market value for professional services and assess compensation methods for compliance and regulatory purposes 3 ©MGMA.
    [Show full text]
  • Which Percentile-Based Approach Should Be Preferred for Calculating
    Accepted for publication in the Journal of Informetrics Which percentile-based approach should be preferred for calculating normalized citation impact values? An empirical comparison of five approaches including a newly developed citation-rank approach (P100) Lutz Bornmann,# Loet Leydesdorff,* and Jian Wang§+ # Division for Science and Innovation Studies Administrative Headquarters of the Max Planck Society Hofgartenstr. 8, 80539 Munich, Germany. E-mail: [email protected] * Amsterdam School of Communication Research (ASCoR), University of Amsterdam, Kloveniersburgwal 48, 1012 CX Amsterdam, The Netherlands. Email: [email protected] § Institute for Research Information and Quality Assurance (iFQ) Schützenstraße 6a, 10117 Berlin, Germany. + Center for R&D Monitoring (ECOOM) and Department of Managerial Economics, Strategy and Innovation, Katholieke Universiteit Leuven, Waaistraat 6, 3000 Leuven, Belgium. Email: [email protected] Abstract For comparisons of citation impacts across fields and over time, bibliometricians normalize the observed citation counts with reference to an expected citation value. Percentile-based approaches have been proposed as a non-parametric alternative to parametric central-tendency statistics. Percentiles are based on an ordered set of citation counts in a reference set, whereby the fraction of papers at or below the citation counts of a focal paper is used as an indicator for its relative citation impact in the set. In this study, we pursue two related objectives: (1) although different percentile-based approaches have been developed, an approach is hitherto missing that satisfies a number of criteria such as scaling of the percentile ranks from zero (all other papers perform better) to 100 (all other papers perform worse), and solving the problem with tied citation ranks unambiguously.
    [Show full text]
  • Percentile Ranks, Stanines and Stens Normal Distribution: the Bell Curve
    Percentile ranks, stanines and stens Normal distribution: the bell curve • Results from assessments are distributed unevenly. • More students gain close to average results than very high or very low ones. • When plotted on a graph using normal distribution, the shape of the graph resembles a bell curve. @2013 Renaissance Learning. All rights reserved. www.renaissancelearning.com 2 @2013 Renaissance Learning. All rights reserved. www.renaissancelearning.com 3 Standard deviation in test results • The standard deviation (σ) refers to the amount of variation there is from average. • A plot of normal distribution has the average score at the centre, the highest point of the bell curve. • A plot of normal distribution where each section represents 1σ shows that almost all the results fall within ±3σ of the average. • 64.2% of results fall within ±1σ of the average. @2013 Renaissance Learning. All rights reserved. www.renaissancelearning.com 4 Below average results Above average results 2.1% of results fall between 34.1% of results fall within -2σ and -3σ of the average 1σ of the average @2013 Renaissance Learning. All rights reserved. www.renaissancelearning.com 5 Percentile rank • The percentile rank divides a score scale into 100 units. • The percentile rank of a test score is the frequency with which scores are the same or lower than that percentile. • As such, a result with a percentile rank of 84% means that 84% of students performed at or below that level, while 16% performed above that level. @2013 Renaissance Learning. All rights reserved. www.renaissancelearning.com 6 Below average results Above average results 84% of those taking the test performed at or below this level @2013 Renaissance Learning.
    [Show full text]
  • Pooled Synthetic Control Estimates for Continuous Treatments: an Application to Minimum Wage Case Studies
    Pooled Synthetic Control Estimates for Continuous Treatments: An Application to Minimum Wage Case Studies Arindrajit Dubeú Ben Zipperer† October 6, 2014 Abstract We apply the synthetic control approach in a setting with multiple cases and continuous treatments. Using minimum wage changes as an application, we propose a simple distribution-free method for pooling across cases using mean percentile ranks, which have desirable small sample properties. We invert the mean rank statistic in order to construct a confidence interval for the pooled estimate, and we test for the heterogeneity of the treatment effect using the distribution of estimated ranks. We also offer guidance on model selection and match quality—issues that are of practical concern in the synthetic control approach generally and when pooling across many cases. Using 32 cases of state minimum wage increases between 1979 and 2013, we do not find a statistically significant effect on teen employment, with the mean elasticity close to zero. There is also no indication of heterogeneous treatment effects. Finally, we discuss some important practical challenges, including the ability to find close matches and the choice of predictors used for constructing a synthetic control. úUniversity of Massachusetts Amherst, and IZA †Washington Center for Equitable Growth 1 1 Introduction The synthetic control offers a data driven method for choosing control groups that is valuable for individual case studies (Abadie, Diamond and Hainmueller, 2010, hereafter ADH). This in- creasingly popular technique generalizes the difference-in-difference approach and also provides a semi-parametric version of the lagged dependent variables model, offering a way to control for time-varying heterogeneity that complicates conventional regression analysis.
    [Show full text]
  • Econ 3790: Business and Economics Statistics
    Econ 3790: Business and Economics Statistics Instructor: Yogesh Uppal Email: [email protected] Today’s Lecture Numerical methods for summarizing data: Location Variability Distribution measures Measures of Location Mean Median Mode Percentiles Quartiles If the measures are computed for data from a sample, they are called sample statistics. If the measures are computed for data from a population, they are called population parameters. Mean The mean of a data set is the average of all the data values. The sample mean x is the point estimator of the population mean . Sample Mean ( x) Sum of the values of the n observations xi x n Number of observations in the sample Population Mean Sum of the values of the N observations xi N Number of observations in the population Go Penguins, Again!!! Rushing Month Opponents TDs Sep SLIPPERY ROCK 4 Sep NORTHEASTERN 4 Sep at Liberty 1 Sep at Pittsburgh 0 Oct ILLINOIS STATE 1 Oct at Indiana State 4 Oct WESTERN ILLINOIS 2 Oct MISSOURI STATE 4 Oct at Northern Iowa 0 Nov at Southern Illinois 0 Nov WESTERN KENTUCKY 3 Sample Mean of TDs x 23 x i 2.09 n 11 Go Penguins…… Rushing TDs Frequency fi.*xi 0 3 0*3=0 1 2 1*2=2 2 1 2*1=2 3 1 3*1=3 4 4 4*4=16 Total=11 Total=23 f x 23 x i i 2.09 n 11 Sample Mean for Grouped Data f M x i i n Where Mi = the mid-point for class i fi = the frequency for class i n = the sample size Median The median of a data set is the value in the middle when the data items are arranged in ascending order.
    [Show full text]
  • Standardized Assessment for Clinical Practitioners: a Primer Lawrence G
    Standardized Assessment for Clinical Practitioners: A Primer Lawrence G. Weiss, PhD Vice President, Global Research & Development psy·cho·met·rics /ˌsīkəˈmetriks/ noun the science of measuring mental capacities and processes. Psychometrics is the study of the measurement of human behavior, concerned with constructing reliable and valid instruments, as well as standardized procedures for measurement. This overview of basic psychometric principles is meant to help you evaluate the quality of the standardized assessment tools you use in your practice. What distinguishes a standardized assessment from a nonstandardized assessment? Standardized test Nonstandardized test • Defined structures for administering and scoring that are Nonstandardized test are usually created and shared by a followed in the same way by every professional who uses clinician because no standardized tests exist for what they need the test to assess. Clinician-created measures are a step toward creating a standardized test because they measure constructs that are • Structured procedures for interpreting results usually clinically meaningful. However, problems can occur when: involving comparing a client’s score to the scores of a representative sample of people with similar characteristics • They do not have data from a large number of subjects who (age, sex, etc.) were tested and whose performance was scored in exactly the same way by each clinician, according to a structured set • Data have been collected on large numbers of subjects and of rules a set of structured
    [Show full text]
  • Statistical Analysis of Mrna-Mirna Interaction
    Statistical Analysis of mRNA-miRNA Interaction A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science at George Mason University by Surajit Bhattacharya Master of Sciences George Mason University, 2012 Director: Daniel N. Cox, Associate Professor School of Systems Biology Spring Semester 2013 George Mason University Fairfax, VA This work is licensed under a creative commons attribution-noderivs 3.0 unported license. ii DEDICATION I dedicate this to all my teachers from whom I have learned a lot and am still in the process of learning more. iii ACKNOWLEDGEMENTS I would like to thank my parents, my wife, friends, relatives, and supporters who have made this happen. I really would like to thank Dan Veltri, for his immense help in the computational aspect of the project and Srividya Chandramouli for helping me understand the biological aspect of the problem. I would also like to thank Jaimin Patel for helping me out with the database management and UI aspect of the Project. I would also like to thank Dr. Anand Vidyashankar, Dr. Larry Tang and Dr. Daniel Cox for the immense help in the project. iv TABLE OF CONTENTS List of Figures .................................................................................................................... vi List of Equations .............................................................................................................. viii List of Abbreviations ........................................................................................................
    [Show full text]
  • Standard Scores Outline.Pdf
    1 Standard Scores Richard S. Balkin, Ph.D., LPC-S, NCC 2 Normal Distributions While Best and Kahn (2003) indicated that the normal curve does not actually exist, measures of populations tend to demonstrate this distribution It is based on probability—the chance of certain events occurring 3 Normal Curve The curve is symmetrical 50% of scores are above the mean, 50% below The mean, median, and mode have the same value Scores cluster around the center 4 Normal distribution "68-95-99" rule One standard deviation away from the mean in either direction (red) includes about 68% of the data values. Another standard deviation out on both sides includes about 27% more of the data (green). The third standard deviation out adds another 4% of the data (blue). 5 The Normal Curve 6 Interpretations of the normal curve Percentage of total space included between the mean and a given standard deviation (z distance from the mean) Percentage of cases or n that fall between a given mean and standard deviation Probability that an event will occur between the mean and a given standard deviation Calculate the percentile rank of scores in a normal distribution Normalize a frequency distribution Test the significance of observed measures in an experiment 7 Interpretations of the normal curve The normal curve has 2 important pieces of information We can view information related to where scores fall using the number line at the bottom of the curve. I refer to this as the score world It can be expressed in raw scores or standard scores—scores expressed in standard deviation units We can view information related to probability, percentages, and placement under the normal curve.
    [Show full text]