STAT1010 – Picturing Data 1
Total Page:16
File Type:pdf, Size:1020Kb
STAT1010 – picturing data 3.2 Visualizing Distributions of data ! A frequency table provides information on the distribution of data. " When we discuss the distribution of a variable, we are referring to the possible values, and which of the values occur more (or less) frequently than the others. Political affiliation Frequency Occurred Democrat 517 a lot Possible Republican 371 values Independent 112 Occurred less frequently1 The distribution of the data ! The distribution of data is the way the data values are spread over all possible values. " What values occur frequently? " If the variable is numeric, what is the maximum value? What is the minimum value? " What is the “shape” of the distribution Weight of Contents of Cans of Cola 15 10 y c n e u q e r 5 F 0 330 340 350 360 370 380 390 Weight (grams) 2 Graphical displays of distributions ! As the phrase goes… “a picture is worth 1000 words”, and distributions are often better conveyed using graphics rather than tables. Political affiliation in a 1000 person survey 600 Political Frequency 500 affiliation 400 Democrat 517 300 Republican 371 200 Independent 112 frequency of affiliation 100 0 Democrat Republican Independent 3 politican affiliation 1 STAT1010 – picturing data Pioneer in Statistical Graphics ! Florence Nightingale " See video clip from “Joy of Statistics” 4 Bar graph ! Used to represent frequencies (or relative frequencies) for qualitative or categorical variables. Political affiliation in a 1000 person survey 600 500 400 300 200 frequency of affiliation 100 0 Democrat Republican Independent 5 politican affiliation Bar graph - labels ! Always provide useful labels. Main title Tick marks Political affiliation in a 1000 person survey 600 Vertical 500 axis label 400 300 200 frequency of affiliation 100 Horizontal 0 Democrat Republican Independent axis Categories politican affiliation label 6 2 STAT1010 – picturing data Bar graph - formatting ! Some things to remember… Space Some white Political affiliation in a 1000 person survey between space at top 600 bars (specifically 500 when this 400 is a categorical 300 variable 200 plot) frequency of affiliation 100 0 Democrat Republican Independent politican affiliation Uniform (arbitrary) bar widths 7 Bar graph – Pareto chart ! A bar graph in which the bars are arranged in frequency order is called a Pareto chart. Political affiliation in a 1000 person survey 600 500 400 A Pareto chart 300 (descending order) 200 frequency of affiliation 100 0 Democrat Republican Independent politican affiliation 8 Bar graph – Pareto chart ! A bar graph in which the bars are arranged in frequency order is called a Pareto chart. Political affiliation in a 1000 person survey 600 500 400 Not a 300 Pareto chart 200 frequencyof affiliation 100 0 Democrat Independent Republican politican affiliation 9 3 STAT1010 – picturing data Bar graph – Pareto chart A Pareto chart (and also a bar chart) Not a Pareto chart (but it is a bar chart) 10 Example: Deflategate ! In 2014, there was a National Football League (NFL) scandal called ‘Deflategate’. ! The Patriots were accused of underinflating their game footballs, which would allow for fewer fumbles (an unfair advantage). ! Did it look like the Patriots had fewer fumbles? If so, how many fewer? We will actually look at the data as number of Plays per Fumble (high-> fewer fumbles). 11 Example: Deflategate (offensive plays) Frequency Presented of fumbles as plays per fumble Categories (i.e. teams) 12 http://www.sharpfootballanalysis.com/blog/2015/the-new-england-patriots-prevention-of-fumbles-is-nearly-impossible 4 STAT1010 – picturing data Dot plot – similar to a bar graph ! If there are only a small number of observations (or counts), a dot plot can be used. ! One dot per observation.s ! Sometimes seen as a quick and easy plot in the engineering field. 13 Pie Charts ! Also used to plot qualitative variables. ! A pie chart is a circle divided so that each wedge represents the relative frequency of a particular category. Political affiliation in a 1000 person survey Democrat Political Frequency Relative affiliation frequency 51.7% Democrat 517 0.517 Republican 371 0.371 11.2% Independent 112 0.112 Independent 37.1% Republican 14 Pie Charts ! As I may have mentioned earlier, research has been done that shows that our brains do not interpret pie charts very well. ! Consider other options first before presenting a pie chart. Our brains comprehend this one better than this one. 15 5 STAT1010 – picturing data Histograms ! A histogram is like a bar graph, but it shows a distribution for a quantitative variable. ! The bars have a natural order (thus, the classes must be quantitative in nature) and the bar widths have specific meaning. ! The bars in a histogram touch each other because there are no gaps between the categories. 16 How ‘often’ a value Histogram falls into a given bin Quantitative values grouped into bins Frequency Frequency Measurement 17 Histogram Example ! 24 cola cans were sampled and weighed. ! A frequency table and histogram were created: Weight of Contents of Cans of Cola 15 Class range Frequency of values 10 y c [340,350) 1 n e u [350,360) 11 q e r 5 [360,370) 8 F [370,380) 4 0 330 340 350 360 370 380 390 Weight (grams) 18 6 STAT1010 – picturing data Axes and labels Histogram Example still important. Weight of Contents of Cans of Cola Some white No space 15 space at top between bars 10 y (specifically c n e when this u q e r 5 is a F quantitative variable 0 plot) 330 340 350 360 370 380 390 Weight (grams) Rearranging these bars (as we did in a Pareto chart for qualitative data) would not make sense here. The classes are in order from smallest to largest. 19 Histogram Example ! Same data, more classes (narrower bins)… histogram looks a bit different. Class range Frequency Weight of Contents of Cans of Cola of values [345,350) 1 10 [350,355) 6 [355,360) 5 y c n e 5 [360,365) 1 u q e r [365,370) 7 F [370,375) 3 [375,380) 1 0 330 340 350 360 370 380 390 Weight (grams) 20 Example: Deflategate (all plays) NOTE: This author should have the Number of teams bars touching each other for a correct falling into each bin histogram presentation. Don’t put space between bars in a histogram Patriots and their 187 plays/fumble Numeric variable 21 http://www.sharpfootballanalysis.com/blog/2015/the-new-england-patriots-prevention-of-fumbles-is-nearly-impossible 7 STAT1010 – picturing data Displaying Quantitative Data ! Histogram " Provides a picture or shape of the distribution of the data. " Collects values into bins. " Bins should be of equal width and they should touch each other. " Different bin choices can yield different pictures. " Can show frequencies or relative frequencies 22 Stem-and-leaf plots ! We can’t see individual data points in a histogram due to the binning and the use of the bars for frequencies. ! A stem-and-leaf plot is similar to a histogram, but individual data points are identified. ! As with dot plots, this type of plot probably makes the most sense when the number of observations is relatively small. 23 Stem-and-leaf plots ! One leaf is associated with one data point. ! Example data: 5.4, 0.7, 3.0, 2.6 0.3, 2.8, 5.2, 2.6 Here, a ‘leaf’ is the value one place to the right of the decimal place. 24 8 STAT1010 – picturing data Stem-and-leaf plots ! One leaf is associated with one data point. ! Example data: 5.4, 0.7, 3.0, 2.6 0.3, 2.8, 5.2, 2.6 Here, a ‘leaf’ is the value one place to the right of the decimal place. 25 Stem-and-leaf example ! Recall the 80 observations on compressive strengths: 105 97 245 163 207 134 218 199 160 196 221 154 228 131 180 178 157 151 175 201 183 153 174 154 190 76 101 142 149 200 186 174 199 115 193 167 171 163 87 176 121 120 181 160 194 184 165 145 160 150 181 168 158 208 133 135 172 171 237 170 180 167 176 158 156 229 158 148 150 118 143 141 110 133 123 146 169 158 135 149 26 7 | 6 8 | 7 Stem-and-leaf example 9 | 7 10 | 15 ! 80 observations 11 | 058 12 | 013 ! Min: 76, Max: 245 13 | 133455 14 | 12356899 ! Here, a ‘leaf’ represents 15 | 001344678888 16 | 0003357789 the “ones place”. 17 | 0112445668 18 | 0011346 ! Looks somewhat like a 19 | 034699 20 | 0178 histogram turned on its 21 | 8 side, but we can identify 22 | 189 23 | 7 individual data points. 24 | 5 ! Gives you a feel for the The decimal point is 1 digit(s) to the right of the | distribution of the data. 27 9 STAT1010 – picturing data Line charts ! Also used to represent a quantitative variable. ! Created by connecting the ‘center dots’ at the top of the bars of a histogram. 28 Line chart example A histogram is also shown here, but it is not part of the line chart 29 Time-Series Graph ! If a histogram or line chart has a horizontal axis of time, then it is a time-series graph. ! Time series plots show how things change over time. ! Often used with financial market information or housing data. 30 10 STAT1010 – picturing data Time-Series Graph – example ! A line chart with a horizontal axis of time (Year) # a times series graph. 31 Time-Series Graph – example Homes sold in Iowa City by zip code and month 1) What is the general trend over the years 2006-2011? 2) What is the general trend within each year? 3) What is the width of the underlying bin? Year (data by the month) 32 Time-Series Graph – example Number of Olympic medals 1) What is the width of the underlying bin? Year 33 11 .