<<

STAT1010 – picturing data

3.2 Visualizing Distributions of data

! A frequency table provides on the distribution of data. " When we discuss the distribution of a variable, we are referring to the possible values, and which of the values occur more (or less) frequently than the others.

Political affiliation Frequency Occurred Democrat 517 a lot Possible Republican 371 values Independent 112 Occurred less frequently1

The distribution of the data

! The distribution of data is the way the data values are spread over all possible values. " What values occur frequently? " If the variable is numeric, what is the maximum value? What is the minimum value? " What is the “shape” of the distribution Weight of Co ntents of Cans of Cola 15

10 y c n e u q e r 5 F

0

330 340 350 360 370 380 390 Weight (grams) 2

Graphical displays of distributions

! As the phrase goes… “a picture is worth 1000 words”, and distributions are often better conveyed using graphics rather than

tables. Political affiliation in a 1000 person survey 600

Political Frequency 500 affiliation

400 Democrat 517

300 Republican 371 200 Independent 112 of affiliation frequency 100 0

Democrat Republican Independent 3 politican affiliation

1 STAT1010 – picturing data

Pioneer in

! Florence Nightingale " See video clip from “Joy of Statistics”

4

Bar graph

! Used to represent frequencies (or relative frequencies) for qualitative or categorical variables. Political affiliation in a 1000 person survey 600 500

400

300 200 frequency of affiliation frequency 100 0

Democrat Republican Independent 5 politican affiliation

Bar graph - labels

! Always provide useful labels. Main title Tick marks Political affiliation in a 1000 person survey 600

Vertical 500

axis label 400 300 200 frequency of affiliation frequency 100

Horizontal 0 Democrat Republican Independent axis Categories politican affiliation label 6

2 STAT1010 – picturing data

Bar graph - formatting

! Some things to remember… Space Some white Political affiliation in a 1000 person survey between space at top 600 bars (specifically 500 when this

400 is a categorical 300 variable

200 plot) frequency of affiliation frequency 100 0

Democrat Republican Independent

politican affiliation

Uniform (arbitrary) bar widths 7

Bar graph – Pareto chart

! A bar graph in which the bars are arranged in frequency order is called a Pareto chart. Political affiliation in a 1000 person survey 600

500

400 A Pareto chart 300 (descending order) 200 frequency of affiliation frequency 100 0

Democrat Republican Independent

politican affiliation 8

Bar graph – Pareto chart

! A bar graph in which the bars are arranged in frequency order is called a Pareto chart. Political affiliation in a 1000 person survey 600

500

400 Not a 300 Pareto chart 200 frequency of affiliation frequency 100 0

Democrat Independent Republican

politican affiliation 9

3 STAT1010 – picturing data

Bar graph – Pareto chart

A Pareto chart (and also a )

Not a Pareto chart (but it is a bar chart)

10

Example: Deflategate

! In 2014, there was a National Football League (NFL) scandal called ‘Deflategate’.

! The Patriots were accused of underinflating their game footballs, which would allow for fewer fumbles (an unfair advantage).

! Did it look like the Patriots had fewer fumbles? If so, how many fewer? We will actually look at the data as number of Plays per Fumble (high-> fewer fumbles). 11

Example: Deflategate (offensive plays)

Frequency Presented of fumbles as plays per fumble

Categories (i.e. teams) 12 http://www.sharpfootballanalysis.com/blog/2015/the-new-england-patriots-prevention-of-fumbles-is-nearly-impossible

4 STAT1010 – picturing data

Dot plot – similar to a bar graph

! If there are only a small number of observations (or counts), a dot plot can be used. ! One dot per observation.s ! Sometimes seen as a quick and easy plot in the engineering field.

13

Pie Charts

! Also used to plot qualitative variables. ! A pie chart is a circle divided so that each wedge represents the relative frequency of a particular category. Political affiliation in a 1000 person survey

Democrat Political Frequency Relative affiliation frequency 51.7% Democrat 517 0.517 Republican 371 0.371 11.2% Independent 112 0.112 Independent 37.1%

Republican 14

Pie Charts ! As I may have mentioned earlier, research has been done that shows that our brains do not interpret pie charts very well. ! Consider other options first before presenting a pie chart. Our brains comprehend this one better than this one.

15

5 STAT1010 – picturing data

Histograms

! A is like a bar graph, but it shows a distribution for a quantitative variable.

! The bars have a natural order (thus, the classes must be quantitative in nature) and the bar widths have specific meaning.

! The bars in a histogram touch each other because there are no gaps between the

categories. 16

How ‘often’ a value Histogram falls into a given bin

Quantitative values grouped into bins Frequency Frequency

Measurement

17

Histogram Example

! 24 cola cans were sampled and weighed. ! A frequency table and histogram were created: Weight of Contents of Cans of Cola

15 Class range Frequency of values 10 y c

[340,350) 1 n e u

[350,360) 11 q e r 5 [360,370) 8 F [370,380) 4 0

330 340 350 360 370 380 390 Weight (grams) 18

6 STAT1010 – picturing data

Axes and labels Histogram Example still important.

Weight of Contents of Cans of Cola Some white No space 15 space at top between bars 10 y (specifically c n

e when this u q e r 5 is a F quantitative variable 0 plot) 330 340 350 360 370 380 390 Weight (grams) Rearranging these bars (as we did in a Pareto chart for qualitative data) would not make sense here.

The classes are in order from smallest to largest. 19

Histogram Example

! Same data, more classes (narrower bins)… histogram looks a bit different.

Class range Frequency Weight of Contents of Cans of Cola of values [345,350) 1 10 [350,355) 6

[355,360) 5 y c n

e 5 [360,365) 1 u q e r

[365,370) 7 F [370,375) 3

[375,380) 1 0

330 340 350 360 370 380 390 Weight (grams) 20

Example: Deflategate (all plays) NOTE: This author should have the Number of teams bars touching each other for a correct falling into each bin histogram presentation.

Don’t put space between bars in a histogram

Patriots and their 187 plays/fumble

Numeric variable 21 http://www.sharpfootballanalysis.com/blog/2015/the-new-england-patriots-prevention-of-fumbles-is-nearly-impossible

7 STAT1010 – picturing data

Displaying Quantitative Data

! Histogram " Provides a picture or shape of the distribution of the data. " Collects values into bins. " Bins should be of equal width and they should touch each other. " Different bin choices can yield different pictures. " Can show frequencies or relative frequencies

22

Stem-and-leaf plots

! We can’t see individual data points in a histogram due to the binning and the use of the bars for frequencies. ! A stem-and-leaf plot is similar to a histogram, but individual data points are identified. ! As with dot plots, this type of plot probably makes the most sense when the number of observations is relatively small.

23

Stem-and-leaf plots

! One leaf is associated with one data point.

! Example data: 5.4, 0.7, 3.0, 2.6 0.3, 2.8, 5.2, 2.6

Here, a ‘leaf’ is the value one place to the right of the decimal place. 24

8 STAT1010 – picturing data

Stem-and-leaf plots

! One leaf is associated with one data point.

! Example data: 5.4, 0.7, 3.0, 2.6 0.3, 2.8, 5.2, 2.6

Here, a ‘leaf’ is the value one place to the right of the decimal place. 25

Stem-and-leaf example

! Recall the 80 observations on compressive strengths:

105 97 245 163 207 134 218 199 160 196 221 154 228 131 180 178 157 151 175 201 183 153 174 154 190 76 101 142 149 200 186 174 199 115 193 167 171 163 87 176 121 120 181 160 194 184 165 145 160 150 181 168 158 208 133 135 172 171 237 170 180 167 176 158 156 229 158 148 150 118 143 141 110 133 123 146 169 158 135 149

26

7 | 6 8 | 7 Stem-and-leaf example 9 | 7 10 | 15 ! 80 observations 11 | 058 12 | 013 ! Min: 76, Max: 245 13 | 133455 14 | 12356899 ! Here, a ‘leaf’ represents 15 | 001344678888 16 | 0003357789 the “ones place”. 17 | 0112445668 18 | 0011346 ! Looks somewhat like a 19 | 034699 20 | 0178 histogram turned on its 21 | 8 side, but we can identify 22 | 189 23 | 7 individual data points. 24 | 5

! Gives you a feel for the The decimal point is 1 digit(s) to the right of the |

distribution of the data. 27

9 STAT1010 – picturing data

Line charts

! Also used to represent a quantitative variable.

! Created by connecting the ‘center dots’ at the top of the bars of a histogram.

28

Line chart example A histogram is also shown here, but it is not part of the line chart

29

Time-Series Graph

! If a histogram or line chart has a horizontal axis of time, then it is a time-series graph. ! Time series plots show how things change over time. ! Often used with financial market information or housing data.

30

10 STAT1010 – picturing data

Time-Series Graph – example

! A line chart with a horizontal axis of time (Year) # a times series graph.

31

Time-Series Graph – example Homes sold in Iowa City by zip code and month

1) What is the general trend over the years 2006-2011?

2) What is the general trend within each year?

3) What is the width of the underlying bin?

Year (data by the month) 32

Time-Series Graph – example

Number of Olympic medals

1) What is the width of the underlying bin?

Year 33

11