Visualiation of Quantitative Information
Total Page:16
File Type:pdf, Size:1020Kb
Visualisation of quantitative information Overview 1. Visualisation 2. Approaching data 3. Levels of measurement 4. Principals of graphing 5. Univariate graphs 6. Graphical integrity James Neill, 2011 2 Is Pivot a turning point for web exploration? (Gary Flake) (TED talk - 6 min. ) 4 Approaching Entering & data screening Approaching Exploring, describing, & data graphing Hypothesis testing 5 6 Describing & graphing data THE CHALLENGE: to find a meaningful, accurate way to depict the ‘true story’ of the data 7 Clearly report the data's main features 10 Levels of measurement • Nominal / Categorical • Ordinal • Interval • Ratio 12 Discrete vs. continuous Discrete - - - - - - - - - - Continuous ___________ Each level has the properties of the preceding 14 13 levels, plus something more! Categorical / nominal Ordinal / ranked scale • Conveys a category label • Conveys order , but not distance • (Arbitrary) assignment of #s to e.g. in a race, 1st, 2nd, 3rd, etc. or categories ranking of favourites or preferences e.g. Gender • No useful information, except as labels 15 16 Ordinal / ranked example: Ranked importance Interval scale Rank the following aspects of the university according to what is most • Conveys order & distance important to you (1 = most important • 0 is arbitrary through to 5 = least important) e.g., temperature (degrees C) __ Quality of the teaching and education • Usually treat as continuous for > 5 __ Quality of the social life intervals __ Quality of the campus __ Quality of the administration __ Quality of the university's reputation 17 18 Interval example: 8 point Likert scale Ratio scale • Conveys order & distance • Continuous, with a meaningful 0 point e.g. height, age, weight, time, number of times an event has occurred • Ratio statements can be made e.g. X is twice as old (or high or heavy) 19 as Y 20 Ratio scale: Why do levels of Time measurement matter? Different analytical procedures are used for different levels of data. More powerful statistics can be applied to higher levels 21 22 Graphs Principles of graphing (Edward Tufte) • Visualise data • Reveal data – Describe – Explore – Tabulate – Decorate • Communicate complex ideas with clarity, precision, and efficiency 23 24 Tufte's graphing guidelines Tufte's graphing guidelines • Show the data • Maximise the information-to-ink • Avoid distortion ratio • Focus on substance rather than • Encourage the eye to make method comparisons • Present many numbers in a small • Reveal data at several space levels/layers • Make large data sets coherent • Closely integrate with statistical and verbal descriptions 25 26 Software for Graphing steps data visualisation (graphing) 1. Identify the purpose of the graph 2. Select which type of graph to use 1. Statistical packages 3. Draw a graph ● e.g., SPSS 4. Modify the graph to be clear, 2. Spreadsheet packages non-distorting, and well-labelled. ● e.g., MS Excel 5. Disseminate the graph (e.g., 3. Word-processors include it in a report) ● e.g., MS Word – Insert – Object – Micrograph Graph Chart 27 28 Univariate graphs • Bar graph • Pie chart Univariate graphs • Data plot • Error bar • Stem & leaf plot • Box plot (Box & whisker) • Histogram 29 30 Bar chart (Bar graph) Pie chart • Examine comparative heights of bars • X-axis: Collapse if too many categories • Use a bar chart instead • Y-axis: Count or % or mean? • Hard to read • Consider whether to use data labels –Does not show small differences 13 12 11 12 10 –Rotation / position influences 12 9 8 11 perception 7 11 6 Bio logy Count Count Sociolo gy 5 10 4 10 3 2 9 1 Anthropology 9 0 P sy cholo gy Sociology Information T echnolo Biology Sociology Information T echnolo Biology P sychology Anthropology P sychology Anthropology AREA AREA 31 Information32 T echnolo Data plot & error bar Stem & leaf plot Data plot Error bar ● Alternative to histogram ● Use for ordinal, interval and ratio data ● May look confusing to unfamiliar reader 33 34 Stem & leaf plot Box plot • Contains actual data (Box & • Collapses tails whisker) Frequency Stem & Leaf 7.00 1 . & 192.00 1 . 22223333333 ● Useful for 541.00 1 . 444444444444444455555555555555 610.00 1 . 6666666666666677777777777777777777 849.00 1 . 88888888888888888888888888899999999999999999999 interval and 614.00 2 . 0000000000000000111111111111111111 602.00 2 . 222222222222222233333333333333333 ratio data 447.00 2 . 4444444444444455555555555 291.00 2 . 66666666677777777 ● Represents 240.00 2 . 88888889999999 167.00 3 . 000001111 min., max, 146.00 3 . 22223333 153.00 3 . 44445555 118.00 3 . 666777 median, 99.00 3 . 888999 106.00 4 . 000111 quartiles, & 54.00 4 . 222 339.00 Extremes (>=43) outliers 35 36 Box plot Histogram • Alternative to histogram • For continuous data • Useful for screening • X-axis needs a happy medium for • Useful for comparing variables # of categories • Can get messy - too much info • Y-axis matters (can exaggerate) • Confusing to unfamiliar reader 3 00 0 10 2 00 0 60 0 8 50 0 1 00 0 40 0 1000 6 Std. Dev = 309.16 0 Mean = 24 .0 800 0 N = 5 57 5.00 20 0 4 28294482267253154120226228451504231939983902646355221793020527435314997364541416412902548168628144167196326144171955174443826882822262617931747148 1 2 .5 2 2.5 32 .5 4 2. 5 52 .5 62 .5 218736735510399522434250553623594998649620510638344230032962562527 44954162578259628414042044353275182341862330517623006559128211495 600 35644317149302843626902101233519693009296541539905538229314216883634 3201419358828475475400198324512898200336473 Participant Age 569 27433593251521081985531655582138303424526783352317 52157129504268724318255928345427211669040523444 10 0 St d. Dev = 9.16 5491 2480296024926454284316542285186 4423423635403519067273946893137 M ean = 24.0 419324766472662291 3562338330403962312229 400 2 226520672270403852527688296021515564300430321938532836535506271835192336608405435012183292849986302224518624385114882241 6084308 1503275241623466255243493045 12255255545 0 N = 5 57 5.00 27806412743294423212570661146542792576430229232476 17 304032431371222596415943511907247380 2385410773323584004 4028 8.0 13.0 18.0 23.0 28.0 33.0 38.0 43. 0 48.0 53.0 58.0 63. 0 231214932334 2699 18082659 4308292014254307 3556334 197862231372721142861 552433515563 T ime Management-T1 200 Std. Dev = 9.16 Participant Age 0 Self-Confidence-T1 Mean = 24 0 N = 5 575.00 Missing Male Female 9 17 25 33 41 49 57 65 1 3 21 29 37 45 53 61 Participant Gender 37 Participant Age 38 Non-normal distributions Histogram of male & female heights 39 40 Non-normal distributions Histogram of weight Histogram 8 6 4 2 Std. Dev = 17.10 Mean = 69.6 0 N = 20.00 Frequency 40.0 50.0 60.0 70.0 80.0 90.0 100.0 110.0 WEIGHT 41 42 Histogram of daily calorie intake Histogram of fertility 43 44 Example ‘normal’ distribution Example ‘normal’ distribution 60 1 2 60 50 40 40 Count 30 Frequency 20 20 10 Mean =81.21 Std. Dev. =18.228 0 N =188 Very feminine Fairly feminine Androgynous Fairly masculine Very masculine 0 45 46 0 20 40 60 80 100 120 140 Femininity-Masculinity 2 Effects of skew on measures of 60 central tendency 40 Count 20 0 Very feminine Fairly feminine Androgynous Fairly masculine Very masculine Femininity-Masculinity Gender: male Gender: male 50 50 40 40 30 30 Count Count 20 20 10 10 0 0 Fairly feminine Androgynous Fairly masculine Very masculine Fairly feminine Androgynous Fairly masculine Very masculine Femininity-Masculinity Femininity-Masculinity 47 48 Line graph Summary: • Alternative to histogram Graphs & levels of measurement • Implies continuity e.g., time NOIR • Can show multiple lines Bar chart & pie chart NOI 8.0 Histogram IR 7.5 Stem & leaf IR 7.0 6.5 Mean Data plot & box plot IR 6.0 Error-bar IR 5.5 Line graph IR 5.0 OVERALL SCALES-T0 OVERALL SCALES-T 2 OVERALL SCALES-T 1 OVERALL SCALES-T 3 49 50 Graphing can be like Graphical a bikini. What they integrity reveal is suggestive, (part of academic but what they integrity) conceal is vital. (aka Aaron Levenstein) 51 52 "Like good writing, good graphical Cleveland’s hierarchy displays of data communicate ideas with clarity, precision, and efficiency. Like poor writing, bad graphical displays distort or obscure the data, make it harder to understand or compare, or otherwise thwart the communicative effect which the graph should convey." Michael Friendly – Gallery of Data Visualisation 53 54 Cleveland’s hierarchy: Best to worst Tufte’s graphical integrity 1.Position along a common scale • Some lapses intentional, some not 2.Position along identical, non aligned scales • Lie Factor = size of effect in graph size of effect in data 3.Length • Misleading uses of area 4.Angle-slope • Misleading uses of perspective 5.Area • Leaving out important context 6.Volume • Lack of taste and aesthetics 7.Color hue - color saturation - density 55 56 Review questions Can you complete this table? 1.If a survey question produces a Level Properties Examples Descriptive Graphs ‘floor effect’, where will the mean, Statistics median and mode lie in relation to Nominal one another? /Categorical Ordinal / 2.Over the last century, the Rank performance of the best baseball Interval hitters has declined. Does this imply Ratio that the overall performance of baseball batters has decreased? Answers: http://wilderdom.com/research/Summary_Levels_Measurement.html 57 58 Links References • Presenting Data – Statistics Glossary v1.1 - 1. Cleveland, W. S. (1985). The elements http://www.cas.lancs.ac.uk/glossary_v1.1/presdata.html of graphing data . Monterey, CA: • A Periodic Table of Visualisation Methods - Wadsworth. http://www.visual-literacy.org/periodic_table/periodic_table.html • Gallery of Data Visualization - 2. Jones, G. E. (2006). How to lie with http://www.math.yorku.ca/SCS/Gallery/ charts . Santa Monica, CA: LaPuerta. • Univariate Data Analysis – The Best & Worst of Statistical Graphs - http://www.csulb.edu/~msaintg/ppa696/696uni.htm 3. Tufte, E. (1983). The visual display of • Pitfalls of Data Analysis – quantitative information . Cheshire, CT: http://www.vims.edu/~david/pitfalls/pitfalls.htm Graphics Press. • Statistics for the Life Sciences – http://www.math.sfu.ca/~cschwarz/Stat-301/Handouts/Handouts.html 59 60 .