Displaying Your Data
Total Page:16
File Type:pdf, Size:1020Kb
Basics and Beyond: Displaying Your Data Mario Davidson, PhD Vanderbilt University School of Medicine Department of Biostatistics Instructor Objectives 1.Understand the types of data and levels of measurement 2.Understand how a Table 1 typically looks 3.Be able to interpret all of the basic graphs. 4.Know the type of displays that may be used dependent upon the type of data and level of measurement 5.Be introduced to less familiar displays of the data Types of Data (Obj1) ●Qualitative Data ● Consist of attributes, labels, or non-numerical entries. ● If you can’t perform mathematical operations or order data, it’s qualitative. ● Ex: Colors in a box of crayons; names; county ●Quantitative Data ● Consist of numerical measurements or counts. ● Ordering is a dead give away ● Ex: BMI; age; numerical grade Levels of Measurement (Obj1) ●Nominal ● Qualitative ● Categorized using names, qualities, or labels ● Ex: Top 5 movies, jersey numbers, type of drug ●Ordinal ● Quantitative or Qualitative ● Can order ● Differences between data are not meaningful. ● Ex: Letter grade, Likert scale such as very dissatisfied to very satisfied Levels of Measurement (Obj1) ●Interval Level of Measurement ● Quantitative ● Can order ● Can calculate meaningful differences ● No Value that means “nothing/none.” A zero entry merely represents a position on a scale (i.e. no inherent zero). ● Ex: Time of day, temperature ●Ratio Level of Measurement ● Quantitative ● Can order ● Can calculate meaningful differences ● There’s a value that means “nothing/none.” ● Ex: Age, weight, test score Popular Displays Description of Table 1 (Obj2) Typically summarizes baseline characteristics of the data. Compares statistics between groups May provide means, medians, confidence intervals, percentiles, percentages, p-values, standard deviations, etc. Summaries of all types of data (e.g. continuous, categorical, nominal, ordinal, interval, ratio) may be used. Likert scale: Scale indicating degree of agreement (e.g. Rate the following statement: I have a had a difficult time focusing on my studies this semester: SD D N A SA Example of a Table 1 (Obj2) Test Your Knowledge Interpret the following graphs. Test Your Knowledge Interpret the following graphs. Cherry or Apple Pies sold the Nearly 15 subjects chose most in January. “Other” pies Saturday as their favorite sold the least day. Sunday was the least chosen. Pie Charts (Obj3) Features (Obj4) – Nominal or Ordinal – Compares Levels of One Characteristic Advantages: Easily Interpreted • Larger Area; Greater Proportion Easy to Create Disadvantages Difficult to Judge Areas Wastes Ink Bar Plots (Obj3) Features (Obj4) – Nominal and Ordinal – Compares Advantages Same as Pie Chart Disadvantages Similar to Pie Chart No such thing as an Analyte 2.5 Ordering can Change Perception Test Your Knowledge Interpret the following graphs Test Your Knowledge The most frequent BMI seems to be There were 8 subject weighing approximately around 24-26. approximately 0 grams. There was only one weighing 10 grams. Histograms (Obj3) Features – Shows Distribution – Continuous – One Characteristic (Obj4) Advantages Easy to Interpret Easy to Produce Disadvantages Size of Bins can Change Perception Cannot Read Exact Values Dot Plot (Obj3) Features (Obj4) –One Characteristic –Ordinal Advantages Good for Small and Moderate Data Easily Interpreted Disadvantages May not be Best Option with Large Data Not Produced in all Packages Stem and Leaf Plot (Obj3) Features (Obj4) – One Characteristic – Ordinal Advantages Useful with Small Data and May be Used with Large Data Can be produce by hand Easily Interpreted Useful with Numeric Disadvantages The most frequent USMLE1 scores in our May be Difficult to Measure data were in the 220's, 230's, and 260's. Center The highest and lowest scores were 190 and 278 respectively. Not Appealing Test Your Knowledge Why is this graph difficult to What is the trend? interpret? An outlier is data that is a What is the trend? numerical distance from the rest. Can you find one? Test Your Knowledge There is no y-label. Seems to be a slight positive trend: as age increases so R is a statistical software. does POMS. From Jan-Dec, there is an The arrows suggest 2 upward trend. possibly outliers. Line Graph (Obj3) Features (Obj4) – One Characteristic – Used with Ordinal and Continuous – Displays Associations, Trends, and Range Advantages Produced in Most Packages Line Graph with Rugplot Scatterplot (Obj3) Features (Obj4) – Continuous and Ordinal – Shows Associations – Shows Trend Advantages Shows all of Data Produced in Most Packages – not the Line Exact values shown Easily Interpreted Disadvantage May not be Best Way for Large Data Less Familiar Graphs Boxplot (Obj3 and Obj5) Features Continuous by Nominal or Ordinal (Obj4) May Compare Groups Advantages Good Summary: Min, 1Q, 2Q(median), 3Q, Max Disadvantages Does not Display All the Data Not as Appealing Cannot be Created in All Packages May not be as Recognized by Some Boxplot The median tooth length for orange juice at 1dose of Vitamin C was roughly 25 units. The first quartile length for 1 dose of ascorbic acid was approx. 15. As Vitamin C doses increase tooth length increases. Overall, it appears that those using orange juice had greater length given the same dose and excluding possibly a Vitamin C dose of two. There was an outlier for the ascorbic acid at dose 1. Boxplot Overlayed with Stripchart (Obj5) Features – Same as Boxplot Advantages Same as Boxplot Can See All of the Data Disadvantage Many Programs Cannot Create Dot Chart (Obj5) Features Nominal, Ordinal Characteristics with a Continuous Outcome (Obj4) – Can Compare Levels and Groups Advantages Easily Interpreted Size of Data Irrelevant Disadvantage Not as Recognized as Bar Graphs and Pie Charts Kaplan Meier Curve (Obj5) Demonstrates the probability of survival The plot suggests that males have a more favorable rate of survival over the years. Can be created in Number at Risk most programs Probably Even Less Familiar Graphs Spaghetti Plot (Obj5) ●Alzheimer's Disease ●Verbal IQ – Words that could not be sounded out (e.g. Depot) Spaghetti Plot Features (Obj4) The overall trend suggest that Continuous, as age increases so do – earnings. Longitudinal – Two Characteristics – Shows Trend Advantages Shows all of the Data Disadvantages Not Available in All Packages E May be Difficult to a r n i Interpret n g s ( t h s d o f d Age(yrs) o l l a r s ) Dendogram: Cluster (Obj5) Useful for Determining Clustering May Help to Remove Variables (Data Reduction) PGY clustered Clinical Year Scatter Plot with Marginal Histograms (Obj5) Continuous Virtually appealing Shows trends, associations, and the distributions of the data Cannot be created in many programs Large Data Sets Sunflower Plot (Obj5) Large data sets The more ink used, the more dense the data Ordinal More fresh embryos to the uterine were transferred on day 3. Heat Map (Obj5) ●Encephalitis ●Red ● Proportion of Presence ●Green ● Proportion of Absence ●White ● Missing ●Light/Dark ● Intensity of Presence of Attribute Heat Map Similar to the Hexagon Plot Lightness or Darkness Indicates Intensity May not be Created in Some Programs Nomogram (Obj5) May Provide Risk, Probability, etc. Useful in Providing Predictive Scores Sum the “Points” for each category, find the “Total Points,” then look at the corresponding “Risk of Death.” 40 yo, Male, 200 Cholesterol, and 170 BP has Approximately a 48% Risk of Death Multidimensional Plot (Obj5) http://data.vanderbilt.edu/rapache/bbplot/ Multidimensional Plot (Obj5) Conclusion Always try to think of the best way to display your story (data). Consider your target audience. When publishing, color may cost. References Hamid, et al. BMC Infectious Diseases 2010, 10:364. http://www.biomedcentral.com/1471-2334/10/364 Grober, E, Hall, CB, Lipton, RB, Zonderman, AB, Resnick, SM, and Kawas, C (2009). Memory impairment, executive dysfunction, and intellectual decline in preclinical Alzheimer's disease. Journal of the International Neuropsychological Society, 14(2), 266-278. http://data.vanderbilt.edu/rapache/bbplot/ .