Data Visualization Short Course

Total Page:16

File Type:pdf, Size:1020Kb

Data Visualization Short Course Data Visualization Short Course 3 April 2017 Jim Wisnowski [email protected] (210) 218-1384 1 2 MCOTEA Example 3 Air Force Example ▪ Air Force Magazine Feb 2017 trends for women as a percent of the force 4 http://www.airforcemag.com/MagazineArchive/Magazine%20Documents/2017/February%202017/0217infographic.pdf Callan Chart of Sector Performance (Quilt Chart) 5 https://www.callan.com/wp-content/uploads/2017/01/Callan-PeriodicTbl_KeyInd_2017.pdf One Last Warm-Up ▪ Stephen Few is a guru in the data visualization world ▪ Let’s take his quiz on best practices at www.perceptualedge.com ▪ Goal is to get every one wrong—0/10 is success! 6 Objectives ▪ Appreciate the historical perspective of data visualization ▪ Know the value of data visualization offers to analytics and Big Data ▪ Understand what makes a good graphical display and some of the common mistakes to avoid in graphical design ▪ Be familiar with some methodologies for the data visualization process ▪ Appreciate how to do data viz with a few common software packages 7 Data Visualization is Not New Scottish political economist William Playfair in 1786 recognized superiority of graphs over tabular presentations— published 43 time series plots and one bar chart Developed the first pie chart in 1801 to show distribution of Turkish Empire over Europe, Africa, and Asia Stephen Few states we really didn’t progress much from these original ideas until late 1970s with Princeton’s John Tukey and his Exploratory Data Analysis (EDA) He argues most are unaware of modern methods 8 Data Visualization is Not New ▪ Area chart using color was masterful ▪ Playfair credited with the introduction of bar charts 9 Data Visualization is Not New 10 Exploratory Data Analysis John Tukey, Princeton, 1977 Too much emphasis on hypothesis tests as confirmatory analysis—focus should also be on discovery Objectives – Suggest hypotheses of observed data – Assess statistical test assumptions – Support selection of appropriate methods and tools – Serve as basis for further data collections and experiments If we need a short suggestion of EDA, I would suggest that – It is an attitude; a flexibility; and requires graph paper and transparencies The greatest value of a picture is when it forces us to notice what we never 11 expected to see…John Tukey Data Visualization Fuel Most important aspect of data visualization is the data itself Value goes beyond the enterprise/transactional data itself – Unstructured data, social networks, Internet of Things Data quality is key and dataviz can help improve that! Phil Simon rates organizations on visualization framework – Data (big or small) – Visualization (static or interactive) Start small and scale If we have data, let’s look at the data. If all we have are opinions, let’s go with mine. 12 Jim Barksdale, Netscape Data is Growing • Big Data is overused term, but we know there is GOLD in those data mountains • 15 Tb of Twitter daily is a lot of data generated; how much gold do we have? We are exposed to more information in a day than someone from the 15th century was over a lifetime. 90% of today’s data was created in last 2 years (IBM); 2.5 quintillion bytes per day In 2015 the number of networked devices doubles the entire global population Of interest: Tera, Peta, Exa, Zetta, Yotta, Brontabytes Graphic from IBM Research India, presented at Text Mining Workshop Jan 2014 13 Data Visualization Needs Credible Data! Do not trust any statistics you did not fake yourself…Churchill Figures don't lie, but liars do figure…Twain 14 Traits of Meaningful Data High Volume Historical Consistent Multivariate Atomic Clean Clear Dimensionally Structured Richly Segmented Of Known Pedigree Data Map and Contour Plots are “best practices” 15 Reference: Now You See It by Stephen Few Data Visualization Definition Data is the new business capital. Data visualization: discovery of solutions that offer highly interactive and graphical user interfaces, are built on in-memory architectures, and are geared toward addressing business users’ unmet ease-of- use and rapid deployment needs. These solutions typically enable users to explore data without much training, making them accessible by a wider range of employees than traditional business analysis tools. SAS Key to making “analytics” approachable is visualization – Visual thinking is essential skill for all – Both an art and science => craft (Berinato, Harvard Business Review) Data is a great but messy story; visual analytics is the master filmmaker to bring the story to life (SAS) Not a great term…was Shakespeare a word sequencer? A picture is worth a thousand data points 16 Data Visualization Characteristics (Card et al, Information Visualization) – Computer supported – Interactive – Visual representations of location, length, size, color, shape to allow us to see trends – Abstract data with no physical form (e.g. human body) Amplify cognition by assisting memory by representing data in ways our brain can easily comprehend 3 facts: Pervasiveness has raised quality expectations, Big Data is here, and the Democratization of Data 90% of data analyses required by most organizations is possible with simple data visualization methods – Excel is getting better – Boss wants to know why graphs in meetings are not nearly as pretty as she sees on fitness tracker (Berinito) Everyone in our business knows they need to visualize data, but it’s easy to do poorly. We invest in it. We want to use it right while they use it wrong. Daryl Morey 17 Interactive Data Visualization with Excel Consider recent data on automobile fuel economy from the EPA for 2017 year vehicles Attributes such as make, model, mpg, class, cylinders, transmission, valve timing etc Downloaded from http://www.fueleconomy.gov/feg/download.shtml Quick exploration with Excel Pivot Tables, Tableau, and JMP 18 Data Visualization Allows viewing of vast quantities of data quickly and efficiently Provides better insight into the business problem through discovery Generates a call to action Performs better if interactive and not static for quick stratification, drill down, and filtering Relies less on the IT department and empowers workers once they have access to the data with intuitive tools 19 www.introtopolicyinformatics.wikispaces.asu.edu Democratization of Data Viz Data visualization methods should allow employees who are not data analysts or scientists the ability to quickly and easily explore data Domain and business expertise critical to data understanding More rapidly find trends, generate hypotheses, identify inconsistencies, and determine additional data support requirements Reduce IT and analyst staff burden—everyone should be numerate Tension growing in non-data driven organizations Need to shorten the “kill-chain” of time data is collected until presented as actionable solution to decision makers – Find, Fix, Track, Target, Engage, Assess (F2T2EA) Goal: Self- Service Approachable Analytics 20 Interactive Data Visualization For All Flight misery map 21 Source: Sviokla, Harvard Business Review Police Department: Interactive Criminal Activity 22 http://www.raidsonline.com/?address= San%20Antonio%20TX San Francisco Police Department with JMP Data is sample file in jmp Use Graph Builder to plot each crime by color Add street map Add filter on station Create html with data file 23 San Francisco PD with JMP A bit more interactive is the Distribution platform Where is there a disproportionate amount of drug activity What days of the week correlate with runaways? What are some safe precincts? 24 Democratization of Data Analytics Data visualization is no longer just static charts created by IT professionals for meetings Even this graphic is outdated. Many are creating graphs continuously Source: TDWI Research, 2013 25 The Human Side of Data Visualization Huge advances in past 25 years in data collection, storage and access; have ignored the primary tool to make information meaningful—the human brain We acquire more information from vision than from all other senses combined 20 Billion neurons in brain used to form patterns from visual information The eye and visual cortex of brain form a massively parallel processor that provides highest bandwidth channel into human cognitive centers—Colin Ware, UNH We seek patterns Strive for Interocular Traumatic Impact 26 The Human Side of Data Visualization We have selective visual attention; we are drawn to familiar patterns, and our working memory is limited Jacque Bertin’s Semiologie Graphique in 1967 describes basic vocabulary of vision of abstract data – Pre-attentive attributes form the core of good data visualization methods – Pre-attentive means without prior conscious awareness—the things that “pop out” most We can only “remember” at most chunks of 3 visualizations and even then for only a short period – So don’t make comparisons difficult-like on next chart or scroll down further. Side-by-side is best. 27 Pre-Attentive Attributes Shape Length Hue/Contrast Size Position Color Enclosure Symmetry 28 Grouping Xan’s Pre-attentive Processing Quiz 29 Pre-attentive Processing 30 Graphic Attributes: Quantitative Scales Position Length Slope Area Color Hue Better Position (unaligned) Angle Color DensityWorse Based on “Graphical Perception: Theory, Experimentation, and Application …” by William Cleveland and Robert McGill, JASA, Sept. 1984 31 The Human Side of Data Visualization Color is a key pre-attentive attribute 5% Females and 9% Males are color blind – Red-Green is most
Recommended publications
  • Life of Pie: William Playfair and the Impact of the Visual
    PRESIDENTIAL ESSAY NO. 4: JULY 2021 Life of Pie: William Playfair and the Impact of the Visual By Chris Pritchard ([email protected]) Introduction only talented member of his family. There were his younger brothers, the lawyer Robert Playfair At the dawn of the nineteenth century, a and the architect James Playfair, as well as Scotsman published a circular diagram divided James’s son, William Henry Playfair, whose into three parts from the unlikely setting of the architectural legacy is seen in many of the fine Fleet, a debtor’s prison just north of Ludgate Hill buildings in Edinburgh’s New Town. And then in central London. It purported to represent the there was another of John’s brothers, William land mass of the Turkish Empire, with sectors of Playfair, engineer, businessman, political appropriate sizes indicating the fractions of the economist, spy and scoundrel, but also a empire lying in Europe, Asia and Africa, and visionary when it came to explaining things rather sloppily coloured by hand. This was the through diagrams. William Playfair, the subject of first pie chart and the man was William Playfair. this article, is the father of the statistical diagram and hence a beacon for those who believe that understanding in mathematics and science is often enhanced by the visual. William Playfair (1759–1823) was taught at home by his father until he was twelve (when his father died) and then by his brother John. He showed an early flair for things mechanical, for draughtsmanship and for model construction, and so it was that he became apprenticed to a local millwright.
    [Show full text]
  • Milestones in the History of Thematic Cartography, Statistical Graphics, and Data Visualization∗
    Milestones in the history of thematic cartography, statistical graphics, and data visualization∗ Michael Friendly October 16, 2008 1 Introduction The only new thing in the world is the history you don’t know. Harry S Truman, quoted by David McCulloch in Truman The graphic portrayal of quantitative information has deep roots. These roots reach into histories of thematic cartography, statistical graphics, and data visualization, which are intertwined with each other. They also connect with the rise of statistical thinking up through the 19th century, and developments in technology into the 20th century. From above ground, we can see the current fruit; we must look below to see its pedigree and germination. There certainly have been many new things in the world of visualization; but unless you know its history, everything might seem novel. A brief overview The earliest seeds arose in geometric diagrams and in the making of maps to aid in navigation and explo- ration. By the 16th century, techniques and instruments for precise observation and measurement of physical quantities were well-developed— the beginnings of the husbandry of visualization. The 17th century saw great new growth in theory and the dawn of practice— the rise of analytic geometry, theories of errors of measurement, the birth of probability theory, and the beginnings of demographic statistics and “political arithmetic”. Over the 18th and 19th centuries, numbers pertaining to people—social, moral, medical, and economic statistics began to be gathered in large and periodic series; moreover, the usefulness of these bod- ies of data for planning, for governmental response, and as a subject worth of study in its own right, began to be recognized.
    [Show full text]
  • The Power of Data Visualization: Advanced Presentations of NRS Data
    The Power of Data Visualization: Advanced Presentations of NRS Data By: Michelle Yin David Hollender Larry Condelli Dahlia Shewitz Amanda Duffy Marcela Movit AMERICAN INSTITUTES FOR RESEARCH® 1000 Thomas Jefferson Street, NW Washington, DC 20007 This guide was prepared for the project: Enhancing and Strengthening Accountability in Adult Education Contract # ED-VAE-10-O-0107 For: U.S. Department of Education Office of Vocational and Adult Education Division of Adult Education and Literacy Cheryl Keenan, Director Division of Adult Education and Literacy Jay LeMaster, Program Specialist Division of Adult Education and Literacy January 2014 Content Chapter 1. Introduction ............................................................................................................... 1 Data Dashboards ....................................................................................................................... 2 Infographics .............................................................................................................................. 2 NRS Training Guides ................................................................................................................ 4 Chapter 2. Data Dashboards ........................................................................................................ 7 Why Use Dashboards? .............................................................................................................. 7 When to Use Dashboards .........................................................................................................
    [Show full text]
  • A Data Visualization Web Application for Monitoring Elderly Behaviour in AAL Systems
    POLITECNICO DI MILANO Scuola di Ingegneria dell’Informazione POLO TERRITORIALE DI COMO Master of Science in Computer Engineering BRIDGeViz: A Data Visualization Web Application for Monitoring Elderly Behaviour in AAL Systems Supervisor: Prof. Sara Comai Assistant Supervisor: Eng. Fabio Veronese Master Graduation Thesis by: Chandnee Gopaul Student ID: 817642 Academic Year 2016/2017 ii Abstract The average age of the population is growing enormously and ensuring the elderly well-being becomes challenging due to several aspects: fragile people prefer staying in their residences even if their health is not at its best, they remain in a constant state of anxiety, but they still choose to execute their Activities of Daily Living (ADLs) on their own. In order to provide security, aid and wellness to older people, it is necessary to build a system able to monitor their behaviour; such a structure, denominated AAL system, can reassure family members, as it is capable of recognizing a state of illness in real time. The BRIDGe project arises as a powerful AAL system including different sensors that continuously monitor the inhabitant’s behaviour and produce broad quantity of textual data. In such a context, this thesis work provides a solution to convey perceptible and simplified information in a visual form: the BRIDGeViz web application, aiming the monitoring of elderly activities in AAL systems. Data Visualization is the discipline that manages the representation of data in a visual form. The web application enriches BRIDGe by visualizing data through appropriate representations able to stimulate the human visual perception, in order to gain knowledge from the represented data.
    [Show full text]
  • FRASER Glossary of Chart Types
    Glossary of Chart Types There are many ways to visualize data. This glossary provides an overview of some common chart types you may encounter in historical documents. Each chart entry includes a basic defi- nition, a description of what kind of information or argument the chart is commonly used to display, and one or more visual examples from FRASER® (the Federal Reserve Bank of St. Louis digital library of economic history). Bolded terms indicate terms defined in this glossary. Sources of the Example charts begin on page 14. Some chart types are not included because of their scarcity in the FRASER collection. If you would like to suggest a chart you think would enhance this glossary, please contact [email protected] and we will consider adding it. Area chart: A variation of a line chart in which the space underneath the line(s) has been filled in or shaded (Examples 1 and 2). Area charts are usually used to compare multiple values that add up to a total (see Example 2), but they can also be used to emphasize the volume of a given measurement (see Example 1). Example 1: Area Chart ©2017, Federal Reserve Bank of St. Louis. Permission is granted to reprint or photocopy this lesson in its entirety for educational purposes, provided the user credits the Federal Reserve Bank of St. Louis, www.stlouisfed.org/education. 1 Glossary of Chart Types Example 2: Area Chart, Multiple Values Bar chart: A chart using vertical or horizontal bars proportional to the values they represent (Exam- ples 3 to 6).
    [Show full text]
  • Simple Data Visualization Techniques to Make Your Charts More Engaging Simple Data Visualization Techniques to Make Your Charts More Engaging
    Simple Data Visualization Techniques to Make Your Charts More Engaging Simple Data Visualization Techniques to Make Your Charts More Engaging Copyright 2016 by Infogram Inc. All rights reserved. No part of this publication text may be uploaded or posted online without the prior written permission of the publisher. For permission requests, write to the publisher, addressed “Attention: Permissions Request,” to [email protected]. Index Introduction 1 Choose a Topic 2 Organize Your Data 3 Pick the Right Data Visualization 4 Make it Look Good 5 Avoid Common Mistakes 7 Introduction 1 Introduction Data visualizations are a fantastic way to bring information to life. If done well, they make your data more engaging and memorable. But, where do you start? Here are a few basic techniques to look over before you begin. We hope you'll fnd these data visualization tips useful. For more in-depth information visit our blog. Choose a Topic 2 Choose a Topic You might already know what story you want to tell with your data. Or, you may be searching for data to turn into an awesome chart, graph, or infographic. Ask yourself these questions when picking a topic for your visualization: • Who is your target audience and where will they fnd your visual? • What is the goal of your visual? What would you like to achieve? • If you plan to share your visual online, do you want a short term boost in trafc or are you hoping for a longer lifespan? • What type of data visualization will best suit your data? RELIABLE, TIMELY, CONTENT DATA UNDERACHIEVING INVISIBLE POTENTIAL THEME, PROBLEM, GOOD COLOR, BORING AMATEUR CLEVER, FONTS, DESIGN INFOGRAPHIC STORY MESSAGE, READABLE SOLUTION DAMAGING LIABILITY EMBARRASSING SHAREABILITY VIRALITY, SEO, LOCATION, SOCIAL © D.
    [Show full text]
  • How Data Won the West
    t \ '\& \ s the 2O16 elec- tion approaches, we're hearing a lot about "red states" and "blue states." That idiom has become so ingrained that \ we've almost forgotten where it ori.gi- nally came from: a data visualization. In the 2000 presidential election, \ the race between Al Gore and George \ W. Bush was so razor close that broad- casters pored over electora-l college maps-which they typically colored red and blue. What's more, they talked about those shadings. NBC's Tim Rus- sert wondered aloud how George Bush would "get those remaining 61 electoral red states, ifyou will," and that langrrage became Iodged in the popular imagina- tion. America became divided into two colors-data spun into pure metaphor. Now Americans even talk routinely HowData about "purple" states, a mental visual- ization of political information. We live in an age of data visualiza- tion. Go to any news website and you'll WontheWest graphics see charting support for the presidential candidates; open your iPhone and the Health app will gener- ate personalized graphs showing how active you've been this week, month or Early lives, year. Sites publish charts showinghow infographics saved soldiers' the climate is changing, how schools debunked myths about slavery and are segregating, how much housework mothers do versus fathers. And news- helped Americans settle the frontier papers are increasingly flnding that readers iove "dataviz": In 2013, BY CLIVE THOMPSON illustration by Kotryna Zukauskaite the New York Times'most-read July . August 2016 I sMrrHSoNrAN.coM 23 social issues with hard facts, if you ence, but Playfair seemed to intuit couldfind awayto analyze it," says Mi- some of its principles.
    [Show full text]
  • What Makes a Visualization Memorable?
    What Makes a Visualization Memorable? Michelle A. Borkin, Student Member, IEEE, Azalea A. Vo, Zoya Bylinskii, Phillip Isola, Student Member, IEEE, Shashank Sunkavalli, Aude Oliva, and Hanspeter Pfister, Senior Member, IEEE Fig. 1. Left: The top twelve overall most memorable visualizations from our experiment (most to least memorable from top left to bottom right). Middle: The top twelve most memorable visualizations from our experiment when visualizations containing human recognizable cartoons or images are removed (most to least memorable from top left to bottom right). Right: The twelve least memorable visualizations from our experiment (most to least memorable from top left to bottom right). Abstract—An ongoing debate in the Visualization community concerns the role that visualization types play in data understanding. In human cognition, understanding and memorability are intertwined. As a first step towards being able to ask questions about impact and effectiveness, here we ask: “What makes a visualization memorable?” We ran the largest scale visualization study to date using 2,070 single-panel visualizations, categorized with visualization type (e.g., bar chart, line graph, etc.), collected from news media sites, government reports, scientific journals, and infographic sources. Each visualization was annotated with additional attributes, including ratings for data-ink ratios and visual densities. Using Amazon’s Mechanical Turk, we collected memorability scores for hundreds of these visualizations, and discovered that observers are consistent in which visualizations they find memorable and forgettable. We find intuitive results (e.g., attributes like color and the inclusion of a human recognizable object enhance memorability) and less intuitive results (e.g., common graphs are less memorable than unique visualization types).
    [Show full text]
  • Digi Data Visualization: Lecture 4
    Digi Data Visualization: Lecture 4 Katie Ireland Kuiper 11/14/2020 Area Graphs in Excel, Tableau, and R Before we get started, be sure that you have Excel, Tableau public, R, and R Studio downloaded and installed. Area graphs were invented by William Playfair in the late 1700s. He created a group of time-series visualizations such as this one, shown below: The above area chart was featured in the Commercial and Political Atlas of 1786. Here is another example of an area chart, created by William Playfair: What is an area chart? An area graph is a specific type of line graph. Instead of connecting data with a continuous line (as in a line graph), the region(s) of the chart are filled. Area graphs highlight volume or quantity differences between data, and because of this, they are not completely interchangeable with line graphs. Area graphs work by filling space between the x-axis and each line with either a specific color, opacity, texture, or some combination of those options. The values on the x-axis represent the different values being compared in the dataset, including time or categories. Often, area charts display two or more data categories. What are area graphs used for? - showing the overall shape or distribution of data by charting the rise and fall of data over time - emphasizing differences in breakdowns of the data - showing part to whole data relationships Here is a cool example of an area graph from Google, showing the popularity of different music genres: Area graphs are not the best visualization option if there is a need for showing fine-grained differences in the dataset, for many different variables exhibiting trends over time, or for data that do not have a meaningful relationship to zero.
    [Show full text]
  • Information Theoretic Multivariate Analysis “
    Big Variates - Visualizing and identifying key variables in a multivariate world Or - “Information Theoretic Multivariate Analysis “ Steve Watts, Lisa Crow School of Physics and Astronomy The University of Manchester Manchester, UK [email protected] ULITIMA, Argonne National Laboratory, 11-14 September 2018 Introduction In science we visually explore data with histograms ( 1D) and scatter plots (2D) to find interesting relationships “Exploratory Visual Data Analysis” Question: How do you display and identify key variables in a multivariate dataset ( D >> 2) ? Answers: i) Multivariate visualisation techniques – hugely helped by computers, colour, brushing, linked plots and use of transparency. ii) Information – theoretic algorithms to guide which 1D and 2D plots to examine. Tools: Visual Exploratory Data Analysis Identifying variables that matter Exploring relationships between variables Guided Analysis – precursor to using datamining algorithms Example to illustrate the point - dataset – ‘wine data’ Relevant Information about the dataset: These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines. Remarks from the data curator: “I think that the initial data set had around 30 variables, but for some reason I only have the 13 dimensional version. I had a list of what the 30 or so variables were, but a.) I lost it, and b.), I would not know which
    [Show full text]