<<

Data Visualisation Outline

• What is data visualisation • Different types of visualisation • Types of data and ways to encode it • Technologies for visualisation • An intro to D3 Learning Objectives

• By the end of this presentation, you will be able to – Describe • What data visualisations can be used for • The difference between exploratory and explanatory visualisations • How different types of data can be encoded visually • Understand the difference between a visualisation library such as D3 and visualisation tools available on the Web • Understand some important concepts about D3 WHAT IS A DATA VISUALISATION? A Basic Definition (1)

• “The Visual Display of Quantitative • The title of the book offers a good definition • The book itself is a classic on the subject • http://www.edwardtufte.com/tufte/books_vdqi A Basic Definition (2)

• Or: – “to communicate information clearly and efficiently via , plots and information graphics” – https://en.wikipedia.org/wiki/Data _visualization Detroit - Data

• Consider the difference between the display of the same data in the following two slides • Which of these best conveys the story behind the data? • Why is this? Detroit - Data

Available from: http://portal.datadrivendetroit.org/datasets/5efe44008cad4ea98377e7129c98c69c_0 Detroit: Visualising Occupancy

https://www.motorcitymapping.org/#t=parcels&s=detroit&f=all&x=preset2 Data Visualisation in the 19th Century

• John Snow investigated cases of Cholera in London in 1854 • He observed clusters of cases around Broad Street • Having investigated further, he ascertained that all Cholera cases were from people who drank water from the Broad Street pump • He had the handle removed, and the incidences subsequently reduced • The Visualisation of the Cholera cases is on the next slide. Instances of Cholera in London, 1854 Types of Data Visualisation

• Generally split into two categories – Exploratory – Explanatory Exploratory Data Visualisation

• Main purpose is to ‘explore’ the data • Allow you to look at data from different angles • Result: – Get a sense of what the data is telling you – Find the interesting nuggets of the data Explanatory Data Visualisation

• Main purpose is to ‘explain’ your findings • Focus on communicating the key nuggets to others • A focus on – Context – Story – Elimination of clutter Explanatory Data Visualisation (1)

http://wheredoesmymoneygo.org/bubbletree-map.html#/~/total Exploratory Data Visualisation

– Visualisations such as these allow you to change the filters to ‘explore’ the data from an angle that is interesting to you

http://www.oecdbetterlifeindex.org/ Data for Visualisation

• The first step to making a good visualisation is understanding what type of data you have. • Context – what are you trying to display? • Depending on the data, different styles of visualisation may be suitable. Data Types

• Continuous – Data for which there is no exact value – E.g., Amount of rain in the UK this summer • Discrete – Typically an integer, a finite, identifiable number – E.g., number of cars owned • Nominal – Categories of data with no order. – E.g., favourite football team Data Encodings

• Encoding is mapping data to a visual object • Different data require different encodings • Think about these three types of data, before looking at the next slide.

Discrete Continuous Nominal

Data with a finite, identifiable Values which cannot be exactly Categories of data with no order number determined E.g., number of cars owned E.g., amount of rain in the UK this E.g., favourite football team summer Retinal Variables (1)

SIZE ORIENTATION / ROTATION COLOUR

• These are useful for displaying ordered data – Both discrete and continuous data types would work here – E.g. larger objects to represent owning more cars – E.g. darker blue to identify areas on a where there has been more rainfall Retinal Variables (2)

COLOUR (Hue)

• These are useful for displaying nominal data – E.g. red to represent Manchester United supporters, blue to represent Chelsea Video

• Hans Rosling – 200 Countries, 200 Years, 4 Minutes - The Joy of Stats - BBC Four – https://youtu.be/jbkSRLYSojo • Note down any data encodings you see – For each variable • List what type of data it is • And what visual encoding is used (e.g. size, position, colour…) How Did You Do?

Variable Type Encoding Life Expectancy Quantitative (discrete) Position (y) Income Quantitative (continuous) Position (x) Total Population Quantitative (discrete) Size Geographical Region Nominal Colour Time Ordinal Animation Visual Encodings

• In 1985, Cleveland and McGill published in the Journal of the American Statistical Association • ‘Graphical Perception and Graphical Methods for Analyzing Scientific Data’ • They proposed basic guidelines for choosing an appropriate graphic form • The paper lists and ranks 10 ‘elementary perceptual tasks’ Cleveland and McGill

• Rank of graphic properties based on human ability to understand information • These are particularly relevant for detecting differences and making comparisons – Position along a common scale – Position on identical but nonaligned scales – Length – Angle, slope – Area – Volume, density, colour saturation – Colour hue • For accurate comparisons, graphical forms from the top of this list should be used. Cleveland and McGill - Example It is easier for humans to perceive the differences the values for 3.5 X=0.2 and X=0.4 using position along a common scale (on the scatter ), than by using area (in the bubble ). 3

2.5

2

1.5

1

0.5

0 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 Chartjunk

• If something can be removed from a chart without changing its meaning, it’s ‘chartjunk’. • Why would one want to combat chartjunk? – Obscures true meaning and story. – Imagery is not information. – Imagery draws attention away from the data • Short-term memory resources are used to identify the images rather than understand the chart.

• Term coined by Edward Tufte in the 1980s Chartjunk Example

Number of customers 6000

5000

4000 Necessary? 3000 Number of customers 2000

Necessary? 1000

0 Year 1 Year 2 Year 3 Year 4 Year 5 Necessary? The Beauty Paradox

• How complex should a graphic be? – How much should you show? • Tufte emphasises minimalism – Communicate as much information from as few pixels as possible • Depending on the audience, this may or may not be the best approach • When creating a visualisation, you must balance the need to focus on the story of the data, with art and design decisions to help your audience engage with and remember the visual. TECHNOLOGIES BEHIND VISUALISATIONS ON THE WEB Web Technologies

JavaScript Behaviour

CSS Presentation

HTML Structure Hypertext

• Text that contains links to other texts. • E.g. HTML – Hypertext Markup Language – Defines basic structure and content of a Web page using markup ‘tags’. – Each tag describes a type of content • For the body of an html page • A paragraph – Read by a Web browser, which decides how to display the content. CSS

• Cascading Style Sheets define the design and presentation of an HTML document. • When the Web browser is deciding how to display the content, CSS provides information about the look and feel of each element

p { font-family: “Times New Roman”; font-size: 12px; color: blue; }

“Make all paragraphs use Times New Roman font, size 12, and colour the text blue.” Javascript

• The third essential technology behind the World Wide Web – The ‘programming language of the Web’ – Programs the behaviour of a Web page • Runs on the user’s machine, rather than the Web server (client-side language) • Defines how the page should respond (behave) when there is a user interaction or event – E.g. what should happen when the user clicks on something? Technologies for Visualisation Low Level High Level

WebGL D3.js NVD3 (reusable D3 ) RAW (Web app, paste data) Canvas Dimple.js (D3 for business analytics) Chartio (paid) SVG Rickshaw (D3 for time series) Plotly CartoDB Tableau

More Complex, Less Complex, Programmable so Mostly automated so more more confined to templates, customisable, Less Powerful. More Powerful. D3.js (1)

• Data-Driven Documents • JavaScript Library • Uses HTML, SVG and CSS to create visualisations from data – Drives the connection between data (provided by user) and documents (rendered by the Web browser) • Aimed at creating explanatory visualisations, rather than exploratory • https://d3js.org/ D3.js (2)

• Four general steps: – Loading the data into browser’s memory – Binding the data to elements within the document – Transforming elements by setting visual properties based on the bound data – Transitioning elements between states • Response to user input • Read more in Interactive Data for the Web by Scott Murray (O’Reilly, 2013) D3.js (3)

• Dynamically access the DOM behind a web page to apply styles – DOM: • Document Object Model • Hierarchical Structure of HTML elements • Allows things like – d3.select(“body”).append(“p”).text(“New Paragraph”); D3: Chain Syntax (1)

• Using “.”s to chain methods together • Perform several actions in a single line of code • Fast and easy, but can cause debugging problems later • JavaScript doesn’t care about whitespace or linebreaks – d3.select(“body”) .append(“p”) .text(“New Paragraph”); D3: Chain Syntax (2)

• d3.select(“body”).append(“p”).text(“New Paragraph”); – First passes ‘select’ a CSS selector – “body”, returning the first element in DOM that matches. – Then creates a new DOM element “p” and appends it to the end of the previously selected element. • So a paragraph is appended to the body element of the document in this case. The new element is passed on in the chain… – … To the the text() function which inserts it into the currently selected element (the “p” element) Alternatives to D3 – Sigma.js

• Sigma.js is a library for visualising network graphs • Allows interactive exploration of networks – Biological – Social – Transport – Infrastructure • Can code visuals from scratch, or export from Gephi – a desktop GUI- based tool for network analysis Alternatives to D3 – IPython/Jupyter

• Ipython notebooks consist of a series of cells, which allow Python code to be written and the visualisation output to be displayed • This allows for replication of the results by following the steps exactly • It is run as a Web server and displays the output in HTML • Jupyter is the continuation of the project more generally, to support other languages, such as R and Julia as well as Python • This allows the use of many popular libraries within the same display, depending on their strengths and weaknesses – ggplot – Matplotlib – Bokeh Alternatives to D3 - Tableau

• Tableau is proprietary software which provides a simple interface to visualise data • Offers an Excel-like view of the data, and allows a drag and drop interface to add variables to the visualisation • Different versions include – A desktop version to be run on a local machine – A version which is run in the cloud, and a separate service to store the data from the visualisations – Tableau Public to display the output of these visualisations Tableau Public

• Having created a visualisation with Tableau Desktop or Online, Tableau Public offers the opportunity to showcase the work • Each user is given free space to store their visuals and share with the world – where they stay live and interactive – Tableau public publishes a fully interactive dashboard without any coding at all • Includes a “Viz of the day” feature to discover visualisations • Gets 25M+ views every month Storytelling and Data-Driven Journalism REPORTING USING VISUALISATIONS Data-driven Journalism (DDJ)

• An interdisciplinary area synthesising: – Journalism – Design – Computer science – Statistics – And more • Journalism based on data where the story must first be extracted through data processing, and then presented using visuals to communicate the narrative. DDJ: Storytelling

“If data journalism is about anything, it's the flexibility to search for new ways of storytelling. And more and more reporters are realising that. Suddenly, we have company - and competition. So being a data journalist is no longer unusual. It's just journalism.”

Simon Rogers – The Guardian. http://www.theguardian.com/news/datablog/2011/jul/28/data- journalism Storytelling (1)

• Conveying events or information using various media such as words, sounds or images • Data visualisation aims to effectively convey meaning from data • Storytelling is therefore about communicating the results of some analysis, in a manner that the audience can understand Storytelling (2)

• Every audience is different. There may be limitations within the audience as to their inclination to follow the story – Time – Interest – Knowledge • What is the data showing? What are the results – what is your message? • The story will often require multiple graphics – Each graphic should add something new to the story Storytelling Considerations: Audience

• What prior knowledge do they posses? – Are they experts in the field? Or general public? • How much time do they have? – Will they be able to explore the visualisation in depth, or just glance at it? • What format are they comfortable digesting information? – Charts? ? Interaction? Storytelling Considerations: The Story (1)

• What is it that the data shows? • What is your key result, the message? – Not the data itself. – Not the method – The result • The story is vital – this is what you will be ‘telling’ in your visualisation. • Why do you want people to look at your visualisation? Storytelling Considerations: The Story (2)

• You really have to know the story yourself in order to ‘sell it’. • “Simplify, then exaggerate”. – Geoffrey Crowther, The Economist – Referring to journalism techniques that have been followed at The Economist since the 1950s • Important: this does not mean lying! Storytelling Considerations: The Action

• What do you want people to do afterwards? – This depends on the audience, and context. • Change behaviour? – E.g., – Implement a new policy – Allocate resources – Campaign for change – Change eating habits Structure of a Story

• Possible to structure a story to be author-driven, reader-driven or a combination of the two • Author-Driven stories have – Linear ordering of scenes – Heavy messaging – No interactivity • Reader-Driven stories have – No prescribed ordering – No messaging – Free interactivity

Segel, E., and Heer, J., 2010. Narrative Visualization: Telling Stories with Data. Visualization and , IEEE Transactions on, 16(6), pp. 1139-1148. Vancouver COMMUNICATING RESULTS Highlight

• Reduce the non-data ink – Removal of chart junk – Tufte’s theories on minimalism • Enhance the data ink – Removal of chart junk (again!) – Designing for pre-attentive processing • Remember Gestalt Theory and the work of Cleveland and McGill Organise

• Grouping – What belongs together? – What are the sub-topics of your message? • Prioritising – What are the important numbers/findings? – Placing something at the top left of a page/screen makes them more prominent • Sequencing – Typically use left-to-right and top-to-bottom as this is the direction of reading (for Western languages – know your audience!) – Enhane the sequence using numbers or letters to signpost the route SUMMARY AND FURTHER READIN Summary

• A data visualisation can be thought of as the visual display of quantitative data • It should present a clear story, whether to explain results or to explore possible results • Different types of data should be displayed in different ways • There are a variety of tools and technologies available for creating a visualisation to display on the Web • A data visualisation needs to be adapted depending on the story, audience and desired action Further Reading

• Tufte, Edward. “The Visual Display of Quantitative Information” 2nd Edition • Cleveland, William S., and Robert McGill. "Graphical perception and graphical methods for analyzing scientific data." Science 229.4716 (1985): 828-833. • Murray, Scott. Interactive for the Web. " O'Reilly Media, Inc.", 2013. • D3 Tutorials such as https://www.dashingd3js.com/table-of-contents • The D3 gallery for adaptable examples: – https://github.com/mbostock/d3/wiki/Gallery