TOOLS FOR MAKING GOOD DATA VISUALIZATIONS: THE ART OF CHARTING Tools formakinggood data visualizations: the art of charting the art
Tools for making good data visualizations: the art of charting Abstract Data visualization is a collection of methods that use visual representations to explore, make sense of and communicate quantitative data. It allows trends and patterns in quantitative data to be seen more easily. The ultimate purpose of data visualization is to facilitate better decisions and actions. With the rise in the amount of data available, it is important to be able to interpret increasingly large batches of information, and data visualization is a strong tool for the job. Use of good data visualizations is essential to make impactful health reports. This tool provides practical guidance on how to make good data visualizations to support convincing policy messages. This guidance document is part of the WHO Regional Office for Europe’s work to support Member States in strengthening their health information systems. Helping countries to produce solid health intelligence and institutionalized mechanisms for evidence-informed policy-making has traditionally been an important focus of WHO’s work and continues to be so under the European Programme of Work 2020–2025.
Keywords: DATA VISUALIZATION, DATA DISPLAY, COMMUNICATION, DECISION MAKING
Document number: WHO/EURO:2021-1998-41753-57181 © World Health Organization 2021 Some rights reserved. This work is available under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 IGO licence (CC BY-NC-SA 3.0 IGO; https://creativecommons.org/licenses/by-nc-sa/3.0/igo). Under the terms of this licence, you may copy, redistribute and adapt the work for non-commercial purposes, provided the work is appropriately cited, as indicated below. In any use of this work, there should be no suggestion that WHO endorses any specific organization, products or services. The use of the WHO logo is not permitted. If you adapt the work, then you must license your work under the same or equivalent Creative Commons licence. If you create a translation of this work, you should add the following disclaimer along with the suggested citation: “This translation was not created by the World Health Organization (WHO). WHO is not responsible for the content or accuracy of this translation. The original English edition shall be the binding and authentic edition: Tools for making good data visualizations: the art of charting. Copenhagen: WHO Regional Office for Europe; 2021”. Any mediation relating to disputes arising under the licence shall be conducted in accordance with the mediation rules of the World Intellectual Property Organization (http://www.wipo.int/amc/en/mediation/rules/). Suggested citation. Tools for making good data visualizations: the art of charting. Copenhagen: WHO Regional Office for Europe; 2021. Licence: CC BY-NC-SA 3.0 IGO. Cataloguing-in-Publication (CIP) data. CIP data are available at http://apps.who.int/iris. Sales, rights and licensing. To purchase WHO publications, see http://apps.who.int/bookorders. To submit requests for commercial use and queries on rights and licensing, see http://www.who.int/about/licensing. Third-party materials. If you wish to reuse material from this work that is attributed to a third party, such as tables, figures or images, it is your responsibility to determine whether permission is needed for that reuse and to obtain permission from the copyright holder. The risk of claims resulting from infringement of any third-party-owned component in the work rests solely with the user. General disclaimers. The designations employed and the presentation of the material in this publication do not imply the expression of any opinion whatsoever on the part of WHO concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. Dotted and dashed lines on maps represent approximate border lines for which there may not yet be full agreement. The mention of specific companies or of certain manufacturers’ products does not imply that they are endorsed or recommended by WHO in preference to others of a similar nature that are not mentioned. Errors and omissions excepted, the names of proprietary products are distinguished by initial capital letters. All reasonable precautions have been taken by WHO to verify the information contained in this publication. However, the published material is being distributed without warranty of any kind, either expressed or implied. The responsibility for the interpretation and use of the material lies with the reader. In no event shall WHO be liable for damages arising from its use. Contents
ACKNOWLEDGEMENTS...... V
AIM OF THIS GUIDANCE...... VI
NOTE ON THE FIGURES...... VII
INTRODUCTION: DATA VISUALIZATION AND WHY IT IS IMPORTANT...... 1 Reader’s guide...... 2
1. WHY DATA VISUALIZATION IS NEEDED...... 3 1.1 The power of visualization: an example...... 3 1.2 Uses of data visualization...... 5
2. CHARACTERISTICS OF THE DATA...... 7 2.1 Individual versus grouped data...... 7 2.2 Level of measurement...... 7
3. TYPES OF CHART...... 10 3.1 Line chart...... 10 3.2 Area chart...... 11 3.3 Area range chart...... 11 3.4 Column chart...... 12 3.5 Bar chart...... 13 3.6 Bullet chart...... 14 3.7 Histogram...... 14 3.8 Scatter plot...... 16 3.9 Bubble chart...... 17 3.10 Pie chart...... 18 3.11 Slope chart...... 19 3.12 Heatmap...... 20 3.13 Map...... 21
4. THE IMPORTANCE OF SIDE INFORMATION...... 24 4.1 Title and subtitle...... 24 4.2 Title of the X axis...... 25 4.3 Title of the Y axis...... 25 4.4 Legend items...... 25 4.5 Source...... 26 4.6 Creator...... 26 4.7 Annotations...... 26 iv TOOLS FOR MAKING GOOD DATA VISUALIZATIONS: THE ART OF CHARTING
5. BEST PRACTICE FOR MAKING CHARTS...... 27 5.1 Supporting the message directly...... 27 5.2 Avoiding chartjunk...... 28 5.3 Considering the Y axis range and the chart aspect ratio...... 29 5.4 Avoiding use of two Y axes in one chart...... 31 5.5 Avoiding 3D charts...... 31 5.6 Avoiding pie charts...... 32
6. CREATING UNITY IN CHART LAYOUTS...... 33
7. FURTHER READING...... 34 Books...... 34 Blogs...... 34
REFERENCES...... 35 Acknowledgements v
Acknowledgements
This document was developed by the Data, Metrics and Analytics Unit in the Division of Country Health Policies and Systems of the WHO Regional Office for Europe. The main author is Laurens Zwakhals. Marieke Verschuuren and David Novillo Ortiz provided direction during the production of the report and technical advice during concept drafting, writing and review. Special thanks to Natasha Azzopardi-Muscat for her strategic guidance.
For further information please contact the Data, Metrics and Analytics Unit ([email protected]). vi TOOLS FOR MAKING GOOD DATA VISUALIZATIONS: THE ART OF CHARTING
Aim of this guidance
This guidance document is part of the WHO Regional Office for Europe’s work on supporting Member States in strengthening their health information systems. Helping countries to produce solid health intelligence and institutionalized mechanisms for evidence-informed policy-making has traditionally been an important focus of WHO’s work, and continues to be so under the European Programme of Work 2020–2025.1 Use of good data visualizations is essential to make impactful health reports. This tool provides practical guidance on how to make good data visualizations to support convincing policy messages.
1 European Programme of Work. In: WHO Regional Office for Europe [website]. Copenhagen: WHO Regional Office for Europe; 2020 (https://www.euro.who.int/en/health-topics/health-policy/european-programme-of-work/european- programme-of-work, accessed 19 November 2020). Note on the figures vii
Note on the figures
All figures in this guidance document were created by the author except Fig. 15, which has a Creative Commons (CC-BY 4.0) licence and is free to share, copy and redistribute in any medium or format. Data sources are cited below each figure. 1 TOOLS FOR MAKING GOOD DATA VISUALIZATIONS: THE ART OF CHARTING
Introduction: data visualization and why it is important
Data visualization is a collection of methods that use visual representations to explore, make sense of and communicate quantitative data (1). It allows trends and patterns in quantitative data to be seen more easily. The ultimate purpose of data visualization is to facilitate better decisions and actions.
With the rise in the amount of data available, it is important to be able to interpret increasingly large batches of information, and data visualization is a strong tool for the job. This is not only important for data scientists and data analysts: it is necessary to understand data visualization in any career. Anyone who works in finance, marketing, design, health monitoring or any other sector needs to visualize data. This showcases the importance of data visualization (2).
Data visualization can be used in various phases of data analysis and communication. First, it is very helpful in the phase of data exploration. With the help of statistical analysis tools and spreadsheet programs, data can easily be analysed through visualization. Relationships, distributions and comparisons can be discovered. Business intelligence tools are also extremely helpful for gaining insight into the data. In the phase of data exploration, the choice of chart type, side information and layout are not the most important issues: as long as the visuals give the analyst more insight, that is fine.
Once a good understanding of the trends and patterns in the data has been developed, the work moves on to think of the story to be told. In this phase, data presentation comes into play. Data presentation (sometimes called data explanation) aims to report the end results. This can be in a book, a report, a presentation or on the web. When transmitting insights to a broad audience, clear and understandable communication becomes important. In this phase the chart type, side information and layout of the visuals become important. This guidance document is about the use of data visualizations in this phase of the process: moving from data via information to insight for a broader audience. The visuals need to speak for themselves. Introduction: data visualization and why it is important 2
Reader’s guide y This guidance document starts with the basics. Section 1 discusses why data visualizations are needed. An example shows that data visualization adds insight to non- visual analyses. The section further explains how visuals add insight about relationships, distributions and comparisons.
y Section 2 discusses some characteristics of datasets. Depending on these, different charts may be appropriate.
y Section 3 gives an overview of the most used charts. The options are endless, but the 13 types discussed in this section cover those used in 90% of cases.
y Section 4 examines the side information. A chart consists of both graphics and textual elements: the textual elements give the graphics context. This side information is important for understanding what the chart is about.
y Section 5 investigates obvious and less obvious pitfalls in charting data.
y Finally, section 6 discusses some layout principles. Especially when more than one chart is used in a document, a unifying layout is important for the audience. Once the reader is used to a specific layout, they will find interpretation of the next chart easier. 3 TOOLS FOR MAKING GOOD DATA VISUALIZATIONS: THE ART OF CHARTING
1. Why data visualization is needed
Data visualization is needed because a visual summary of information makes it easier to identify patterns and trends than looking through thousands of rows on a spreadsheet. It is the way the human brain works. Since the purpose of data analysis is to provide insight, data are much more valuable when visualized. Even if a data analyst can identify the patterns and pull insight from data without visualization, it is more difficult to communicate the data findings and their meaning without charts.
1.1 The power of visualization: an example
The power of visualization can be illustrated by a simple example known as Anscombe’s quartet (3). It comprises four datasets that have nearly identical simple descriptive statistics, yet have different distributions and appear vastly different when charted. Each dataset consists of 11 (x, y) points. They were constructed in 1973 by the statistician Francis Anscombe to demonstrate the importance of charting data before analysing it, to counter the impression among statisticians that “numerical calculations are exact, but charts are rough”.
Table 1 shows the datasets Anscombe constructed. The x values are the same for the first three datasets. At the end of the table several descriptive statistics are shown. Essential statistics are the same, so it is possible to conclude that the four datasets are similar, but when they are charted (Fig. 1), the differences become clear at a glance. Visualizing data can provide insight that tabular data and descriptive statistics do not always give. 1. Why data visualization is needed 4
Table 1. Anscombe’s quartet
Dataset I Dataset II Dataset III Dataset IV
Observation x1 y1 x2 y2 x3 y3 x4 y4
1 10 8.04 10 9.14 10 7.46 8 6.58
2 8 6.95 8 8.14 8 6.77 8 5.76
3 13 7.58 13 8.74 13 12.74 8 7.71
4 9 8.81 9 8.77 9 7.11 8 8.84
5 11 8.33 11 9.26 11 7.81 8 8.47
6 14 9.96 14 8.1 14 8.84 8 7.04
7 6 7.24 6 6.13 6 6.08 8 5.25
8 4 4.26 4 3.1 4 5.39 19 12.5
9 12 10.84 12 9.13 12 8.15 8 5.56
10 7 4.82 7 7.26 7 6.42 8 7.91
11 5 5.68 5 4.74 5 5.73 8 6.89
Summary statistics
Number 11 11 11 11 11 11 11 11
Mean 9.00 7.50 9.00 7.50 9.00 7.50 9.00 7.50
Standard 3.16 1.94 3.16 1.94 3.16 1.94 3.16 1.94 deviation
Correlation 0.82 0.82 0.82 0.82 coefficient
Fig. 1. Anscombe’s quartet visualized