Learn to Create a Heat Map in Python with Data from NCHS (2018)
Total Page:16
File Type:pdf, Size:1020Kb
Learn to Create a Heat Map in Python With Data From NCHS (2018) © 2021 SAGE Publications, Ltd. All Rights Reserved. This PDF has been generated from SAGE Research Methods Datasets. SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization Learn to Create a Heat Map in Python With Data From NCHS (2018) How-to Guide for Python Introduction In this guide, you will learn how to create a heat map using the Python programming language. Readers are provided links to the example dataset and encouraged to replicate this example. An additional practice example is suggested at the end of this guide. The example assumes you have downloaded the relevant data files to a folder on your computer and that you are using the JupyterLab environment. The relevant code should, however, work in other environments too. Contents 1. Heat Map 2. An Example in Python: Monthly Deaths by Influenza and Pneumonia in California, 2014–2018 2.1. The Python Procedure 2.1.1. JupyterLab Notebooks 2.1.2. Testing Out the Programming Environment 2.1.3. Creating Our Notebook, Importing Necessary Modules 2.1.4. Reading In and Formatting Our Data 2.1.5. Plotting the Chart 2.1.6. Saving the Plot 2.1.7. Alternative Version With Classified Color Scale Page 2 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization 2.2. Exploring the Output 3. Your Turn 1. Heat Map Heat maps are visualizations that use a two-dimensional arrangement of colored rectangles, where the row corresponds to one data dimension and the column to another. Each rectangular block represents an intersection between the two plotted series and is colored by the values for observations in that particular intersection. Heat maps use position and color hue to encode values. One or both axes of the heat map can be quantitative, in which case the blocks represent values falling into different bins or value ranges. The axes can also be qualitative, thus representing different categories. Heat maps are also often used to show time series. A similar tabular visualization that shows intersections of data series with qualitative values (e.g., true/false; low, medium, high) is usually called a matrix. 2. An Example in Python: Monthly Deaths by Influenza and Pneumonia in California, 2014–2018 Figure 1 shows a heat map of monthly influenza and pneumonia deaths in California. The heat map gives a good impression of the seasonal variation in deaths attributed to these causes and highlights months where particularly many lives were lost to these causes. A continuous linear yellow–orange–red color scheme was used with darker hues corresponding to larger death counts. Data values are displayed for each individual cell. The horizontal axis is labeled month and lists months from January to December. The vertical axis ranges from 2014 to 2018, in increments of 1. The data are tabulated below. Page 3 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization Year January February March April May June July August September October November December 2014 986 693 521 475 446 382 392 364 361 373 447 503 2015 918 730 577 521 452 420 379 358 385 417 455 573 2016 679 680 749 521 420 360 421 330 376 403 417 635 2017 985 722 617 511 466 403 360 372 368 399 396 762 2018 1,674 817 771 554 427 389 359 324 319 358 441 485 As the count increases, the intensity of the color shade increases. Text under the map reads, “Source: NCHS, 2018.” Figure 1. Heat Map of Monthly Influenza and Pneumonia Deaths in California 2014–2018 2.1 The Python Procedure Python is a general-purpose programming language that supports several Page 4 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization programming paradigms and has a very clear syntax. It is a versatile tool, particularly for data manipulation and visualization. As Python was originally created as a learning tool, it is also reasonably easy to read for beginners. For more information, visit https://www.python.org/. You can write Python code with any plain text editor, such as Sublime Text or Visual Studio Code. For the purposes of this tutorial, you do not need to install anything additional, as we will be using a web-based programming environment. Note: This tutorial uses Python 3. Many online articles about Python programming and other sources discuss Python 2, which differs slightly, but in important ways from Python 3. Although code written in Python 2 often works in Python 3 and vice versa, this not always the case, and mixing the two Python versions can lead to errors or unexpected results. 2.1.1 JupyterLab Notebooks The traditional way of programming would be to write some code in a text file, then building and running it to generate an output. In a notebook, on the other hand, the code is broken down into cells, which can be run one at a time, displaying results right in the editor. This makes working with code and experimenting with changing parameters much more flexible and is particularly suitable for interactive data exploration, where the Python programming language shines. Sharing small code projects (such as visualizations!) generally becomes much simpler with the notebook approach, since you can save the entire notebook and send it to others. We will be using JupyterLab, a modern web-based notebook interface for Python, that requires no installation on the user’s part. To try a notebook online, just open https://jupyter.org/try and click Try JupyterLab. A cloud-hosted ready-to-use online JupyterLab environment will be activated after a short wait. Try refreshing the window if loading stalls. Page 5 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization Take note that this JupyterLab session is hosted on https://mybinder.org/, and it will timeout after ~15 minutes if inactive. Make sure to download and save your notebooks locally before leaving the computer. If your session has expired, start a new one from https://jupyter.org/try and use the interface to upload your saved notebook to continue where you left off. Note: The online trial of JupyterLab is a good place to start if you want to experiment with programming in Python, but for continued use in the future, it is recommended to install the Conda package and environment management system and JupyterLab locally on your system. To obtain Conda, it is easiest to install one of two distributions: Anaconda, a powerful Python and R distribution that includes over 250 packages for various uses, or Miniconda, a minimal version Anaconda that includes only conda, Python, package dependencies, and a few other useful packages (JupyterLab not included). For more information on obtaining Conda, you can visit https://docs.conda.io/projects/conda/en/latest/user-guide/install/. If you choose the Miniconda distribution, you will need to install JupyterLab locally from your terminal (Mac) or Command Prompt/PowerShell (Win) with conda install -c conda- forge jupyterlab. For more information on installing JupyterLab, visit https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html. If you already have an installation of Anaconda or similar and a preferred plain text editor at your disposal, the relevant code covered in this tutorial should work in other environments as well. 2.1.2 Testing Out the Programming Environment If you open https://jupyter.org/try and click Try JupyterLab, you will be welcomed by a rather complex demo example. We will ignore this for now and create our own new notebook instead. Under File choose New > New Notebook. In the Select Kernel prompt, choose Python 3. Page 6 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization You now see an empty notebook called Untitled.ipynb with one empty cell: a text box where you type code for execution. The cells of a notebook are convenient for structuring code in small chunks that can be run one at a time—as opposed to the more common way of building and running a whole script at once. A single cell can contain as many or few lines of code as you want. You could also change the cell to hold markdown-formatted text instead of code to write longer comments or add illustrations. You can test what JupyterLab does by writing some code in the empty cell. Click inside the cell and type in the following: print("Hello, world!") Hit shift + enter or press the small play arrow ▸ above in the toolbar to run the cell. 2.1.3 Creating Our Notebook, Importing Necessary Modules Create a new notebook and save it with a name, for example, polarchart.ipynb. You will refer back to this should your JupyterLab session time out. If we were running this project locally, we would first need to install all the modules necessary for generating the visualization. However, the trial environment launched from https://jupyter.org/try conveniently comes equipped with everything for our purposes. • Matplotlib (MPL for short) is one of the most popular visualization libraries for Python.