Learn to Create a Heat Map in Python With Data From NCHS (2018)

© 2021 SAGE Publications, Ltd. All Rights Reserved. This PDF has been generated from SAGE Research Methods Datasets. SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization Learn to Create a Heat Map in Python With Data From NCHS (2018)

How-to Guide for Python

Introduction In this guide, you will learn how to create a heat map using the Python programming language. Readers are provided links to the example dataset and encouraged to replicate this example. An additional practice example is suggested at the end of this guide. The example assumes you have downloaded the relevant data files to a folder on your computer and that you are using the JupyterLab environment. The relevant code should, however, work in other environments too.

Contents 1. Heat Map

2. An Example in Python: Monthly Deaths by Influenza and Pneumonia in California, 2014–2018

2.1. The Python Procedure 2.1.1. JupyterLab Notebooks 2.1.2. Testing Out the Programming Environment 2.1.3. Creating Our Notebook, Importing Necessary Modules 2.1.4. Reading In and Formatting Our Data 2.1.5. Plotting the 2.1.6. Saving the 2.1.7. Alternative Version With Classified Color Scale

Page 2 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization 2.2. Exploring the Output

3. Your Turn

1. Heat Map Heat maps are visualizations that use a two-dimensional arrangement of colored rectangles, where the row corresponds to one data dimension and the column to another. Each rectangular block represents an intersection between the two plotted series and is colored by the values for observations in that particular intersection. Heat maps use position and color to encode values.

One or both axes of the heat map can be quantitative, in which case the blocks represent values falling into different bins or value ranges. The axes can also be qualitative, thus representing different categories. Heat maps are also often used to show time series. A similar tabular visualization that shows intersections of data series with qualitative values (e.g., true/false; low, medium, high) is usually called a matrix.

2. An Example in Python: Monthly Deaths by Influenza and Pneumonia in California, 2014–2018 Figure 1 shows a heat map of monthly influenza and pneumonia deaths in California. The heat map gives a good impression of the seasonal variation in deaths attributed to these causes and highlights months where particularly many lives were lost to these causes. A continuous linear yellow–orange–red was used with darker corresponding to larger death counts. Data values are displayed for each individual cell.

The horizontal axis is labeled month and lists months from January to December. The vertical axis ranges from 2014 to 2018, in increments of 1. The data are tabulated below.

Page 3 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization

Year January February March April May June July August September October November December

2014 986 693 521 475 446 382 392 364 361 373 447 503

2015 918 730 577 521 452 420 379 358 385 417 455 573

2016 679 680 749 521 420 360 421 330 376 403 417 635

2017 985 722 617 511 466 403 360 372 368 399 396 762

2018 1,674 817 771 554 427 389 359 324 319 358 441 485

As the count increases, the intensity of the color shade increases. Text under the map reads, “Source: NCHS, 2018.”

Figure 1. Heat Map of Monthly Influenza and Pneumonia Deaths in California 2014–2018

2.1 The Python Procedure Python is a general-purpose programming language that supports several

Page 4 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization programming paradigms and has a very clear syntax. It is a versatile tool, particularly for data manipulation and visualization. As Python was originally created as a learning tool, it is also reasonably easy to read for beginners. For more information, visit https://www.python.org/.

You can write Python code with any plain text editor, such as Sublime Text or Visual Studio Code. For the purposes of this tutorial, you do not need to install anything additional, as we will be using a web-based programming environment.

Note: This tutorial uses Python 3. Many online articles about Python programming and other sources discuss Python 2, which differs slightly, but in important ways from Python 3. Although code written in Python 2 often works in Python 3 and vice versa, this not always the case, and mixing the two Python versions can lead to errors or unexpected results.

2.1.1 JupyterLab Notebooks The traditional way of programming would be to write some code in a text file, then building and running it to generate an output. In a notebook, on the other hand, the code is broken down into cells, which can be run one at a time, displaying results right in the editor. This makes working with code and experimenting with changing parameters much more flexible and is particularly suitable for interactive data exploration, where the Python programming language shines. Sharing small code projects (such as visualizations!) generally becomes much simpler with the notebook approach, since you can save the entire notebook and send it to others.

We will be using JupyterLab, a modern web-based notebook interface for Python, that requires no installation on the user’s part. To try a notebook online, just open https://jupyter.org/try and click Try JupyterLab. A cloud-hosted ready-to-use online JupyterLab environment will be activated after a short wait. Try refreshing the window if loading stalls.

Page 5 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization Take note that this JupyterLab session is hosted on https://mybinder.org/, and it will timeout after ~15 minutes if inactive. Make sure to download and save your notebooks locally before leaving the computer. If your session has expired, start a new one from https://jupyter.org/try and use the interface to upload your saved notebook to continue where you left off.

Note: The online trial of JupyterLab is a good place to start if you want to experiment with programming in Python, but for continued use in the future, it is recommended to install the Conda package and environment management system and JupyterLab locally on your system.

To obtain Conda, it is easiest to install one of two distributions: Anaconda, a powerful Python and R distribution that includes over 250 packages for various uses, or Miniconda, a minimal version Anaconda that includes only conda, Python, package dependencies, and a few other useful packages (JupyterLab not included). For more information on obtaining Conda, you can visit https://docs.conda.io/projects/conda/en/latest/user-guide/install/. If you choose the Miniconda distribution, you will need to install JupyterLab locally from your terminal (Mac) or Command Prompt/PowerShell (Win) with conda install -c conda- forge jupyterlab. For more information on installing JupyterLab, visit https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html.

If you already have an installation of Anaconda or similar and a preferred plain text editor at your disposal, the relevant code covered in this tutorial should work in other environments as well.

2.1.2 Testing Out the Programming Environment If you open https://jupyter.org/try and click Try JupyterLab, you will be welcomed by a rather complex demo example. We will ignore this for now and create our own new notebook instead. Under File choose New > New Notebook. In the Select Kernel prompt, choose Python 3.

Page 6 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization You now see an empty notebook called Untitled.ipynb with one empty cell: a text box where you type code for execution. The cells of a notebook are convenient for structuring code in small chunks that can be run one at a time—as opposed to the more common way of building and running a whole script at once. A single cell can contain as many or few lines of code as you want. You could also change the cell to hold markdown-formatted text instead of code to write longer comments or add illustrations.

You can test what JupyterLab does by writing some code in the empty cell. Click inside the cell and type in the following:

print("Hello, world!")

Hit shift + enter or press the small play arrow ▸ above in the toolbar to run the cell.

2.1.3 Creating Our Notebook, Importing Necessary Modules Create a new notebook and save it with a name, for example, polarchart.ipynb. You will refer back to this should your JupyterLab session time out.

If we were running this project locally, we would first need to install all the modules necessary for generating the visualization. However, the trial environment launched from https://jupyter.org/try conveniently comes equipped with everything for our purposes.

• Matplotlib (MPL for short) is one of the most popular visualization libraries for Python. It has a huge feature set, and there are dozens of often very different—and sometimes odd—ways of achieving the same thing. Do not be alarmed if you search the internet for how to do something in Matplotlib and cannot comprehend some particular instructions. It is probably the case that the article you have found is based on a different approach to MPL than

Page 7 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization what we are working with in this tutorial. Pyplot (PLT) is a Matplotlib module which provides features similar to MATLAB, https://matplotlib.org/ • Pandas (PD) is a powerful data analysis and manipulation toolset, https://pandas.pydata.org/ • Seaborn (SNS) brings added functionality to Matplotlib, for example, new chart types. All of MPL’s features are still accessible, but Seaborn offers an easier to use interface for some of them. This guide is written based on Seaborn 0.10.1—other versions can differ somewhat.

You can run the following code in the first cell of your notebook to import the necessary items:

import matplotlib.pyplot as plt

import pandas as pd

import seaborn as sns

One initially confusing thing about working with Matplotlib is the proliferation of different submodules you need to invoke at different times when writing your code. At times we are giving commands to Matplotlib, other times to Pyplot, but also conceivably to Seaborn, and so on. You will start getting used to it after a while, but it is sure to cause some confusion more than once.

It is a common practice to use the shortcut plt for Pyplot and pd for Pandas. There are other such established shortcuts in the world of Python, as well, for example, sns for Seaborn and np for Numpy. There is technically no need to use such shortcuts, but as most online articles will follow this convention, we will as well.

2.1.4 Reading In and Formatting Our Data Save the tutorial csv data file Weekly_Counts_of_Deaths_by_State_and _Select_Causes__2014-2018.csv to a folder on your computer. The example

Page 8 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization uses the same folder where the JupyterLab notebook is saved, so no other path than the file name is necessary to refer to the csv files. If you choose to save your files elsewhere, just update the path accordingly.

We will begin by importing our data. Enter this in the next cell:

data = pd.read_csv('Weekly_Counts_of_Deaths_by_State_and_Select_Causes__2014-2018.csv', sep=';', thousands=',')

data

Instead of using Python’s own csv module, we will use the read_csv() function from the Pandas library. This creates a “pandas dataframe” of our csv, essentially a spreadsheet table with rows and columns, which lets the user specify decimal separators and other important parameters during import and is a required data format for some plotting modules. This file also uses thousands separators. The argument thousands=',' allows the function to read the numerical data correctly.

Should you have a different decimal separator or delimiter in your dataset, you could specify it during import like this importedfile = read_csv("file.csv", delimiter = "\t", decimal = ",").

Depending on your datasets in the future, you may also need to experiment with different encoding types—if wrong characters appear in names, you might need to change the encoding. Here it is by default encoding = "utf-8". See the full list: https://docs.python.org/3/library/codecs.html#standard-encodings.

You can check and see that the import worked by adding data to the end of the cell (Figure 2). The traditional long-form version of this command is print(data), but one of the convenient things about working with a notebook is that you also can see what a variable contains simply by typing the name and running the

Page 9 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization cell—this will show the table and with nicer formatting. Note that if you place two variable names in the same cell like this, only the last one will be shown in the preview. To inspect many different variables in the same cell or inspect any variables in environments other than JupyterLab, you will still need to use the print() command.

The code line is as follows:

[2]: data = pd.read_csv(‘Weekly_Counts_of_Deaths_by_State_and_Select_Causes__2014-2018.csv’, sep=‘;’, thousands=‘,’) data

The output table is as follows.

[2]:

Jurisdiction Week Malignant Diabetes Alzheimer MMWR MMWR All Natural Septicemia of Ending neoplasm mellitus disease Flag_neopl Flag_diab Flag_alz Year Week Cause Cause (A40-A41) Occurrence Date (C00-C97) (E10-E14) (G30)

01/04/ Alabama 2014 1 355 327 NaN 60.0 NaN 10.0 NaN NaN NaN 2014

01/11/ Alabama 2014 2 872 792 23.0 163.0 23.0 35.0 NaN NaN NaN 2014

01/18/ Alabama 2014 3 1044 971 21.0 209.0 34.0 31.0 NaN NaN NaN 2014

01/25/ Alabama 2014 4 1022 967 25.0 205.0 23.0 25.0 NaN NaN NaN 2014

02/01/ Alabama 2014 5 1040 953 18.0 200.0 26.0 38.0 NaN NaN NaN 2014

Page 10 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization

12/01/ Suppressed Suppressed Wyoming 2018 48 106 94 NaN 22.0 NaN 10.0 NaN 2018 (counts 1-9) (counts 1-9)

12/08/ Suppressed Suppressed Wyoming 2018 49 92 80 0.0 22.0 NaN NaN NaN 2018 (counts 1-9) (counts 1-9)

12/15/ Suppressed Suppressed Wyoming 2018 50 101 92 0.0 19.0 NaN NaN NaN 2018 (counts 1-9) (counts 1-9)

12/22/ Suppressed Suppressed Wyoming 2018 51 105 95 NaN 24.0 NaN NaN NaN 2018 (counts 1-9) (counts 1-9)

12/29/ Suppressed Suppressed Wyoming 2018 52 92 86 0.0 16.0 NaN NaN NaN 2018 (counts 1-9) (counts 1-9)

Text under the table reads, “13833 rows times 30 columns.”

Figure 2. The Imported csv

This is a rather large table with 30 columns. The preview does not show them all;

Page 11 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization hidden columns are indicated with the ellipsis symbol (…).> If you wish to change the number of columns/rows displayed, type pd.options.display.max_columns = 30/ pd.options.display.max_rows = 100.

Next, we want to select the data for plotting. You can enter all the following code in one cell if you like, but we will break it up for ease of explaining the various subparts.

A simple way of selecting a subset is by passing the loc method a set of conditions so to speak. The conditional expression is actually a pandas Series of boolean (True or False) values for each row in the original DataFrame. In this case, it returns True only for rows where the field TIME is 2019. Passing this Series selects the rows with True values.

Using boolean operators like & for “and” or | for “or” one can add multiple conditions for selection. For more details, see the Pandas introduction to subsetting.

To select just the data for California, input the following.

data_select = data.loc[data['Jurisdiction of Occurrence']=='California']

data_select

Then, as we are not going to use the flag columns, we should filter them out. The 17th column, index 16, is the last data column:

data_select = data_select.loc[:,data.columns[0:16]]

data_select

If we have many conditions, selecting a subset of the data can become a bit difficult to read as the name of the original dataframe is repeated for each

Page 12 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization condition. An alternative way of selecting by conditions is using the query() method of the dataframe. See the User Guide on indexing for further details on selection and indexing methods: https://pandas.pydata.org/docs/user_guide/ indexing.html

To plot a heat map, we need a simple table with rows and columns, where every cell contains a numerical value—in this case, death counts.

Now, to have a look at influenza and pneumonia deaths by week, we can use Pandas’ pivot functionality:

data_pivot = data_select.pivot(index='MMWR Year', columns= 'MMWR Week', values='Influenza and pneumonia (J10-J18)')

data_pivot

The function pivot reshapes a DataFrame based on column values and returns a new DataFrame. The column passed to index becomes the index, the column or columns passed to columns becomes new columns, and values take one or several data values. Since we assign it values='Influenza and pneumonia (J10-J18)' all other causes of deaths are omitted.

2.1.5 Plotting the Chart We can now plot this table of weekly deaths as a heat map with the simple command ax = sns.heatmap(data_select_pivot) (Figure 3).

The code line is as follows:

[2]: data = pd.read_csv(‘Weekly_Counts_of_Deaths_by_State_and_Select_Causes__2014-2018.csv’, sep=‘;’, thousands=‘,’) data

Page 13 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization The output table is as follows.

[2]:

Jurisdiction Week Malignant Diabetes Alzheimer MMWR MMWR All Natural Septicemia of Ending neoplasm mellitus disease Flag_neopl Flag_diab Flag_alz Year Week Cause Cause (A40-A41) Occurrence Date (C00-C97) (E10-E14) (G30)

01/04/ Alabama 2014 1 355 327 NaN 60.0 NaN 10.0 NaN NaN NaN 2014

01/11/ Alabama 2014 2 872 792 23.0 163.0 23.0 35.0 NaN NaN NaN 2014

01/18/ Alabama 2014 3 1044 971 21.0 209.0 34.0 31.0 NaN NaN NaN 2014

01/25/ Alabama 2014 4 1022 967 25.0 205.0 23.0 25.0 NaN NaN NaN 2014

02/01/ Alabama 2014 5 1040 953 18.0 200.0 26.0 38.0 NaN NaN NaN 2014

12/01/ Suppressed Suppressed Wyoming 2018 48 106 94 NaN 22.0 NaN 10.0 NaN 2018 (counts 1-9) (counts 1-9)

12/08/ Suppressed Suppressed Wyoming 2018 49 92 80 0.0 22.0 NaN NaN NaN 2018 (counts 1-9) (counts 1-9)

12/15/ Suppressed Suppressed Wyoming 2018 50 101 92 0.0 19.0 NaN NaN NaN 2018 (counts 1-9) (counts 1-9)

12/22/ Suppressed Suppressed Wyoming 2018 51 105 95 NaN 24.0 NaN NaN NaN 2018 (counts 1-9) (counts 1-9)

12/29/ Suppressed Suppressed Wyoming 2018 52 92 86 0.0 16.0 NaN NaN NaN 2018 (counts 1-9) (counts 1-9)

Text under the table reads, “13833 rows times 30 columns.”

Figure 3. Pivoted Table and Heat Map With Default Settings

Page 14 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization

This chart gives some impression of the data but can easily be improved by setting size, color scale, and tick marks.

fig, ax = plt.subplots(figsize=(12, 4))

sns.heatmap(data_select_pivot, linewidths=.5, ax=ax, cmap='YlOrRd', linewidth=0.25, robust=True)

plt.tick_params( axis='y', labelsize=10, length=0, labelrotation=0)

plt.tick_params( axis='x', labelsize=10, labeltop=True, length=0)

plt.ylim(5,0)

To set a different color map, we tell the chart to use one of the built-in named color maps with cmap='YlOrRd'. This is a sensible choice, as it gives a clear impression

Page 15 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization of increasing values. The default is a bit ambiguous in this regard, with very light and saturated colors corresponding to high values.

See the full Matplotlib documentation for available color maps: https://matplotlib.org/tutorials/colors/colormaps.html and Seaborn’s documentation https://seaborn.pydata.org/archive/0.10/tutorial/ color_palettes.html.

The argument robust=True tells Seaborn to set the color map based on quantiles instead of calculated extreme values, giving a more balanced appearance if there are outliers. You could also set the value range manually using the arguments vmax and vmin. If you have diverging data, you can provide the center value with the argument center which takes a float (decimal number).

The command plt.ylim() sets the y-axis range to 0, 5, so as to not crop out half of the top and bottom rows (Figure 4).

The input code lines are as follows: fig, ax = plt.subplots(figsize=(12, 4)) sns.heatmap(data_select_pivot, linewidths=.5, ax=ax, cmap=‘YlOrRd’, linewidth=0.25, robust=True) plt.tick_params( axis=‘y’, labelsize=10, length=0, labelrotation=0) plt.tick_params( axis=‘x’, labelsize=10, labeltop=True, length=0) plt.ylim(5,0)

The output heat map shows the number of deaths by MMWR week. The horizontal axis is labeled MMWR week and ranges from 1 to 53, in increments of 2. The vertical axis is labeled MMWR year and ranges from 2014 to 2018, in increments

Page 16 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization of 1. The number of deaths is high in the first seven weeks and in the last two weeks of every year. As the count increases, the intensity of the color shade increases.

Figure 4. Improved Heat Map of Weekly Data

To show this data as a heat map of monthly values, we need to do some calculations on the data frame. Each row in the original data has a 'Week Ending Date' value which is a date. Using Pandas functionality, we can use this to create a DateTime index for the table which makes time-based aggregation easy:

data_select.set_index(pd.to_datetime(data_select['Week Ending Date']), inplace=True)

If the original data provided daily instead of weekly counts, we could now directly use the Pandas resample function to sum them by month.

Since some weeks span two months, doing such a “naive” resampling would create artifacts in the monthly counts (with some months getting too low and some too high values).

Page 17 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization What needs to be done instead is to identify all weeks that span two months. Since we do not precisely know how many deaths occurred per day, we will divide the week’s total deaths by seven and then multiply that by how many days into the new month this week extends. Then we can subtract this value from the original weekly count to get the number of deaths to assign to the previous week.

To this end, we create the following function.

def adjust_week_on_month_boundary(row, column):

#print(row.name.day)

day_of_month = row.name.day

week_value = row[column]

adjusted_week_value = week_value / 7 * day_of_month

if day_of_month > 6:

return week_value

else:

return adjusted_week_value

This function is designed to be used with a column name and apply on the rows of a DataFrame (as defined by the argument axis=1). For each row in the data, the function is run, and the resulting values are assigned to a new column.

col = 'Influenza and pneumonia (J10-J18)'

Page 18 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization # use apply to calculate the current week value

data_select['value_current_week'] = data_select.apply(adjust_week_on_month_boundary, axis=1, column=col)

Now we can subtract the calculated week value to get the number to assign to the previous week (you can do this type of arithmetic with columns in a DataFrame; here the values get subtracted from each other row by row):

data_select['value_previous_week'] = data_select[col] - data_select['value_current_week']

Then we use the shift function to shift the previous week column 1 row upward. This means we are losing the values remaining for the first row and introducing an empty value in the end (since 2013 is left out and we lack data for the first week of 2019). To get the final weekly counts, we add everything up in a new column value_adjusted_weeks and check that the result looks right:

data_select['value_previous_week'] = data_select['value_previous_week'].shift(-1, fill_value=0)

data_select['value_adjusted_weeks'] = data_select['value_current_week'] + data_select['value_previous_week']

data_select[[col,'value_current_week', 'value_previous_week', 'value_adjusted_weeks']]

Now it is possible to resample the data frame by months. We give the value

Page 19 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization column a more descriptive name. To make the data easy to reshape, we add year, month number, and name columns from the index (Figure 5):

data_select_monthly = data_select['value_adjusted_weeks'].resample('M').sum().to_frame()

data_select_monthly.rename(columns={'value_adjusted_weeks':'monthly_influenza_deaths'}, inplace=True)

# add month and year columns from index for reshaping

data_select_monthly['Month'] = data_select_monthly.index.month

data_select_monthly['Month_name'] = data_select_monthly.index.month_name()

data_select_monthly['Year'] = data_select_monthly.index.year

data_select_monthly

The code lines are as follows:

# resample by month and return a data frame data_select_monthly = data_select['value_adjusted_weeks'].resample('M').sum().to_frame() data_select_monthly.rename(columns='value_adjusted_weeks':'monthly_influenza_deaths'}, inplace=True)

Page 20 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization # add month and year columns from index for reshaping data_select_monthly['Month'] = data_select_monthly.index.month data_select_monthly['Month_name'] = data_select_monthly.index.month_name() data_select_monthly['Year'] = data_select_monthly.index.year data_select_monthly

The output table is as follows.

Week Ending Date Monthly_influenza_deaths Month Month_Name Year

2014-01-31 986.285714 1 January 2014

2014-02-28 692.857143 2 February 2014

2014-03-31 520.571429 3 March 2014

2014-04-30 474.857143 4 April 2014

2014-05-31 445.714286 5 May 2014

2014-06-30 382.428571 6 June 2014

2014-07-31 392.428571 7 July 2014

2014-08-31 364.285714 8 August 2014

2014-09-30 361.428571 9 September 2014

2014-10-31 373.428571 10 October 2014

2014-11-30 446.857143 11 November 2014

2014-12-31 502.857143 12 December 2014

2015-01-31 918.285714 1 January 2015

Page 21 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization

2015-02-28 730.000000 2 February 2015

Figure 5. Weekly Data of Influenza and Pneumonia Deaths Reshaped Into Months

Note that a degree of inaccuracy is introduced here by the division of the weekly values.

This data can now be pivoted into the correct format for plotting a heat map. We first use the month numbers for columns, otherwise, they get sorted in alphabetical order:

data_pivot = data_select_monthly.pivot(index='Year', columns= 'Month', values='monthly_influenza_deaths')

Page 22 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization data_pivot.columns = data_select_monthly['2018']['Month_name'].str[0:3] # first three letters of month names

data_pivot

To plot the final version of the heat map, input the following.

fig, ax = plt.subplots(figsize=(14, 6))

sns.heatmap(data_pivot, linewidths=.5, ax=ax, annot=True,

fmt=",.0f", cmap='YlOrRd', linewidth=0.25,

cbar_kws={’spacing':'proportional'})

plt.ylim(5,0)

The fmt argument controls how annotation values are displayed with a formatting string without brackets. Here it is set to show float (decimal values) with thousand separator and no decimals. See: https://pyformat.info/#number.

To tweak the output further, we adjust the axis labels, add a title, a source, and add the thousand separators to the color bar (Figure 6). (Note how the formatting string is written slightly differently here.)

plt.tick_params( axis='y', labelsize=12, length=0, labelrotation=0)

plt.tick_params( axis='x', labelsize=10, labeltop=True, length=0, labelrotation=0)

Page 23 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization plt.xlabel('Month')

plt.ylabel(None)

fig.suptitle('Monthly deaths from influenza and pneumonia in California 2014–2018',

x=0.09, horizontalalignment='left', fontweight='bold')

fig.text(s='Source: NCHS, 2018', x=0.7,y=0.05)

colorbar = ax.collections[0].colorbar

# set thousands separator on color bar

colorbar.ax.set_yticklabels(["{:,.0f}".format(i) for i in

colorbar.get_ticks()])

The input code lines are as follows: plt.xlabel(‘Month’) plt.ylabel(None) fig.suptitle(‘Monthly deaths from influenza and pneumonia in California 2014–2018’, x=0.09, horizontalalignment=‘left’, fontweight=‘bold’)

Page 24 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization fig.text(s=‘Source: NCHS, 2018‘, x=0.7,y=0.05) colorbar = ax.collections[0].colorbar

# set thousands separator on color bar colorbar.ax.set_yticklabels([“{:,.0f}”.format(i) for i in colorbar.get_ticks()])

The output heat map is titled “Monthly deaths from influenza and pneumonia in California 2014–2018.” The horizontal axis is labeled month and lists months from January to December. The vertical axis ranges from 2014 to 2018, in increments of 1. The data are tabulated below.

Year January February March April May June July August September October November December

2014 986 693 521 475 446 382 392 364 361 373 447 503

2015 918 730 577 521 452 420 379 358 385 417 455 573

2016 679 680 749 521 420 360 421 330 376 403 417 635

2017 985 722 617 511 466 403 360 372 368 399 396 762

2018 1,674 817 771 554 427 389 359 324 319 358 441 485

As the count increases, the intensity of the color shade increases. Text under the map reads, “Source: NCHS, 2018.”

Figure 6. The Finalized Heat Map

Page 25 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization

2.1.6 Saving the Plot At this point, you can proceed to save out your figure with the following code. You can try different file types such as pdf, png, svg, or jpg. This will output the figure into the same directory you have saved the notebook in.

fig.savefig('heat-map.pdf', format='pdf')

Note that if you use the plot.show() function before exporting the plot, you will export out a blank image. If you use the show function in the same cell, make sure to either save the plot first or comment out the line showing your plot before running the export.

You can also quickly export to a png by right-clicking on the plot in the Notebook

Page 26 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization and choosing Create new view for outputs. In the Output view window that opens, you can again right-click and save the image. If you need to format the resolution or size of your plot before output, you can experiment with:

fig.set_dpi(150)

fig.set_size_inches(6,6)

Centimeters are sadly somewhat more complicated, requiring some calculations such as inch = 2.54, followed by fig.set_size_inches(6*inch, 10*inch).

Note: If you would like to export out a vector image with editable text, you will need to include matplotlib.rcParams['pdf.fonttype'] = 42 at the beginning of your notebook. rcParams is a dictionary-like file with default settings for all of Matplotlib. If you work more with Matplotlib, you might want to consider adding some preferred defaults to it. See matplotlib rcParams for more information.

2.1.7 Alternative Version With Classified Color Scale While heat map conventionally tends to use unclassified or continuous color scales, a classified color scale makes it easier to visually separate value ranges.

The simplest way to achieve this is to create a new palette with the desired number of colors and pass it to cmap:

cmap = sns.color_palette("YlOrRd", 5)

fig, ax = plt.subplots(figsize=(14, 6))

sns.heatmap(data_pivot, linewidths=.5, ax=ax, annot=True, fmt=",.0f",

cbar_kws={’spacing':'proportional'},

Page 27 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization cmap=cmap,

linewidth=0.25)

To manually define the locations of the color bounds, we can use some additional Matplotlib functionality. BoundaryNorm creates a colormap index based on discrete intervals. The number of colors is always one less than the provided bounds. The argument cbar_kws={'spacing':'proportional'} sizes the segments of the color bar proportionally to the data values (Figure 7). See documentation and Matplotlib tutorials for more details: https://matplotlib.org/3.3.1/tutorials/colors/ colorbar_only.html#sphx-glr-tutorials-colors-colorbar-only-py.

from matplotlib.colors import BoundaryNorm

from matplotlib import cm # matplotlib color map object collection

#cmap = sns.cubehelix_palette(light=0.9, as_cmap=True) # optionally use a seaborn cubehelix color map

cmap = cm.OrRd # orange-red color map

bounds = [300, 500, 700, 900, 1200, 1700]

norm = BoundaryNorm(bounds, cmap.N)

fig, ax = plt.subplots(figsize=(14, 6))

sns.heatmap(data_pivot, linewidths=.5, ax=ax, annot=True, fmt=",.0f",

cbar_kws={’spacing':'proportional'},

Page 28 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization cmap=cmap,

norm=norm,

linewidth=0.25)

plt.ylim(5,0)

colorbar = ax.collections[0].colorbar

# Specify color bar labeling manually at bounds

colorbar.set_ticks(bounds)

The input code lines are as follows. cmap = cm.OrRd # orange-red color map bounds = [300, 500, 700, 900, 1200, 1700] norm = BoundaryNorm(bounds, cmap.N) fig, ax = plt.subplots(figsize=(14, 6)) sns.heatmap(data_pivot, linewidths=.5, ax=ax, annot=True, fmt=“,.0f”, cbar_kws={‘spacing’:‘proportional’}, cmap=cmap, norm=norm, linewidth=0.25) plt.ylim(5,0) colorbar = ax.collections[0].colorbar

# Specify color bar labeling manually at bounds colorbar.set_ticks(bounds)

Page 29 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization The data from the output heat map are tabulated below.

Year January February March April May June July August September October November December

2014 986 693 521 475 446 382 392 364 361 373 447 503

2015 918 730 577 521 452 420 379 358 385 417 455 573

2016 679 680 749 521 420 360 421 330 376 403 417 635

2017 985 722 617 511 466 403 360 372 368 399 396 762

2018 1,674 817 771 554 427 389 359 324 319 358 441 485

As the count increases, the intensity of the color shade increases.

Figure 7. The Finalized Heat Map

Page 30 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization

2.2 Exploring the Output The heat map created in this demonstration shows clearly and at a glance how recorded deaths from influenza and pneumonia follow a seasonal pattern in California over the time period in question (Figure 8). The large number of deaths in January 2018 stands out very clearly. Another interesting element in the visualization that is readily apparent is the somewhat divergent pattern for 2016, where the peak of influenza and pneumonia deaths occurred in March. That year also appears to have the lowest overall number of deaths due to this cause.

The horizontal axis is labeled month and lists months from January to December. The vertical axis ranges from 2014 to 2018, in increments of 1. The data are

Page 31 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization tabulated below.

Year January February March April May June July August September October November December

2014 986 693 521 475 446 382 392 364 361 373 447 503

2015 918 730 577 521 452 420 379 358 385 417 455 573

2016 679 680 749 521 420 360 421 330 376 403 417 635

2017 985 722 617 511 466 403 360 372 368 399 396 762

2018 1,674 817 771 554 427 389 359 324 319 358 441 485

As the count increases, the intensity of the color shade increases. Text under the map reads, “Source: NCHS, 2018.”

Figure 8. Heat Map of Monthly Influenza and Pneumonia Deaths in California 2014–2018

From the visualization, it would further appear that deaths from influenza and

Page 32 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization pneumonia often start increasing rather quickly in December, to taper off more slowly in the following first three months of the next year. August stands out as the month with the fewest deaths.

The heat map is a useful choice for data like this and can be a good tool for finding new insights. Since it can be determined that influenza and pneumonia deaths increase during the winter months, one might consider a reworking of the visualization where each row is centered on the new year. The accompanying notebook file includes an example on how one might do this.

The time series here is too short to observe long-term trends. A cycle plot might more clearly highlight such rates of change.

3. Your Turn Now that you have been introduced to some of the basic operations necessary to complete this type of visualization, you may experiment with variations based on this same dataset. You can try plotting different causes of death or a different state—how would you accomplish these tasks?

Page 33 of 33 Learn to Create a Heat Map in Python With Data From NCHS (2018)