Learn to Create a Choropleth in R With Data From Eurostat (2017)

© 2021 SAGE Publications, Ltd. All Rights Reserved. This PDF has been generated from SAGE Research Methods Datasets. SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization Learn to Create a Choropleth Map in R With Data From Eurostat (2017)

Student Guide

Introduction This tutorial explores creating a choropleth map by visualizing a Eurostat dataset about employment in different sectors across Europe. Choropleth , a subtype of the areal unit map, are used to visualize predefined contiguous geographic areas in map form and enable the reader to easily scrutinize the geographic distribution of a value over a given region. The choropleth map uses color or pattern to encode values across geographic regions.

The visualization in this tutorial uses Eurostat data about employment in industry, trade, and service sectors across Europe. Each country is colored based on their proportional number of employees in a given sector (Figure 1).

A choropleth map showing European countries color-coded by data value. Colors range from yellow to red in a multi-hue scale. Text at the bottom of the map reads “Source: Natural Earth, Eurostat 2017.” The map also shows Montenegro, Albania, North Macedonia, and Turkey grayed out.

Figure 1. Different Areal Units: Countries and States

Page 2 of 21 Learn to Create a Choropleth Map in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization

What Is a Choropleth Map? Choropleth maps are perhaps the best known and most widely used type of areal unit map. Areal unit maps, in general, are composed of a particular geographic region divided into smaller subregions, each of which is usually assigned a color or pattern based on the value of a specific variable in that region. Choropleth maps are a subtype of areal unit maps, where the method of differentiating regions is based largely on predefined geographic regions, such as postal code areas, counties, states, or countries. Other types of areal unit maps are discussed in the Variations and Alternatives section below.

The quantitative data used in these maps are most often classified into brackets

Page 3 of 21 Learn to Create a Choropleth Map in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization (also known as bins) for visualizing on an ordinal scale, especially when color density will be the only means of depicting value. Classifying, or binning, the individual values depicted in the map can be done in a variety of ways, for instance, classifying all data points into arbitrary round number categories or into categories delineated by breaks at equal intervals, quantiles, natural breaks (Jenks) in the data, or standard deviation breaks around the mean or median value. Categorizing into quantiles, that is, dividing the dataset into classes with an equal number of items, is a good option for most datasets—though this will always depend on the particular topic and data at hand. Below in Figure 2 is an example of how different binning can affect the final result.

Both choropleth maps show European countries but the one on the left uses a fixed interval method and the one on the right uses a quantile binning method. Data shown by the maps are tabulated as follows:

Fixed interval percentages Countries

10 to 15 Norway, U.K., Netherlands, Greece

15 to 20 Iceland, Ireland, Sweden, Latvia, Denmark, Belgium, France, Spain

20 to 25 Finland, Lithuania, Germany, Switzerland, Austria, Portugal

25 to 30 Estonia, Poland, Hungary, Croatia, Italy, Bulgaria

30 to 35 Slovakia, Romania, Serbia, Bosnia and Herzegovina, Slovenia

35 to 40 Czech Republic

Missing Montenegro, Albania, North Macedonia, Turkey

Quantile percentages Countries

12.09 to 15 Norway, United Kingdom, Netherlands, Greece

15.12 to 18 Iceland, Ireland, Sweden, Denmark, Belgium, Spain

Page 4 of 21 Learn to Create a Choropleth Map in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization 18.69 to 23 Finland, Latvia, Lithuania, France, Portugal, Switzerland, Austria

23.65 to 28 Estonia, Germany, Italy, Croatia, Bulgaria

28.03 to 35 Poland, Czech Republic, Slovakia, Hungary, Romania, Serbia, Bosnia and Herzegovina, Slovenia

Missing Montenegro, Albania, North Macedonia, Turkey

Text at the bottom of the map reads “Source: Natural Earth, Eurostat 2017.

Figure 2. Two Choropleth Maps, Using Fixed Interval and Quantile Binning Methods

Page 5 of 21 Learn to Create a Choropleth Map in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization

Why Use a Choropleth Map? Choropleth maps are well suited for comparing differences between clearly defined geographical areas. They are often a good map choice when the dataset itself contains some clear geographic area designations. These maps are especially suited for encoding numerical data in relative form, which enables more accurate comparison across disparate regions (see Considerations and Cautions below). Choropleth maps are not just for visualizing numerical values; however, they can also encode nominal data such as political party preferences by state, country membership in the EU, adherence to international treaties, and much more in the same vein. For example, the very simple figure below (Figure

Page 6 of 21 Learn to Create a Choropleth Map in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization 3) shows the particular countries which are included in the Eurostat Persons employed by NACE Rev. 2 dataset.

The choropleth map showing countries addressed by the dataset at hand, with countries included colored in yellow.Areas not included in the dataset but pictured on the map include Russia, Ukraine, Belarus, Moldova, and Kosovo, which are grayed out. Text at the bottom of the map reads “Source: Natural Earth, Eurostat 2017.”

Figure 3. Simple Choropleth Map, Showing Which Countries are Included in the Dataset

Page 7 of 21 Learn to Create a Choropleth Map in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization

Considerations and Cautions Areal unit choices: Choropleth maps are entirely dependent on areal unit boundaries, and therefore, careful consideration is needed when choosing the appropriate areal units for visualizing any particular topic. Previously defined administrative areas, for instance, can be very dissimilar when comparing the land area to population density, leaving room for visual misinterpretation where larger unpopulated areas have many times the visual weight of densely populated

Page 8 of 21 Learn to Create a Choropleth Map in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization urban areas. There is also the larger issue in statistical analysis known as the modifiable areal unit problem (MAUP), in which bias can be unwittingly introduced by choice of areal unit. Fortunately, it is not always necessary to individually determine a suitable areal unit for visualization, as the data itself will often have some predefined area designation. In these cases, the data’s own designation should be used almost without exception. For example, polling districts used in an election and any visualizations based on this polling data should use the same areal units to avoid misrepresenting the data. Figure 4 below shows two examples of differnet areal unit choices.

Two maps side-by-side, on the left titled “Areal unit: countries” and spilt into similarly colored areas by country borders. The one on the right is titled “Areal unit: states and provinces” and shows the corresponding areal units of the same region. Text at the bottom of the map reads “Source: Natural Earth.”

Figure 4. Different Areal Units: Countries and States

Color choices: As in other areal unit maps, color can be used in choropleth maps

Page 9 of 21 Learn to Create a Choropleth Map in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization to encode both qualitative and quantitative data, and the same principles of color apply.

On a qualitative color scale, colors usually represent objects that belong in different groups or categories. The goal with a qualitative color scale is usually to create a color palette in which different colors are easily distinguishable, that is to say, relying heavily on major differences in hue. Some care should especially be used in choosing colors for charts that will feature neighboring areas of color, as small areas can easily become lost between other shapes without a significantly differentiating color, and even larger areas can blend into each other visually if their colors are too similar.

Quantitative color scales on the other hand, which can be either classified or unclassified, often make use of variation in the lightness of color to show variation in value. Generally, as the value of a variable increases, so does the contrast between the color and its background. Classifying the values depicted in the map into a predefined set of colors is usually easier for the reader to differentiate, and therefore generally preferable to an unclassified continuous color scale where classes are encoded as ordinal values. As a general rule of thumb, the human eye can reliably only distinguish 6–7 degrees of lightness in any given hue due to a phenomenon called simultaneous contrast. Consequently, the number of classes visualized using a choropleth map should also ideally be limited to 7 or fewer, depending on your choice of the color palette.

A sequential single-hue color scale, where value differences are marked only by differences in hue lightness (e.g., white to red), can especially hinder the differentiation of adjoining areas in choropleth maps. A better choice in these cases is a sequential multi-hue scale, where changes on the value scale are accompanied by changes in hue (e.g., yellow to red). Another option is the diverging color scale, often used to depict value scales containing both negative

Page 10 of 21 Learn to Create a Choropleth Map in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization and positive values, where the color scale ends in disparate hues adjoined by a neutral third hue in the middle (e.g., blue, white, red). These latter two color palettes also enable the use of a few more classes than the recommended maximum, if necessary, as the effects of simultaneous contrast are diminished with increased variation in hue (Figure 5).

The color scales shown in the image are listed as follows:

• Qualitative scale: This scale has boxes of markedly different color, these are used to depict different groups or categories. • Quantitative scales, in which colors represent variations in value: • Single-hue scale: This scale has boxes from white to red, with value changes marked with changes in hue lightness. • Multi-hue scale: This scale has boxes from yellow to red, with value changes marked by changing hue. • Diverging scale: This scale has boxes of blue and red at each end, with a sliding scale through a neutral-color between them.

Figure 5. Color Palette Examples: Single-hue, Multi-hue, and Diverging

Establishing good color contrast is overall a good practice, keeping in mind readers with differences in color vision. Whenever possible, it is recommended to check chosen color palettes through some form of simulated preview to see what the result looks like for readers with deuteranopia, pronatopia, or other differences in color vision (e.g., within Adobe image and vector editing software with different

Page 11 of 21 Learn to Create a Choropleth Map in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization Proof Setups, or with online resources such as the Coblis simulator).

Normalizing values: It should also be noted that choropleth maps are not well suited for use with absolute numbers. Ideally, absolute values such as the number of employed people should be normalized to make it comparable across diverse geographic regions. In this case, for example, the number of employed people in relation to the total population, and so on. Below in Figure 6, is an example of sector employment using absolute numbers, and another with the same numbers in relation to the total number of employed people in that country. The version with absolute numbers, in essence, only shows where most employees are situated on the map.

The left map is titled “absolute numbers” and data shown by it are tabulated as follows:

Number of people Countries

22,221 - Iceland, Estonia, Latvia, North Macedonia

154,290 - Norway, Denmark, Ireland, Belgium Lithuania, Slovenia, Croatia, Bosnia and Herzegovina

318,494 - Sweden, Finland, Slovakia, Serbia, Bulgaria, Greece

615,878 - Netherlands, Portugal, Switzerland, Austria, Hungary, Romania

1,227,974- United Kingdom, France, Spain, Germany, Poland, Czech Republic, Italy

Missing Montenegro, Albania, Turkey

The right map is titled “relative numbers” and data shown by it are tabulated as follows:

Percentage Countries

12 to 15 Norway, United Kingdom, Netherlands, Greece

Page 12 of 21 Learn to Create a Choropleth Map in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization

15 to 18 Iceland, Sweden, Ireland, Spain, Belgium

18 to 23 Finland, Latvia, Lithuania, France, Portugal, Switzerland, Austria

23 to 28 Estonia, Germany, Italy, Croatia, Bulgaria

28 to 35 Poland, Czech Republic, Slovakia, Hungary, Romania, Serbia, Slovenia, Bosnia and Herzegovina

No data Montenegro, Albania, North Macedonia, Turkey

Text at the bottom of the map reads “Source: Natural Earth, Eurostat 2017.”

Figure 6. Two Maps Using Absolute and Relative Numbers

Variations and Alternatives There are a number of other areal unit map subtypes besides the choropleth: namely, the grid map, , isarithmic or contour map, and area-class

Page 13 of 21 Learn to Create a Choropleth Map in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization map (Figure 7). All these other subtypes function on the same principle that areas of a map encode a certain value but differ in the means of delineating areal units. Grid maps divide the map into equal-sized units such as 1 × 1 km, while dasymetric maps subdivide predefined borders using information about land use, usually to exclude areas not relevant to the dataset. Contour maps create new organic area shapes by connecting points of similar value and are most often used for displaying continuous scientific quantities such as atmospheric pressure. An area-class map is a type of contour map in which the areas between lines are colored in—much like a choropleth map—but the size of the area itself is based on the nature of the data, and when representing quantifiable data neighboring areas always represent successive classes. Area-class maps are often also used for nominal data, such as species habitat distributions.

All maps follow a yellow to red multi-hue scale and are described as follows:

• Choropleth map: The map shown has distinct colored boundaries marked on it, based on some pre-defined areal divisions. • Grid map: The map shown is divided into a colored grid of equal-sized squares. • Dasymetric map: The map shown is divided up into areas by land use. Areas without data as well as roads between sections are shown in gray. • Area-class map: The map shown has areas colored by successive value classes.

Figure 7. Grid Map, Dasymetric Map, and Area-Class Map Examples

Page 14 of 21 Learn to Create a Choropleth Map in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization

If a particular dataset does not lend itself to delineation along with areal divisions, but still includes a spatial element that needs visualization, it may be advisable to use an alternate type of map instead. For example, dot distribution maps or proportional symbol maps are well suited for visualizing individual point occurrences without using specific borders to encode significant data. These are suitable for visualizing individual point data, such as earthquakes, car accidents, or the number and location of buildings. It should be noted that from any map with areal divisions, a can be created simply by locating the symbols over the centroids of the regions.

There is also the dorling option, which closely resembles a proportional symbol map, with the difference that the latter uses fixed geographical locations for the data symbols (Figure 8). In a Dorling cartogram, the symbols are positioned next to each other in relative locations, which allows closely situated regions with large values to be seen clearly. In this respect, the proportional symbol map offers little or no benefit of accuracy over the Dorling cartogram, as the circles simply are located using the calculated centroids of the countries, which tells nothing about the actual distribution of the data. Dorling , much like areal unit maps, are not well suited for comparing small differences between data points.

The Dorling cartogram does not show the map of Europe but only shows relative locations of different countries using circles. Data shown by the cartogram are tabulated as follows:

Page 15 of 21 Learn to Create a Choropleth Map in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization

Number of persons Countries employed

Sweden, Poland, Belgium, Switzerland, Austria, Czech Republic, Hungary, Romania, 0 to 1 min Portugal

1 min to 2 min Spain, Netherlands, Italy

2 min to 3 min United Kingdom, France

3 min to 4 min Germany

The proportional symbol map shows a map of Europe. A circle is placed on each country, with circle size varying by inherent data value.

Texts above the cartogram read:

Western European countries have more people employed in administration and support roles

Administrative and support service activities, 2017

Eurostat: Persons employed by NACE Rev. 2 [TIN00151]

Texts above the proportional symbol map read:

Administrative and support service activities, 2017

Eurostat: Persons employed by NACE Rev. 2 [TIN00151] proportional symbol map

Texts on the bottom left of the image read:

Source: Eurostat 2017

Figure 8. The Dorling Cartogram (Left) Compared With a Proportional

Page 16 of 21 Learn to Create a Choropleth Map in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization

Symbol Map (Right)

Illustrative Example: Employment in the Manufacturing Sector by Country 2017 Figure 9 shows a map of the percentage of people working in each country in the manufacturing sector, in relation to the surveyed industry, trade, and service sectors.

A choropleth map, titled “Manufacturing sector employees as percentage of all employees in industry, trade and service sectors” showing the following data:

Percentage Countries

12.09 to 15.12 Norway, United Kingdom, Netherlands, Greece

15.12 to 18.69 Iceland, Sweden, Ireland, Denmark, Belgium, Spain

18.69 to 23.65 Finland, Latvia, Lithuania, France, Switzerland, Austria, Portugal

Page 17 of 21 Learn to Create a Choropleth Map in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization

23.65 to 28.03 Estonia, Germany, Italy, Croatia, Bulgaria

28.03 to 35.57 Poland, Czech Republic, Slovakia, Romania, Serbia, Bosnia and Herzegovina, Serbia

Missing Montenegro, Albania, North Macedonia, Turkey

Text at the bottom of the map reads “Source: Natural Earth, Eurostat 2017.”

Figure 9. A Choropleth Map, a Type of Areal Unit Map

Areal unit maps are particularly suited for comparing areas such as those set out in this dataset, and since the dataset uses clearly defined country borders as an

Page 18 of 21 Learn to Create a Choropleth Map in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization areal unit, and employment numbers aggregated into a set of single statistics per area, a choropleth map, in particular, is a suitable choice. Furthermore, since the dataset does not include geographic data in smaller areas such as postal codes or counties, it is reasonably justified—and indeed the only viable option—to choose country borders as the basis of the choropleth visualization.

Ideally, the dataset would have included relative numbers instead of absolutes, and as the dataset scope does not include population or total employment data for the given countries, a relative number has been produced by comparing the absolute number of persons employed in a sector to the total numbers provided.

It is important to note that the dataset scope of course limits the conclusions we are able to draw from this visualization. Some particular sectors are intentionally excluded from the survey scope, such as education, healthcare, agriculture, and so on. Consequently, this visualization can not give a definite account of the population working in manufacturing across Europe, it can only show relative values within the dataset’s own scope constraints. This type of limitation should always be clearly denoted in the visualization title, if possible. Alternately, supplemental data about population or employment numbers could be included to enable further conclusions from this initial dataset.

The colors chosen are on a multi-hue scale from yellow to red, to minimize the detrimental effects of simultaneous contrast in the cell to cell visual comparison. The values were binned into quantiles, a good option for most datasets, which in this case, further highlighted the relative weight of manufacturing industry work in the East.

The Data The data used in this tutorial is a 2017 subset of Eurostat’s Persons employed by NACE Rev. 2 dataset, which contains the number of persons employed in

Page 19 of 21 Learn to Create a Choropleth Map in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization the industry, trade, and service sectors, defined as the total number of persons working in each sector by country.

The background map data is from the Natural Earth dataset, modified to include the country codes used by Eurostat and leaving out the French overseas territories.

Interpreting the Chart The data showed that while Western and Central Europe have the largest absolute number of manufacturing employees, Eastern European countries have a relatively larger portion of their workforce dedicated to these tasks, and the choropleth map reveals this to the reader very clearly and in a visually effective way, using administrative boundaries familiar to the reader. The color coding based on data values helps the reader identify how the different countries fall into percentage groups.

The choropleth map is generally a good choice for showing data that includes predefined regional boundaries—such as countries or states—though needs relative data to be truly effective.

Review This Student Guide has introduced you to the concept of areal unit maps, which are composed of a particular geographic region divided into smaller subregions, usually with a color or pattern assigned to each subpart based on a particular value. We have particularly explored choropleth maps, which are a subtype of areal unit maps using predefined regional boundaries as its means of areal delineation. Choropleth maps are useful for visualizing datasets with clear regional datasets and encode both qualitative and quantitative (although ideally relative) data.

Page 20 of 21 Learn to Create a Choropleth Map in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization You should know:

• What are an areal unit map and choropleth map? • What kind of data can a choropleth map encode? • When is a choropleth map an appropriate visualization choice? • What are the best practices for using color and categorization? • What are the main weaknesses and limitations of this visualization method?

Your Turn You may now proceed to download the sample dataset and walkthrough guide on how to carry out the visualization in the R statistical software. The sample dataset includes many more employment sectors than the ones pictured above. You may, for example, experiment with visualizing different sector employment rates, or how different classification breakpoints affect the final visualization result.

Page 21 of 21 Learn to Create a Choropleth Map in R With Data From Eurostat (2017)