Learn to Create a Choropleth Map in R with Data from Eurostat (2017)
Total Page:16
File Type:pdf, Size:1020Kb
Learn to Create a Choropleth Map in R With Data From Eurostat (2017) © 2021 SAGE Publications, Ltd. All Rights Reserved. This PDF has been generated from SAGE Research Methods Datasets. SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization Learn to Create a Choropleth Map in R With Data From Eurostat (2017) Student Guide Introduction This tutorial explores creating a choropleth map by visualizing a Eurostat dataset about employment in different sectors across Europe. Choropleth maps, a subtype of the areal unit map, are used to visualize predefined contiguous geographic areas in map form and enable the reader to easily scrutinize the geographic distribution of a value over a given region. The choropleth map uses color or pattern to encode values across geographic regions. The visualization in this tutorial uses Eurostat data about employment in industry, trade, and service sectors across Europe. Each country is colored based on their proportional number of employees in a given sector (Figure 1). A choropleth map showing European countries color-coded by data value. Colors range from yellow to red in a multi-hue scale. Text at the bottom of the map reads “Source: Natural Earth, Eurostat 2017.” The map also shows Montenegro, Albania, North Macedonia, and Turkey grayed out. Figure 1. Different Areal Units: Countries and States Page 2 of 21 Learn to Create a Choropleth Map in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization What Is a Choropleth Map? Choropleth maps are perhaps the best known and most widely used type of areal unit map. Areal unit maps, in general, are composed of a particular geographic region divided into smaller subregions, each of which is usually assigned a color or pattern based on the value of a specific variable in that region. Choropleth maps are a subtype of areal unit maps, where the method of differentiating regions is based largely on predefined geographic regions, such as postal code areas, counties, states, or countries. Other types of areal unit maps are discussed in the Variations and Alternatives section below. The quantitative data used in these maps are most often classified into brackets Page 3 of 21 Learn to Create a Choropleth Map in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization (also known as bins) for visualizing on an ordinal scale, especially when color density will be the only means of depicting value. Classifying, or binning, the individual values depicted in the map can be done in a variety of ways, for instance, classifying all data points into arbitrary round number categories or into categories delineated by breaks at equal intervals, quantiles, natural breaks (Jenks) in the data, or standard deviation breaks around the mean or median value. Categorizing into quantiles, that is, dividing the dataset into classes with an equal number of items, is a good option for most datasets—though this will always depend on the particular topic and data at hand. Below in Figure 2 is an example of how different binning can affect the final result. Both choropleth maps show European countries but the one on the left uses a fixed interval method and the one on the right uses a quantile binning method. Data shown by the maps are tabulated as follows: Fixed interval percentages Countries 10 to 15 Norway, U.K., Netherlands, Greece 15 to 20 Iceland, Ireland, Sweden, Latvia, Denmark, Belgium, France, Spain 20 to 25 Finland, Lithuania, Germany, Switzerland, Austria, Portugal 25 to 30 Estonia, Poland, Hungary, Croatia, Italy, Bulgaria 30 to 35 Slovakia, Romania, Serbia, Bosnia and Herzegovina, Slovenia 35 to 40 Czech Republic Missing Montenegro, Albania, North Macedonia, Turkey Quantile percentages Countries 12.09 to 15 Norway, United Kingdom, Netherlands, Greece 15.12 to 18 Iceland, Ireland, Sweden, Denmark, Belgium, Spain Page 4 of 21 Learn to Create a Choropleth Map in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization 18.69 to 23 Finland, Latvia, Lithuania, France, Portugal, Switzerland, Austria 23.65 to 28 Estonia, Germany, Italy, Croatia, Bulgaria 28.03 to 35 Poland, Czech Republic, Slovakia, Hungary, Romania, Serbia, Bosnia and Herzegovina, Slovenia Missing Montenegro, Albania, North Macedonia, Turkey Text at the bottom of the map reads “Source: Natural Earth, Eurostat 2017. Figure 2. Two Choropleth Maps, Using Fixed Interval and Quantile Binning Methods Page 5 of 21 Learn to Create a Choropleth Map in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization Why Use a Choropleth Map? Choropleth maps are well suited for comparing differences between clearly defined geographical areas. They are often a good map choice when the dataset itself contains some clear geographic area designations. These maps are especially suited for encoding numerical data in relative form, which enables more accurate comparison across disparate regions (see Considerations and Cautions below). Choropleth maps are not just for visualizing numerical values; however, they can also encode nominal data such as political party preferences by state, country membership in the EU, adherence to international treaties, and much more in the same vein. For example, the very simple figure below (Figure Page 6 of 21 Learn to Create a Choropleth Map in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization 3) shows the particular countries which are included in the Eurostat Persons employed by NACE Rev. 2 dataset. The choropleth map showing countries addressed by the dataset at hand, with countries included colored in yellow.Areas not included in the dataset but pictured on the map include Russia, Ukraine, Belarus, Moldova, and Kosovo, which are grayed out. Text at the bottom of the map reads “Source: Natural Earth, Eurostat 2017.” Figure 3. Simple Choropleth Map, Showing Which Countries are Included in the Dataset Page 7 of 21 Learn to Create a Choropleth Map in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization Considerations and Cautions Areal unit choices: Choropleth maps are entirely dependent on areal unit boundaries, and therefore, careful consideration is needed when choosing the appropriate areal units for visualizing any particular topic. Previously defined administrative areas, for instance, can be very dissimilar when comparing the land area to population density, leaving room for visual misinterpretation where larger unpopulated areas have many times the visual weight of densely populated Page 8 of 21 Learn to Create a Choropleth Map in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization urban areas. There is also the larger issue in statistical analysis known as the modifiable areal unit problem (MAUP), in which bias can be unwittingly introduced by choice of areal unit. Fortunately, it is not always necessary to individually determine a suitable areal unit for visualization, as the data itself will often have some predefined area designation. In these cases, the data’s own designation should be used almost without exception. For example, polling districts used in an election and any visualizations based on this polling data should use the same areal units to avoid misrepresenting the data. Figure 4 below shows two examples of differnet areal unit choices. Two maps side-by-side, on the left titled “Areal unit: countries” and spilt into similarly colored areas by country borders. The one on the right is titled “Areal unit: states and provinces” and shows the corresponding areal units of the same region. Text at the bottom of the map reads “Source: Natural Earth.” Figure 4. Different Areal Units: Countries and States Color choices: As in other areal unit maps, color can be used in choropleth maps Page 9 of 21 Learn to Create a Choropleth Map in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization to encode both qualitative and quantitative data, and the same principles of color apply. On a qualitative color scale, colors usually represent objects that belong in different groups or categories. The goal with a qualitative color scale is usually to create a color palette in which different colors are easily distinguishable, that is to say, relying heavily on major differences in hue. Some care should especially be used in choosing colors for charts that will feature neighboring areas of color, as small areas can easily become lost between other shapes without a significantly differentiating color, and even larger areas can blend into each other visually if their colors are too similar. Quantitative color scales on the other hand, which can be either classified or unclassified, often make use of variation in the lightness of color to show variation in value. Generally, as the value of a variable increases, so does the contrast between the color and its background. Classifying the values depicted in the map into a predefined set of colors is usually easier for the reader to differentiate, and therefore generally preferable to an unclassified continuous color scale where classes are encoded as ordinal values. As a general rule of thumb, the human eye can reliably only distinguish 6–7 degrees of lightness in any given hue due to a phenomenon called simultaneous contrast. Consequently, the number of classes visualized using a choropleth map should also ideally be limited to 7 or fewer, depending on your choice of the color palette. A sequential single-hue color scale, where value differences are marked only by differences in hue lightness (e.g., white to red), can especially hinder the differentiation of adjoining areas in choropleth maps.