Learn to Create an Area in R With Data From Eurostat (2017)

© 2021 SAGE Publications, Ltd. All Rights Reserved. This PDF has been generated from SAGE Research Methods Datasets. SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization Learn to Create an Area Cartogram in R With Data From Eurostat (2017)

Student Guide

Introduction This guide introduces the area cartogram, a type of that belongs to the class of cartogram visualizations. It allows the reader to reshape an existing layout to bring out underlying data values usually hidden in other types of . These map-like are most useful for underlining differences in how the reader perceives any given geographic area when compared with the distribution of a particular variable of interest over the same region. The guide describes what the area cartogram is, the design and data reasons for using it, as well as its weaknesses and variations.

The visualization uses Eurostat data from 2017 about persons employed in manufacturing work by country in the European Union (EU). For each country, the land area is transformed in scale, proportional to the number of persons employed in the chosen sector in that country. The countries remain located in the same positions relative to their neighbors, and the relative shape of the continent is still distinguishable, but the individual countries change their usual appearance and move from their original location as needed to maintain the original borders.

What Is an Area Cartogram? , also known as anamorphic maps or value-by-area maps, are diagrams that in some part visually resemble the areas they depict, and include

Page 2 of 14 Learn to Create an Area Cartogram in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization some amount of geographic information, but are not geographically accurate because they intentionally reshape the land areas to encode non-spatial information. There are two subtypes of a cartogram, namely, contiguous and noncontiguous. Noncontiguous cartograms use geometric shapes or geographical regions scaled proportionally so that their surface areas represent some data value, such as population, while their approximate locations are retained. Contiguous cartograms reshape and distort traditional maps by preserving the relative positions and common borders of geographical subdivisions while growing or shrinking their areas based on a chosen data variable (Figure 1).

In the map, both cartograms show the same data. The contiguous cartogram shades the entire area that lies within a country’s border. The noncontiguous cartogram shades a small area in the shape of the country within a country’s border. Data shown by both cartograms are tabulated as follows:

Number of Countries employees

Iceland, Norway, Finland, Estonia, Latvia, Lithuania, Denmark, Ireland, Slovenia, Croatia, Bosnia and 0 to 356,817 Herzegovina, Montenegro, Albania, Greece, North Macedonia, Serbia

356,817 to Sweden, Netherlands, Belgium, Portugal, Switzerland, Austria, Slovakia, Hungary, Bulgaria 757,819

757,819 to Spain, Czech Republic, Romania 1,917,714

1,917,714 to United Kingdom, France, Italy, Poland 3,744,271

3,744,271 to Germany 7,409,552

Text at the bottom of the image reads:

“Source: Eurostat 2017”

Page 3 of 14 Learn to Create an Area Cartogram in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization

Figure 1. Two Different Types of Area Cartogram of Employment in Manufacturing Sector Work

The most common type of cartogram is the area cartogram. Similar to a , an existing geographic area is used as a starting point, though in the area cartogram, the surface areas do not represent the true shape of these geographic entities, but the values of other variables inherent in the data. Area cartograms are usually of the contiguous type, which is to say that while different portions of the map may change shape and size even drastically, the shared borders remain attached, retaining a contiguous shape in whole. The resulting shapes are often distorted beyond recognition. Area cartograms can also be rendered as a noncontiguous cartogram, in which the areas retain their original shapes and approximate locations on the map, but are no longer attached at the border.

Why Use an Area Cartogram Area cartograms are not in themselves necessarily very informative about the finer

Page 4 of 14 Learn to Create an Area Cartogram in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization points of the underlying data, but they can be useful in underscoring dissonance in how people usually perceive a certain geographic region, compared with the actual distribution of a chosen data variable. Comparing a traditional map of any given country with an area cartogram of its population distribution, for instance, often shows the states or municipalities of that country in a completely different light. Ideally, the reader will more easily understand the relative significance of a given region in respect to a certain variable, also removing a certain amount of visual bias from the actual size and surface area of that region. Area cartograms are most often used in this way to map population statistics, but any quantitative data could technically be used. It is best to use area cartograms to represent areas that the reader will be familiar with, as visualizing shape areas that are unfamiliar to the reader will not have a basis of comparison as to how they have been distorted.

Considerations and Cautions Since an area cartogram uses the actual areas of the mapped regions and changes their sizes relative to data variables—and the original geographical regions are often highly irregular in shape—the resulting areas can sometimes be even completely unrecognizable, and differences between the resulting shapes tend to be more difficult to compare than, for example, in Dorling cartograms (see Variations and Alternatives). Additionally, the result cannot accurately present their relative differences in the data, since the existing shapes and areas are used as a baseline for visualization. The area cartogram is best used to only give a rough glimpse of how data variables actually differ across predetermined geographic regions.

Color Color can be used in cartograms to encode both qualitative and quantitative data, and the same principles of color apply as with areal unit maps.

Page 5 of 14 Learn to Create an Area Cartogram in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization On a qualitative color scale, colors usually represent objects that belong in different groups or categories. The goal with a qualitative color scale is usually to create a color palette in which different colors are easily distinguishable, that is to say, relying heavily on major differences in hue.

Quantitative color scales, on the other hand, which can be either classified or unclassified, often make use of variation in the lightness of color to show variation in value. Generally, as the value of a variable increases, so does the contrast between the color and its background. When using color to encode quantitative data, classifying the values depicted in the map into a predefined set of colors is usually easier for the reader to differentiate and therefore generally preferable to an unclassified continuous color scale where classes are encoded as ordinal values. As a general rule of thumb, the human eye can reliably only distinguish 6–7 degrees of lightness in any given hue due to a phenomenon called simultaneous contrast. Consequently, the number of classes visualized in a map should also ideally be limited to seven or fewer, depending on your choice of the color palette.

A sequential single-hue color scale, where value differences are marked only by differences in hue lightness (e.g., white to red), can especially hinder the differentiation of adjoining areas. A better choice in these cases is a sequential multi-hue scale, where changes on the value scale are accompanied by changes in hue (e.g., yellow to red). Another option is the diverging color scale, often used to depict value scales containing both negative and positive values, where the color scale ends in disparate hues adjoined by a neutral third hue in the middle (e.g., blue, white, red). These latter two color palettes also enable the use of a few more classes than the recommended maximum, if necessary, as the effects of simultaneous contrast are diminished with increased variation in hue (Figure 2).

The color scales shown in the image are listed as follows:

Page 6 of 14 Learn to Create an Area Cartogram in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization • Qualitative scale: This scale has boxes of markedly different color. • Quantitative scale • Single-hue scale: This scale has boxes of same color but of increasing brightness. • Multi-hue scale: This scale has boxes of varying hue. • Diverging scale: This scale has boxes of different colors at each end, with a series of neutral-colored boxes between them.

Figure 2. Color Palette Examples: Single-hue, Multi-hue, and Diverging

Establishing good color contrast is overall a good practice, keeping in mind readers with differences in color vision. Whenever possible, it is recommended to check chosen color palettes through some form of simulated preview to see what the result looks like for readers with deuteranopia, pronatopia, or other differences in color vision (e.g., within Adobe image and vector editing software with different Proof Setups, or with online resources such as the Coblis simulator).

Variations and Alternatives Choropleth maps are in a sense the basis that an area cartogram is built on; an area cartogram without its characteristic distortion. Choropleth maps are composed of a particular predefined geographic region divided into smaller subregions, each of which is assigned a color or pattern based on the value of a specific variable. The quantitative data used in these maps are most often classified into brackets (also known as bins) for visualizing on an ordinal scale,

Page 7 of 14 Learn to Create an Area Cartogram in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization especially when color density will be the only means of depicting value. These maps are especially suited for encoding numerical data in relative form, which enables more accurate comparison across disparate regions; however, as these predefined administrative areas can be very dissimilar when comparing the land area to other inherent variables, there is ample room for visual misinterpretation where larger areas have many times the visual weight of a smaller region with a more significant variable value. Area cartograms, in some part, sidestep this issue of visual misrepresentation by changing area shapes to match a particular value (Figure 3).

The Dorling cartogram does not show the map of Europe but only shows geographic location of different countries, represented by circles of various sizes. Data shown by the cartogram are tabulated as follows:

Number of Persons Countries Employed

Sweden, Poland, Belgium, Switzerland, Austria, Czech Republic, Hungary, Romania, 0 to 1 min Portugal

1 min to 2 min Spain, Netherlands, Italy

2 min to 3 min United Kingdom, France

3 min to 4 min Germany

The contiguous cartogram shows a map of Europe. Country sizes and shapes are distorted based on data value. Text above the cartogram reads:

“Western European countries have more people employed in administration and support roles

Administrative and support service activities, 2017

Eurostat: Persons employed by NACE Rev. 2 [TIN00151]”

Page 8 of 14 Learn to Create an Area Cartogram in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization Text above the reads:

“Administrative and support service activities, 2017

Eurostat: Persons employed by NACE Rev. 2 [TIN00151]”

Text on the bottom left of the image reads:

“Source: Eurostat2017”

Figure 3. The Dorling Cartogram (Left) Compared With the Contiguous Cartogram Using the Same Data (Right)

Dorling cartograms are a variation of the noncontiguous cartogram type, in which numerical data using either a ratio or interval scale is displayed in the form of a proportionately sized circle, overlaid on the approximate location of the original geographical feature. Variations with different geometric shapes exist, such as the Demers cartogram, which uses squares instead of circles. In a

Page 9 of 14 Learn to Create an Area Cartogram in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization Dorling cartogram, the chosen shapes should never overlap—instead, close- lying symbols push others to the side. Dorling cartograms are generally among the most useful types of cartograms. It can be a good alternative to an area cartogram or choropleth map as it is not affected by the geographical extents of the mapped regions and instead allows numbers such as population counts to be visualized directly using absolute values. Dorling cartograms also make a more compact visualization possible, allowing closely situated regions with large values to be clearly distinguished. Unlike a choropleth map, the Dorling cartogram is good at clearly showing small regions with large data values—such as tiny but densely populated areas. Especially, if the reader is sufficiently familiar with the region depicted or the exact geographical locations are not relevant, the Dorling cartogram can give a good overview of the distribution of a variable. The Dorling cartogram—like all visualizations that use the area as a visual variable—is not well suited for comparing small differences between data points. It is still often much more readable than other cartogram types.

Another alternative visualization type is the grid cartogram, which is a type of contiguous cartogram formed from grids of equal size. The grid cartogram consists of geometric shapes—usually hexagons, though triangles and squares are sometimes also used—which vary in number according to data variables. This can be a useful cartogram type for visualizing polling data from various districts, for example, though works best in cases where the datapoint to grid cell ratio is 1:1 (e.g., one elected official per polling district, and so on).

Illustrative Example: Area Cartograms of Persons Employed in Manufacturing Work in Europe in 2017 In Figure 4, the map on the left is a contiguous area cartogram showing the number of persons employed in manufacturing work in Europe, with country areas scaled corresponding to employment numbers. The same dataset is presented

Page 10 of 14 Learn to Create an Area Cartogram in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization on the right in a noncontiguous area cartogram format. In both cases, countries for which data are missing are binned with the zero-value areas, due to the functionality of the R cartogram package.

In the map, both cartograms show the same data. The contiguous cartogram shades the entire area that lies within a country’s border. The noncontiguous cartogram shades a small area inthe shape of the country within a country’s border. Data shown by both cartograms are tabulated as follows:

Number of Countries Employees

Iceland, Norway, Finland, Estonia, Latvia, Lithuania, Denmark, Ireland, Slovenia, Croatia, Bosnia and 0 to 356,817 Herzegovina, Montenegro, Albania, Greece, North Macedonia, Serbia

356,817 to Sweden, Netherlands, Belgium, Portugal, Switzerland, Austria, Slovakia, Hungary, Bulgaria 757,819

757,819 to Spain, Czech Republic, Romania 1,917,714

1,917,714 to United Kingdom, France, Italy, Poland 3,744,271

3,744,271 to Germany 7,409,552

Text at the bottom of the image reads:

“Source: Eurostat 2017”

Figure 4. Two Different Types of Area Cartogram of Employment in Manufacturing Sector Work

Page 11 of 14 Learn to Create an Area Cartogram in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization

We set out to show the number of manufacturing employees in European countries, and since these figures are unequally distributed and the countries vary significantly in size, an area cartogram is suitable. The particular aim of the map is to provide only a quick visual overview of the situation, nothing particularly accurate. The EU geographical area is arguably familiar and recognizable enough to most readers and that its distortion will not cause undue confusion. However, as many of the individual country shapes may be unfamiliar to the reader when removed from their continental context, it is advisable to use the contiguous version of the area cartogram, or a noncontiguous version still showing original country borders. Color in this case encodes the same values as area, and a blue single-hue scale was chosen with five bins to highlight the countries with the most significant value while minimizing the detrimental effects of simultaneous contrast in visual comparison. The values were binned by natural breaks (jenks) before assigning color.

The Data

Page 12 of 14 Learn to Create an Area Cartogram in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization The statistical data are from Eurostat’s Persons employed by NACE Rev. 2 dataset, which contains the number of persons employed defined as the total number of persons working in various industry, trade, and services industries by EU countries.

The background map data is from the Natural Earth dataset, modified to include the country codes used by Eurostat and leaving out the French overseas territories.

Interpreting the Chart The data showed that the greatest number of manufacturing employees are located centrally within Europe, and the area map reveals this to the reader very clearly by further enlarging the areas of the countries in question. The color coding is based on the same data values as the area and helps the reader further identify how the different countries fall into groups.

The area map is suitable mostly for contrasting actual data values with the reader’s perception of a given geographical area, that is, showing data that differs significantly from the surface area that might usually be used for visualization in a choropleth map—such as population data. The ability to distinguish exact data values from the map is not usually paramount in these cases and is best used only for challenging the reader’s pre-existing concepts of geographical areas.

Review This dataset example has demonstrated a type of the area cartogram, how it can be used, and how it compares to other visualization types for similar data. One data series from Eurostat’s Persons employed by NACE Rev. 2 dataset was visualized.

You should know:

Page 13 of 14 Learn to Create an Area Cartogram in R With Data From Eurostat (2017) SAGE SAGE Research Methods: Data 2021 SAGE Publications, Ltd. All Rights Reserved. Visualization • What are a cartogram and an area cartogram in particular? • What kind of data can an area cartogram encode? • When is an area cartogram an appropriate visualization choice? • What are the best practices for composing an area cartogram? • What are the main weaknesses and limitations of this visualization method?

Your Turn You may now proceed to download the sample dataset and walkthrough guide on how to carry out the visualization in the R statistical software. The sample dataset includes many more employment sectors than the ones pictured above. You may, for example, experiment with visualizing different sector employment numbers, or how different cartogram parameters affect the final visualization result.

Page 14 of 14 Learn to Create an Area Cartogram in R With Data From Eurostat (2017)