<<

Rectangular Statistical in R: The recmap Package

Christian Panse Swiss Federal Institute of Technology Zurich

Abstract drawing is a technique for showing -related statistical informa- tion, such as demographic and epidemiological data. The idea is to distort a by resizing its regions according to a statistical parameter by keeping the map recognizable. This article describes an R package implementing an algorithm called RecMap which approximates every map region by a rectangle where the area corresponds to the given statistical value (maintain zero cartographic error). The package implements the compu- tationally intensive tasks in C++. This paper’s contribution is that it demonstrates on real and synthetic how recmap can be used, how it is implemented and used with other statistical packages.

Keywords: cartogram, spatial , , demographics, R.

1. Introduction

The idea to generate a cartogram is to distort a map by resizing its regions according to a given statistical parameter, but in a way that keeps the map recognizable. These so-called cartograms or value-by-area maps may be used to visualize any geo-spatial related data, e.g., political, economical, or data. There exists several algorithms to compute so- called contiguous cartograms. An overview on historic, hand-drawn, and computer generated cartograms can be found in Tobler(2004); Nusrat and Kobourov(2016). For using contiguous cartograms, the diffusion-based method of Gastner and Newman(2004) is available through the R packages Rcartogram and getcartr Lang(2016); Brunsdon and Charlton(2014). An alternative approach to contiguous cartograms is to entirely relax the map topology by approximating each map region by basic geometric objects like rectangles or circles (Dorling arXiv:1606.00464v2 [stat.CO] 24 Mar 2017 1996). Such rectangular cartograms can be generated from geolocation and statistical data. Hence, they provide a useful alternative, even if there are no boundaries available or some statistical values are missing. First rectangular cartograms were drawn by hand following a system of (Raisz 1934). Recent research publication on rectangular cartogram drawing include van Kreveld and Speckmann(2004, 2007); Buchin, Speckmann, and Verdon- schot(2012); Buchin, Eppstein, L ¨offler, N¨ollenburg, and Silveira(2016). However, according to a recent publication, both variants of recmap are the only rectangular cartogram algorithms that “maintain zero cartographic error” (Nusrat and Kobourov 2016, section 5.4). The recmap package discussed in this article contains an implementation of the RecMap 2 recmap: Compute the Rectangular Statistical Cartogram

(Map Partition variant 2) algorithm Heilmann, Keim, Panse, and Sips(2004) to draw maps according to given statistical parameter. A typical usage of a cartogram based visualization is demonstrated in Figure1.

Maine

New Hampshire Vermont North Dakota Michigan Washington Minnesota Wisconsin New New Massachusetts Montana South Michigan Dakota Jersey Idaho Oregon Wyoming Nebraska Iowa York Rhode Connecticut Indiana Island Nevada Utah Colorado Kansas Illinois Pennsylvania Ohio Delaware Oklahoma Arkansas Kentucky California West Arizona Virginia Maryland D.C. New Mexico Louisiana Missouri Texas Virginia Tennessee North Carolina

Mississippi Georgia South Alabama Carolina

Florida

Figure 1: A rectangular statistical cartogram of the U.S. election in 2004 is drawn. The area corresponds to the number of electors. Color is indicating the outcome of the vote. Democrats are represented by the color blue and republicans are represented through red coloring. Regions with low saturation, e.g., Ohio, Pennsylvania, and Florida, mark states with a tight outcome of the vote (also known as swing states). The election cartogram was computed by using the original implementation of the construction heuristic RecMap MP2 introduced by Heilmann et al. (2004). map source: U.S. Census Bureau; election data source: http://www.electoral-vote.com/, November 2004.

The article is organized as follows: The next section defines the input and output of the recmap algorithm and the objective functions. In Section3, the usage of the recmap package using the R shell is demonstrated. Section4 discusses major implementation details and provides some benchmark and performance studies. Section5 describes how two metaheuristics can be used to find an optimized cartogram drawing. In Section6, a number of applications are presented. Section7 summarizes and discuss the approach. Christian Panse 3

2. Problem definition and objective functions

The input consists of a map represented by overlapping rectangles R = (r1, . . . , rn). Each map region rj contains:

• a tuple of (x, y) values corresponding to the (longitude, latitude) position,

• a tuple of (dx, dy) of expansion along (longitude, latitude),

• and a statistical value z.

The (x, y) coordinates represent the center of the minimal bounding boxes (MBB). The co- ordinates of the MBB are derived by adding or subtracting the (dx, dy) tuple accordingly. A tuple (dx, dy) defines also the ratio of the corresponding map region. The statistical values (z1, . . . , zn) define the desired area of each map region. The ordering Π is an index vector taken from the permutation set P erm(n).

The output is a rectangular cartogram R where the map regions are:

• non overlapping,

• connected,

• rectangles are placed parallel to the axes.

Furthermore, for each map region the following criteria have to be satisfied:

• the area is equal to the desired area derived from the as input given statistical value z,

• the ratio, dy/dx, is preserved.

The recmap construction heuristic is a function

n×2 n×3 n×2 n×2 f : R × R>0 × P erm(n) → R × R>0 (1) which transforms the set of input rectangles R and a permutation Π into a rectangular cartogram

R = f(R, Π) (2) so that important spatial constraints, in particular

• the topology of the dual graph G(R,E), defined by the overlapping input rectangles,

• the relative position of map region centers, 4 recmap: Compute the Rectangular Statistical Cartogram are tried to be preserved. If the output satifies these criteria, the rectangular cartogram is denoted as feasible solution. The following equation were introduced by Keim, North, and Panse(2004, Definition 1). The desired area A˜j of a map region rj is defined as

Pn ˜ i=1 A(ri) Aj = zj · Pn (3) i=1 zi where the area of the rectangle r is defined by A(r) = 4 · dx · dy (4)

The objective functions for area dA, shape dS (ratios of the MBBs), relative positon dR, and map topology dT , are as defined and described by Heilmann et al. (2004, equations 2–4):

dA = dA(R, R) (5) n X = |Aj − A˜j| (6) j=1

dS = dS(R, R) (7) n X = |(dyj/dxj) − (dyj/dxj)| (8) j=1

dT = dT (R, R) (9) |E \E | + |E \E | = a a a a , (10) |Ea ∪ Ea|

dR = dR(R, R) (11) n−1 n 2 X X = · | (r , r ) − (r ˜ , r˜ )| (12) n · (n − 1) ] i j ] i j i=1 j=i+1 |{z} |{z} ∈R ∈R

Note, if ∀j ∈ {1, . . . , n} : Aj = A˜j ⇒ dA = 0 and if ∀j ∈ {1, . . . , n} : dyj/dxj = dyj/dxj ⇒ dS = 0. n = |R| and E denotes the edges of the dual graph. dT determines the differences in the pseudo dual graphs. dR measures the angle differences between all pairs of rectangle 1 centers of the input map and output cartogram by using ](x, y) defined as follows :

](x, y) = arctan2(x, y) (13) To find an optimal rectangular statistical cartogram out of all feasible solutions, as it will be intended in this manuscript using recmap, the optimization problem can be expressed as follows:

minimize a · dR + b · dT with a, b ∈ R≥0 (14) Π∈P erm(n)

subject to dA = 0, dS = 0. (15) 1Implemented using the C++ method std::atan2 – “Computes the arc tangent of y/x using the signs of arguments to determine the correct quadrant.” http://en.cppreference.com/w/cpp/numeric/math/atan2, access: 2017-01-01. Christian Panse 5

Term Description Equation R = (r1, . . . , rn) overlapping input rectangles (x, y) coordinates represent the center of a rectangle (dx, dy) expansion along x and y axes z statistical value of a rectangle G(R,E) dual graph of R Π permutation A(r) area of a rectangle A˜j desired area of a map region j 3 dA area error5 dS shape error8 dT topology error 10 dR relative position error 12 2 ](x, y) angle between two points x and y in R 13 f construction heuristic1 R output / cartogram2

Table 1: The table provides a glossary of the used algebraic terms.

To find a sufficiently good solution for the optimization problem a metaheuristic, as described in section5, will be applied.

3. The package usage

3.1. Input The U.S. map on state level is often used to compare cartogram algorithms. To generate a useful dual graph, which is a requirement of the algorithm, the input map regions have to overlap. For the state.x77 data used in this section this can be done by correcting lines of longitude derived by the square roots of the area values (see Figure2 left).

R> US.map <- data.frame(x=state.center$x, + y = state.center$y, + dx = sqrt(state.area) / 2 / (0.7 * 60 * cos(state.center$y * pi / 180)), + dy = sqrt(state.area) / 2 / (0.7 * 60) , + z = sqrt(state.area), + name = state.name) R> head(US.map)

x y dx dy z name 1 -86.7509 32.5901 3.209890 2.704478 227.1761 Alabama 2 -127.2500 49.2500 14.005670 9.142338 767.9564 Alaska 3 -111.6250 34.2192 4.859044 4.017906 337.5041 Arizona 4 -92.2992 34.7336 3.338204 2.743370 230.4431 Arkansas 5 -119.7730 36.5341 5.902177 4.742415 398.3629 California 6 -105.5130 38.6777 4.923602 3.843727 322.8730 Colorado 6 recmap: Compute the Rectangular Statistical Cartogram

Please note, in general recmap is not transforming the geodetic datum, e.g., WGS84 or Swiss- grid. Furthermore, geospatial positions have to be mapped from the surface to the plane. This has to be done prior the the cartogram generation. An overview of can be found in Snyder(1997). Map projections aim to optimize towards different objectives, e.g., conformal mapping (preservation of local angles) or area mapping (preservation of area). In cartogram publications the map projection aspect is often neglected and conformal errors are accepted. However, studying U.S. cartograms, a cylindrical projection seems to be the pro- jection of choice. The R user can find support for a wide variety of adequate map projections due to using the mapproj package by McIlroy, Brownrigg, Minka, and Bivand(2015).

3.2. Run The algorithm takes a data.frame object having the column names c(’x’, ’y’, ’dx’, ’dy’, ’z’, ’name’), here the US.map object, as input. Additional control is given by the index order and the (dx, dy) values. The index order Π defines in which order the dual graph is explored and the (dx, dy) values define which map region are adjacent.

R> library("recmap") R> # Generate the rectangular statistical cartogram. R> US.cartogram <- recmap(US.map)

3.3. The output The recmap method returns an S3 class object c(’recmap’, ’data.frame’) of the trans- formed map. Additional columns in the result contain for topology and relative position error defined in Equations 10 and 12. The column dfn.num indicates in which order the dual graph has been explored.

R> head(US.cartogram)

x y dx dy z name dfs.num 1 -79.27226 10.40598 3.916894 3.300161 227.1761 Alabama 23 2 -129.22283 63.71115 8.181797 5.340748 767.9564 Alaska 49 3 -126.86923 30.51915 4.819169 3.984933 337.5041 Arizona 34 4 -87.19357 10.42302 3.994415 3.282650 230.4431 Arkansas 39 5 -137.00972 33.40013 5.311325 4.267664 398.3629 California 35 6 -115.35010 62.23315 4.851078 3.787109 322.8730 Colorado 48 topology.error relpos.error relposnh.error 1 6 0.3917837 0.7848231 2 6 0.5901459 1.9059785 3 3 0.2447048 0.3498989 4 8 0.5624989 1.0728192 5 4 0.1861821 0.2596710 6 10 0.7576109 1.4491548

Input and output of the recmap function can be visualized using the S3 method plot.recmap. The output of the R code snippet can be seen in Figure2. As default the plot.recmap Christian Panse 7 function places the name attribute in the centre of each rectangle. To avoid overplotting of the labels, the text is scaled by using cex = dx / strwidth(name) as argument for the text function. However small statistical values result in small label areas. This problem can be circumvented by using an interactive visualization, e.g., using shiny the function hoverOpts enables the “mouseover” feature. Please note: while the input US.map is not classified as a recmap class, the plot.recmap function can not be dispatched by the S3 class system and has to be explicitly called.

R> op <- par(mfrow = c(1, 2), mar = c(0, 0, 0, 0)) R> plot.recmap(US.map, col.text = 'darkred') R> plot(US.cartogram, col.text = 'darkred')

A summary method implements the calculation of some meta data including the objective functions.

R> summary(US.cartogram)

values number of map regions 50.000000 area error 0.000000 topology error 266.000000 relative position error 0.420000 screen filling [in %] 36.893585 xmin -146.967409 xmax -34.722706 ymin 7.105818 ymax 73.190904

North Dakota

South Dakota Alaska Colorado Iowa

Nebraska Nevada Missouri Illinois Indiana Ohio

AlaskaWashington North Dakota Washington Montana Minnesota Montana Minnesota Kentucky Maine Pennsylvania

South Dakota Wisconsin Vermont North Carolina Oregon New Hampshire Idaho Michigan New York Maine Wyoming Vermont Massachusetts Oregon Wisconsin Iowa Connecticut Nebraska Pennsylvania Wyoming Michigan New York Idaho New Hampshire Maryland IllinoisIndiana Ohio New Jersey Utah Rhode Island Maryland

Delaware Nevada Utah Colorado West Virginia Kansas Kansas Missouri Massachusetts Kentucky Virginia New Jersey California West Virginia California Oklahoma Tennessee Virginia Oklahoma North Carolina Arizona New Mexico Delaware Connecticut Arkansas New Mexico Hawaii Tennessee Arizona South Carolina Mississippi Alabama Georgia Hawaii Texas South Carolina Texas Louisiana Louisiana Georgia

Florida

Mississippi Florida

Arkansas Alabama

Figure 2: The “usage” example, generated from the “US State Facts and Figures” datasets package, was used for drawing a rectangular map approximation. The input set of overlapping rectangles is shown left. A feasible solution thereof generated with recmap is on the right side. The state area, original size of the map region, is used as statistical value. 8 recmap: Compute the Rectangular Statistical Cartogram

4. Implementation recmap is implemented in C++ using features provided by the C++-11 standard. The input and output data transfer between R and the C++ recmap class is handled by using the Rcpp (Eddelbuettel 2013) mechanism. The C++ recmap class itself consists of a std::vector of map_region.A map_region contains all the (x, y, dx, dy, z) values, a std::vector of type int to it neighbor map regions, and some additional help variables to ease the error computation. In general the construction algorithm follows the map partition 2 (MP2) procedure described in Heilmann et al. (2004). The local placement function can place any rectangle next to another rectangle as it is demonstrated in Figure3. The current implementation starts with the original bearing α of the two map region centers (see Figure3 where β = 0). If the placement does not lead to a feasible solution, the angle β is added to α. β is iterating π between [0, π] with step size of 180 and a changing sign until a placement without overlap has been found. If no placement can be found, the algorithm considers all adjacent placed map regions. If also in a later step during the DFS a map region can not be placed, a non feasible solution is accepted. This situation is often caused because the construction algorithm is hampered by the input configuration of the map regions. Solving this can be very compute expensive and often the procedure leads to a solution which will be rejected by the metaheuristic due to the bad fitness value.

6 2 β=0 β=−0.14 β=0.27 fpl : R × [− π, π] → R

B B

A A A B

β=−0.41 β=0.55 β=−0.69

A B B

A A A

y return value B

α argument −3.14 −2.36 β=0.82 β=−0.96 β=1.1 −1.57 −0.79 0 0.79 1.57 2.36 B 3.14 A A A

B B

x return value

Figure 3: The rainbow colored rectangles graph the feasable positions of a local placement function fpl to place a rectangle B around a given rectangle A of any angle α between [−π, π] (left figure). There are special cases for quadrant I, II, III, and IV indicated by different colors π in the graphic. On the right figure, B should be placed around A starting with an angle of 4 (this is the initial relative position in the input map). Since the rectangle can not be placed without overlap (indicated by red) an angle β is added.

Furthermore, no genetic algorithm (metaheuristic) has been implemented. Here recmap will use the GA package available on CRAN as demonstrated in section5. The most computa- tionally expensive part is the computation of MBB intersections which has to be performed Christian Panse 9 to achieve feasible solutions, multiple times, for each placement step. In the package version 0.2.1 these tests were performed by iterating over each map region. All later versions use a std::multiset data structure and a std::lower_bound algorithm of the C++ Standard Template Library (STL) to reduce the search space. The time complexity for one recmap run is O(n2), where n is the number of regions. A depth first search (DFS) run is visiting each map region only once and therefore it has time complex- ity O(n). For each map_region placement a constant number of MBB intersection are called (max 360). MBB check is implemented using std::multiset container, and the functions std::insert, std::upper_bound, and std::upper_bound. The time complexity for all of these functions are reported on http://www.cplusplus.com/reference/stl as O(log(n)). However, the worst case scenario for a range query is O(n), iff dx or dy cover the whole x or y range. The boxplots in Figure4 (left plot) show that the number of MBB intersection test calls could be reduced by using a std::multiset data structure. For this benchmark, synthetic checkerboards with a number of map regions in the interval of 22,..., 202 were generated using the R function checkerboard. For each checkerboard size recmap was called 100 times using different index orderings of the checkerboard input. The index order of the input records have a direct impact how the DFS is traversing the map. This characteristic will later be used for the metaheuristic. For a performance study the checkerboard bench described above was repeated using the rbenchmark package by Kusnierczyk(2012) to measure the proc.time. The study was per- formed on two systems (using one core only). An Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz running a Debian 8 using a 3.16.0-4-amd64 Kernel and a 3 GHz Intel Core i7 (Apple MacBook Pro) running OS X 10.11.4 (15E65) using a Darwin 15.4.0 Kernel. The middle graphic in Figure4 displays the resulting mean aggregated measured data of the two (hardware / OS) systems using the two different implementations of the MBB intersection test. Beside the fact that the Linux system can not benefit from the more efficient implementation of the MBB intersection test, the graphic (middle) shows even for an input size of 202 = 400 map regions, a rectangular cartogram can be computed in less than a second.

0.7 L

A recmap 0.3.0 MacOSX i7 A 500 recmap 0.3.0 MacOSX i7 1e+07

● ● L recmap 0.3.0 Debian i5 ● ● L ● 0.6 A recmap 0.2.1 MacOSX i7 ● ● A ● 200 ● ● ● L recmap 0.2.1 Debian i5 ● ● ● ● ● L ● ● ● ● ● ● ● ● A ● 0.5 ● ● ● ● ● ● ● ● L ● 100 ● ● ● L 1e+05 ● ●

0.4 L

L A 50

● A ● L

0.3 L A A

L 20 A L mean $user.self [in seconds] mean $user.self

1e+03 A 0.2 L A

number of MBB intersection calls number ● ● L AL A 10 AL ● ● 0.1 L A ● L

A A 5 recmap > 0.2.1 L A L A A of rectangular cartogramsnumber per second generated ● L AL A A recmap <= 0.2.1 AL AL AL AL AL AL A A A 0.0 1e+01 5x5 10x10 15x15 20x20 5x5 10x10 15x15 20x20 5x5 US 10x10 15x15 20x20

number of map regions number of map regions number of map regions

Figure 4: The boxplot (left) graphs the number of MBB intersection test calls for a given checkerboard size. The graph in the middle displays the mean aggregated measured com- putation time. The scatterplot (right) displays how many rectangular cartograms can be generated within one second depending on the board size.

The right plot in Figure4 was derived from the performance study (plot in the middle). 10 recmap: Compute the Rectangular Statistical Cartogram

It shows the number of rectangular cartograms which can be generated within one second depending on the number of map regions. The grey vertical line indicates the number of the U.S. map regions. The ability to generate a high number of cartogram candidates in a short time period is an important requirement for any metaheuristic as it is demonstrated in the next section.

5. Choose a metaheuristic

The design of the recmap algorithm is as follows: First, compute a set of feasible solutions. In a second evaluation step choose the best solution. In the current implementation, variations can be introduced by changing the index order Π of the input data. This has a direct impact to the DFS traversal and leads to different cartogram layouts. The objective functions, equations 5 to 12, can be used to evaluate the result and to define a fitness function which has to be maximized. The following R code defines the fitness function which will be used as default.

R> recmap:::.recmap.fitness

function (idxOrder, Map, ...) { Cartogram <- recmap(Map[idxOrder, ]) if (sum(Cartogram$topology.error == 100) > 0) { return(0) } 1/sum(Cartogram$relpos.error) }

Other variants of fitness functions will lead to different results as it is shown in (Heilmann et al. 2004, Figure 4). Since it is not possible to compute and evaluate all permutations, which is n!, random ex- periments are conducted. In the following section it is demonstrated how two metaheuristic, GRASP and GA, can be used to find a optimal layout for the rectangular statistical cartogram. For a visual evaluation, of the in this section proposed metaheuristics, an 8x8 checkerboard will be used as input map. The map is generated using the checkerboard method

R> Checkerboard <- checkerboard(8) R> summary(Checkerboard)

values number of map regions 64.00 area error 0.27 topology error NA relative position error NA screen filling [in %] 100.00 xmin 0.50 xmax 8.50 Christian Panse 11 ymin 0.50 ymax 8.50 and can be seen in Figure5 (left). If the assumption is made, that for each vertex, the cyclic order of edges in the contiguous cartogram remains the same as as in the input map, checkerboards provide examples of sets of map regions which do not have ideal cartogram solutions (Keim et al. 2004, Definition 2, Lemma 1, Fig. 3.).

5.1. Greedy Randomized Adaptive Search Procedures One group of optimizer is “trivial to efficiently implement” (Feo and Resende 1995) and can directly benefit from a parallel environment is called greedy randomized adaptive search procedures (GRASP). The R method recmapGRASP defines a generic GRASP implementation as described in (Feo and Resende 1995, Fig. 1). The recmapGRASP function generates a set of rectangular cartograms which have different layouts caused by the random sampling. Each cartogram is evaluated. The best candidate is saved. The following command will generate cartogram solution based on a GRASP metaheuristic.

R> set.seed(1) R> res.GRASP <- recmapGRASP(Checkerboard) R> plot(res.GRASP$Cartogram, + col = c('white', 'white', 'white', 'black')[res.GRASP$Cartogram$z])

A drawing of the cartogram can be found in Figure5 (middle). For some types of input maps, GRASP can generate amazing results in a short time. As it can be seen in Figure5, for the checkerboard map, GRASP is outperformed by the genetic algorithm, introduced in the next section. The plot in Figure6 (right) shows that the solution process runs too fast into a saturation.

5.2. Lessons learnt from biological evolution In this paragraph a constraint-based genetic algorithm (GA) as discussed in Heilmann et al. (2004) is used as metaheuristic. Here the construction heuristic benefits from the existence of the GA package by Scrucca(2013). The GA configuration used for recmap was directly derived from the traveling salesperson problem (TSP) example (Scrucca 2013, section 4.8) using the permutation type of the ga method. As genotype the index order Π of the input map is used. The following command generates an almost perfect rectangular cartogram for the checkerboard map having 64 map regions on the author’s laptop2 within 60 seconds.

R> recmap.GA <- ga(type = "permutation", + fitness = recmap:::.recmap.fitness, + Map = Checkerboard, + min = 1, max = nrow(Checkerboard), + popSize = 64, + maxiter = 1000,

2Mac Book Pro from 2015, please find the hardware specification in Table2. 12 recmap: Compute the Rectangular Statistical Cartogram

+ maxFitness = 1.7, + parallel = TRUE, + pmutation = 0.25)

The metaheuristic stops when a maximum number of iteration has been performed or the fitness value is higher than 1.7. Having reached a fitness value of 1.7 using the recmap.fitness function the result in Figure5 (right) looks like an almost “optimal” solution. The recmapGA function is a higher level wrapper function to glue the recmap construction heuristic with the metaheuristic ga.

R> res.GA <- recmapGA(Checkerboard) R> summary(res.GA) R> plot(res.GA$Cartogram, + col=c('white', 'white', 'white', 'black')[res.GA$Cartogram$z])

The recmapGA function returns a list of the input Map, The solution of the GA, and a recmap object containing the cartogram. The resulting cartograms using a GA can be seen in Figure5 (right). The red line in Figure6 (left) indicates in which order the rectangles were placed using the DFS numbering. The red • symbol marks the first placed rectangle and the  the last one.

a8 b8 c8 d8 e8 g8 h8 b8 c8 d8 e8 g8 h8 f8 a8 f8 b8 c8 h8 a8 e8 f8 g8 d8 a7 b7 c7 d7 e7 g7 h7 a7 b7 c7 e7 f7 g7 h7 f7 d7 a7 b7 c7 e7 f7 g7 h7 d7 g6 a6 b6 c6 d6 e6 f6 g6 h6 a6 b6 d6 h6 a6 g6 c6 b6 c6 d6 e6 f6 h6 e6 f6 a5 g5 a5 b5 c5 d5 e5 g5 h5 a5 b5 c5 d5 b5 c5 d5 f5 h5 f5 e5 e5 f5 g5 c4 h5 b4 c4 d4 a4 b4 c4 d4 e4 g4 h4 a4 b4 d4 a4 e4 f4 g4 f4 g4 h4 e4 h4 d3 f4 a3 c3 e3 f3 g3 a3 b3 c3 d3 e3 g3 h3 b3 c3 d3 g3 b3 h3 f3 f3 h3 a3 e3 b2 d2 g2 h2 a2 b2 d2 e2 g2 h2 b2 c2 d2 g2 h2 a2 c2 e2 f2 c2 f2 a2 e2 f2 h1 d1 f1 h1 a1 b1 c1 d1 e1 f1 g1 a1 b1 c1 d1 e1 f1 g1 h1 a1 b1 c1 e1 g1

Figure 5: A comparison of the input map, GRASP, and GA is displayed. As input map a 8x8 checkerboard (left) has been generated. The area of a black box needs to be four times as large as the area of a white box. The cartogram in the middle proposes a solutions generated by a GRASP metaheuristic. The right cartogram graphs a solution computed by a genetic algorithm within one minute.

Figure7 illustrates the variability in solutions, dependent on the initial seed value. The experiment was repeated twice to demonstrate the effect that the same seed values lead to the same permutation order Π and finally to the same cartogram construction.

R> fitness.weighted <- function (idxOrder, Map, ...) + { + Cartogram <- recmap(Map[idxOrder, ]) + if (sum(Cartogram$topology.error == 100) > 0) { Christian Panse 13

c8 b8 h8 1.6 a8 e8 f8 g8 d8 b7 h7 a7 e7 f7 g7 1.4 c7 d7

a6 g6 b6 c6 d6 e6 f6 h6 1.2 a5 g5 b5 c5 d5 f5 h5 e5 1.0 c4 ● b4 d4 recmap.fitness a4 e4 f4 g4 h4 0.8 d3 a3 c3 e3 f3 g3 b3 h3 GA 0.6 GRASP b2 d2 g2 h2 a2 c2 e2 f2 0.4

h1 a1 b1 c1 d1 e1 f1 g1 0 10 20 30 40 50 60

elapsed time [in seconds]

Figure 6: The left map graphs the order of the placement during the cartogram generation. The right plot displays the recmap.fitness values versus elapsed time of the two metaheuris- tics (right).

a4 b4 c4 d4 b4 d4 b4 c4 d4 b4 d4 a4 c4 a4 a4 c4 d3 a3 b3 c3 b3 d3 d3 a3 b3 c3 a3 c3 a3 c3 d2 d3 b3 a2 c2 a2 c2 b2 d2 b2 d2 a2 b2 d2 a1 b1 c1 d1 a2 c2 c2 a1 b1 d1 b1 d1 b1 d1 b2 a1 c1 c1 a1 c1

Πforward = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16) Π1 = (1, 10, 9, 12, 15, 14, 13, 7, 3, 2, 5, 4, 8, 6, 11, 16) Π2 = (12, 8, 4, 10, 15, 5, 16, 11, 2, 6, 14, 1, 7, 13, 9, 3) Π3 = (10, 9, 2, 6, 7, 15, 16, 8, 4, 14, 13, 3, 11, 5, 12, 1)

c3 b4 d4 b4 c4 d4 b4 d4 a4 c4 a4 a4 c4

a4 b4 c4 d4 b3 d3 d3 a3 b3 c3 a3 c3 a3 c3 d3 b3 b3 d3 a3 a2 c2 b2 d2 b2 d2 a2 b2 d2 b2 c2 d2 a2 c2 c2 a2 a1 b1 d1 b1 d1 b1 d1 a1 b1 c1 d1 a1 c1 c1 a1 c1

Πreverse = (16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1) Π1 = (1, 10, 9, 12, 15, 14, 13, 7, 3, 2, 5, 4, 8, 6, 11, 16) Π2 = (12, 8, 4, 10, 15, 5, 16, 11, 2, 6, 14, 1, 7, 13, 9, 3) Π3 = (10, 9, 2, 6, 7, 15, 16, 8, 4, 14, 13, 3, 11, 5, 12, 1)

Figure 7: This graphical example illustrates the variability in solutions, dependent on the initial seed value in Πseed. The left column displays forward and reverse index orders. All other columns show the results from duplicate seeds {1, 2, 3} to demonstrate that the same seed will lead to the same index order and to the identical layout of the cartogram. 14 recmap: Compute the Rectangular Statistical Cartogram

+ return(0) + } + + S <- summary(Cartogram) + dT <- max(Cartogram$topology.error) + dR <- S[4,] + dE <- (100 - S[5,]) / 100 + + # weighting the objectives + 1 / (c(0.2, 0.6, 0.2) %*% c(dT, dR, dE)) + }

R> set.seed(2) R> US.map.best <- recmapGA(Map = US.map, + fitness = fitness.weighted, + maxiter = 100, + maxFitness = 100, + popSize = 50, + keepBest = TRUE, + pmutation = 0.35, + parallel = TRUE)

The regtangular map approximations in Figure8 demonstrate the continuous improvement of the feasible solutions with an increasing number of generations using the GA as metaheuristic for the data used in Section3. Note that the metaheuristic of the space filling Quad Tree (Finkel and Bentley 1974) based RecMap MP1 variant could also be realized by using the GA package. Here, instead of a permutation, a binary representation of decision variables has to be chosen. This can be done by setting the type attribute of the ga function to ’binary’. The genotyp, given as a binary vector, is defining the split type of the Quad Tree data structure. A 1 is applying a vertical split while a 0 triggers a horizontal split.

6. Application

This section applies recmap to some real world maps having numbers of map regions of different magnitudes and different kind of topology. Beside Figures1 and9 the examples focus more on the demonstration of the drawing characteristics of the recmap method itself and less on the information visualization scopes. Figure 13 demonstrates how recmap objects can be transformed to SpatialPolygonsDataFrame objects.

US State Facts and Figures based cartograms are displayed in Figure9. The data are available from the data frame state.x77. On the cartograms two statistical data are displayed using the area and the color of a map region. The colormap was generated by using the heat_hcl function of the colorspace package by Zeileis, Hornik, and Murrell(2009) (red is low; white is high). The code below is a wrapper function for the recmap and the ga functions. A tuple of state.x77 column names is given as input. Christian Panse 15

●●●●●●●●●●●●●●●●●●●●●●● 50

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●● 40 ●●●●●●● ●●●●

0.46 ●●●●●●●●●●●●●●●●●●●● 30

●● ● 20

● ● order Index 0.42 Fitness value Best ● ● ● ● ● ● ● ● ● ●

Mean● ● 10 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●Median ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● 0.38 ● ●● ● ● ● ● ● ● ● ● ● ● ● ● 0● ● 20 ● 40 60 80 100 20 40 60 80 100 ● ● Generation● Generation

Nebraska Illinois

North Dakota North Dakota Michigan Washington Minnesota Montana

Alaska Wyoming Wisconsin Michigan South Dakota Illinois Alaska New York Indiana New Hampshire Nebraska Idaho Washington Iowa Indiana Maine Oregon Montana Vermont Maine Minnesota Maryland Vermont Idaho Wisconsin New York South Dakota Utah Kansas Pennsylvania Oregon Massachusetts Nevada Massachusetts New Hampshire Pennsylvania

Ohio West Virginia Iowa Connecticut Rhode Island Utah Connecticut Rhode Island Wyoming Ohio Virginia Missouri Oklahoma New Jersey Nevada New Jersey West Virginia Maryland California Tennessee Arkansas Missouri Oklahoma Kentucky Delaware Delaware Arizona North Carolina Arkansas Hawaii North Carolina California New Mexico Kentucky Colorado Arizona Tennessee Alabama Colorado Kansas South Carolina Hawaii Texas South Carolina Virginia New Mexico Mississippi Mississippi Georgia

Georgia Alabama Texas Louisiana Louisiana Generation 1 Florida Generation 3 Florida dT = 10 dT = 10 dR = 0.34 dR = 0.34 dE = 45 dE = 51

South Dakota

Iowa Nebraska

Michigan Wisconsin Utah Colorado Montana North Dakota Missouri Oregon Indiana Pennsylvania Kansas Tennessee Illinois Indiana Idaho South Dakota West Virginia Utah Texas Kentucky New Hampshire Alaska Maine Alaska Maine Washington North Dakota Vermont South Carolina Montana Colorado Kansas Minnesota Washington Vermont

Georgia New Hampshire Minnesota New York Massachusetts New York Wisconsin Michigan Connecticut Oregon Nevada Wyoming Iowa Massachusetts Idaho Wyoming Connecticut Rhode Island Virginia Maryland Missouri Rhode Island Nebraska New Jersey New Jersey Illinois Ohio Maryland Tennessee Oklahoma Delaware Nevada California New Mexico North Carolina Pennsylvania California Delaware Arizona Arkansas Ohio Oklahoma Arizona West Virginia New Mexico North Carolina Arkansas Kentucky Hawaii Georgia Hawaii Virginia

South Carolina

Mississippi Louisiana Alabama Texas Alabama Florida Generation 23 Generation 27 Florida dT = 9 Mississippi dT = 9

dR = 0.51 Louisiana dR = 0.43 dE = 44 dE = 37

Idaho South Dakota

Wyoming Michigan Michigan North Dakota Maine Montana Vermont Maine Montana Idaho Minnesota Vermont North Dakota Indiana New York Ohio Pennsylvania New York Massachusetts New Hampshire Wisconsin Minnesota Ohio Massachusetts New Hampshire Alaska Wyoming Maryland Delaware Pennsylvania Connecticut Rhode Island Washington Wisconsin Nebraska Iowa Maryland Connecticut Rhode Island Alaska South Dakota Iowa West Virginia Washington New Jersey Illinois New Jersey

West Virginia Virginia Missouri Kentucky Virginia Illinois Oregon Delaware Nebraska Nevada Utah Colorado Indiana Oregon Nevada Utah Colorado Missouri Kentucky Oklahoma North Carolina Tennessee North Carolina Mississippi California Arkansas

Tennessee Arkansas Arizona California South Carolina Kansas Kansas South Carolina Hawaii Arizona Georgia Mississippi New Mexico Georgia Hawaii New Mexico Louisiana Texas Texas Florida Alabama Florida Oklahoma Alabama Generation 77 Louisiana Generation 100 dT = 9 dT = 9 dR = 0.37 dR = 0.23 dE = 48 dE = 48

Figure 8: The plot on the top left displays the fitness value during the increasing genera- tion of the genetic algorithm. The image plot on the top right visualizes the genotype (Π) change from one generation to the other using a grey colormap for encoding the index order. The six phenotypes visualize the improvement of the solution with an increasing number of generations. 16 recmap: Compute the Rectangular Statistical Cartogram

R> library("colorspace") R> recmap_state_x77 <- + function(input, Map = US.map, DF = state.x77, cm = heat_hcl(10)){ + # join + Map <- cbind(Map, DF, match(Map$name, row.names(DF))) + + # filter + Map <- Map[!Map$name %in% c("Hawaii", "Alaska"), ] + + # set attribute for desired area + Map$z <- Map[, input$area] + + ptm <- proc.time() + res <- recmapGA(Map = Map, + popSize = 300, maxiter = 30, run = 10, parallel = TRUE) + + # set attribute for the coloring + S <- Map[res$GA@solution[1, ], input$color] + col.idx <- round((length(cm) -1) * (S - min(S)) / (max(S) - min(S))) + 1 + + # have fun + plot(res$Cartogram, col = cm[col.idx], col.text='black') + legend("bottomleft", c(paste("area:", input$area), + paste("color:", input$color)), cex=1.5) + + res$time.elapsed = (proc.time()-ptm)[3] + res + }

As input map the US.map defined in Section3 is used. An interactive shiny (Chang, Cheng, Allaire, Xie, and McPherson 2016) web application, using this method, provides more combi- nation of attributes and is available through https://recmap.shinyapps.io/state_x77/.

R> op <- par(mfrow = c(4, 1), mar = rep(0.25, 4), bg = "white") R> set.seed(1) R> cartogram.x77 <- lapply(list(list(color = "Area", area = "Population"), + list(color = "HS Grad", area = "Murder"), list(color = "HS Grad", + area = "Income"), list(color = "Life Exp", area = "Illiteracy")), + recmap_state_x77) R> par(op)

R> op <- par(mar = c(5, 5, 3, 3), mfrow = c(4, 1)) R> res <- lapply(cartogram.x77, function(x) { + plot(x$GA) + }) R> par(op) Christian Panse 17

● ● ● ● ● ● ● ● ● ● Wisconsin Vermont Maine

New Hampshire ● ●

Michigan 0.09 New York

Massachusetts Washington ● ● ● ●

North Dakota Oregon Montana Minnesota Indiana 0.08 ● ● ● Nevada South Dakota Rhode Island ● ● Connecticut

Idaho Wyoming Nebraska Ohio Kansas Iowa ● ● ● Pennsylvania Utah 0.07 New Jersey California Colorado Illinois Kentucky Arizona West Virginia Missouri Delaware 0.06 Fitness value

Maryland Oklahoma Tennessee North Carolina ● Best

0.05 ● Arkansas Georgia ● New Mexico ● ● ●Mean● Mississippi ● ● Alabama ● ● ● ● ● ● ● ● ● South Carolina ● ● ● ● Virginia ● ● ● Median

Texas Louisiana

0.04 ● Florida 5 10 15 20 area: Population color: Area Generation

Vermont

● ● ● ● ● ● ● ● ● ●

Maine New Hampshire

Washington New York

North Dakota Ohio Massachusetts Montana

Minnesota 0.07 South Dakota Michigan Oregon Rhode Island Idaho Connecticut

Wyoming Illinois Indiana Pennsylvania Maryland Nebraska

Wisconsin New Jersey

West Virginia Nevada Utah Iowa Kentucky Delaware

Virginia 0.06 Kansas

California Arizona New Mexico Tennessee Fitness value Oklahoma Mississippi North Carolina South Carolina 0.05 Colorado ● Best ● Louisiana ● ●● Mean Georgia ● ● ● ● Arkansas ● Median Texas ● ● 0.04 Missouri Florida Alabama 2 4 6 8 10 area: Murder color: HS Grad Generation

Maine ● ● ● ● ● ● ● ● ● ●

Washington

New Hampshire 0.09

Minnesota ● ● ● ● ● ● Oregon ● ● ● Idaho Montana North Dakota ● ● ● ● ● Indiana Vermont Massachusetts 0.08 Pennsylvania Illinois New York South Dakota Nevada Wyoming Wisconsin Michigan West Virginia Connecticut Rhode Island 0.07

Nebraska Iowa Maryland California Utah

Tennessee Ohio New Jersey Fitness value 0.06 New Mexico Virginia Arizona Kansas Missouri Kentucky ● Delaware Best

0.05 ● ● ● ● ● South Carolina ● ● ● ● ● ● Mean ● ● ● ● ● ● ● ● ● Colorado Oklahoma Arkansas ● ● Mississippi Georgia ● ● Median North Carolina ● 0.04 Louisiana Texas Alabama Florida 5 10 15 20 area: Income color: HS Grad Generation

Vermont Maine

New Hampshire ● ● ● ● ● ● ● ● ● ● 0.08 Pennsylvania New York Connecticut Ohio Rhode Island

Montana Minnesota ● ●

Washington North Dakota Wisconsin Michigan Maryland New Jersey

South Dakota Idaho Wyoming Massachusetts 0.07 Nevada Oregon Iowa Nebraska Illinois Indiana West Virginia Delaware

California Kansas Virginia

New Mexico Arizona Tennessee 0.06 Arkansas Oklahoma

North Carolina Fitness value South Carolina Utah Colorado Kentucky 0.05 ● Best Missouri ● ● ● ● ● ● Mean● ● ● ● ● ● Alabama Georgia Median

Texas 0.04 Louisiana ●

Florida 2 4 6 8 10 12 area: Illiteracy Mississippi color: Life Exp Generation

Figure 9: Rectangular Statistical Cartograms using the “US State Facts and Figures” dataset are drawn (see also https://recmap.shinyapps.io/state_x77/). The plots on the right column display the fitness values versus the generation of the genetic algorithm during the optimization process. 18 recmap: Compute the Rectangular Statistical Cartogram

Del Norte County Modoc County Siskiyou County

Butte County

Shasta County Tehama County Placer County ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Humboldt County Shasta County Nevada County Trinity County ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● El Dorado County

Napa County Lassen County Sacramento County Sonoma County 0.035 Yolo County Tehama County Solano County Plumas County Marin County ● ● ● ● ● ● ● ● ● ● ● Butte County Contra Costa County San Joaquin County Glenn County Sierra County San Francisco County Mendocino County Yuba County Nevada County Madera County Colusa County Merced County Lake County Sutter County Placer County Alameda County ● San Mateo County ● ● El Dorado County 0.030

Yolo County Alpine County Stanislaus County Sonoma County Napa County Amador County Sacramento County Santa Cruz County

Solano County Santa Clara County

Calaveras County Fresno County Marin County Tuolumne County Mono County

Contra Costa County San Joaquin County

Alameda County Tulare County Stanislaus County Monterey County Mariposa County Kings County

San Mateo County

Madera County Santa Clara County Kern County Merced County 0.025 Santa Cruz County Santa Barbara County

Fresno County San Luis Obispo County Fitness value San Benito County Inyo County Ventura County Monterey County San Bernardino County Tulare County Kings County Los Angeles County ● Riverside County Best● 0.020 ● ● ● ● ● ● ● ● ● ● ● San Luis Obispo County ● ● ● ● ● Kern County ● ● ● ● ● ● ● Mean● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● San Bernardino County ● ● Median Santa Barbara County ●

Ventura County Orange County Los Angeles County

0.015 ●

Orange County Riverside County San Diego County Imperial County 0 10 20 30 40 50

San Diego County Imperial County Generation

Larimer County ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Sedgwick County Weld County ● ● ● Logan County Jackson County Larimer County Boulder County ● ● ● ● ● ● ● ● ● ● ● Moffat County Phillips County

0.035 ● ● ● ● ● Routt County Weld County ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Broomfield County Morgan County

Grand County Boulder County Washington County Yuma County Rio Blanco County Adams County Gilpin County

Denver County 0.030 Denver County ● Garfield County Clear Creek County Eagle County Summit County Arapahoe County

Jefferson County Logan County

Morgan County Douglas County Kit Carson County

Lake County Elbert County Pitkin County Adams County Park County Otero County Lincoln County Prowers County 0.025 Delta County Mesa County Teller County

El Paso County Cheyenne County Summit County Jefferson County

Chaffee County Moffat County Gunnison County Routt County Grand County Rio Blanco County Fitness value

Fremont County Garfield County Montrose County Kiowa County Eagle County Crowley County Lake County Arapahoe County Park County

Mesa County Delta County 0.020 ● Ouray County Pueblo County Best Saguache County Custer County Douglas County Elbert County

San Miguel County Otero County Bent County Prowers County Montrose County Pitkin County Chaffee County Fremont County ●

San Miguel County Gunnison County Mean San Juan County Dolores County Hinsdale County ● ● ● ● Alamosa County ● ● Teller County ● ● Mineral County Las Animas County ● ● Montezuma County La Plata County ● ● ● ● Archuleta County ● ● ● ● ● ● Rio Grande County Huerfano County ● ● ● ● ● ● Conejos County ● ● ● ● ● ● ● ● Median Alamosa County ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Las Animas County El Paso County 0.015 ● ● ● ● Montezuma County La Plata County Costilla County Baca County ● Archuleta County Conejos County

0● 10 20 30 40 50 Pueblo County

Generation

Nassau County

Okaloosa County Holmes County Santa Rosa County Escambia County Jackson County Jackson County Okaloosa County Santa Rosa County Escambia County Walton County Washington County Okaloosa County Duval County Gadsden County Gadsden County Nassau County Bay County Leon County Madison County Hamilton County Leon County Calhoun County Jefferson County Baker County Duval County ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● Bay County Liberty County Wakulla County Alachua County ●●●●●●●●●●●●●●●●●●●●●● Columbia County ●●●●●●●●●● Clay County St. Johns County Wakulla County Suwannee County Levy County

Lafayette County Union County Taylor County Gulf County Bradford County Clay County St. Johns County Sumter County ●●●●●●●●●●●●●●●●● Citrus County Putnam County ●●●●● Franklin County Flagler County ●●●●●●●●●●●●●●●●●●●●●●●●●● Gilchrist County ●●●●●●● Alachua County 0.07 Marion County Dixie County Putnam County Volusia County

Flagler County

Hernando County ● Levy County Lake County ●●●●●●●● Marion County ●●●●● Seminole County ●●●●●●●●●●●●●●●●● Volusia County Pasco County 0.06 Citrus County ●● Lake County Brevard County Seminole County

Sumter County Orange County Hernando County Orange County Hillsborough County

Pinellas County Pasco County Brevard County Indian River County 0.05

Polk County Osceola County

St. Lucie County Polk CountyOsceola County Manatee County Pinellas County Hillsborough County

DeSoto County Martin County Highlands County Indian River County Fitness value

Sarasota County 0.04 Hardee County

Manatee County Okeechobee County St. Lucie County Highlands County ● ● Charlotte County ●● ●●● Palm Beach County ● ● ●● DeSoto County ● ●● ● ● Sarasota County ● ● ● ● ●●●● ●● ● ● ● Martin County ●● ● ● ● ● Hendry County ●● ● ● ● ● Best● ●● ● ●●●●● ● ● ●●●●●● ● ●● ●● ● ● ●●● Glades County ● ●● ● ● ● ● ●●● ●● ● ● Lee County ●● ● ● ● ●● ● ●● Charlotte County ● ● ● ●● ●● ● ● ●● ● ● ● ●● ● ● ● ●●●●● ● ●● ● ● ●● ● 0.03 ● ● ●● ● ●● ● ●● ● Mean Palm Beach County ● ● ● ●● ● ● ●●●● ● Lee County Collier County ●●● ● Hendry County Broward County ●●● ●● ● Median ●●● ● Collier County Broward County 0.02 ● 0 50 100 150

Miami−Dade County Miami−Dade County

Monroe County Monroe County Generation

Sussex County Sussex County Bergen County Passaic County ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Passaic County

Bergen County Morris County

Warren County Morris County

Essex County 0.14 Hudson County Essex County Hudson County ● ● ● Union County ● ● ● ● ● ● ● ● ● ● ● ● ● ● Somerset County

Hunterdon County Somerset County ● ● ● ● ● ● ● ● ● ● ●

Middlesex County

Union County 0.12

Warren County

Mercer County Monmouth County

Hunterdon County ●

Middlesex County 0.10 Fitness value ● ● Burlington County ● ● Ocean County ● ● ● ● ● ● ● Camden County ● ● ●● ● ● ● Best● Mercer County Monmouth County ● ● ● ● ● Gloucester County ● ● ● ● 0.08 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Mean● ● ● ● ● ● Salem County ● ● ● Atlantic County ● Median ● Burlington County

Ocean County Cumberland County Camden County 0.06 Gloucester County 0 10 20 30 40 50 Atlantic County

Cape May County Salem County

Cumberland County Cape May County Generation

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Clinton County St. Lawrence County

Clinton County Jefferson County ● ● Oswego County

Warren County Franklin County St. Lawrence County Oneida County

Orleans County Niagara County Onondaga County Monroe County Wayne County Saratoga County Albany County Fulton County ● ●

Schenectady County

0.04 ● Ontario County Essex County Genesee County Rensselaer County Tompkins County Madison County Jefferson County Erie County Steuben County Chemung County Otsego County Greene County Lewis County Ulster County Chautauqua County Tioga County Broome County Sullivan County Hamilton County Dutchess County

Warren County Putnam County Oswego County Herkimer County Orange County Washington County

Orleans County Oneida County Niagara County Wayne County Rockland County Monroe County Fulton County Westchester County

Saratoga County 0.03

Cayuga County Onondaga County Genesee County Madison County Montgomery County Ontario County Schenectady County Erie County Seneca County Livingston County Rensselaer County Wyoming County

Yates County Cortland County Otsego County Schoharie County Albany County

Chenango County Bronx County Tompkins County Schuyler County Fitness value Chautauqua County ● Cattaraugus County Steuben County Greene County Allegany County Columbia County ● Tioga County Broome County Delaware County ● Chemung County ●

0.02 ● ● ● ● ● ● ● ● ● ● ● ● New York County ● ● ● ● ● ● ● ● Ulster County ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Dutchess County ● ● Sullivan County ● ● Best ● ● Suffolk County ● Nassau County Putnam County ● Mean Orange County Queens County

Rockland County Westchester County Median

Kings County 0.01 ● Suffolk County Richmond County

Nassau County

Queens County 0 10 20 30 40

Generation

Figure 10: U.S. input maps (left) of California, Colorado, Florida, New Jersey, and New York on county level were used as input to compute 2010 census population cartograms (middle; top to bottom). On the right column the fitness values versus the generations are displayed. Christian Panse 19

US population cartograms on county level showing cartograms of California, Colorado, Florida, New Jersey, and New York are displayed in Figure 10. The map material was extracted from the maps package by Becker, Wilks, Brownrigg, Minka, and Deckmyn(2016) and the population data were retrieved from the noncensus package by Ramey(2014). The map regions were joined over the fips (Federal Information Processing Standard) county codes using the counties data frame. The cartograms were generated by using the genetic algorithm as metaheuristic.

A population cartogram of Switzerland on community (Gemeinde) level is drawn in Figure 11. The rectangles of the original map were extracted from an ESRI shape file of the map data Landschaftsmodelle: GG25 from the Federal Office of (swisstopo) using shapefiles by Stabler(2013). The following attributes were extracted for each map region: box, Gemeindecode, and Gemeindename. There are 2300 rectangles to place. The statistical values (population 2013, published in 2015) were downloaded from Swiss Statis- tics (Regionalportr¨ats: Kennzahlen aller Gemeinden (je-d-21.03.01) Bundesamt fur¨ Statis- tik BFS http://www.bfs.admin.ch/bfs/portal/de/index/regionen/02/daten.html) and joined by the Gemeindecode attribute with the swisstopo map. On the web application, avail- able through https://recmap.shinyapps.io/CH-Gemeinden/, the reader can play with the rectangular population cartogram of Switzerland and different statisical values provided in the previous mentioned data for coloring the cartogram regions.

A Swiss railway passenger frequency cartogram is graphed on the bottom of Fig- ure 12. The upper visualization shows the overlapping rectanglues of all 400 geo-locations which define the pseudo dual of the map. The data were retrieved from https://data.sbb. ch/explore/ and contain already the longitude and latitude coordinates of of the railway main station and stops.

The UK Brexit EU-referendum is shown as final example in Figure 13. The UK bound- ary file were downloaded from https://census.edina.ac.uk and joined by the column name geo_code and Area_Code with the the outcome of the referendum downloaded through http://www.electoralcommission.org.uk/ on July 3th. This example also demonstrate the usage of the sp package by Bivand, Pebesma, and Gomez-Rubio(2013). Through using the function recmap2sp the recmap class can be transformed into a SpatialPolygonsDataFrame object. The following code snippets show how the summary and spplot methods can be used.

R> DF <- rbind(data.frame(Pct_Leave = UK$Map$Pct_Leave, + Pct_Turnout = UK$Map$Pct_Turnout, + Pct_Rejected = UK$Map$Pct_Rejected, + row.names = UK$Map$name), + data.frame(Pct_Leave =44.22, + Pct_Turnout = 62.69, + Pct_Rejected = 0.05, + row.names = 'Northern\nIreland')) R> UK.sp <- recmap2sp(add_NI(UK.recmap) , DF) R> summary(UK.sp) 20 recmap: Compute the Rectangular Statistical Cartogram

Baden

Aarau

Frauenfeld

Dietikon Kloten

Kreuzlingen Adliswil

Basel Schaffhausen

Bülach Allschwil

Wettingen Baar Riehen

Köniz Zürich Winterthur

Reinach (BL)

Wetzikon (ZH)

La Chaux−de−Fonds

Dübendorf Biel/Bienne

Emmen Rapperswil−Jona

Uster Luzern Bulle

Horgen Montreux Wädenswil St. Gallen

Yverdon−les−Bains Nyon Renens (VD) Zug Kriens

Chur

Neuchâtel

Bern Vevey Lugano Thun

Lausanne Fribourg

Meyrin

Sion

Vernier Genève

Carouge (GE) Lancy

Figure 11: A rectangular population cartogram of Switzerland is shown. map data source: Swiss Federal Office of Topography using Models / Boundaries GG25, downloaded 2016-05-01; statistical data: Bundesamt fur¨ Statistik (BFS), Website Statistik Schweiz, down- loaded file je-d-21.03.01.xls on 2016-05-26. An interactive shiny app is available through https://recmap.shinyapps.io/CH-Gemeinden/. Christian Panse 21

Basel SBB Brugg AGBaden Winterthur Zürich Flughafen Wil Zürich Oerlikon St. Gallen AarauLenzburg ZürichZürichZürich AltstettenHardbrückeStettbach HB Olten Zürich StadelhofenUster Wetzikon

Rapperswil Zug Biel/Bienne Luzern Neuchâtel Bern

Chur Fribourg Thun

Lausanne Vevey

Genève

Zürich Flughafen

Schaffhausen

Zürich Oerlikon Bülach

Zürich Altstetten Zürich Hardbrücke Effretikon Wil

Baden Stettbach Aarau Zürich HB Lenzburg Olten Winterthur Basel SBB Dietikon

Liestal

Brugg AG St. Gallen

Thalwil

Zürich Stadelhofen

Neuchâtel Biel/Bienne Uster

Wetzikon Zug Rapperswil Vevey Luzern Fribourg Chur Bern Wädenswil Nyon

Visp Thun Renens VD Genève

Lausanne

Figure 12: A Swiss railway passenger frequency cartogram is shown on the lower map. The upper graphic displays the overlapping rectangles of the input map. source: sbb.ch, 2016- 05-12. 22 recmap: Compute the Rectangular Statistical Cartogram

Object of class SpatialPolygonsDataFrame Coordinates: min max x -121259.4 1389447 y -340948.2 1243144 Is projected: NA proj4string : [NA] Data attributes: Pct_Leave Pct_Turnout Pct_Rejected Min. :21.38 Min. :56.25 Min. :0.03000 1st Qu.:47.38 1st Qu.:70.19 1st Qu.:0.06000 Median :54.34 Median :74.30 Median :0.07000 Mean :53.29 Mean :73.69 Mean :0.07283 3rd Qu.:60.49 3rd Qu.:77.89 3rd Qu.:0.08000 Max. :75.56 Max. :83.57 Max. :0.24000

R> spplot(UK.sp, col.regions=diverge_hcl(19)[1:16], layout=c(3,1))

7. Summary

This article introduces the CRAN recmap package which implements the RecMap MP2 algo- rithm. This method generates rectangular statistical cartograms. Two outstanding features of the implemented algorithm are: the areas of the map regions represent the exact statis- tical value without any area error and the ratios of the map regions are preserved. These constraints are important for the correct interpretation of the geography-related statistical data. It is evident that using these restrictions the map topology can not be preserved. The implementation allows the generation of rectangular statistical cartograms having less than one hundred map regions within a few seconds with support of a metaheuristic. The car- tograms generated enable an interactive explorative data analysis. All necessary steps can be done on the R command line or by using web applications on a modern computer. It has been demonstrated, how the drawing of the cartogram can be optimized according to a fitness function by using a metaheuristic and benefiting from today’s multi core hardware and R’s parallel environment. Most promising is using a fitness function which is derived from the relative position error objective function. It should also be highlighted that the method can read a spreadsheet containing the geographic location. It does not require any complex polygon mesh as input. The potential of the method is shown by using real world maps cov- ering a maps size of three orders magnitude and synthetical data (8x8 checkerboard). Table2 provides an overview of the rectangular cartogram specification. The recmap package is a powerful tool in the hand of data analysts, cartographers, or statisticians using R who want to draw their own statistical rectangular cartograms.

References

Becker RA, Wilks AR, Brownrigg R, Minka TP, Deckmyn A (2016). maps: Draw Geograph- ical Maps. R package version 3.1.0, URL https://CRAN.R-project.org/package=maps. Christian Panse 23

Input England/Wales/Scottland

Pct_Leave Pct_Turnout Pct_Rejected 80 70

70

60 60

50

50 40

30 40

20

30 10

0

20

Orkney Islands Aberdeen City

Eilean Siar Moray Highland

Aberdeenshire Angus Dundee City

Perth & Kinross Clackmannanshire Argyll & Bute North Lanarkshire Fife Glasgow City West Dunbartonshire Inverclyde Falkirk East Dunbartonshire

North Ayrshire East Lothian

Renfrewshire East Renfrewshire

Stirling

Edinburgh, City of South Lanarkshire South Ayrshire West Lothian East Ayrshire

Midlothian North Tyneside Dumfries & Galloway

Newcastle upon Tyne Eden South Lakeland Oldham Scottish Borders

Copeland Calderdale South Tyneside

Pendle Sunderland Lancaster Craven Carlisle Barrow−in−Furness Rochdale Allerdale Burnley South Ribble County Durham Fylde Ribble Valley

Redcar and Cleveland Hartlepool

Chorley Bolton Blackpool Wyre Preston Bradford Ryedale Darlington Stockton−on−Tees

Blackburn with Darwen

Rossendale Richmondshire Scarborough West Lancashire Sheffield Sefton York Bassetlaw Bury

Middlesbrough

Hyndburn Harrogate Wigan Kirklees Knowsley Warrington Chesterfield

Hambleton Rotherham Wirral

Halton Barnsley Doncaster Cheshire West and Chester Cheshire East Salford West Lindsey

East Riding of Yorkshire Leeds North East Lincolnshire St. Helens Selby

Tameside Bolsover Liverpool Stockport Newark and Sherwood

Stoke−on−Trent

Manchester Kingston upon Hull, City of Flintshire Gedling Mansfield High Peak Nottingham Wakefield

Trafford North East Derbyshire Broxtowe Northern Newcastle−under−Lyme Wrexham North Lincolnshire Ireland Ashfield Lincoln East Staffordshire

Stafford Derbyshire Dales Cannock Chase Amber Valley East Lindsey Telford and Wrekin Derby North Kesteven

Lichfield

Leicester South Cambridgeshire Isle of Anglesey North Norfolk

South Staffordshire Erewash South Holland Boston Rushcliffe South Kesteven Conwy South Derbyshire Denbighshire Charnwood Breckland King's Lynn and West Norfolk Staffordshire Moorlands Norwich

Melton Birmingham Huntingdonshire Dudley North West Leicestershire Peterborough Rutland Fenland East Cambridgeshire Gwynedd Hinckley and Bosworth Blaby Great Yarmouth Harborough Broadland Oadby and Wigston Corby Shropshire Tamworth East Northamptonshire Cambridge Wyre Forest Bedford Forest Heath Daventry South Norfolk Waveney Rugby Kettering Enfield Braintree

North Warwickshire Wellingborough

Neath Port Talbot Powys Warwick St Edmundsbury Worcester Northampton Ceredigion Walsall Mid Suffolk Malvern Hills Central Bedfordshire Wolverhampton Nuneaton and Bedworth Suffolk Coastal Uttlesford Harlow Luton North Hertfordshire Chelmsford

Waltham Forest Cotswold Pembrokeshire Carmarthenshire Ipswich Torfaen South Northamptonshire Babergh Milton Keynes Solihull Hertsmere Epping Forest Three Rivers Coventry Dacorum Watford Brentwood Sandwell Colchester Blaenau Gwent Gloucester Merthyr Tydfil South Gloucestershire Tendring Cherwell Haringey Redditch Wycombe Swansea Wychavon Barking and Dagenham Thurrock Maldon Bridgend Bromsgrove Islington Camden Bath and North East Somerset Rochford

Stratford−on−Avon Dartford Castle Point

Sedgemoor Herefordshire, County of South Bucks Greenwich Slough Southend−on− Tewkesbury Rhondda Cynon Taf Ealing Hounslow West Dorset East Dorset Medway Oxford Gravesham Forest of Dean Thanet Cheltenham West Oxfordshire Monmouthshire Bromley Swale Kensington and Chelsea Richmond upon Thames Runnymede Canterbury Stroud Vale of White Horse Hammersmith and Fulham Tonbridge and Malling Caerphilly Poole South Oxfordshire Maidstone Dover Bristol, City of Swindon Spelthorne Mole Valley Shepway

Kingston upon Thames Newport Wokingham Bracknell Forest Surrey Heath The Vale of Glamorgan West Berkshire Mid Sussex Rother Reading Wealden Wiltshire Wandsworth Eastbourne Hastings Woking Lewisham

Epsom and Ewell Rushmoor Basingstoke and Deane Hart Cardiff Merton Tandridge

North Somerset Winchester Guildford Eastleigh Sutton North Dorset Southampton Test Valley Waverley Elmbridge Croydon

West Somerset East Hampshire Sevenoaks Mendip Fareham Mid Devon Reigate and Banstead Adur Worthing Bexley

Taunton Deane Bournemouth New Forest Arun South Somerset Gosport Portsmouth Exeter Redbridge Ashford Havering Torridge Tunbridge Wells Christchurch Crawley North Devon East Devon Horsham Purbeck Isle of Wight Newham

Teignbridge Chichester Havant Lewes Broxbourne West Devon Hackney Basildon Brighton and Hove

Torbay Tower Hamlets Plymouth Aylesbury Vale South Hams Barnet Harrow Lambeth

Chiltern

Southwark Brent

Hillingdon

Windsor and Maidenhead

Figure 13: The outcome of UK Brexit EU-referendum is displayed. Northern Ireland was manually added. The overlapping MBBs of the input map are displayed in the top left. On all other plots, the region areas represent the number of the electorates. The colors are indicating the outcome of the referendum (blue: remain / red: leave; as lower the color intensity as closer to 50% : 50% outcome ). Other attributes represented as percentages (Pct) are displayed using the spplot of the sp package (top right). Copyright: Contains National Statistics data © Crown copyright and database right 2016 Contains NRS data © Crown copyright and database right 2016 Source: NISRA : Website: www.nisra.gov.uk Contains OS data © Crown copyright [and database right] (2016) 24 recmap: Compute the Rectangular Statistical Cartogram nae ro fteipt h etnua atgashvn oeta 0 a ein eurdmr eeaino h genetic the of generation more required demands. regions computing map higher 300 the than to more due hardware having server used cartograms compute the by a rectangular on on ordered The time computed manuscript were process input. this and the as algorithm in well of displayed as error cartograms area regions, an rectangular map the of statistical using number all al. computed e.g., was et of error properties, overview area input The an includes platforms. provides spreadsheet spreadsheet The The appearance. 2: Table Kelectorates 2010 population frequency passenger 2010 population UK railway SBB 2010 Swiss 2010 population population Switzerland York 2010 New population US Jersey New 2010 US population 1977 Florida illiteracy US 1977 Colorado income US 1977 California murder US level 1977 state population US level state US level state value US statistical level 1:4 state area US 1:4 checkerborad #electors 8x8 checkerborad 8x8 level state US level state US map .SneteSisriwypsegrfeunycrormi eie rma2 on e,i sntmaigu ocompute to meaningful not is it set, point 2D a from derived is cartogram frequency passenger railway Swiss the Since ( 2004 ). rgosae err area #regions 300.59 2300 7 0.57 NA 370 724 20.62 0.38 0.44 62 0.68 21 0.62 68 0.40 64 0.29 48 0.37 48 0.41 48 0.27 48 0.27 48 0.17 64 0.36 64 50 49 A10 .54000 1000 0.25 300 0.25 148 30 0.35 1200 0.25 49 1000 0.25 102 1280 0.25 27 310 0.25 30 105 0.25 GA 30 240 0.25 GA 20 - 320 0.25 GA 11 240 0.25 GA 300 - 0.25 GA 533 300 NA GA 300 0.25 GA NA 300 GA - GA 256 GA NA 10 GA GA #gen unknown GRASP mutrate unknown GA size pop - GA meta summary.recmap Keim by introduced function error area the implementing method H ne oei 4 4 4 4 64 4 64 4 1 2.30GHz 64 4 @ v3 E5-2698 2.30GHz CPU 4 @ Xeon v3 Intel E5-2698 2.30GHz CPU 4 @ Xeon v3 Intel E5-2698 CPU 4 Xeon i7 Intel Core Intel 4 GHz i7 3 Core Intel GHz i7 3 Core Intel GHz i7 3 Core Intel GHz i7 3 Core Intel GHz i7 3 1 Core cores Intel GHz i7 3 Core Intel GHz i7 3 Core Intel GHz i7 3 Core Intel GHz i7 3 Core Intel GHz i7 3 Core Intel GHz 3 GHz 1.5 - @ CPU Xeon Intel hardware compute rctm unit time proc 0 sec 106  ≈ ≈ ≈ 9sec 39 sec 74 sec 10 sec 16 sec 16 sec 10 days 3 days 2 days 3 sec 4 sec 5 sec 1 min 1 min 1 min 1 Fig 13 12 11 10 10 10 10 10 9 9 9 9 5 5 2 1 Christian Panse 25

Bivand RS, Pebesma E, Gomez-Rubio V (2013). Applied Spatial Data Analysis with R. 2nd edition. Springer-Verlag, NY. URL http://www.asdar-book.org/. Brunsdon C, Charlton M (2014). getcartr: Front end for Rcartogram package. R package version 1.01, URL https://github.com/chrisbrunsdon/getcartr.

Buchin K, Eppstein D, L¨offler M, N¨ollenburg M, Silveira R (2016). “Adjacency-preserving spatial treemaps.” Computational Geometry, 7(1), 100–122. doi:10.20382/jocg.v7i1a6. Buchin K, Speckmann B, Verdonschot S (2012). “Evolution Strategies for Optimizing Rect- angular Cartograms.” In N Xiao, M Kwan, MF Goodchild, S Shekhar (eds.), Geographic Information Science - 7th International Conference, GIScience 2012, Columbus, OH, USA, September 18-21, 2012. Proceedings, volume 7478 of Lecture Notes in Computer Science, pp. 29–42. Springer-Verlag. ISBN 978-3-642-33023-0. doi:10.1007/978-3-642-33024-7_3. URL http://dx.doi.org/10.1007/978-3-642-33024-7_3. Chang W, Cheng J, Allaire J, Xie Y, McPherson J (2016). shiny: Web Application Framework for R. R package version 0.13.2, URL https://CRAN.R-project.org/package=shiny. Dorling D (1996). Area Cartograms: Their Use and Creation. 1st edition. Department of Geography, University of Bristol, England.

Eddelbuettel D (2013). Seamless R and C++ Integration with Rcpp. Springer-Verlag, New York. ISBN 978-1-4614-6867-7.

Feo TA, Resende MGC (1995). “Greedy Randomized Adaptive Search Procedures.” Journal of Global Optimization, 6(2), 109–133. ISSN 1573-2916. doi:10.1007/BF01096763. URL http://dx.doi.org/10.1007/BF01096763. Finkel RA, Bentley JL (1974). “Quad trees a data structure for retrieval on composite keys.” Acta Informatica, 4(1), 1–9. ISSN 1432-0525. doi:10.1007/BF00288933. URL http: //dx.doi.org/10.1007/BF00288933. Gastner MT, Newman ME (2004). “From The Cover: Diffusion-based method for producing density-equalizing maps.” Proc. Natl. Acad. Sci. U.S.A., 101(20), 7499–7504. doi:10. 1073/pnas.0400280101. Heilmann R, Keim DA, Panse C, Sips M (2004). “RecMap: Rectangular Map Approxi- mations.” In MO Ward, T Munzner (eds.), 10th IEEE Symposium on Information Vi- sualization (InfoVis 2004), 10-12 October 2004, Austin, TX, USA, pp. 33–40. IEEE Computer Society. ISBN 0-7803-8779-1. doi:10.1109/INFVIS.2004.57. URL http: //dx.doi.org/10.1109/INFVIS.2004.57. Keim DA, North SC, Panse C (2004). “CartoDraw: A Fast Algorithm for Generating Contiguous Cartograms.” IEEE Trans. Vis. Comput. Graph., 10(1), 95–110. doi: 10.1109/TVCG.2004.1260761. URL http://dx.doi.org/10.1109/TVCG.2004.1260761. Kusnierczyk W (2012). rbenchmark: Benchmarking routine for R. R package version 1.0.0, URL https://CRAN.R-project.org/package=rbenchmark. Lang DT (2016). Rcartogram: Interface to Mark Newman’s cartogram software. R package version 0.2-2, URL https://github.com/omegahat/Rcartogram. 26 recmap: Compute the Rectangular Statistical Cartogram

McIlroy D, Brownrigg R, Minka TP, Bivand R (2015). mapproj: Map Projections. R package version 1.2-4, URL https://CRAN.R-project.org/package=mapproj. Nusrat S, Kobourov S (2016). “The State of the Art in Cartograms.” In EuroVis 2016, 18th EG/VGTC Conference on Visualization, 6-10 June 2016, Groningen, the Netherlands. URL http://www.cs.arizona.edu/~kobourov/star.pdf. Raisz E (1934). “The Rectangular Statistical Cartogram.” Geographical Review., 24(2), 292– 296. doi:10.2307/208794. URL http://www.jstor.org/stable/208794. Ramey JA (2014). noncensus: U.S. Census Regional and Demographic Data. R package version 0.1, URL https://CRAN.R-project.org/package=noncensus. Scrucca L (2013). “GA: A Package for Genetic Algorithms in R.” Journal of Statistical Software, 53(4), 1–37. doi:10.18637/jss.v053.i04. URL http://www.jstatsoft.org/ v53/i04/. Snyder J (1997). Flattening the Earth: Two Thousand Years of Map Projections. Univer- sity of Chicago Press. ISBN 9780226767475. URL https://books.google.ch/books?id= 0UzjTJ4w9yEC. Stabler B (2013). shapefiles: Read and Write ESRI Shapefiles. R package version 0.7, URL https://CRAN.R-project.org/package=shapefiles. Tobler W (2004). “Thirty Five Years of Computer Cartograms.” Annals of the Association of American Geographers, 94(1), 58–73. ISSN 1467-8306. doi:10.1111/j.1467-8306.2004. 09401004.x. URL http://dx.doi.org/10.1111/j.1467-8306.2004.09401004.x. van Kreveld MJ, Speckmann B (2004). “On Rectangular Cartograms.” In S Albers, T Radzik (eds.), Algorithms - ESA 2004, 12th Annual European Symposium, Bergen, , September 14-17, 2004, Proceedings, volume 3221 of Lecture Notes in Computer Science, pp. 724–735. Springer-Verlag. ISBN 3-540-23025-4. doi:10.1007/978-3-540-30140-0_64. URL http://dx.doi.org/10.1007/978-3-540-30140-0_64. van Kreveld MJ, Speckmann B (2007). “On Rectangular Cartograms.” Comput. Geom., 37(3), 175–187. doi:10.1016/j.comgeo.2006.06.002. URL http://dx.doi.org/10. 1016/j.comgeo.2006.06.002. Zeileis A, Hornik K, Murrell P (2009). “Escaping RGBland: Selecting Colors for .” Computational Statistics & Data Analysis, 53, 3259–3270. doi:10.1016/j. csda.2008.11.033.

Affiliation: Christian Panse Functional Genomics Center Zurich UZH|ETHZ Winterthurerstr. 190 CH-8057, Zurich,¨ Switzerland Telephone: +41/44/63-53912 E-mail: [email protected] URL: http://www.fgcz.ch/the-center/people/panse.html