Rworldmap: a New R Package for Mapping Global Data
Total Page:16
File Type:pdf, Size:1020Kb
CONTRIBUTED RESEARCH ARTICLES 35 rworldmap: A New R package for Mapping Global Data by Andy South There appears to be a gap in the market for free software tools that can be used across disci- Abstract rworldmap is a relatively new pack- plinary boundaries to produce innovative, publica- age available on CRAN for the mapping and vi- tion quality global visualisations. Within R there are sualisation of global data. The vision is to make great building blocks (particularly sp, maptools and the display of global data easier, to facilitate un- fields) for spatial data but users previously had to go derstanding and communication. The initial fo- through a number of steps if they wanted to produce cus is on data referenced by country or grid due world maps of their own data. Experience has shown to the frequency of use of such data in global as- that difficulties with linking data and creating classi- sessments. Tools to link data referenced by coun- fications, colour schemes and legends, currently con- try (either name or code) to a map, and then to strains researchers’ ability to view and display global display the map are provided as are functions to data. We aim to reduce that constraint to allow re- map global gridded data. Country and gridded searchers to spend more time on the more important functions accept the same arguments to specify issue of what they want to display. The vision for the nature of categories and colour and how leg- rworldmap is to produce a package to facilitate the ends are formatted. This package builds on the visualisation and mapping of global data. Because functionality of existing packages, particularly the focus is on global data, the package can be more sp, maptools and fields. Example code is pro- specialised than existing packages, making world vided to produce maps, to link with the pack- mapping easier, partly because it doesn’t have to deal ages classInt, RColorBrewer and ncdf, and to with detailed local maps. Through rworldmap we plot examples of publicly available country and aim to make it easy for R users to explore their global gridded data. data and also to produce publication quality figures from their outputs. Introduction Global datasets are becoming increasingly common rworldmap was partly inspired and largely and are frequently seen on the web, in journal pa- funded by the UK Natural Environment Research pers and in our newspapers (for example a ’carbon Council (NERC) program Quantifying Uncertainty atlas’ of global emissions available at http://image. in Earth System Science (QUEST). This program guardian.co.uk/sys-files/Guardian/documents/ brings together scientists from a wide range of dis- 2007/12/17/CARBON_ATLAS.pdf). At the same time ciplines including climate modellers, hydrologists there is a greater interest in global issues such as cli- and social scientists. It was apparent that while mate change, the global economy, and poverty (as researchers have common tools for visualisation for example outlined in the Millenium Development within disciplines, they tend to use different ones Goals, http://www.un.org/millenniumgoals/bkgd. across disciplines and that this limits the sharing shtml). Thirdly, there is an increasing availability of of data and methods necessary for truly interdis- software tools for visualising data in new and inter- ciplinary research. Within the project, climate and esting ways. Gapminder (http://www.gapminder. earth system modellers tended to use IDL, ecologists org) has pioneered making UN statistics more avail- ArcGIS, hydrologists and social scientists Matlab and able and intelligible using innovative visualisation fisheries scientists R. With the exception of R, these tools and Many Eyes (http://www-958.ibm.com/) software products cost thousands of pounds which provides a very impressive interface for sharing data acts as a considerable constraint on users being able and creating visualisations. to try out techniques used by collaborators. This World maps have become so common that they high cost and learning curve of adopting new soft- have even attracted satire. The Onion’s Atlas of ware tools hinders the sharing of data and methods the Planet Earth (The Onion, 2007), contains a ’Bono between disciplines. To address this, part of the vi- Awareness’ world map representing ’the intensity sion for rworldmap was to develop a tool that can be with which artist Bono is aware of the plight and freely used and modified across a multi-disciplinary daily struggles of region’, with a categorisation rang- project, to facilitate the sharing of scripts, data and ing from ’has heard of nation once’ through ’moved outputs. Such freely available software offers greater enough by nations crisis to momentarily remove sun- opportunity for collaboration with research institutes glasses’ to ’cares about welfare of nation nearly as in developing countries that may not be able to af- much as his own’. ford expensive licenses. The R Journal Vol. 3/1, June 2011 ISSN 2073-4859 36 CONTRIBUTED RESEARCH ARTICLES rworldmap data inputs NetCDF is a data storage file format com- monly used by climate scientists and oceanogra- rworldmap consists of tools to visualise global data phers. NetCDF files can be multi-dimensional, e.g. and focuses on two types of data. Firstly, data that holding (x,y) data for multiple attributes over mul- are referenced by country codes or names and sec- tiple months, years, days etc. The package ncdf is ondly, data that are referenced on a grid. good for reading data from netCDF files. Country data rworldmap functionality There is a wealth of global country level data avail- rworldmap has three core functions outlined below able on the internet including UN population data, and others that are described later. and many global indices for, among others: Envi- 1. joinCountryData2Map() joins user country ronmental Performance, Global Hunger and Multi- data referenced by country names or codes to dimensional Poverty. a map to enable plotting Data are commonly referenced by country names mapCountryData() as these are the most easily recognised by users, 2. plots a map of country data but country names have the problem that vocabu- 3. mapGriddedData() plots a map of gridded data laries are not well conserved and many countries have a number of subtly different alternate names Joining country data to a map (e.g. Ivory Coast and Cote d’Ivoire, Laos and Peo- ple’s Democratic Republic of Lao). To address this To join the data to a map use joinCountryData2Map. problem there are ISO standard country codes of ei- You will need to specify the name of column contain- ther 2 letters, 3 letters or numeric, and also 2 letter ing your country identifiers (nameJoinColumn) and FIPS country codes, so there is still not one univer- the type of code used (joinCode) e.g. "ISO3" for ISO sally adopted standard. rworldmap supports all of 3 letter codes or "UN" for numeric country codes. these country codes and offers tools to help when data(countryExData) data are referenced by names or other identifiers. sPDF <- joinCountryData2Map( countryExData ,joinCode = "ISO3" ,nameJoinColumn = "ISO3V10") Gridded data This code outputs, to the R console, a summary of Global datasets are frequently spatially referenced how many countries are successfully joined. You can on a grid, because such gridded or raster data for- specify verbose=TRUE to get a full list of countries. mats offer advantages in efficiency of data storage The object returned (named sPDF in this case) is of and processing. Remotely sensed data and other val- type "SpatialPolygonsDataFrame" from the package ues calculated from it are most frequently available sp. This object is required for the next step, display- in gridded formats. These can include terrestrial or ing the map. marine data or both. If you only have country names rather than codes There are many gridded data formats, here I will in your data, use joinCode="NAME"; you can expect concentrate on two: ESRI GridAscii and netCDF. more mismatches due to the greater variation within ESRI GridAscii files are an efficient way of storing country names mentioned previously. To address and transferring gridded data. They are straightfor- this you can use the identifyCountries() function ward text files so can be opened by any text editor. described below, and change any country names in They have a short header defining the structure of your data that do not exactly match those in the in- the file (e.g. number, size and position of rows and ternal map. columns), followed by a single row specifying the value at each grid cell. Thus they use much less space Mapping country data than if the coordinates for each cell had to be speci- fied. To plot anything other than the default map, mapCountryData Example start of gridAscii file for a half degree requires an object of class "SpatialPolygonsDataFrame" global grid: and a specification of the name of the column containing the data to plot: ncols 720 data(countryExData) nrows 360 sPDF <- joinCountryData2Map( countryExData xllcorner -180 ,joinCode = "ISO3" yllcorner -90 ,nameJoinColumn = "ISO3V10") cellsize 0.5 mapDevice() #create world map shaped window NODATA_value -999 mapCountryData(sPDF -999 1 0 1 1 ... [all 259200 cell values] ,nameColumnToPlot='BIODIVERSITY') The R Journal Vol. 3/1, June 2011 ISSN 2073-4859 CONTRIBUTED RESEARCH ARTICLES 37 BIODIVERSITY "quantiles", "categorical", or a numeric vec- tor defining the breaks between categories. Works with the next argument (numCats) , al- though numCats is not used for "categorical", where the data are not modified, or if the user specifies a numeric vector, where the number of categories will be a result. • numCats specifies the favoured number of cat- 0.2 100 egories for the data. The number of categories used may be different if the number requested catMethod Figure 1: Output of mapCountryData() is not possible for the chosen (e.g.