Usage and demo of R package nCov2019

Installation Query the latest data Access detailed historical data Geographic map visualization Shiny Dashboard

Installation To start off, users could utilize the ‘remotes’ package to install it directly from GitHub by running the following in R: require('remotes') remotes::install_github("GuangchuangYu/nCov2019", dependencies = TRUE)

Query the latest data

To query the latest data, you can load it in with get_nCov2019(). By default, the language setting is automatically set to Chinese or English based on the user's system environment. Of course, users can also use parameter lang = 'zh' or lang = 'en' to set it explicitly.

Since most of confirmed cases concentrated in China, researchers may more concern about the details in China. So, print the object x, you could get the total number of confirmed cases in China. library('nCov2019') x <- get_nCov2019(lang = 'en') x

## China (total confirmed cases): 79968 ## last update: 2020-03-01 12:40:24

And then you could use summary(x) to get recent Chinese data. head(summary(x))

## confirm suspect dead heal nowConfirm nowSevere deadRate healRate date ## 1 41 0 1 0 0 0 2.4 0.0 01.13 ## 2 41 0 1 0 0 0 2.4 0.0 01.14 ## 3 41 0 2 5 0 0 4.9 12.2 01.15 ## 4 45 0 2 8 0 0 4.4 17.8 01.16 ## 5 62 0 2 12 0 0 3.2 19.4 01.17 ## 6 198 0 3 17 0 0 1.5 8.6 01.18

While no region is specified, x[] will return the provincial level outbreak statistics in China. head(x[])

## name confirm suspect dead deadRate showRate heal healRate showHeal ## 1 Hubei 66907 0 2761 4.13 FALSE 31187 46.61 TRUE ## 2 Guangdong 1349 0 7 0.52 FALSE 1009 74.80 TRUE ## 3 1272 0 22 1.73 FALSE 1185 93.16 TRUE ## 4 Zhejiang 1205 0 1 0.08 FALSE 1027 85.23 TRUE ## 5 Hunan 1018 0 4 0.39 FALSE 853 83.79 TRUE ## 6 990 0 6 0.61 FALSE 888 89.70 TRUE

To obtain a more granular scale data, you only need to specify the province name. For example, to obtain data in Hubei Province. head(x['Hubei'])

## name confirm suspect dead deadRate showRate heal healRate showHeal ## 1 49122 0 2195 4.47 FALSE 19227 39.14 TRUE ## 2 Xiaogan 3518 0 118 3.35 FALSE 2215 62.96 TRUE ## 3 Huanggang 2905 0 115 3.96 FALSE 2171 74.73 TRUE ## 4 Jingzhou 1579 0 46 2.91 FALSE 1034 65.48 TRUE ## 5 Ezhou 1391 0 47 3.38 FALSE 784 56.36 TRUE ## 6 Suizhou 1307 0 40 3.06 FALSE 835 63.89 TRUE

In addition, by using the argument by = 'today', the number of newly added cases will be return. head(x['Hubei', by = 'today'])

## name confirm confirmCuts isUpdated ## 1 Wuhan 565 0 TRUE ## 2 Xiaogan 0 0 TRUE ## 3 Huanggang 1 0 TRUE ## 4 Jingzhou 0 0 TRUE ## 5 Ezhou 1 0 TRUE ## 6 Suizhou 0 0 TRUE

Getting global data is also easy, by using x ['global'], the data frame for the global landscape view of each country will be returned. head(x['global']) ## name confirm suspect dead deadRate showRate heal healRate ## 1 China 79968 851 2873 3.59 FALSE 41675 52.11 ## 2 Republic of Korea 3150 0 13 0.41 FALSE 24 0.76 ## 3 Diamond Princess 705 0 6 0.85 FALSE 0 0.00 ## 4 Italy 653 0 17 2.60 FALSE 45 6.89 ## 5 Iran 270 0 26 9.63 FALSE 49 18.15 ## 6 Japan 240 0 5 2.08 FALSE 1 0.42

If you wanted to visualize the cumulative summary data, an example plot could be the following: require(ggplot2) ggplot(summary(x), aes(as.Date(date, "%m.%d"), as.numeric(confirm))) + geom_col(fill = 'firebrick') + theme_minimal(base_size = 14) + xlab(NULL) + ylab(NULL) + scale_x_date(date_labels = "%Y/%m/%d") + labs(caption = paste("accessed date:", time(x)))

And the bar-plot of the latest confirmed diagnosis in Anhui province could be plotted as follow: library(ggplot2) d = x['Anhui', ] # you can replace Anhui with any province d = d[order(d$confirm), ] ggplot(d, aes(name, as.numeric(confirm))) + geom_col(fill = 'firebrick') + theme_minimal(base_size = 14) + xlab(NULL) + ylab(NULL) + labs(caption = paste("accessed date:", time(x))) + scale_x_discrete(limits = d$name) + coord_flip()

Access detailed historical data

The method for accessing historical data is basically the same as getting the latest data, but entry function is load_nCov2019(). library('nCov2019') y <- load_nCov2019(lang = 'en') y # this will return update time of historical data

## nCov2019 historical data ## last update: 2020-02-29

For the historical data, currently, we maintain three historical data, one of which is collected and organized from GitHub repo, user will access it by default, or use load_nCov2019(source = 'github') to get it.

The second one is obtained from an Chinese website Dingxiangyuan and user could access it by using load_nCov2019(source = 'dxy'). And the last one is obtained from the National Health Commission of Chinese, user could get it by using argument source = 'cnnhc'. The forms of these data are basically the same, but the default data source has more comprehensive global historical information and also contains older historical data. Users can compare and switch data from different sources. # compare the total confirmed cases in china between data sources library(nCov2019) library(ggplot2) y = load_nCov2019(lang = 'en', source = 'github') dxy = load_nCov2019(lang = 'en', source = 'dxy') nhc = load_nCov2019(lang = 'en', source = 'cnnhc') dxy_china <- aggregate(cum_confirm ~ + time, summary(dxy), sum) y_china <- aggregate(cum_confirm ~ + time, summary(y), sum) nhc_china <- aggregate(cum_confirm ~ + time, summary(nhc), sum) dxy_china$source = 'DXY data' y_china$source = 'GitHub data' nhc_china$source = 'NHC data' df = rbind(dxy_china, y_china, nhc_china) ggplot(subset(df, time >= '2020-01-11'), aes(time,cum_confirm, color = source)) + geom_line() + scale_x_date(date_labels = "%Y-%m-%d") + ylab('Confirmed Cases in China') + xlab('Time') + theme_bw() + theme(axis.text.x = element_text(hjust = 1)) + theme(legend.position = 'bottom')

Then you can use summary(y) to get historical data at the provincial level in China: head(summary(y))

## time province cum_confirm cum_heal cum_dead suspected ## 1 2019-12-01 Hubei 1 0 0 0 ## 2 2019-12-02 Hubei 1 0 0 0 ## 3 2019-12-03 Hubei 1 0 0 0 ## 4 2019-12-04 Hubei 1 0 0 0 ## 5 2019-12-05 Hubei 1 0 0 0 ## 6 2019-12-06 Hubei 1 0 0 0

To get historical data for all cities in China, you can use y[] as follow: head(y[])

## time province city cum_confirm cum_heal cum_dead suspected ## 1 2019-12-01 Hubei Wuhan 1 0 0 0 ## 2 2019-12-02 Hubei Wuhan 1 0 0 0 ## 3 2019-12-03 Hubei Wuhan 1 0 0 0 ## 4 2019-12-04 Hubei Wuhan 1 0 0 0 ## 5 2019-12-05 Hubei Wuhan 1 0 0 0 ## 6 2019-12-06 Hubei Wuhan 1 0 0 0

You can also specify a province name to get the corresponding historical data, for example, extracting historical data from Anhui Province: head(y['Anhui'])

## time province city cum_confirm cum_heal cum_dead suspected ## 71 2020-01-21 Anhui 0 0 0 1 ## 108 2020-01-22 Anhui Hefei 1 0 0 3 ## 109 2020-01-22 Anhui Lu'an 0 0 0 1 ## 197 2020-01-23 Anhui Hefei 6 0 0 0 ## 198 2020-01-23 Anhui 1 0 0 0 ## 199 2020-01-23 Anhui 1 0 0 0

Similarly, you can get global historical data by specifying the 'global' parameter. y <- load_nCov2019(lang = 'en', source='github') head(y['global'])

## time country cum_confirm cum_heal cum_dead ## 1 2019-12-01 China 1 0 0 ## 2 2019-12-02 China 1 0 0 ## 3 2019-12-03 China 1 0 0 ## 4 2019-12-04 China 1 0 0 ## 5 2019-12-05 China 1 0 0 ## 6 2019-12-06 China 1 0 0

NOTE: The global historical data is not available from source 'dxy'.

Here are some visualization examples with the historical data. 1. Draw a curve reflecting the number of deaths, confirms, and cures in China. require('tidyr') require('ggrepel') require('ggplot2') y <- load_nCov2019(lang = 'en') d <- subset(y['global'], country == 'China') d <- gather(d, curve, count, -time, -country) ggplot(d, aes(time, count, color = curve)) + geom_point() + geom_line() + xlab(NULL) + ylab(NULL) + theme_bw() + theme(legend.position = "none") + geom_text_repel(aes(label = curve), data = d[d$time == time(y), ], hjust = 1) + theme(axis.text.x = element_text(angle = 15, hjust = 1)) + scale_x_date(date_labels = "%Y-%m-%d", limits = c(as.Date("2020-01-15"), as.Date("2020-03-01"))) + labs(title="Number of deaths, confirms, and cures in China")

2. Outbreak Trend Curves of Countries Around the World (except China). require('ggrepel') require('ggplot2') y <- load_nCov2019(lang = 'en') d <- subset(y['global'], country != "China") ggplot(d, aes(time, as.numeric(cum_confirm), group = country, color = country)) + geom_point() + geom_line() + geom_label_repel(aes(label = country), data = d[d$time == time(y), ], hjust = 1) + theme_bw() + theme(legend.position = 'none') + xlab(NULL) + ylab(NULL) + scale_x_date(date_labels = "%Y-%m-%d", limits = c(as.Date("2020-02-01"), as.Date("2020-03-01"))) + theme(axis.text.x = element_text(angle = 15, hjust = 1)) + labs(title = "Outbreak Trend Curves of Countries Around the World \n (except China)")

Outbreak Trend Curves of Countries Around the World (except China)

● 3000 Republic of Korea

2000

Italy Norway

United States USA ● Japan Croatia

Bahrain Germany Oman 1000 ● ● ● ● United Kingdom ● ● ● Thailand● ● Singapore Russia Spain ● ● ● ● ● ● ● Kuwait Iran Greece Canada Malaysia ● ● ●

● ● ●France ●Netherlands Switz● erland Vietnam Australia ● ● ● ● ● ● ● ● Austria ● Israel SwedenUnited ●Arab Emirates● India ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Iraq Nepal RepubNigerialika SevNeerAfghanistanLithuaniawCambodianaBelgiumSr Zealand QatariMak PhilippinesAzLankaDenmarRomaniaLebanonPerbaijanGeorgiaBelarEstoniaFinlandIcelandedonijaakistanMeAlgerEgyptBrxicoazilusiak

03 10 17 24 02 −02− −02− −02− −02− −03− 2020 2020 2020 2020 2020

3. Growth curve of confirms in Anhui Province, China. y <- load_nCov2019(lang = 'en') d <- y['Anhui'] ggplot(d, aes(time, as.numeric(cum_confirm), group = city, color = city)) + geom_point() + geom_line() + geom_label_repel(aes(label = city), data = d[d$time == time(y), ], hjust = 1) + theme_minimal(base_size = 14) + theme(legend.position = 'none') + scale_x_date(date_labels = "%Y-%m-%d") + xlab(NULL) + ylab(NULL) + theme(axis.text.x = element_text(hjust = 1)) + labs(title = "Growth curve of confirms in Anhui Province, China")

Growth curve of confirms in Anhui Province, China

Hefei ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Bengbu ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 150 ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Bozhou ● ● 100 ● ● ● ● ● ● ● ● Anqing ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Lu'an ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Ma' ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2020−02−01 2020−02−15 2020−03−01 4. A heatmap of epidemic situation around the world in the last 7 days. y <- load_nCov2019(lang = 'en') d <- y['global'] max_time <- max(d$time) min_time <- max_time - 7 d <- na.omit(d[d$time >= min_time & d$time <= max_time,]) dd <- d[d$time == max(d$time, na.rm = TRUE),] d$country <- factor(d$country, levels=dd$country[order(dd$cum_confirm)]) breaks = c(10, 100, 1000, 10000) ggplot(d, aes(time, country)) + geom_tile(aes(fill = cum_confirm), color = 'black') + scale_fill_viridis_c(trans = 'log', breaks = breaks, labels = breaks) + xlab(NULL) + ylab(NULL) + scale_x_date(date_labels = "%Y-%m-%d") + theme_minimal()

Geographic map visualization We provide a built-in and convenient geographic map visualization function with nCov2019 package. Getting a plot of the world map is really simple. Just in a few lines as follow: require('maps') x = get_nCov2019(lang = 'en') plot(x)

Combined with chinamap R package, you can draw more detailed maps in China. For example, we can slightly modify the code above to better display China region. remotes::install_github("GuangchuangYu/chinamap") library(chinamap) x = get_nCov2019(lang = 'en') cn = get_map_china() cn$province <- trans_province(cn$province) plot(x, chinamap = cn, palette = "Reds")

Note: The cn$province should be translated by using trans_province() while in English language environment.

We provide several parameters to adjust the final effect of the map, including font.size to adjust label size, continuous_scale = FALSE to set a discrete color scale (continuous color scale at log space was used by default), and palette' to adjust the color palette (e.g. palette = 'blue' for setting color from dark blue to light blue).

With the argument region , User could plot the map focus on specific country. For example, plot(x, region = 'South Korea') will plot the map with confirmed cases number in south Korea. And plot(x, region = 'Japan') will plot the map of Japan (excluding cases in Diamond Princess).

To get a closer look at what the situation is in China, please add the argument region = 'china' and chinamap as follow: x <- get_nCov2019(lang = 'en') cn = get_map_china() cn$province <- trans_province(cn$province) plot(x, region = 'china', chinamap = cn, continuous_scale = FALSE, palette = 'Blues', font.size = 2)

Plotting data on selected geographical region is also supported if the GIS file was available. For example, you can get a map of Hubei province in China by preparing a GIS file: require('sf') x = get_nCov2019(lang = 'en') m = sf::st_read("PATH/TO_GIS_file.shp") m$NAME <- trans_city(m$NAME) plot(x, region = 'Hubei', chinamap = m) 2019nCov confirmed cases: 66907

33

Shiyan

32 Xiangyang Suizhou confirm Shennongjia 10000

Jingmen Xiaogan 1000 31 Yichang 100 Tianmen Wuhan Huanggang Qianjiang Enshi Xiantao Ezhou Jingzhou 30 Huangshi Xianning

29

108 110 112 114 116 accessed date: 2020−03−01 12:40:24

Also, you can plot a map on specific date: y <- load_nCov2019(lang = 'en') cn = get_map_china() cn$province <- trans_province(cn$province) plot(y, region = 'china', continuous_scale = FALSE, chinamap = cn, date = '2020-02-01', font.size = 2) 2019nCov confirmed cases: 14408

50

Heilongjiang

Xinjiang Jilin

Inner Mongolia Liaoning

40 confirm Hebei Ningxia Shanxi <10 Qinghai Shandong Gansu Henan 10−100 Shaanxi

AnhuiJiangsu 100−500 Tibet Sichuan Hubei Zhejiang 30 500−1000 Jiangxi Hunan Guizhou 1000−10000 Fujian Yunnan Taiwan

Guangxi Guangdong HongMacau Kong 20 Hainan

10

80 100 120 accessed date: 2020−02−01

A more informative application is to draw dynamic geographic maps at multiple time points and save it as gif file. Users can easily do that, just need to specify the date with arguments from and to. Other useful parameters includes: width and height to specify animation figure size, and filename to specify the file name to save.

The complete codes for plotting historical maps of world, China, and provinces in China are as follows: require(nCov2019) from = "2020-02-18" to = "2020-03-02" y <- load_nCov2019(lang = 'en') # To generate a historical world map; # with default figure size and save with default filename # the gif file will be saved in current working directory plot(y, from = from, to = to)

# To generate a historical map of China # and save as "china.gif": require(chinamap) cn = get_map_china() cn$province <- trans_province(cn$province) plot(y, region="china", chinamap=cn, from=from, to=to, filename='china.gif')

# Specify figure width and height is also available, # file “cn_city_map.rds” contains map data of cities in China; # it could be found in the current working directory; # after user could run “dashboard(remote = FALSE)” shijie = readRDS("path/to/cn_city_map.rds") shijie$NAME = trans_city(shijie$NAME) plot(y, region="Hubei", chinamap=shijie, width = 600, height = 600, fro m=from, to=to)

Shiny Dashboard

Sometimes users want to know the situation directly without taking time to code. We provide a helpful Shiny dashboard, both online web and local version, users can choose between them according to needs. By using dashboard(lang = 'en', remote = TRUE) , an English website of online dashboard will be open, and remote = FALSE will run it on local machine. The online version is usually more convenient, because the first run of the local version needs to download map information file, which may take some time, depending on your Internet speed.