Supplementary document #1: Use the nCov2019 R package to obtain latest and historical data on the coronavirus outbreak

Tianzhi Wu1, Xijin Ge2,*, Guangchuang Yu1,* 1Department of Bioinformatics, School of Basic Medical Science, Southern Medical University, , 2Department of Mathematics and Statistics, South Dakota State University, Brookings, SD 57007, USA. Corresponding: Guangchuang Yu ([email protected]); Xijin Ge ([email protected])

Part I— Quickstart You can easily install the package by running the following in R:

Install.packages("remotes") remotes::install_github("GuangchuangYu/nCov2019") The geographical name is Chinese by default, and users can use the argument lang = 'en' as in this example to set the name in English. To get the latest data, you can load it in with get_nCov2019(); print the object x, it will show the total number of confirmed cases in China: library('nCov2019') x <- get_nCov2019(lang='en') x

## China (total confirmed cases): 77269 ## last update: 2020-02-24 23:58:45

Note: if you would like to use the Chinese version, you can remove the lang='en' argument.

To look at the global data, you can run the code x['global', ]: head(x['global', ])

## name confirm suspect dead deadRate showRate heal healRate ## 1 China 77269 3434 2596 3.36 FALSE 24854 32.17 ## 2 Republic of Korea 833 0 8 0.96 FALSE 12 1.44 ## 3 Diamond Princess 691 0 3 0.43 FALSE 0 0.00 ## 4 Italy 230 0 6 2.61 FALSE 2 0.87 ## 5 Janpan 159 0 1 0.63 FALSE 1 0.63 ## 6 Singapore 90 0 0 0.00 FALSE 53 58.89

1

## showHeal ## 1 TRUE ## 2 TRUE ## 3 TRUE ## 4 TRUE ## 5 TRUE ## 6 TRUE

And view provincial data within China easily with x[]. head(x[])

## name confirm suspect dead deadRate showRate heal healRate showHeal ## 1 64287 0 2495 3.88 FALSE 16738 26.04 TRUE ## 2 Guangdong 1345 0 6 0.45 FALSE 772 57.40 TRUE ## 3 Henan 1271 0 19 1.49 FALSE 930 73.17 TRUE ## 4 Zhejiang 1205 0 1 0.08 FALSE 765 63.49 TRUE ## 5 1016 0 4 0.39 FALSE 721 70.96 TRUE ## 6 989 0 6 0.61 FALSE 692 69.97 TRUE Or get the data of specific province head(x['Hubei', ]) # you can replace Hubei with any province

## name confirm suspect dead deadRate showRate heal healRate showHeal ## 1 46607 0 1987 4.26 FALSE 8946 19.19 TRUE ## 2 3465 0 108 3.12 FALSE 1177 33.97 TRUE ## 3 2904 0 103 3.55 FALSE 1659 57.13 TRUE ## 4 1574 0 42 2.67 FALSE 734 46.63 TRUE ## 5 1383 0 40 2.89 FALSE 465 33.62 TRUE ## 6 1303 0 33 2.53 FALSE 551 42.29 TRUE

Use the argument by='today' to view the number of newly added cases head(x['Hubei', by='today'][,c(1,2,4)])

## name confirm isUpdated ## 1 Wuhan 0 FALSE ## 2 Xiaogan 0 FALSE ## 3 Huanggang 0 FALSE ## 4 Jingzhou 0 FALSE ## 5 Ezhou 0 FALSE ## 6 Suizhou 0 FALSE

# And the below command is the same head(x['Hubei', by='today'][,c(1,2,4)])

## name confirm isUpdated ## 1 Wuhan 0 FALSE ## 2 Xiaogan 0 FALSE ## 3 Huanggang 0 FALSE ## 4 Jingzhou 0 FALSE

2

## 5 Ezhou 0 FALSE ## 6 Suizhou 0 FALSE

Another method to extract information is the summary function. It outputs daily statistics of cumulative (default) or newly (using the parameter by = "today") cases in China. Such information is useful to draw a daily growth curve. See supplementary document 1 for more detailed tutorial. The daily data output by summary function is only available for the whole country of China. Getting daily data of each province and city is described in the following session.

To get a cumulative summary of the daily data, you could use the summary function on x . head(summary(x))

## confirm suspect dead heal nowConfirm nowSevere deadRate healRate date ## 1 41 0 1 0 0 0 2.4 0.0 01.13 ## 2 41 0 1 0 0 0 2.4 0.0 01.14 ## 3 41 0 2 5 0 0 4.9 12.2 01.15 ## 4 45 0 2 8 0 0 4.4 17.8 01.16 ## 5 62 0 2 12 0 0 3.2 19.4 01.17 ## 6 198 0 3 17 0 0 1.5 8.6 01.18

Similarly, if you wanted to view the new daily cases, you could use the by="today" argument: head(summary(x, by="today"))

## confirm suspect dead heal deadRate healRate date ## 1 77 27 0 0 0.0 0.0 01.20 ## 2 149 53 3 0 2.0 0.0 01.21 ## 3 131 257 8 0 6.1 0.0 01.22 ## 4 259 680 8 6 3.1 2.3 01.23 ## 5 444 1118 16 3 3.6 0.7 01.24 ## 6 688 1309 15 11 2.2 1.6 01.25

Here is A quick visulization example for Anhui province in China. library(forcats) library(ggplot2) d = x['Anhui',] # you can replace Anhui with any province d$confirm=as.numeric(d$confirm) d$name = fct_reorder(d$name, d$confirm) ggplot(d, aes(name, confirm)) + geom_col(fill='steelblue') + coord_flip() + geom_text(aes(y = confirm+2, label=confirm), hjust=0) + theme_minimal(base_size=14) + scale_y_continuous(expand=c(0,10)) + xlab(NULL) + ylab(NULL)

3

If you wanted to visualize the cumulative summary data, an example plot could be the following: ggplot(summary(x), aes(as.Date(date, "%m.%d"), as.numeric(confirm))) + geom_col(fill='firebrick') + theme_minimal(base_size = 14) + xlab(NULL) + ylab(NULL) + labs(caption = paste("accessed date:", time(x)))

Part II — Historical Data

You can use load_nCov2019() to access detailed historical data. x <- load_nCov2019(lang='en') head(x[])

## time province city cum_confirm cum_heal cum_dead ## 1 2020-01-25 Guangxi Wuzhou 1 0 0 ## 2 2020-01-25 Hubei Enshi 11 0 0

4

## 3 2020-01-25 Guangdong Shaoguan 3 0 0 ## 4 2020-01-25 Hechuan 2 0 0 ## 5 2020-01-25 Chongqing 2 0 0 ## 6 2020-01-25 Chongqing 1 0 0 ## suspected ## 1 0 ## 2 0 ## 3 0 ## 4 0 ## 5 0 ## 6 0

Similar to get_nCov2019(), you are also able to call the summary() function on the data. head(summary(x))

## time province cum_confirm ## 1 2020-01-25 Guangxi 23 ## 2 2020-01-25 Hubei 761 ## 3 2020-01-25 Guangdong 78 ## 4 2020-01-25 Chongqing 57 ## 12 2020-01-25 Hunan 43 ## 33 2020-01-25 Zhejiang 62 And you can get historical provincial & City Details in China: head(x['Hubei'])

## time province city cum_confirm cum_heal cum_dead suspected ## 2 2020-01-25 Hubei Enshi 11 0 0 0 ## 107 2020-01-25 Hubei Jingzhou 10 0 0 0 ## 109 2020-01-25 Hubei Suizhou 5 0 0 0 ## 167 2020-01-25 Hubei Wuhan 572 32 38 0 ## 168 2020-01-25 Hubei Huanggang 64 0 0 0 ## 169 2020-01-25 Hubei 31 0 1 0 You can then use this data for visualizations: library(ggplot2) ggplot(summary(x, 'Hubei'), aes(time, as.numeric(cum_confirm))) + geom_col()

5

library(ggrepel) d <- x['Hubei'] ggplot(d, aes(time, as.numeric(cum_confirm), group=city, color=city)) + geom_point() + geom_line() + geom_text_repel(aes(label=city), data=d[d$time == time(x), ], hjust=1 ) + theme_minimal(base_size = 14) + theme(legend.position='none') + xlab(NULL) + ylab(NULL)

## Warning: Removed 1 rows containing missing values (geom_text_repel).

6

ggplot(subset(x['Hubei'], city='Huanggan'), aes(time, as.numeric(cum_co nfirm))) + geom_col()

Part III— Map Plotting We provide several parameters to adjust the final effect of the map, including font.size to adjust label size, continuous_scale = FALSE to a discrete color scale (continuous color scale at log space was used by default), and palette to adjust the color palette (.g., palette='blue' for setting color from dark blue to light blue).

Getting a plot of the world map is really simple. There are only three lines needed: require(nCov2019) x = get_nCov2019(lang='en') plot(x)

7

And with more detailed in China region: remotes::install_github("GuangchuangYu/chinamap") require(chinamap) cn = get_map_china() cn$province <- trans_province(cn$province) plot(x, chinamap=cn, palette="Purples")

Note: The provinces must be translated by using trans_province() while in English version.

To get a closer look at what the situation is in China, add a region = 'china' argument to the plot x <- get_nCov2019(lang='en') cn = get_map_china() cn$province <- trans_province(cn$province) plot(x, region='china', chinamap=cn, continuous_scale=FALSE, palette='Blues', font.size = 2)

8

To get a map for a specific province in in China x = get_nCov2019(lang='en') m = sf::st_read("PATH/TO/GIS file.shp") plot(x, region='Hubei', chinamap=m)

In addition, you can draw historical maps on specific date

9

y <- load_nCov2019(lang = 'en') cn = get_map_china() cn$province <- trans_province(cn$province) plot(y, region='china', chinamap=cn, date='2020-02-01', font.size=2)

10