<<

2020 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud , Sustainable Computing & Communications, & Networking (ISPA/BDCloud/SocialCom/SustainCom)

Temporal Data Analytics on COVID-19 Data with Ubiquitous Computing

Yubo Chen, Carson K. Leung( ), Siyuan Shang, Qi Wen Department of University of Manitoba Winnipeg, MB, Canada Email: [email protected]

Abstract— With technological advancements in computing and rich data sources at a rapid rate. These big data can be of communications, huge amounts of big data are generated and different levels of veracity (e.g., precise data, imprecise and collected at a very rapid rate from a wide variety of rich data uncertain data [8-10]). Examples of big data include: sources. Embedded in these big data are useful information and valuable knowledge. An example is healthcare and • communication network data [11-13], epidemiological data such as data related to patients who suffered • financial time series [14-16], from viral diseases like the coronavirus disease 2019 (COVID-19). Knowledge discovered from these epidemiological data via data • omic data (e.g., genomic data) [17, 18], science helps researchers, epidemiologists and policy makers to get a better understanding of the disease, which may inspire them to • social network data [19-24], come up ways to detect, control and combat the disease. In this • paper, we present a temporal data science for analyzing transportation data [25-28], big COVID-19 epidemiological data, with focus on the temporal • disease reports [29-32], as well as data analytics with ubiquitous computing. The algorithm helps users to get a better understanding of information about the • epidemiological data and . confirmed cases of COVID-19. Evaluation results show the benefits of our system in temporal data analytics of big COVID-19 Useful information and valuable knowledge is usually data with ubiquitous computing. Although the algorithm is embedded in these big data. This calls for data science [33, 34], designed for temporal data analytics of big epidemiological data, which aims to discover knowledge from these big data via data it would be applicable to other temporal data analytics of big [35-37], tools [38-41], in many real-life applications and services. mathematical and statistical models [42], informatics [43], data analytics, and visual analytics. The discovered knowledge is Keywords—data science, coronavirus disease, COVID-19, big useful as it can significantly improve the quality of human life. data, temporal data, data mining, ubiquitous computing For instance, knowledge discovered from these epidemiological data helps researchers, epidemiologists and policy makers to get I. INTRODUCTION a better understanding of the disease, which may inspire them to Over the past decades, technologies of computing and come up ways to detect, prevent, and/or control diseases such as communications have been evolved and advanced. Ubiquitous viral diseases. Examples of viral diseases include: computing and communications [1, 2] provide pervasive and • reliable computing solutions and communication services [3, 4] severe acute respiratory syndrome (SARS), with anytime and anywhere. For instance, citizens of many countries outbreak in 2002–2004; have been using of things (IoT)—such as mobile • Middle East respiratory syndrome (MERS), with phones, wearable devices (e.g., smartwatches) and/or other outbreak in 2012–2015; and devices—for contact tracing, which helps to identify person who may have come into contact with an infected person. • coronavirus disease 2019 (COVID-19), with outbreak Examples include contact tracing apps [5-7] for monitoring the started in 2019 and became pandemic in 2020 spread, as well as notifying the exposure of, the coronavirus 1 Due to the COVID-19 pandemic, many researchers have disease 2019 (COVID-19) such as (a) COVID Alert app in focused on different aspects of the COVID-19 disease. These Canada and (b) the National Health Service (NHS) COVID-19 2 include clinical and treatment information [44, 45], as well as app in the UK. drug discovery [17, 46], related on research medical and health Consequently, big data are everywhere. Huge amounts of sciences. In contrast, as computer scientists, we focus on other data are easily generated and collected from a wide variety of aspects of COVID-19 data—namely, epidemiological data.

1 https://www.canada.ca/en/public-health/services/diseases/coronavirus-disease-covid-19/covid-alert.html 2 https://www.nhs.uk/apps-library/nhs-covid-19/

978-0-7381-3199-3/20/$31.00 ©2020 IEEE 958 DOI 10.1109/ISPA-BDCloud-SocialCom-SustainCom51426.2020.00146 Many existing works on the COVID-19 epidemiological Our key contributions of this paper include our design and data focused on showing the numbers of confirmed cases and development of a data science algorithm for temporal data mortality temporally. In other words, they show temporal analytics of COVID-19 epidemiological data. Our algorithm: differences among weeks or days along the timeline—e.g., to • show the effects of public health strategies and mitigation discovers frequently co-occurring characteristics (e.g., techniques such as social/physical distancing, stay-at-home common sets of symptoms) of COVID-19 cases; orders, and lockdowns in “flattening the (epidemic) curve”. • compares and contrast among different time intervals; Based on the temporal numbers of confirmed cases and and mortality, related parameters—such as cumulative total numbers of cases/deaths, cumulative numbers of cases/ deaths • reveals temporal trends on COVID-19 cases. per thousand (or million) inhabitants, 7-day average numbers of Our algorithm helps users (e.g., researchers, epidemiologists and cases/deaths, etc.—can be derived. policy makers) to get a better understanding of information While the numbers of confirmed cases and mortality are about the confirmed cases of COVID-19. This, in turns, may important in showing the severity of the disease at a specific inspire them to come up ways to detect, control and combat the time or time interval, there are other important knowledge that disease. Moreover, despite that this algorithm is designed for can be discovered from the epidemiological data for revealing temporal analytics of big epidemiological data, it is applicable additional information associated with the disease. For instance, to temporal analytics of other big data in many real-life knowing that more confirmed cases and mortality reported today applications and services. when compared with yesterday indicates the severity of the The remainder of this paper is organized as follows. Next COVID-19 situations in Canada. However, these numbers do section discusses some background and related work. Section III not reveal information such as: presents our data science algorithm. Section IV shows • How does the common transmission method changes evaluation results, and Section V draws the conclusions. over time (e.g., from international travel exposures to II. BACKGROUND AND ELATED ORKS domestic acquisitions)? R W A. COVID-19 Research • Are there any changes in the set symptoms shown by patients over time? Because of the COVID-19 pandemic, many researchers have explored on different aspects of the COVID-19 disease. These • How do characteristics of the patients change over time? led to numerous works on COVID-19. Examples include: In this paper, we present temporal data science algorithm for • systematic reviews on literature about medical and health analyzing big COVID-19 epidemiological data. We focus on the science research on COVID-19 [52, 53] temporal data analytics with ubiquitous computing. The algorithm aims to discover additional information associated • clinical and treatment information [44, 45], as well as with the disease from the epidemiological data. The algorithm drug discovery and vaccine development [17, 46], which collects a wide variety of data—such as (a) administrative focus more on the medical and health science aspects information, (b) case details, (c) symptoms, (d) clinical course • crisis management for the COVID-19 outbreak [54], and outcomes, (e) exposures, etc.—from a different data which focuses more on the social science aspects sources. With the increasing number of cases in Canada (and around the world), these data are big and updated frequently. • (AI)-driven informatics, sensing, Due to the nature of the data, it is not unusual to have different imaging for tracking, testing, diagnosis, treatment and levels of veracity—i.e., with known values for some of the prognosis [55] such as those imaging-based diagnosis of attributes (e.g., known hospitalization status like “hospitalized COVID-19 using chest computed tomography (CT) and admitted to the intensive care unit (ICU)”) and images [56, 57] unknown/NULL values for some others (e.g., unstated transmission methods of disease). Moreover, some data are quite • mathematical modelling of the spread of COVID-19 [58] detailed (e.g., “on January 23, a 56-year old male presented to In contrast, the current paper focuses more on natural Sunnybrook Health Sciences Centre in Toronto with a new onset sciences and engineering aspects—especially, takes on a more of fever and non-productive cough following return from computational favor. Moreover, our designed and developed Wuhan, China, the day prior” [47]). Some other data are more data science algorithm examines textual-based COVID-19 abstract and general (e.g., “on Week 3—i.e., the third full epidemiological data (rather than images). Instead of projecting week—of 2020, a male in his 50s—who was transmitted the spread of the disease, our algorithm discovers common through international travel—in the province of Ontario showed characteristics among COVID-19 cases in a certain time symptoms of fever and cough”), for preserving the privacy [48- interval, and compares them with those in another time interval. 51] of the individuals. Knowledge discovered from these The discovered knowledge helps users to get a better epidemiological data via data science helps researchers, understanding of information about the confirmed cases of epidemiologists and policy makers to get a better understanding COVID-19. Although this algorithm is designed for temporal of the disease, which may inspire them to come up ways to analytics of big epidemiological data, it would be applicable to detect, control and combat the disease. temporal analytics of other big data in many real-life applications and services.

959 B. Confirmed Cases and Mortality been close to 60 million COVID-19 cases, out of which Many existing works on the COVID-19 epidemiological 1.4 million have lost their lives, worldwide. Throughout the data focused on reporting simply the numbers of confirmed 11-month period, the highest numbers occurred on November cases and mortality temporally. They highlight the temporal 21 (with 679,671 new daily cases) and November 25 (with trends and/or effectiveness of different public health strategies 12,158 new daily deaths). and mitigation techniques—such as social/physical distancing, Similarly, Figs. 3 & 4 respectively show the daily and the stay-at-home orders, and/or lockdown—help “flattening the cumulative numbers of active cases, recoveries and deaths in (epidemic) curve”. Examples of these works include data and Canada based on the Government of Canada data captured by dashboards reported by organizations like: Wikipedia. It is also sad to observe from these figures that, as of • World Health Organization (WHO) [59]; November 25, 2020, there have been close to 350 thousand COVID-19 cases in Canada. Out of these cases, more than • Center for Systems Science and Engineering (CSSE) at 11 thousand have lost their lives but 277 thousand have Johns Hopkins University (JHU)3; recovered, leading to less than 59 thousand active cases. Throughout the 11-month period, the highest numbers occurred • European Center for Disease Prevention and Control 4 on November 23 (with 5,886 new daily cases), May 01 (with (ECDC) ; 207 actual new daily deaths), and May 31 (with 222 newly • governments (e.g., Government of Canada5); as well as reported deaths, out of which 165 were the result of catching up on those passed away before May 23). • major news channels/media/networks (e.g., newspaper, 6 7 New daily cases & deaths in Canada TV ) and Wikipedia . 6 new cases New daily cases & deaths around the world 5 700 new deaths new cases Thousands 4 600 new deaths 500 3 Thousands 400 2 300 1 200 100 0 0 11-Jul 25-Jul 25-Jan 13-Jun 27-Jun 03-Oct 17-Oct 31-Oct 04-Apr 18-Apr 08-Feb 22-Feb 05-Sep 19-Sep 08-Aug 22-Aug 14-Nov 07-Mar 21-Mar 02-May 16-May 30-May 28-Jul-2020 30-Jan-2020 28-Jun-2020 26-Oct-2020 29-Apr-2020 29-Feb-2020 26-Sep-2020 31-Dec-2019 27-Aug-2020 25-Nov-2020 30-Mar-2020 29-May-2020 Fig. 3. A line graph showing the daily new confimed (a) cases and (b) deaths in Canada from January 25 (for the first confrimed case in Canada) to Fig. 1. A line graph showing the daily new confimed (a) cases and (b) deaths November 25, 2020. worldwide from December 31, 2019 (for the first report of COVID-19) to November 25, 2020. Cumulative cases in Canada (with their breakdown) 350 active cases cml recoveries cml deaths Cumulative cases (including cumulative deaths) around the world 300 60 cum cases cum deaths

Thousands 250 50

Millions 200 40 150 30 100 20 50 10 0 0 11-Jul 25-Jul 25-Jan 13-Jun 27-Jun 03-Oct 17-Oct 31-Oct 04-Apr 18-Apr 05-Sep 19-Sep 08-Feb 22-Feb 08-Aug 22-Aug 14-Nov 07-Mar 21-Mar 02-May 16-May 30-May 28-Jul-2020 30-Jan-2020 28-Jun-2020 26-Oct-2020 29-Apr-2020 29-Feb-2020 26-Sep-2020 31-Dec-2019 27-Aug-2020 25-Nov-2020 30-Mar-2020 29-May-2020 Fig. 4. A stacked area under curve showing the cumulative cases in Canada and their breakdown—i.e., (a) cumulative active cases, (b) cumulative Fig. 2. A stacked area under curve showing the cumulative cases worldwide— recoveries and (c) cumulative deaths—from from January 25 to November 25, which include cumulative deaths—from from December 31, 2019 to November 2020. 25, 2020. These numbers of confirmed cases and mortality are Temporal information (e.g., daily or cumulative numbers of important in showing the severity of the disease at a specific new cases, confirmed cases and deaths) is usually represented time or time interval. However, it is equally important to explore by line graphs, column charts (or stacked column charts), and and discover other useful knowledge from the epidemiological areas under curve (or stacked areas under curve). See Figs. 1 & data because the discovered knowledge can reveal useful 2, which capture the daily & cumulative numbers of cases and information (e.g., some characteristics of COVID-19 cases) deaths worldwide based on the ECDC data. It is sad to observe associated with the disease. This, in turn, helps users to get a from these figures that, as of November 25, 2020, there have

3 https://coronavirus.jhu.edu/map.html 4 https://qap.ecdc.europa.eu/public/extensions/COVID-19/COVID-19.html, https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide 5 https://www.canada.ca/en/public-health/services/diseases/2019-novel-coronavirus-infection.html 6 https://newsinteractives.cbc.ca/coronavirustracker/ 7 https://en.wikipedia.org/wiki/COVID-19_pandemic_in_Canada, https://en.wikipedia.org/wiki/Template:COVID-19_pandemic_data/Canada_medical_cases

960 better understanding on characteristics of the confirmed cases of • clinical course and outcomes, which include: COVID-19 (rather than just the numbers of cases). o hospital status (e.g., hospitalized in the III. OUR DATA SCIENCE ALGORITHM FOR TEMPORAL DATA intensive care unit (ICU), non-ICU ANALYTICS hospitalized, not hospitalized). In this section, we describe our data science algorithm for o For recovered case, it also includes additional temporal data analytics of COVID-19 epidemiological data. information such as the recovery day. A. Collection and Integration of Data o For the case who has not recovered, it Big COVID-19 epidemiological data can be of a wide indicates that the case died while infected by variety (e.g., different types of data). They are usually generated COVID-19. and collected from various data sources. • exposures, which include transmission methods. As a concrete example, in Canada, health care is a B. Preprocessing of Data responsibility of provincial governments. So, Canadian COVID- 19 epidemiological data are gathered from each province (or After collecting and integrating data from heterogeneous territory), and provincial data are obtained from health regions sources, we observe that there are some missing, unstated or (which are also known as health authorities) within the unknown information (i.e., NULL values). Given the nature of province. For instance, in the province of Manitoba, COVID-19 these COVID-19 cases (e.g., for timely reporting of cases, data can be gathered from Winnipeg Regional Health Authority privacy-preservation of the identity of cases), it is not unusual to (WRHA) and four other health authorities8. Similarly, data for have NULL values because values may not be available or the province of British Columbia (BC) can be gathered from five recorded. For some other attributes related to case details (e.g., health authorities such as Vancouver Coastal Health (VCH), personal information like gender, age), patients may prefer not which obtained data from 14 local health areas (LHA) within the to report it due the privacy concerns. As there are many cases three health service delivery areas (HSDA) in the VCH. In BC, with NULL values for some attributes, ignoring them may lead there are 88 HSDA within the 16 LHA among the five health to inaccurate or incomplete analysis of the data. Instead, our authorities 9 . As a third example, data from the province of algorithm keeps all these cases for analysis. Ontario can be gathered from public health units within the For some attributes (e.g., date), it would be too specific for provincial 14 local health integration networks (LHIN)10. the analysis. Moreover, delays in testing or reporting (especially, In terms of data types, COVID-19 epidemiological data due to weekends) are not uncommon. Hence, it would also be usually include: logical to group days into a 7-day interval---i.e., a week. For example, all days within the week of January 19-25 inclusive are • administrative information, which includes: considered as Week 3. Side-benefits of such grouping include: o an unique privacy-preserving identifier for • Summing the frequency of cases over a week (cf. a single each case, day) increases the chance of having sufficient frequency for being discovered as a frequent pattern and getting o its location, and statistically significant mining results. o episode day (i.e., symptom onset day or its • Generalizing the cases help preserve the privacy of the closest day). individuals while maintaining the utility for knowledge • case details, which include: discovery. o gender, Similarly, for some attributes (e.g., age, occupation), it would be logical to group similar values into a mega-value (say, ages can o age, and be binned into age groups). For example: o specific occupation of the cases. • grouping ages to age groups (e.g., ” 19 years old, • symptom-related data, which include additional 20-29 years old, ..., 70-79 years old, • 80 years old); information for the case who is not asymptomatic (i.e., • generalizing specific occupation of the cases to some symptomatic case) such as: generalized key occupation groups—say, (a) health care o onset day of symptoms, and workers, (b) school or daycare workers, (c) long-term care residents, and (d) others; o a collection of symptoms (e.g., cough, fever, chills, sore throat, runny nose, shortness of • generalizing specific transmission methods to some breath, nausea, headache, weakness, pain, generalized key transmission methods—say, irritability, diarrhea, and other symptoms). (a) community exposures, (b) travel exposures, and (c) others.

8 https://www.gov.mb.ca/health/rha/ 9 https://www2.gov.bc.ca/gov/content/data/geographic-data-services/land-use/administrative-boundaries/health-boundaries 10 http://www.lhins.on.ca/

961 C. Mining of Frequent Patterns and Contrast Patterns 2. A generalized region/location To discover frequently co-occurring characteristics of 3. Episode week (or onset week of symptoms): From COVID-19 cases, we apply frequent patterns to COVID-19 Week 3 (i.e., week of January 12-18, 2020) to now epidemiological data for each week. As data for each week is disjoint, our algorithm makes good use of ubiquitous computing 4. Gender to mine the each of these disjoint data set independently in 5. Age group: ” 19, 20s, 30s, 40s, 50s, 60s, 70s, and • 80s. parallel. It then returns frequent patterns discovered from these parallel units to the users. 6. Occupation group, including: It may due to the timely reporting of cases, symptoms were a) health care worker, unstated for many cases (i.e., many NULL values for b) school or daycare worker (or attendee), symptoms). As such, the frequency of the symptoms may be lower than values for some other attributes (e.g., domestic c) long-term care resident, and acquisition as a transmission method). However, it is d) other occupation. scientifically important to know which symptoms—among more than 12 different symptoms—co-occurred more frequently 7. Asymptomatic: Yes and No than others. As such, our algorithm provides users with flexible to express their preference or interests. For example, the users 8. Set of 13 symptoms, including cough, fever, chills, sore can express their interest in finding frequent patterns containing throat, runny nose, shortness of breath, nausea, at least one symptoms. As another example, the users can also headache, weakness, pain, irritability, diarrhea, and express their interest in finding frequent patterns consisting of other symptoms. only symptoms. 9. Hospital status, including: In addition to finding frequent patterns from each week, our a) hospitalized in the ICU, algorithm also compares frequent patterns among weeks. If patterns discovered from two consecutive weeks are similar, the b) hospitalized but not in the ICU, and two consecutive weeks can be merged when reporting the c) not hospitalized. results—namely, frequent patterns—to avoid repetition. 10. Transmission method, including: However, if discovered patterns are different, our algorithm compares the ranking of the discovered patterns. To elaborate, a) community exposures, and if a pattern P appears in both weeks, then P is frequent and the b) travel exposures. algorithm reports the uptrend or downtrend of P. Otherwise, P is frequent in one week (say, Week W1) but not another (say, 11. Clinical outcome: Recovered and death Week W2). The algorithm then considers P as a candidate and 12. Recovery week computes the frequency of P (which is infrequent) in Week W2 by scanning the weekly data for Week W2. As of November 12, 2020, the dataset has captured 209,811 COVID-19 cases in Canada. Among them, 190,108 Along this direction, our algorithm can easily look up cases with stated episode week. Moreover, although the first frequency of any pattern P that is frequent in at least one of the Canadian case occurred in Week 3, there were not more than weeks. For the week when P is infrequent, it can consider P as two new daily cases for following few weeks. To preserve a candidate and computes its frequency by scanning that weekly privacy of these early cases and to cumulate statistically data. By doing so, our algorithm sums weekly frequency of P significant mass for analysis, cases from Weeks 3-8 were over all weeks to obtain the frequency of P over the entire grouped into (Episode) Week 8 (February 23-29) with COVID-19 period. When applying this procedure for all patterns 107 cases. From Week 9 onward, the data reflect their reported that are frequent in at least one of the weeks in a single run, our episode weeks. algorithm can then rank and return a collection of all frequent patterns. 2) Mining of Frequent Patterns and Contrast Patterns Once the data are preprocessed, our algorithm analyzes and IV. EVALUATION mines data from each week. For instance, we observe the A. A Case Study on Real-Life COVID-19 Data following from Week 10: 1) Collection, Integration and Preprocessing of Data • Frequent singleton pattern {recovered}:2002 reveals that To evaluate and demonstrate the usefulness of our data 2,002 Canadian COVID-19 patients have recovered. science algorithm, we tested it with different COVID-19 This accounts for 96.8% of the 2,068 cases occurred in epidemiological data including the Canada cases from Statistics Week 10, which is encouraging. Canada [60]. With this dataset, data have been collected and • integrated from provincial and territorial public health {other occupation}:1588 reveals that 1,588 cases (i.e., authorities by the Public Health Agency of Canada (PHAC). We 76.8% of all the cases occurred in Week 10) are not preprocess data and generalize some attributes to obtain a health care workers, school/daycare worker, or long- dataset with the following attributes: term care residents. 1. A unique privacy-preserving identifier for each case

962 • {not hospitalized}:1251 reveals that 1,251 cases (i.e., occasionally drop to 1-digit in some weeks (e.g., Weeks 24, 25, 60.5%) did not needed to be hospitalized. 28, 31, 32 and 34). Similar trend is observed for non-ICU cases. To elaborate, starting Week 8, some cases needed to be • {domestic acquisition}:1206 reveals that 1,206 cases hospitalized. Then, the number was at its peak around Weeks (i.e., 58.3%) were transmitted via domestic acquisition, 15-19, and then decreased and stabilized at 3-digital figures in i.e., community exposure. recent weeks. • Frequent non-singleton pattern {not hospitalized, Fig. 6 shows the relative percentage of cases in different recovered}:1232 reveals that, among 1,251 cases not hospitalization categories. The demand for hospitalization went requiring hospitalization, 1,232 of them recovered. up in early weeks, with a peak for the ICU service in Week 10 As users have flexibility to express their interest or and a peak for non-ICU service in Week 19. In general, the preference (say, finding frequent pattern consisting of only demand for hospitalization (admitted to ICU or not) was around symptoms), our algorithm then incorporates user preference into 20% in early weeks (i.e., Weeks 10-18). The demand went mining frequent patterns satisfying the user preference. For beyond 20% in Week 19. Fortunately, it dropped to around 10% instance, it finds the following patterns from Week 10: in Week 20 and dropped further in succeeding weeks. For instance, hospitalized cases accounted for 4.9% of all cases in • Frequent pattern {cough}:695 reveals that 695 cases Week 43. show cough as a symptom. Hospitalization 9000 not hospitalized non-ICU ICU • Frequent patterns {headache}:486 and {pain}:485 reveal 8000 that 486 and 485 cases show headache and pain, 7000 respectively, as a symptom. 6000 5000 • 4000 Frequent pattern {fever}:441 reveals that 441 cases show 3000 fever as a symptom. When compared with other 2000 symptoms (e.g., cough, headache, pain), fever is less 1000 0 common despite that many places are checking 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 individuals’ body temperatures to detect potential Week COVID-19 cases. Fig. 5. A stacked column chart showing hospital status of Canadian COVID- 19 cases from Week 8 to Week 43 in terms of the absolute numbers of (a) ICU, • Frequent non-singleton {cough, pain}:409 reveals that (b) non-ICU but hospitalized, and (c) non-hospitalized cases. 409 cases show both cough and pain. Similarly, {cough, Hospitalization headache}:407 reveals that 407 cases show both cough 100% and headache. 80% • However, {cough, headache, pain}:291 reveals that, 60% while cough commonly occurred with headache or pain, but not frequently occurred with both headache and pain. 40% Our data science algorithm applies a similar procedure to 20% not hospitalized non-ICU ICU other weeks for (a) discovery of frequent patterns from 0% 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 individual week and (b) comparison among patterns discovered Week from different weeks. We observe that most of the frequent Fig. 6. A 100% stacked column chart showing hospital status of Canadian patterns are consistent among consecutive weeks, and can be COVID-19 cases from Week 8 to Week 43 in terms of the relative percentages merged into a mega-interval. For example: of (a) ICU, (b) non-ICU but hospitalized, and (c) non-hospitalized cases among all cases with stated hospital status. • {recovered}:149944 reveals that recovery is the most frequent clinical outcome across 36 consecutive weeks Another instance with variations is transmission methods. in our evaluation data (from Week 8 to Week 43), with a Fig. 7 shows the absolute number of cases in different frequency of 149,944 cases. transmission categories. In early weeks (e.g., Weeks 8-13), • {cough}:17712 reveals that cough is the most frequent many cases were transmitted via international travel. For symptom across 36 consecutive weeks in our evaluation example, more than 1,600 cases were transmitted via travel data, with a frequency of 17,712 cases. exposure in Week 11. This numbers dropped to zero in Weeks 15-24 when non-essential international travel is strongly There are few variations in frequency in some patterns. Here, discouraged and many countries closed their borders. Then, it we highlight two instances. One of the instance is stabilized and remained below 50 cases. hospitalization. Fig. 5 shows the absolute number of cases in different hospitalization categories. In Week 8, no cases needed Fig. 8 shows the relative percentage of cases in different to be admitted to the ICU. Starting from Week 9, some cases transmission categories. Similar observations can be made. For needed to be in the ICU, with the peak absolute number of example, close to 50% of cases in Week 9 were travel exposed. 271 cases in Week 13. The numbers of ICU cases stay at 3-digit From Week 15 onwards, international travel accounts for a (i.e., more than 100 cases) until Week 21 when it dropped to minimal percentage of cases. Majority of cases were transmitted 55 ICU cases. Afterwards, the numbers remain at 2-digit with via domestic acquisition (i.e., through community exposure).

963 Transmission methods 12000 As ongoing and future work, we transfer knowledge learned dom acq intl trvl from the current work to temporal analytics of other big data in 10000 many real-life applications and services. We also explore the 8000 incorporation of visual analytics [61] with our data science 6000 algorithm to conduct visual analytics of temporal big data. 4000 ACKNOWLEDGMENT 2000

0 This work is partially supported by the Natural Sciences and 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 Engineering Research Council of Canada (NSERC), as well as Week the University of Manitoba. Fig. 7. A stacked column chart showing transmission methods of Canadian COVID-19 cases from Week 8 to Week 43 in terms of the absolute numbers of REFERENCES (a) domestic acquisition (i.e., community exposures) and (b) international travel exposures. [1] K. Huang, et al., “EBD-MLE: enabling block dynamics under BL-MLE for ubiquitous data,” in IEEE ISPA-IUCC 2017, pp. 1281-1288. [2] E. Serrano, et al., “A cloud environment for ubiquitous medical image 100% transmission methods reconstruction,” in IEEE BDCloud (ISPA-IUCC-BDCloud-SocialCom- 80% SustainCom) 2018, pp. 1048-1055. [3] M. Badawy, et al., “Verification in mobile communication during the 60% change of IP address,” in IEEE IUCC-DSCI-SmartCNS 2019, pp. 101- 40% 105. [4] X. Cheng, et al., “Post-evaluation model of telecommunication network 20% construction based on AHP,” in IEEE IUCC-DSCI-SmartCNS 2019, dom acq intl trvl 0% pp. 616-620. 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 Week [5] N. Ahmed, et al., “Survey of COVID-19 contact tracing apps,” IEEE Access 8, 2020, pp. 134577-134601. Fig. 8. A stacked column chart showing hospital status of Canadian COVID- [6] R. Raskar, et al., “Contact tracing: holistic solution beyond bluetooth,” 19 cases from Week 8 to Week 43 in terms of the relative percentages of IEEE Data Eng. Bull. 43(2), 2020, pp. 67-70. (a) domestic acquisition (i.e., community exposures) and (b) international travel exposures among all cases with stated hospital status [7] N. Trieu, et al., “Epione: lightweight contact tracing with strong privacy. IEEE Data Eng. Bull. 43(2), 2020, pp. 95-107. B. Functionality Check with Related Works [8] F. Jiang, C.K. Leung, “A data analytic algorithm for managing, querying, and processing uncertain big data in cloud environments,” Algorithms After demonstrating the features and usefulness of our data 8(4), 2015, pp. 1175-1194. science algorithm in analyzing real-life COVID-19 data, let us [9] C.K. Leung, et al., “Fast algorithms for frequent itemset mining from evaluate its functionality when compared with related works. uncertain data,” in IEEE ICDM 2014, pp. 893-898. First, most of the related works are observed to report mostly the [10] W. Zheng, et al., “An adaptive priority-based heuristic approach for numbers of cases and deaths. They do not provide privacy- scheduling DAG applications with uncertainties,” in IEEE ISPA-IUCC preserving details and epidemiological characteristics of those 2017, pp. 72-79. [11] L. Kang, et al., “A percolation based approach for critical density in non- COVID-19 cases, which are provided by our algorithm. Second, orientation directional sensor network,” in in IEEE IUCC-DSCI- for those related works that provide overall data distribution of SmartCNS 2019, pp. 89-94. cases, they are mostly confined to single dimensions/attributes. [12] H. Qi, et al., “An improved sierpinski fractal based In contrast, our algorithm provides multi-dimensional for datacenters,” in IEEE IUCC-DSCI-SmartCNS 2019, information such as relationships among attributes in the form pp. 27-34. of frequent patterns. [13] Y. Zhang, et al., “Connection degree cost and reward based algorithm in cognitive radio networks,” ,” in IEEE IUCC-DSCI-SmartCNS 2019, V. CONCLUSIONS pp. 76-80. [14] A.K. Chanda, et al. “A new framework for mining weighted periodic In this paper, we presented a data science algorithm for patterns in time series ,” ESWA 79, 2017, pp. 207-224. temporal analytics on big COVID-19 epidemological data. The [15] Y. Hou, et al., “HDSVM: a high efficiency distributed svm framework algorithm generalizes some attributes for effective analysis. over data stream,” in IEEE ISPA-IUCC 2017, pp. 352-359. Instead of ignoring unstated/NULL values of some attributes, [16] C.K. Leung, et al., “A machine learning approach for stock price the algorithm provides users with flexibility of including or prediction,” in IDEAS 2014, pp. 274-277. excluding these values. It also provides users with flexibility to [17] D. Barh, et al.,, “Multi-omics-based identification of SARS-CoV-2 express their preference (e.g., “must include symptoms”) in infection biology and candidate drugs against COVID-19,” Comput. Biol. mining of frequent patterns. It makes good use of ubiquitous Medicine 126, 2020, pp. 104051:1-104051:13. computing to discover frequent patterns from each weekly data [18] O.A. Sarumi, C.K. Leung, “Exploiting anti-monotonic constraints for in parallel. Moreover, it compares and contrasts the discovered mining palindromic motifs from big genomic data,” in IEEE BigData 2019, pp. 4864-4873. frequent patterns among consecutive weeks to observe any [19] F. Jiang, et al., “Discovery of really popular friends from social uptrends or downtrends on any characteristics of COVID-19 networks,” in IEEE BDCloud 2014, pp. 342-349. cases. Evaluation results show the practicality of our algorithm [20] F. Jiang, et al., “Finding popular friends in social networks,” in CGC in providing rich knowledge about characteristics of COVID-19 2012, pp. 501-508 . cases. This helps researchers, epidemiologists and policy makers [21] C.K. Leung, C.L. Carmichael, “Exploring social networks: a frequent to get a better understanding of the disease, which may inspire pattern approach,” in IEEE SocialCom 2010, pp. 419-424. them to come up ways to detect, control and combat the disease.

964 [22] C.K. Leung, et al., “Personalized DeepInf: enhanced social influence [43] W. Lee, et al., “Reducing noises for recall-oriented patent retrieval,” in prediction with deep learning and transfer learning,” in IEEE BigData IEEE BDCloud 2014, pp. 579-586. 2019, pp. 2871-2880. [44] A.A. Ardakani, et al., “Application of deep learning technique to manage [23] C.K. Leung, F. Jiang, “Big data analytics of social networks for the COVID-19 in routine clinical practice using CT images: results of discovery of "following" patterns,” in DaWaK 2015, pp. 123-135. 10 convolutional neural networks,” Comp. Bio. Med. 121, 2020, [24] X. Xia, “Personalized privacy protection with spatio-temporal features in pp. 103795:1-103795:9. social networks,” in IEEE IUCC-DSCI-SmartCNS 2019, pp. 146-152. [45] M.B. Jamshidi, et al., “Artificial intelligence and COVID-19: deep [25] P.P.F. Balbin, et al., “Predictive analytics on open big data for supporting learning approaches for diagnosis and treatment,” IEEE Access 8, 2020, smart transportation services,” Procedia Computer Science 176, 2020, pp. 109581-109595. pp. 3009-3018. [46] B. Robson, “COVID-19 coronavirus spike protein analysis for synthetic [26] C.K. Leung, et al., “An innovative fuzzy logic-based machine learning vaccines, a peptidomimetic antagonist, and therapeutic drugs, and algorithm for supporting predictive analytics on big transportation data,” analysis of a proposed achilles' heel conserved region to minimize in FUZZ-IEEE 2020. doi:10.1109/FUZZ48607.2020.9177823 of escape mutations and drug resistance,” Comp. Bio. Med. 121, 2020, pp. 103749:1-103749:28. [27] C.K. Leung, et al., “Data mining on open public transit data for transportation analytics during pre-COVID-19 era and COVID-19 era,” [47] X. Marchand-Senécal, et al., “Diagnosis and management of first case of in INCoS 2020, pp. 133-144. COVID-19 in Canada: lessons applied from SARS-CoV-1,” Clinical Infectious Diseases, 2020. doi:10.1093/cid/ciaa227 [28] C.K. Leung, et al., “Urban analytics of big transportation data for supporting smart cities,” in DaWaK 2019, pp. 24-33. [48] C.S. Eom, et al., “Effective privacy preserving data publishing by vectorization,” Information Sciences 527, 2020, pp. 311-328. [29] P. Gupta, et al., “Vertical data mining from relational data and its application to COVID-19 data,” Big Data Analyses, Services, and Smart [49] C.K. Leung, et al., “Privacy-preserving frequent pattern mining from big Data, 2021, pp. 106-116. uncertain data,” in IEEE BigData 2018, pp. 5101-5110. [30] L. Lazli, M. Boukadoum, “Quantification of Alzheimer's disease brain [50] A.M. Olawoyin, et al., “Privacy-preserving spatio-temporal patient data tissue volume by an enhanced possibilistic clustering technique based on publishing,” in DEXA 2020 (II), pp. 407-416. bias-corrected fuzzy initialization,” in IEEE ISPA-IUCC 2017, pp. 1434- [51] B.H. Wodi, et al., “Fast privacy-preserving keyword search on encrypted 1438. outsourced data,” in IEEE BigData 2019, pp. 6266-6275. [31] J. Souza, et al., “An innovative big data predictive analytics framework doi:10.1109/BigData47090.2019.9046058 over hybrid big data sources with an application for disease analytics,” in [52] W.T. Li, et al., “Using machine learning of clinical data to diagnose AINA 2020, pp. 669-680. COVID-19: a systematic review and meta-analysis,” BMC Medical [32] Z. Zheng, et al., “Fruit tree disease recognition based on convolutional Informatics Decis. Mak. 20(1), 2020, pp. 247:1-247:13. neural networks,” in IEEE IUCC-DSCI-SmartCNS 2019, pp. 118-122. [53] A.S. Albahri, et al., “Role of biological data mining and machine learning [33] C.K. Leung, “Data science for big data applications and services: data techniques in detecting and diagnosing the novel coronavirus (COVID- lake management, data analytics and visualization,” in Big Data Analyses, 19): a systematic review,” J. Medical Syst. 44(7), 2020, pp. 122:1-122:11. Services, and Smart Data, 2021, pp. 28-44. [54] W. Kuo, J. He, “Guest editorial: crisis management - from nuclear [34] C.K. Leung, F. Jiang, “A data science solution for mining interesting accidents to outbreaks of COVID-19 and infectious diseases,” IEEE patterns from uncertain big data,” in IEEE BDCloud 2014, pp. 235-242. Trans. Reliab. 69(3), 2020, pp. 846-850. [35] A. Fariha, et al., “Mining frequent patterns from human interactions in [55] A.A. Amini, et al., “Editorial special issue on "AI-driven informatics, meetings using directed acyclic graphs,” in PAKDD 2013 (I), pp. 38-49. sensing, imaging and big data analytics for fighting the COVID-19 pandemic". IEEE JBHI 24(10), 2020, pp. 2731-2732. [36] C.K. Leung, “Uncertain frequent pattern mining,” Frequent Pattern Mining, 2014, pp. 417-453. [56] D. Shen, et al., “Guest editorial: special issue on imaging-based diagnosis of COVID-19,” IEEE TMI 39(8), 2020, pp. 2569-2571. [37] E. Marin, et al., “Predicting hacker adoption on darkweb forums using sequential rule mining,” in IEEE BDCloud 2018, pp. 1183-1190. [57] Y. Zhang, et al., “A five-layer deep convolutional neural network with stochastic pooling for chest CT-based COVID-19 diagnosis,” Mach. Vis. [38] S. Ahn, et al., “A fuzzy logic based machine learning tool for supporting Appl. 32(1), 2021, pp. 14:1-14:13. big data business analytics in complex artificial intelligence environments,” in FUZZ-IEEE 2019, pp. 1259-1264. [58] A. Viguerie, et al., “Simulating the spread of COVID-19 via a spatially- resolved susceptible-exposed-infected-recovered-deceased (SEIRD) [39] J.A. Brown, et al., “A machine learning system for supporting advanced model with heterogeneous diffusion,” Appl. Math. Lett. 111, 2021, knowledge discovery from chess game data,” in IEEE ICMLA 2017, pp. 106617:1-106617:9. pp. 649-654. [59] World Health Organization, WHO coronavirus disease (COVID-19) [40] K.J. Morris, et al., “Token-based adaptive time-series prediction by dashboard. https://covid19.who.int/ ensembling linear and non-linear estimators: a machine learning approach for predictive analytics on big stock data,” in IEEE ICMLA 2018, [60] Public Health Agency of Canada, “Detailed preliminary information on pp. 1486-1491. confirmed cases of COVID-19 (revised),” Statistics Canada Table 13-10- 0781-01. doi:10.25318/1310078101-eng [41] Y. Zhang, et al., “Dynamic beam hopping for DVB-S2X satellite: a multi- objective deep approach,” in IEEE IUCC-DSCI- [61] C.K. Leung, et al, “Big data visualization and visual analytics of COVID- SmartCNS 2019, pp. 164-169. 19 data,” in IV 2020, pp. 387-392. doi:10.1109/IV51561.2020.00073 [42] C.K. Leung, “Mathematical model for propagation of influence in a social network,” Encyclopedia of Social Network Analysis and Mining, 2e, 2018, pp. 1261-1269

965