sustainability

Article An Infodemiology and Infoveillance Study on COVID-19: Analysis of Twitter and Google Trends

Reem Alshahrani 1 and Amal Babour 2,*

1 Department of Computer Science, Taif University, Taif 26571, ; [email protected] 2 Department of Information Systems, King Abdulaziz University, 21589, Saudi Arabia * Correspondence: [email protected]

Abstract: Infodemiology uses web-based data to inform public health policymakers. This study aimed to examine the diffusion of Arabic language discussions and analyze the nature of Internet search behaviors related to the global COVID-19 pandemic through two platforms (Twitter and Google Trends) in Saudi Arabia. A set of Twitter Arabic data related to COVID-19 was collected and analyzed. Using Google Trends, internet search behaviors related to the pandemic were explored. Health and risk perceptions and information related to the adoption of COVID-19 infodemic markers were investigated. Moreover, Google mobility data was used to assess the relationship between different community activities and the pandemic transmission rate. The same data was used to investigate how changes in mobility could predict new COVID-19 cases. The results show that the top COVID-19–related terms for misinformation on Twitter were folk remedies from low quality sources. The number of COVID-19 cases in different Saudi provinces has a strong negative correlation with COVID-19 search queries on Google Trends (Pearson r = −0.63) and a statistical significance (p < 0.05). The reduction of mobility is highly correlated with a decreased number of total cases in   Saudi Arabia. Finally, the total cases are the most significant predictor of the new COVID-19 cases.

Citation: Alshahrani, R.; Babour, A. Keywords: coronavirus; COVID-19; Google Trends; infodemiology; infoveillance; social media; Twitter An Infodemiology and Infoveillance Study on COVID-19: Analysis of Twitter and Google Trends. Sustainability 2021, 13, 8528. https:// 1. Introduction doi.org/10.3390/su13158528 In late December 2019, news of a novel coronavirus began to emerge from Wuhan, Academic Editor: Andrei P. Kirilenko China. The new virus, named by World Health Organization (WHO) as Coronavirus Disease 2019 (COVID-19) causes severe acute respiratory infection leading to death in Received: 10 June 2021 many cases [1]. This virus has continued to spread aggressively around the world, causing Accepted: 27 July 2021 a global panic and posing serious challenges to public health and policy makers. Many Published: 30 July 2021 countries have enforced rigorous protective measures to combat the pandemic, such as mandatory quarantine and massive closures. However, efforts to slow down the transmis- Publisher’s Note: MDPI stays neutral sion rate have been undermined by the COVID-19 ‘infodemic’ [2–5]. with regard to jurisdictional claims in An important public health response to the COVID-19 infodemic is proactive com- published maps and institutional affil- munication, with the objectives of alleviating confusion, avoiding misunderstandings, iations. minimizing adverse consequences, and saving lives [6]. In March 2020, WHO published a recommendation to provide guidance for countries on how to implement effective Risk Communication and Community Engagement (RCCE) strategies to improve public aware- ness and perception [7]. RCCE can help prevent infodemics by building trust in public Copyright: © 2021 by the authors. health institutions and spokespersons, thereby increasing the chance that public health Licensee MDPI, Basel, Switzerland. guidelines will be followed. Determining what people know, how they feel, and what they This article is an open access article do to keep the outbreak under control is essential for public health workers to address the distributed under the terms and public’s perception of risk and uncertainties. Thus, designing effective RCCE strategies conditions of the Creative Commons to stop further spreads of misinformation and rumors must be based on public health Attribution (CC BY) license (https:// surveillance. Social media and web searches have become essential sources of information creativecommons.org/licenses/by/ about the public and are widely utilized for health-related information. The COVID-19 4.0/).

Sustainability 2021, 13, 8528. https://doi.org/10.3390/su13158528 https://www.mdpi.com/journal/sustainability Sustainability 2021, 13, 8528 2 of 16

pandemic provides a starting point to discuss how social media and web searches can be leveraged to design beneficial RCCE to address this unprecedented crisis. In the past decade, the internet has become an integral part of people’s lives. Online sources that provide real time or near-real time data are becoming increasingly avail- able, consequently changing the pattern of spread of health-related information [8]. This paradigm shift can be useful for understanding population health concerns and needs, the diffusion of health-related information/misinformation, and the public’s reaction to health events [9]. There is a large body of studies utilizing social media and internet traffic to un- derstand the prevalence and diffusion of information and misinformation about COVID-19, providing insights for public health surveillance [6] and health policy makers [10,11]. This use of the internet has formed two new concepts: infodemiology, defined as the science of distribution and determinants of information on the Internet, and infoveillance, defined as the longitudinal tracking of infodemiology metrics for surveillance and trend analysis [9]. Infodemiology and infoveillance have contributed to public health knowledge by analyzing a range of topics such as chronic diseases and influenza. More specifically, infodemiology provides the necessary tools for understanding health-related infodemics, which can make it hard for people to find reliable sources and trustworthy guidance during public health crises [12,13]. Twitter Analytics and Google Trends are two of the most effective infoveillance tools for assessing the dissemination of information on health issues and topics [14,15]. Twitter functions as a convenient source of information about COVID-19 [11]. Google Trends is a tool that provides both real-time and archived information on Google queries worldwide. Thus, information from internet searches that are performed anonymously, enabling the analyzing and forecasting of health-related topics can be obtained. The social media search index has been identified as a promising predictor of COVID-19 transmission rates [16,17]. Although Twitter has some limitations as a tool for disease prediction and containment, its potential for communicating peoples’ stories and news sharing may profoundly impact public health outcomes. Both Google Trends and Twitter can serve as viable resources to understand people’s perception and to monitor their reaction to the pandemic over time [14,18]. The use and roles of social media and web searches amid the COVID-19 pandemic have been widely studied [14,16,17]. However, to the best of our knowledge, no systematic research has yet been conducted on Arabic infodemiology and infoveilliance. Given the fact that 41% of the online population in Saudi Arabia uses social media, a higher percentage than any other country in the world, it is vital to conduct a study concerning the Arabic language [19]. Considering the impact that the spread of information and misinformation of COVID-19 may have upon the transmission rate, determining the public reactions to tweets and investigating the nature and diffusion of COVID-19-related information on the internet can provide important insights into the beliefs and concerns of Arabic users. Furthermore, analyzing the spread of information found in tweets may help decision makers to intervene in limiting the wide spread of misinformation or fake news, such as setting laws to prevent the spread of unreliable information. In this paper, the infodemiology of COVID-19 in terms of socially disseminated information are addressed. First, the magnitude of misinformation and the quality of information sources regarding the COVID-19 epidemic that is being spread on Twitter in the Arabic language was analyzed. Second, information prevalence indicators (search volume) about COVID-19 was analyzed by using data from Google Trends to examine the correlation between information prevalence and daily new cases in different provinces in Saudi Arabia. Third, the relationship between mobility activity in Saudi Arabia and COVID-19 prevalence was examined [20]. Fourth, an infodemiology study with a multiple regression analysis was conducted to examine the relationship between new COVID-19 cases and various potential predictors, namely overall mobility, total confirmed cases, and information prevalence. In this paper, Saudi Arabia was chosen as a case study since it is one of the top Arabic countries that is highly affected by the pandemic [21]. Moreover, Sustainability 2021, 13, x FOR PEER REVIEW 3 of 17

Sustainability 2021, 13, 8528 cases and various potential predictors, namely overall mobility, total confirmed cases,3 and of 16 information prevalence. In this paper, Saudi Arabia was chosen as a case study since it is one of the top Arabic countries that is highly affected by the pandemic [21] . Moreover, to the best of our knowledge, no study has examined infodemiology and infovalliance in to the best of our knowledge, no study has examined infodemiology and infovalliance in Saudi Arabia. Saudi Arabia. Figure 1 shows the status of the pandemic in Saudi Arabia according to the regions. Figure1 shows the status of the pandemic in Saudi Arabia according to the regions. The cumulative number of cases in Saudi Arabia reached 275,000 as of 3 June 2020, with The cumulative number of cases in Saudi Arabia reached 275,000 as of 3 June 2020, with confirmedconfirmed cases cases in in each each Saudi Saudi province province [22]. [22 ].This This study study serves serves as a as starting a starting point point for de- for signingdesigning strategic strategic messages messages for health for health campaigns campaigns and establishing and establishing an effective an effective risk com- risk municationcommunication channel. channel.

FigureFigure 1. 1. TheThe status status of of COVID-19 COVID-19 cases cases in in Saudi Saudi Arabia’s Arabia’s cities cities as as of of 3 3 June June 2020. 2020.

2.2. Materials Materials and and Methods Methods ToTo answer the the research research questions questions of of this this study, study, data from Twitter [23], [23], and Google TrendsTrends [24] [24] and Google COVID-19 community mo mobilitybility data [20] [20] were collected. In this section,section, the the data data collection collection process process and and analysis analysis are explained.

2.1.2.1. Twitter Twitter SearchesSearches about about the the novel novel coronavirus coronavirus relate relatedd tweets written in the Arabic language andand geolocated geolocated in in Saudi Saudi Arabia Arabia were were performed performed between between 13 13 June June 2020 2020 and and 12 12 July 2020 usingusing Python Python software software 3.7.0, 3.7.0, Twitter Twitter standard search Application Programming Interface (API), and Tweepy Python libraries. In the search, a set consisting of predefined search (API), and Tweepy Python libraries. In the search, a set consisting of predefined search terms that were most widely used as media terms for the novel coronavirus remedy from terms that were most widely used as media terms for the novel coronavirus remedy from Twitter trends were used. Table1 shows the English name for each of the terms used in the Twitter trends were used. Table 1 shows the English name for each of the terms used in search with a brief description. the search with a brief description.

Table 1. List of terms used in collecting tweets. Table 1. List of terms used in collecting tweets.

TermTerm English English Name Name Description Description CoronavirusCoronavirus The most The most common common term term for for the the novel novel coronavirus coronavirus AKðPñ»ﻛﻮﺭﻭﻧﺎ X DexamethasoneDexamethasone Corticosteroid Corticosteroid used used as asanti-inflammatory anti-inflammatory ÓA‚ºK  JJ PA àðﺩﻳﻜﺴﺎﻣﻴﺜﺎﺯﻭﻥ SaussureaSaussurea costus costus Common Common folk folkremedy remedy for for respiratory respiratory complaints complaints  ﺍﻟﻘﺴﻂ ﺍﻟﻬﻨﺪﻱ ø YJêË@ ¡‚®Ë@ SumacSumac Common Common folk folkremedy remedy for for a variety a variety of of complaints complaints AÖ ކﺳﻤﺎﻕ Remdesivir An antiviral medication ﺭﻣﺪﻳﺴﻴﻔﻴﺮ Q ®J ‚ YÓP Remdesivir An antiviral medication

A set of 6541 Arabic tweets was collected. The data contains text and metadata of the tweets, including tweet id, username, hashtags, and number of retweets. The data preprocessing involves three phases which are the data cleaning phase, normalization phase, and lemmatization phase. In the data cleaning phase, all non-Arabic terms, stop Sustainability 2021, 13, x FOR PEER REVIEW 4 of 17

Sustainability 2021, 13, 8528 A set of 6541 Arabic tweets was collected. The data contains text and metadata of4 of the 16 tweets, including tweet id, username, hashtags, and number of retweets. The data prepro- cessing involves three phases which are the data cleaning phase, normalization phase, and lemmatization phase. In the data cleaning phase, all non-Arabic terms, stop words, num- words,bers, punctuations, numbers, punctuations, emojis, hashtags, emojis, and hashtags, URLs were and URLsautomatically were automatically removed by removed coding. byIn coding.the normalization In the normalization phase, multiple phase, forms multiple of a forms letter ofwere a letter converted were converted into one intouniform one uniformletter, numbers, letter, numbers, spaces, repeated spaces, repeated letters, an letters,d elongation and elongation were removed were removed using the using Tasha- the Tashaphynephyne Python Python library library [25]. In [25 the]. In lemmatization the lemmatization phase, phase, words words were wereconverted converted to their to theirroots rootsusing using the Farasa the Farasa toolkit toolkit [26]. [ 26An]. exampl An examplee of the of thepreprocessing preprocessing phases phases is shown is shown in inFigure Figure 2.2 .

FigureFigure 2.2. Example of preprocessing phases.phases.

AA set of metrics were selected to analyze thethe collected tweets.tweets. Information prevalence -coronavirus’)’ (‘corona‘) ﻛﻮﺭﻭﻧ’ﺎ‘ «waswas computedcomputed by by counting counting the the number number of conversations of conversations mentioning mentioning ‘AKðPñ invirus’) combination in combination with one with or moreone or of more the other of the terms other listed terms in Tablelisted1 inwithin Table the 1 within collected the tweets.collected The tweets. information The information occurrence occurrence ratio was ratio determined was determined by calculating by calculating the information the in- prevalenceformation prevalence for each of for the each terms of divided the terms by thedivided total by number the total of the number collected of the tweets. collected The informationtweets. The information prevalent quality preval wasent foundquality by was examining found by the examining source of the each source tweet of in each the collectedtweet in the tweets collected to determine tweets to its determine reliability. its The reliability. accounts The of Saudiaccounts Arabia’s of Saudi Ministry Arabia’s of Health,Ministry official of Health, TV channels, official TV TV channels, news channels, TV news TV channels, programs, TV officialprograms, newspapers, official news- and onlinepapers, newspapers and online were newspapers considered aswere high-quality considered sources as high-quality (HQS); usernames sources of personal (HQS); accounts,usernames personal of personal groups, accounts, and unofficial personal accounts groups, were and coded unofficial as low-quality accounts were sources coded (LQS). as Then,low-quality the information sources (LQS). prevalent Then, quality the information was calculated prevalent as the quality number was of calculated the tweets as that the werenumber retweeted of the tweets from LQSs that were divided retweeted by the totalfrom number LQSs divided of the tweets by the that total were number retweeted of the fromtweets both that HQSs were retweeted and LQSs. from Information both HQSs incidence and LQSs. was Information calculated incidence by determining was calcu- the numberlated by ofdetermining conversations the number about each of conversa of the definedtions about four each terms of combinedthe defined with four ‘A KðPñ» terms’ the collected’ (‘Coronavirus’) tweets by in units the collected of time, where tweets the byunit units of of time time, in thiswhere study the ﻛﻮﺭﻭﻧﺎ‘Coronavirus’)combined with in‘) isunit four of weeks.time in To this visualize study is the four most weeks. frequent To visualize words in the the most conversations, frequent words word clouds in the wascon- usedversations, via RStudio word Versionclouds was 1.3.1056 used [via9,15 RStudio]. Version 1.3.1056 [9,15].

2.2. Google Trends Data Google Trends is an open online tracking tool that provides real-time and archived information for internet hit search volumes [19]. It normalizes and scales data in the form Sustainability 2021, 13, 8528 5 of 16

of search volume numbers to reflect search popularity on a scale relative to the total number of queries carried out on Google overtime. The scale ranges from 0 (low) to 100 (highly popular) for a specific search term. The information prevalence indicator [19] in this study is represented by the popularity scale provided by Google Trends. In order to find the information prevalence in different provinces, the framework in [14] was followed. The framework provides a systematic way to extract information from Google Trends. In this study, the country was set to “Saudi Arabia”, and a default of “All categories” and “Web search” were selected. Since “Coronavirus or AKðPñ» ” is the official name used by the Saudi Ministry of Health, the topic “AKðPñ» ” (‘Coronavirus’) was selected as the most widely used term for the novel coronavirus. Besides this, other popular search terms/phrases were added. The terms/phrases and the English name for each of them are provided in Table2. Note that only Arabic words were examined; no English words were included in this study.

Table 2. English translation for the most Arabic terms used to describe COVID-19.

Arabic Term/Phrase English Name AKðPñ» Coronavirus

YJ ¯ñ»19 COVID-19

YJ ¯ñ» COVID AKðQ» Corona AKðPñ» HBAg XY« Coronavirus new cases  Coronavirus in Saudi Arabia éK Xñª‚Ë@ ú ¯ AKðPñ»

Google Trends data were collected as time series queries during the period from February to July 2020 and were retrieved in the csv format. Since there is a lag between the action of Google search and new cases confirmation, a delayed effect of 14 days was considered. The collected data were aligned with official data on daily COVID-19 new cases and deaths per one million people for each province in Saudi Arabia during the period from May to July with a lagged difference of 14 days, all of which was retrieved from the Saudi Ministry of Health [27]. The data was then arranged as (date, province, demand prevalence indicator, number of new cases). The aim is to examine the correlation between Google search queries for different Arabic cities and the daily increase in COVID- 19 cases. Pearson correlation coefficients were used to examine the association between daily new COVID-19 cases and the hit search volume in each province using the same data. Autocorrelation function (ACF) (the correlation of a parameter with itself over the time) was also performed using Google Trends data to test whether the number of cases of a specific day would impact the search volume the next day [28]. Both crude and partial autocorrelations were computed. All of the calculations were performed using a Python 3.7.0 environment. Note that for all analyses, an alpha level of 0.05 was used to determine statistical significance.

2.3. Google Mobility Data Daily community mobility data provided by Google covers 130 countries starting from 15 February 2020 [20]. Saudi Arabia was picked as a case study to examine whether or not less mobility is associated with fewer COVID-19 total cases per one million. The data are grouped by Google into five categories based on the most popular activities among people. these categories are work, grocery and pharmacy, parks, residential, and retail. To describe mobility, Google uses a percentage change in relation to previous values or baseline [20], as seen in Figure3. The baseline is calculated by Google as the average value, for the same day of the week, during the five-week period between 3 January and 6 February 2020. For example, on 21 February 2020 in Saudi Arabia compared to the baseline, grocery mobility decreased by 43%, park mobility decreased by 60%, retail and transit mobilities decreased Sustainability 2021, 13, x FOR PEER REVIEW 6 of 17

Sustainability 2021, 13, 8528 6 of 16

6 February 2020. For example, on 21 February 2020 in Saudi Arabia compared to the base- line, grocery mobility decreased by 43%, park mobility decreased by 60%, retail and transit bymobilities 100%, and decreased residential by mobility100%, and decreased residential by 100%.mobility The decreased government by 100%. imposed The a curfewgovern- toment control imposed the spread a curfew of the to pandemic.control the Duringspread theof the curfew, pandemic. most retailDuring stores the curfew, were closed most andretail residential stores were activities closedwere and residential prohibited. acti However,vities were there prohibited. were specific However, hours during there were the dayspecific when hours people during were the allowed day when to go people to grocery were and allowed pharmacy to go orto grocery parks. Noteworthy, and pharmacy a positiveor parks. score Noteworthy, in mobility a positive indicates score increased in mobility mobility indicates and increased a negative mobility score indicates and a neg- a reductionative score in indicates mobility. a Googlereduction collects in mobility. only the Google data collects from the only devices the data whose from users the devices allow theirwhose location users toallow be usedtheir anonymously.location to be used anonymously.

FigureFigure 3. 3.Mobility Mobility of of Google Google users users in in Saudi Saudi Arabia. Arabia.

ToTo capture capture daily daily trends trends in in movement movement patterns, patterns, the the data data used used in in this this analysis analysis covers covers thethe period period from from 16 16 February February to to 25 25 June June 2020. 2020. Since Since the the Saudi Saudi Arabia Arabia government government ordered ordered allall workersworkers toto work from home home throughout throughout th thisis period period [16], [16 ],the the “work” “work” category category was was ex- excludedcluded and and mobility mobility was was assessed assessed based based on on the the four four remaining remaining categories categories to tofind find any any as- associationsociation between between the the new new COVID-19 cases andand people’speople’s mobility.mobility. TheseThese associations associations werewere investigated investigated using using Pearson Pearson correlations correlations (if (if both both variables variables entered entered into into the the correlation correlation werewere normally normally distributed)distributed) oror non-parametricnon-parametric Spearman rank correlations correlations (if (if one/both one/both of ofthe the variables variables entered entered into into the the correlation correlation we werere notnot normallynormally distributed).distributed). TheThe Pearson Pearson CorrelationCorrelation Coefficient Coefficient was was applied applied to to specify specify how how two two variables variables vary vary together together (the (the new new COVID-19COVID-19 cases cases and and people’s people’s mobility). mobility). OverallOverall mobility mobility (i.e., (i.e., the the average average mobility mobility across across all all categories) categories) from from the the same same data data setset werewere used used to to conduct conduct a a multiple multiple regression regression analysis analysis to to investigate investigate how how mobility mobility in in generalgeneral could could predict predict new new coronavirus coronavirus cases. cases. This This analysis analysis also also included included total total cases cases and and informationinformation prevalenceprevalence asas predictors of of new new cases cases as as well. well. The The data data was was examined examined for for the theentire entire country country and andwas wasnot disaggregated not disaggregated by province. by province. To conduct To conduct this analysis, this analysis, Rstudio RstudioVersion Version1.3.1056 1.3.1056was used. was Using used. the Using same thedata, same a multiple data, a regression multiple regressionanalysis was analysis carried wasout carriedto investigate out to investigatewhether patterns whether of patternstravel to ofgrocery travel and to grocery pharmacy, and pharmacy,parks, residential, parks, residential,and retail areas and retail could areas significantly could significantly predict new predict coronavirus new coronavirus cases (i.e., cases the (i.e.,four themobility four mobilitycategories categories were used were as predictors used as predictors in this regression, in this regression, as well as total as well cases as and total information cases and informationprevalence). prevalence). 3. Results 3. Results 3.1. Twitter 3.1. Twitter During the study period, the total number of tweets about the most popular media terms for the novel coronavirus remedy was 6541. Figure4 shows the differences among the information prevalence, the information occurrence ratio, and the quality of the information prevalence for each of the identified terms. As seen, the majority of tweets with an Sustainability 2021, 13, x FOR PEER REVIEW 7 of 17

During the study period, the total number of tweets about the most popular media terms for the novel coronavirus remedy was 6541. Figure 4 shows the differences among the information prevalence, the information occurrence ratio, and the quality of the infor- mation prevalence for each of the identified terms. As seen, the majority of tweets with an information prevalence of 3807 and an information occurrence ratio of 58.2% was about Dexamethasone’), 89% of which were retweeted. This was followed by 1694‘) ’ﺩﻳﻜﺴﺎﻣﻴﺜﺎﺯﻭﻥ‘ Saussurea costus’), 64% of which were‘) ’ﺍﻟﻘﺴﻂ ﺍﻟﻬﻨﺪﻱ‘ tweets about the folk remedy (26%) Remdesivir’) comes third with 1021 (15.6%) tweets,7 of 90% 16‘) ’ﺭﻣﺪﻳﺴﻴﻔﻴﺮ‘ Sustainability 2021, 13, 8528 retweeted. The use of Sumac’) is considered the‘) ’ﺳﻤﺎﻕ‘ of which were retweeted. The use of the folk remedy least prevalent with 19 tweets (0.3%) and 100% retweets. ’ﺍﻟﻘﺴﻂ ﺍﻟﻬﻨﺪﻱ‘ Sumac’) and 98% of the‘) ’ﺳﻤﺎﻕ‘ Figure 4 also shows that 100% of the information prevalence of 3807 and an information occurrence ratio of 58.2% was about (‘Saussurea costus’) tweets were retweeted from LQS. These are considered high percent- ‘  ’ (‘Dexamethasone’), 89% of which were retweeted. This was followed by àðagesPA JJ andÓA‚ºK the X tweets may include added misinformation. This was followed by 39% of Saussurea costus’), 64% of which‘) ’ ﺭﻣﺪﻳﺴﻴﻔﻴﺮ ‘ tweets about the folk remedy (%26) ﺩﻳﻜﺴﺎﻣﻴﺜﺎﺯﻭﻥ1694 ‘ ’(‘Dexamethasone’), and 13%ø YofJêË@ ‘ ¡‚®Ë@ ’(‘Remdesivir’) tweets retweeted werefrom retweeted. LQS, respectively. The useof Spreading ‘Q®J ‚YÓP informatio’ (‘Remdesivir’)n from comes unofficial third withaccounts 1021 means (15.6%) that tweets, they are not posted from HQS (e.g., WHO, Ministry of Health). Noteworthy here is that infor- 90% of which were retweeted. The use of the folk remedy ‘  ’ (‘Sumac’) is considered mation prevalence from LQS is mostly for non-medical or folk†AÖÞ remedy. This is evident by thethe least high prevalent percentage with of retweets 19 tweets from (0.3%) LQ andS for 100% ‘sumac’ retweets. and ‘Saussurea costus’.

FigureFigure 4. 4.The The information information prevalence, prevalence, information information occurrence occurrence ratio, ratio, and theand quality the quality of the informationof the infor- mation prevalence of coronavirus remedy terms in tweets and retweets. prevalence of coronavirus remedy terms in tweets and retweets. FigureNext, 4a sensealso shows of the conversation that 100% of incidence the ‘ †AÖ Þ ’of (‘Sumac’) the defined and terms 98% over of the the ‘ø studyYJêË@ ¡‚period®Ë@ ’ (‘Saussureawas calculated. costus’) The tweets results, were as summarized retweeted from in Figure LQS. These 5, show are how considered conversation high percent- volume agesfor each and term the tweets changes may over include time. addedIt can be misinformation. seen that a high This conversation was followed incidence by 39% about of ﺍﻟﻘﺴﻂ ﺍﻟﻬﻨﺪﻱ ﺩﻳﻜﺴﺎﻣﻴﺜﺎﺯﻭﻥ ‘ ‘  ’’(‘Dexamethasone’), (‘Dexamethasone’) and and ‘ 13% of ‘Q’ (‘Saussurea’(‘Remdesivir’) costus’) during tweets the retweeted first week àðof PAtheJJ ÓA‚ºK study X period that then decreases throug h®J the‚ YÓP rest of the study period. In contrast, from LQS, respectively. Spreading information from unofficial accounts means that they are ,Remdesivir’) is low during the first week‘) ’ﺭﻣﺪﻳﺴﻴﻔﻴﺮ‘ the conversation incidence about not posted from HQS (e.g., WHO, Ministry of Health). Noteworthy here is that information followed by a high incidence during the second week, then followed by a low incidence prevalence from LQS is mostly for non-medical or folk remedy. This is evident by the high Sumac’), the incidence of its conversations is‘) ’ﺳﻤﺎﻕ‘ through the last two weeks. As for percentage of retweets from LQS for ‘sumac’ and ‘Saussurea costus’. the least from the beginning to the end of the study period when compared with the other Next, a sense of the conversation incidence of the defined terms over the study period three terms. was calculated. The results, as summarized in Figure5, show how conversation volume for each term changes over time. It can be seen that a high conversation incidence about  ‘àð PA JJ ÓA‚ºK X’ (‘Dexamethasone’) and ‘øYJêË@ ¡‚®Ë@’ (‘Saussurea costus’) during the first week of the study period that then decreases through the rest of the study period. In

contrast, the conversation incidence about ‘Q ®J ‚ YÓP’ (‘Remdesivir’) is low during the first week, followed by a high incidence during the second week, then followed by a low incidence through the last two weeks. As for ‘†AÖ Þ ’ (‘Sumac’), the incidence of its conversations is the least from the beginning to the end of the study period when compared with the other three terms. SustainabilitySustainability 20212021,, 1313,, x 8528 FOR PEER REVIEW 8 8of of 17 16 Sustainability 2021, 13, x FOR PEER REVIEW 8 of 17

FigureFigure 5. 5. InformationInformation incidence incidence over over time. time. Figure 5. Information incidence over time. Figure6 shows a word cloud providing an intuitive overview of the most preva- Figure 6 shows a word cloud providing an intuitive overview of the most prevalent lentFigure words 6 shows that appear a word at cloud least providing 150 times an in intuitive the tweets, overview where of words the most are sizedprevalent based words that appear at least 150 times in the tweets, where words are sized based on how wordson how that frequentlyappear at least they 150 appear times in in the the whole tweets, tweets. where As words seen, are the sized most based prevalent on how word ’ﻛﻮﺭﻭﻧﺎ‘ frequently they appear in the whole tweets. As seen, the most prevalent word was ’ﻛﻮﺭﻭﻧﺎ‘ they’ (‘Coronavirus’) appear in the whole numbering tweets. (4466)As seen, times. the most The prevalent other top word 10 most was prevalent frequentlywas ‘ (‘Coronavirus’)AKðPñ» numbering (4466) times. The other top 10 most prevalent words and their (‘Coronavirus’) numbering (4466) times. The other top 10 most prevalent words and their ’ ‘ ,(Treatment’) (3807‘) ’ﻋﻼﺝ‘ ,(  ’ (‘Dexamethasone’) (3807 as follows:’ (‘Dexamethasone’)‘ ﺩﻳﻜﺴﺎﻣﻴﺜﺎﺯﻭﻥ‘ frequencywords and were their frequencyas follows: were »Treatment’)h. C‘) ’ﻋﻼﺝ‘ ,(X (3807 ÓA‚ºK Dexamethasone’)àðPAJJ‘) ’ﺩﻳﻜﺴﺎﻣﻴﺜﺎﺯﻭﻥ‘ :frequency were as follows -Pa‘) ,(1398) ’ﻣﺮﺿﻰ‘ (’The (1694), health’)‘  ’ (1398),(‘The health‘) ’ﺍﻟﺼﺤﺔ‘(’ costus’) ’ (‘Saussurea (1694), costus Saussurea‘‘) ’,(3203) ﺍﻟﻘﺴﻂ ﺍﻟﻬﻨﺪﻱ‘ (’Treatment‘),(3203) -Pa‘) ’ﻣﺮﺿﻰ‘ ,(The health’)éj’Ë@ (1398‘) ’ﺍﻟﺼﺤﺔ‘ ,(SaussureaøYJêË@ ¡‚ costus’)®Ë@ (1694‘) ’ﺍﻟﻘﺴﻂ ﺍﻟﻬﻨﺪﻱ‘ ,(3203) (’Virus‘) ’ﻓﻴﺮﻭﺱ‘ ,(Britain’) (1011‘)’ﺑﺮﻳﻄﺎﻧﻴﺎ‘ ,(Remdesivir’) (1021‘) ’ ﺭﻣﺪﻳﺴﻴﻔﻴﺮ‘ ,(tients’) (1338 ’ ‘(’Virus‘) ’,(1011) ﻓﻴﺮﻭﺱ‘ (’ (1011),’(‘Britain Britain’)‘‘)’,(1021) ﺑﺮﻳﻄﺎﻧﻴﺎ‘ ,( ’ (‘Remdesivir’) (1021 Remdesivir’)‘‘) ’,(1338) ﺭﻣﺪﻳﺴﻴﻔﻴﺮ‘ (’ (1338),’ (‘Patients tients’)‘ ¯ QK. €ðQ KA¢ YÓP’ (‘Medicine’) (790). AJ ﺩﻭﺍء‘ ‚ ®J Drug’) (912), andQ‘) ’ﻋﻘﺎﺭ‘ úæ•QÓ,(931) .(Medicine’) and ‘Z@ðX’ (‘Medicine’) (790). (790‘) ’,(912) ﺩﻭﺍء‘ Drug’)‘PA®« (912),’ (‘Drug’) and‘),(931) ’ﻋﻘﺎﺭ‘ (’Virus‘),(931)

(a) (b) (a) (b) Figure 6. Word Cloud of frequently mentioned words in novel coronavirus in Arabic tweets in (a) Figure 6. Word Cloud of frequently mentioned words in novel coronavirus in Arabic tweets in (a) ArabicFigure and 6. Word(b) English Cloud language. of frequently mentioned words in novel coronavirus in Arabic tweets in Arabic(a) Arabic and (b and) English (b) English language. language. 3.2. Google Trends 3.2.3.2. Google Google Trends Trends Since words alone give us limited insight into people’s perception and topics of con- SinceSince words words alone alone give give us us limited limited insight insight in intoto people’s people’s perception perception and and topics of conver-con- versation, the pattern of information prevalence about COVID-19 by using an autocorre- versation,sation, thethe pattern pattern of of information information prevalence prevalence about about COVID-19 COVID-19 by by using using an an autocorrelation autocorre- lation analysis was identified to show how the prevalence of these terms change over time. lationanalysis analysis was was identified identified to show to show how how the prevalencethe prevalence of these of these terms terms change change over over time. time. Search Search volume and time series data from Google Trends [19] and their relation to the num- Searchvolume volume and and time time series series data data from from Google Google Trends Trends [19 [19]] and and their their relation relation to to thethe numbernum- ber of total cases are observed. The pattern of internet search volume shown in Figure 7 berof of total total cases cases are are observed. observed. The The pattern pattern of of internet internet search search volume volume shown shown in in Figure Figure7 did7 did not reveal a cyclic trend, as can be seen from the autocorrelation diagram in Figure 8. didnot not reveal reveal a a cyclic cyclic trend, trend, as as can can be be seen seen from from the the autocorrelation autocorrelation diagram diagram in in Figure Figure8. 8. No No seasonal trends were found (ACF = −0.2). The research volume remained constant Noseasonal seasonal trends trends were were found found (ACF (ACF = −= 0.2).−0.2). The The research research volume volume remained remained constant constant from from March to June 2020, apart from a peak in late March and early April. fromMarch March to Juneto June 2020, 2020, apart apart from from a peak a peak in latein late March March and and early early April. April. Sustainability 2021, 13, x FOR PEER REVIEW 9 of 17

SustainabilitySustainability2021 2021,,13 13,, 8528 x FOR PEER REVIEW 99 of 1617

Figure 7. Google Trends-based COVID-19 hit search volume in Arabic from March to June 2020. FigureFigure 7.7. GoogleGoogle Trends-basedTrends-based COVID-19COVID-19 hithit searchsearch volumevolume inin Arabic Arabic from from March March to to June June 2020. 2020.

(a) (b) (a) (b) FigureFigure 8. 8. AutocorrelationAutocorrelation plot plot for for Coronavirus Coronavirus hit hit search search in in Arabic Arabic ( (aa)) and and partial partial autocorrelation autocorrelation plotplotFigure ( (bb),), showing8. showing Autocorrelation no no cyclical cyclical plot pattern pattern for Coronavirus or or regular regular trend trend hit search ( (λλ = 1, 1, ind d = =Arabic 0, 0, D D = = (0,a 0, )CI CIand = = 0.95, partial 0.95, CI CI typeautocorrelation type = = white). white). plot (b), showing no cyclical pattern or regular trend (λ = 1, d = 0, D = 0, CI = 0.95, CI type = white). COVID-19COVID-19 cases cases and and symptoms symptoms are are the the most most searched searched coronavirus coronavirus terms terms by the by Ar- the abicArabic usersCOVID-19 users in Saudi’s in Saudi’scases provinces. and provinces. symptoms Our analysis Our are analysisthe revealedmost revealedsearched that provinces thatcoronavirus provinces with terms a withhigher by a the highernum- Ar- bernumberabic of users COVID-19 of in COVID-19 Saudi’s cases provinces. casesper 1 per million 1Our million peopleanalysis people had revealed hadlower lower thatGoogle Googleprovinces search search withinterest interest a higher related related num- to COVID-19toberCOVID-19 of COVID-19 (e.g.,(e.g., Makkah, cases Makkah, per Riyadh, 1 Riyadh,million and peoplethe and East the hadern Eastern lower provinces). provinces).Google Figure search Figure 9 interestshows9 shows the related infor- the to mationinformationCOVID-19 prevalence (e.g., prevalence Makkah, indicator indicator Riyadh, represented represented and the by East bytheern the popularity provinces). popularity scale scaleFigure provided provided 9 shows by by the Google Google infor- Trends.Trends.mation The Theprevalence COVID-19 indicator relatedrelated searchrepresentedsearch queriesqueries by have hatheve a popularitya significant significant strongscale strong provided negative negative correlationby correla- Google tionwithTrends. with the The incidencethe incidenceCOVID-19 of the of related averagethe average search of new of queries ne COVID-19w COVID-19 have casesa significant cases per per 1 million 1strong million in negative these in these provinces correla- prov- inces(Pearsontion (Pearsonwithr the= − incidencer0.63, = −0.63,p < 0.05).pof < the 0.05). average of new COVID-19 cases per 1 million in these prov- inces (Pearson r = −0.63, p < 0.05). 3.3. Google Mobility Data Shapiro–Wilk tests found that all mobility data categories were normally distributed, with the exception of Parks. Since COVID-19 virus has an incubation period of 5 to 14 days, mobility and potential exposure will have a delayed effect on the new confirmed cases. Thus, mobility data with 14 days lag difference was correlated to current new cases. Moreover, the Saudi government imposed a curfew on 23 March 2020. Thus, two separate analysis were conducted. The first one is for mobility before curfew from 16 February to 23 March, and its impact on new cases from 1 March to 5 April. The second analysis is from 24 March to 14 June, which represents the time period after curfew, and its impact on new cases from 6 April to 25 June. Sustainability 2021, 13, 8528 10 of 16 Sustainability 2021, 13, x FOR PEER REVIEW 10 of 17

FigureFigure 9. 9.Disease Disease incidence incidence scatterplot scatterplot ( a(a)) versus versus Information Information prevalence prevalence ( b(b).). The analysis of Google mobility data before imposing curfew shows that the increased reduction3.3. Google of Mobility mobility Data in all categories (e.g., grocery & pharmacy, parks, retail and recreation, and residential)Shapiro–Wilk is highly tests andfound negatively that all mobility correlated data with categories decreased were number normally of total distributed, cases in Saudiwith the Arabia exception (see Table of Parks.3). In otherSince word,COVID-19 the less virus the has movement, an incubation the lower period the of number 5 to 14 ofdays, new mobility cases. However, and potential after theexposure curfew, will the analysishave a delayed shows aeffect positive on the weak new relationship confirmed betweencases. Thus, new mobility cases and data mobility with 14 in days grocery lag difference & pharmacy was and correlated retail and to recreation,current new which cases. indicatesMoreover, that the reductionSaudi government in mobility imposed has a a weak curfew relationship on 23 March with 2020. new Thus, cases. two separate On the otheranalysis hand, were there conducted. is a negligible The first correlation one is for between mobility parks before and curfew residential from 16 mobility February and to new23 March, COVID-19 and casesits impact (Pearson on newr = 0.2).cases Noteworthy, from 1 March all theseto 5 April. correlations The second are statistically analysis is significantfrom 24 March with top <14 0.05. June, It which is noteworthy represents that the these time correlationsperiod after arecurfew, all very and large its impact [29]. Figureon new 10 cases shows from the 6 scatterplots April to 25 ofJune. these four correlations. The analysis of Google mobility data before imposing curfew shows that the in- creased reduction of mobility in all categories (e.g., grocery & pharmacy, parks, retail and recreation, and residential) is highly and negatively correlated with decreased number of total cases in Saudi Arabia (see Table 3). In other word, the less the movement, the lower Sustainability 2021, 13, x FOR PEER REVIEW 11 of 17

the number of new cases. However, after the curfew, the analysis shows a positive weak relationship between new cases and mobility in grocery & pharmacy and retail and recre- ation, which indicates that reduction in mobility has a weak relationship with new cases. On the other hand, there is a negligible correlation between parks and residential mobility and new COVID-19 cases (Pearson r = 0.2). Noteworthy, all these correlations are statisti- cally significant with p < 0.05. It is noteworthy that these correlations are all very large Sustainability 2021, 13, 8528 11 of 16 [29]. Figure 10 shows the scatterplots of these four correlations.

Table 3. Correlation coefficients for different community mobility data with new COVID-19 cases Tableper one 3. Correlationmillion before coefficients and after for imposing different curfew community by the mobility Saudi government, data with new and COVID-19 their statistical cases persignificance one million according before to and their after two imposing tailed p-value. curfew by the Saudi government, and their statistical significance according to their two tailed p-value.Mobility Categories Grocery & Phar- ParksMobility Retail Categories & Recreation Residential Grocerymacy & Pharmacy Parks Retail & Recreation Residential r rp-Value p-Value r rp-Value p-Value r r p-Value p-Value r rp-Value p-Value BeforeBefore curfew −0.8−0.8 0.007 0.007 * * −0.6− 0.60.04 0.04 * * −0.8− 0.80.003 0.003 * * −0.8−0.8 0.01 0.01 * AfterAfter curfew curfew 0.40.4 0.0001 0.0001 * * 0.2 0.2 0.006 0.006 * * 0.4 0.4 0.0002 0.0002 * * 0.2 0.2 0.003 0.003 * ** StatisticallyStatistically significant significant. Note:. Note: Parks Parks mobility mobility data was data not was normally not no distributed,rmally distributed, so a Spearman so a rankSpearman correlation wasrank conducted. correlation All was other conducted. correlations All were othe Pearsonr correlations correlations. were Pearson correlations.

Figure 10. Scatter plotsplots ofof totaltotal cases cases and and (A (A) grocer) grocer and and pharmacy, pharmacy, (B )(B parks,) parks, (C )( retailC) retail and and recreation, recrea- andtion, ( andD) residential (D) residential mobility mobility data indata Saudi in Saudi Arabia. Arabia. Blue Bl linesue lines represent represent the linear the linear line ofline best of fit,best and fit, theand shaded the shaded areas areas are the are 95% the 95% confidence confidence interval interval of the of line the ofline best of fit.best fit.

A multiple regression analysis was conductedconducted toto examineexamine thethe relationshiprelationship betweenbetween new casescases ofof COVID-19 COVID-19 and and various various potential potential predictors, predictors, namely namely overall overall mobility, mobility, total total con- firmedconfirmed cases, cases, and and information information prevalence. prevalence. The The results results of the of multiplethe multiple regression regression indicated indi- thatcated the that model the model explained explained 88.31% 88.31% of the of variance the variance and that and thethat model the model was awas significant a significant pre- dictorpredictor of the of numberthe number of new of new cases, cases, F (3, 105)F (3, = 105) 272.9, = 272.9,p < 0.001. p < While0.001. allWhile three all independent three inde- variablespendent variables were significantly were significantly predictive predic of newtive cases of new (total cases cases (total B = 0.02,casesp B< = 0.001; 0.02, pmobility < 0.001; Bmobility = −12.62, B = p−12.62,< 0.001; p < information0.001; information prevalence prevalence B = − B16.82, = −16.82,p < 0.001),p < 0.001), the the proportion proportion of variance in new cases is uniquely explained by each predictor varied substantially: total cases explained 44.32% of the variance, mobility explained 14.11% of the variance, and information prevalence explained 7.28% of the variance (22.60% of the explained variance was therefore not unique to one predictor, but shared between at least two). When this analysis was repeated with the four separate mobility dimensions as opposed to one aver- age mobility score, the results were similar. The model explained 89.16% of the variance, and it was a significant predictor of the number of new cases, F(6, 102) = 149.0, p < 0.001 (see Table4). Total cases, information prevalence, grocery and pharmacy mobility, and parks mobility were all statistically significant predictors of the number of new cases, but Sustainability 2021, 13, 8528 12 of 16

retail and recreation mobility and residential mobility were not. This is might be due to the impact of the curfew, since recreation and residential mobility have the highest percentage of reduction (80% and 100%, respectively) [20]. Therefore, they were not significant predic- tors. In contrast, the reduction in mobility in grocery and pharmacy and parks is low (25% and 40%, respectively), which explains why these two are significant predictors. It should be noted, however, that 58.22% of the variance in the number of new cases was not unique to one predictor, likely due to the four mobility variables all correlating very highly with each other (all rs ≥ 0.80). Therefore, the low percentages of unique variance explained are likely due to multicollinearity.

Table 4. Multiple regression results for total cases, information prevalence, and four mobility scores predicting number of new cases.

Predictor B p % Variance Explained Total cases 0.024 <0.001 * 20.48% Information prevalence −22.06 <0.001 * 9.69% Grocery and pharmacy mobility −46.57 0.009 * 0.11% Parks mobility 28.01 0.012 * 0.46% Retail and recreation mobility 11.54 0.335 0.05% Residential mobility 4.43 0.402 0.15% * Statistically significant.

4. Discussion Social media and online platforms have become key distribution channels for in- formation surrounding COVID-19. One of the main advantages of these data sources is that they can be obtained both anonymously and easily, and early in the epidemic at a low cost. Although governments and health organizations have used these platforms as communication channels to reach the public, populations can become overwhelmed with the propagation of misinformation and disinformation, as it is increasingly prevalent. Infoveillance can monitor how people react to the evolution of the pandemic over time, as well as identify common beliefs, concerns, or hopes regarding prevention, treatment, and vaccines. COVID-19 is the first global pandemic of the digital era. Our results provide a means for understanding people’s perceptions of risk and recognizing the prevalence of misinformation. It also provides insights into the public’s information seeking behaviors and the role of mobility in contributing to the number of cases. Designing an effective risk communication strategy based on digital health solutions can play a major part in the fight against COVID-19 [13]. Understanding the population’s perception and compliance with health guidelines can be utilized to develop more effective RCCE and digital health solutions and health campaigns to improve public awareness and perception. The retweet was the most common form of interaction. It is important to note that most of the popular folk remedies are shared from unreliable sources or low-quality sources. Nevertheless, it must be noted that most retweets are for information from reliable sources or high-quality sources. Twitter users are passionate in notifying their followers with any information about possible coronavirus treatments, whether they are mainstream medical treatments or folk remedies, and whether they are from reliable or unreliable sources. The conversation volume for each remedy changes over time based on the global news and trends. For example, retweets related to “Dexamethasone” spiked when it was announced as a cure to treat COVID-19 [30]. This is an important finding, suggesting that people are aware and up-to-date with any news related to COVID-19. Thus, the use of Twitter as a source of information can cause a wide spread of misinformation. These results build on existing evidence of the internet search behavior and the extent of the results in [31,32]. The results of this study suggest a mean through which public health officials can identify the most common forms of misinformation and indicate that the best remedy Sustainability 2021, 13, 8528 13 of 16

for the COVID-19 infodemic is to broadcast timely and correct information to Twitter audiences. On the other hand, people confronting the COVID-19 pandemic face unfamiliar circumstances and are anxious for any source of information. Therefore, it is important to deliver accurate and appropriate information through risk communication and digital health campaigns [33]. Identifying the top examples of misinformation in Arabic in Twitter helps government authorities and experts debunk comments on misinformation and rumors. Specifically, Twitter analytics can enable them to choose what keywords and hashtags are more prevalent among the audience. For example, governments can effectively communicate accurate information on how to deal with the symptoms of coronavirus. These implications can be utilized to enhance government fact-checking services, risk communication, and health campaigns. While this finding is encouraging, there are many more types of misinformation that need to be analyzed to understand the total impact of myths on the number of COVID- 19 cases. It is crucial to monitor infodemics and build a system to limit the spread of misleading information and potentially harmful Arabic content [34]. Twitter and Google have updated their approaches to misleading information about COVID-19 by classifying its related content for English-speaking audiences [35]. However, it is important to consider filtering the Arabic language content as well. Therefore, there is a need for a strong system to detect rumors or misinformation posted on social media, possibly under the supervision of the government and health monitors such as the work in [21]. Systems such as the one described in [21] could help governments to set laws to prevent the spread of misinformation such as imposing penalties, fines, blocking accounts, or omitting the blogs that have misinformation related to health or health treatment [36]. The study provides a new insight into the relationship between the increased per- ceptions and the number of new cases. The results demonstrate how the internet search volume can be used to measure peoples’ perception of the risk of the pandemic. In line with the hypothesis, the Saudi provinces where people do less research about COVID-19 tend to have more cases. These results build on an existing evidence in [37]. This result emphasizes the importance of encouraging people to stay up to date about the pandemic. Besides, future digital health initiatives should focus on reaching people using different communi- cation channels to make a greater impact on the epidemic, specifically by identifying digital communication channels and influencers with the potential to reach larger audiences. A further finding in our analysis of public mobility and the spread of COVID-19 reveals that increased mobility leads to an increased number of cases. A significant strong negative correlation between the reduction in mobility and the number of daily new cases was found, especially within the grocery and residential categories. Specifically, a low mobility score means more reduction in mobility. Therefore, as people move about more, more cases arise. The results demonstrated match the state-of-the-art methods used in [38] and [39]. This is an important finding in the understanding of the effectiveness of quarantining and social distancing during the pandemic. This result aligns with the findings in [40] that emphasizes a high correlation between social distancing and daily new cases. In general, when enforcing new health guidelines to combat the pandemic, it is crucial to convey the reasoning behind, and aims of, such guidelines. This will help manage people’s fear and increase the likelihood that they will adhere to the measures imposed on them during the crisis. Among all COVID-19 predictors (e.g., total cases, mobility, and information preva- lence), the total cases are the strongest predictor, while the information prevalence is the weakest. This result highlights the strongest predictor of the number of daily new cases. Our findings suggest that the transmission rate increases as the total number of cases increase, therefore emphasizing the importance of imposing more restrictive measures in places where the total number of cases is higher. Moreover, this result can be used to design customized risk mitigation measures for different provinces or cities based on mobility data and total cases [41]. This could contribute to the controlling of the pandemic and supporting economic activities without applying extreme restrictions in unnecessary Sustainability 2021, 13, 8528 14 of 16

areas. Applying unnecessary precautions can delay the economy re-opening, which result in a lower economic activity levels and a slower economic growth. Although this does not mean that information prevalence should be ignored since it is statistically significant, total cases and mobility increase the transmission rate of the virus, better explaining the results. Besides, among the four mobility categories, grocery and pharmacy and parks were significant predictors since the mobility reduction in these two categories was lower than the others. This result emphasizes the importance of curfews during the pandemic.

Study Limitations Some limitations can be attributed to this study in the spread of misinformation in tweets regarding COVID-19. Although multiple search terms can be used to collect COVID- 19 related tweets such as “COVID-19”, “COVID”, “SARS-CoV2”, “Wuhan virus”, and “Chinese virus”, the official Arabic name used by the Saudi Ministry of Health and the most widely used term for COVID-19 in Saudi Arabia is “AKðPñ» ” (‘coronavirus’). Therefore, the latter term with a set of predefined search terms for COVID-19 remedies from Twitter trends were used to collect the tweets. In addition, the study only analyzed tweets written in the Arabic language and geolocated in the country of Saudi Arabia. Moreover, the Twitter standard search API does not allow for collecting tweets older than one week. Thus, tweets posted before 13 June 2020, could not be collected. Furthermore, the study could not collect tweets from private accounts. Thus, the results only represent organizations and people who use Twitter for trading news and information. Therefore, the results may not reflect the real number of tweets discussed by all kinds of users which may include misinformation related to COVID-19. All these points may restrict the generalizability of the results of this study. Locally distributed information and socioeconomic status for different cities are im- portant factors to be considered in the analysis. However, the Saudi Ministry of Health is the main source of information nationwide. Thus, there is no information dissemination from local medical resources in different cities or hospitals other than what the Ministry of Health distributes. This is a protection measure from the Saudi government to guarantee the correctness of the disseminated information. Moreover, there is no local data available about socioeconomic status for different cities. Therefore, the data used for the regression analysis are at a national level and does not consider socioeconomic status as a predictor. It should also be noted that many of the results presented in this study are correlational, and as such does not imply any directional associations.

5. Conclusions and Future Work In this study, we used three different platforms to conduct an infodemiology and in- foveillance survey of COVID-19 in Saudi Arabia. Frist, using Twitter, the study showed the prevalence of misinformation and folk remedies that might hinder people from following medical information from trusted sources. Second, using Google Trends, we investigated the relationship between information prevalence in each province in Saudi Arabia and the number of daily new cases. The results showed that there is a strong negative relationship, which indicates that literacy among people is an important factor in controlling the pan- demic. Third, we used Google mobility data to investigate the impact of mobility on the number of daily new cases. The results showed that reduction of mobility can decrease the number of daily new cases. Finally, the study examined the relationship between new cases of COVID-19 and various potential predictors, namely overall mobility, total confirmed cases, and information prevalence. The analysis showed that the total confirmed cases is the most significant predictor. Governments, policymakers, and healthcare providers can use these results to design effective programs, awareness messages, and community campaigns to increase the per- ception and knowledge about COVID-19. Moreover, governments can apply customized restrictive measures for different places based on the total cases and mobility. Sustainability 2021, 13, 8528 15 of 16

It is important to emphasize that this study focuses on the spreading of misinformation regarding COVID-19. One direction of future work will focus on the opposite side of this study, which discusses the information prevalence on Twitter to analyze public awareness and perception towards COVID-19 in LQS versus HQS. Moreover, establishing directional associations links between COVID-19 information distribution, mobility, and deaths allows for future research to investigate causal associations.

Author Contributions: Conceptualization, R.A. and A.B.; methodology, R.A. and A.B.; software, R.A. and A.B.; validation, R.A. and A.B.; formal analysis, R.A. and A.B.; investigation, R.A. and A.B.; resources, R.A. and A.B.; data curation, R.A. and A.B.; writing—original draft preparation, R.A. and A.B.; writing—review and editing, R.A. and A.B.; visualization, R.A. and A.B. Both authors have read and agreed to the published version of the manuscript. Funding: This research received no external funding. Informed Consent Statement: Not applicable. Data Availability Statement: The data presented in this study are available on request from the corresponding author. Conflicts of Interest: The authors declare no conflict of interest.

Abbreviations

ACF Autocorrelation Function COVID-19 Novel coronavirus disease HQS High-Quality Sources LQS Low-Quality Sources RCCE Risk Communication and Community Engagement WHO World Health Organization

References 1. Harapan, H.; Itoh, N.; Yufika, A.; Winardi, W.; Keam, S.; Te, H.; Megawati, D.; Hayati, Z.; Wagner, A.L.; Mudatsir, M. Coronavirus disease 2019 (COVID-19): A literature review. J. Infect. Public Health 2020, 13, 667–673. [CrossRef][PubMed] 2. World Health Organization. Managing the COVID-19 Infodemic: Promoting Healthy Behaviours and Mitigating the Harm from Misinformation and Disinformation. 2020. Available online: https://www.who.int/news/item/23-09-2020-managing-the-covid- 19-infodemic-promoting-healthy-behaviours-and-mitigating-the-harm-from-misinformation-and-disinformation (accessed on 22 November 2020). 3. Nsoesie, E.O.; Cesare, N.; Müller, M.; Ozonoff, A. COVID-19 Misinformation Spread in Eight Countries: Exponential Growth Modeling Study. J. Med. Internet Res. 2020, 22, e24425. [CrossRef][PubMed] 4. Cinelli, M.; Quattrociocchi, W.; Galeazzi, A.; Valensise, C.M.; Brugnoli, E.; Schmidt, A.L.; Zola, P.; Zollo, F.; Scala, A. The COVID-19 social media infodemic. Sci. Rep. 2020, 10, 1–10. [CrossRef][PubMed] 5. Tagliabue, F.; Galassi, L.; Mariani, P. The “Pandemic” of Disinformation in COVID-19. SN Compr. Clin. Med. 2020, 2, 1287–1289. [CrossRef] 6. Smith, G.D.; Ng, F.; Li, W.H.C. COVID-19: Emerging compassion, courage and resilience in the face of misinformation and adversity. J. Clin. Nurs. 2020, 29, 1425. [CrossRef] 7. World Health Organization. Risk Communication and Community Engagement Readiness and Response to Coronavirus Disease (COVID-19): Interim Guidance. 19 March 2020. Available online: https://apps.who.int/iris/bitstream/handle/10665/331513 /WHO-2019-nCoV-RCCE-2020.2-eng.pdf?sequence=1&isAllowed=y (accessed on 22 November 2020). 8. Rovetta, A.; Bhagavathula, A.S. COVID-19-Related Web Search Behaviors and Infodemic Attitudes in Italy: Infodemiological Study. JMIR Public Health Surveill. 2020, 6, e19374. [CrossRef] 9. Eysenbach, G. Infodemiology and Infoveillance: Framework for an Emerging Set of Public Health Informatics Methods to Analyze Search, Communication and Publication Behavior on the Internet. J. Med. Internet Res. 2009, 11, e11. [CrossRef] 10. Cuan-Baltazar, J.Y.; Muñoz-Perez, M.J.; Robledo-Vega, C.; Pérez-Zepeda, M.F.; Soto-Vega, E. Misinformation of COVID-19 on the Internet: Infodemiology Study. JMIR Public Health Surveill. 2020, 6, e18444. [CrossRef] 11. Kouzy, R.; Jaoude, J.A.; Kraitem, A.; El Alam, M.B.; Karam, B.; Adib, E.; Zarka, J.; Traboulsi, C.; Akl, E.W.; Baddour, K. Coronavirus Goes Viral: Quantifying the COVID-19 Misinformation Epidemic on Twitter. Cureus 2020, 12.[CrossRef] 12. Report of the WHO-China Joint Mission on Coronavirus Disease 2019 (COVID-19). Available online: https://www.who.int/ docs/default-source/coronaviruse/who-china-joint-mission-on-covid-19-final-report.pdf (accessed on 22 November 2020). Sustainability 2021, 13, 8528 16 of 16

13. Ho, Y.-H.; Tai, Y.-J.; Chen, L.-J. COVID-19 Pandemic Analysis for a Country’s Ability to Control. The Outbreak Using Little’s Law: Infodemiology Approach. Sustainability 2021, 13, 5628. [CrossRef] 14. Mavragani, A.; Ochoa, G. Google Trends in Infodemiology and Infoveillance: Methodology Framework. JMIR Public Health Surveill. 2019, 5, e13439. [CrossRef] 15. Singh, L.; Bansal, S.; Bode, L.; Budak, C.; Chi, G.; Kawintiranon, K.; Padden, C.; Vanarsdall, R.; Vraga, E.; Wang, Y. A First Look at COVID-19 Information and Misinformation Sharing on Twitter. ArXiv 2020, arXiv:2003.13907v1. 16. Sahni, H.; Sharma, H. Role of social media during the COVID-19 pandemic: Beneficial, destructive, or reconstructive? Int. J. Acad. Med. 2020, 6, 70. [CrossRef] 17. Bernardo, T.; Liang, C.; Sun, L.; Marlicz, W.; Mavragani, A. Infodemiology and Infoveillance: Scoping Review. J. Med Internet Res. 2020, 22, e16206. [CrossRef] 18. Chaves-Montero, A.; Relinque-Medina, F.; Fernández-Borrero, M.Á.; Vázquez-Aguado, O. Twitter, Social Services and Covid-19: Analysis of Interactions between Political Parties and Citizens. Sustainability 2021, 13, 2187. [CrossRef] 19. Saudi Arabia Social Media Statistics 2020. 8 September 2020. Available online: https://www.globalmediainsight.com/blog/ saudi-arabia-social-media-statistics/ (accessed on 22 November 2020). 20. Google. COVID-19 Community Mobility Reports. Available online: https://www.google.com/covid19/mobility/ (accessed on 22 November 2020). 21. “Covid19 Infodemic Observatory” from FBK-WHO Partnership. 2020. Available online: https://covid19obs.fbk.eu/#/ (accessed on 22 November 2020). 22. COVID-19 Dashboard. Available online: https://covid19.moh.gov.sa/ (accessed on 27 March 2021). 23. Twitter. Search Tweets. Available online: https://developer.twitter.com/en/docs/twitter-api/v1/tweets/search/overview (accessed on 13 June 2020). 24. Google Trends. Available online: https://trends.google.com/trends/explore (accessed on 31 July 2020). 25. Zerrouki, T. Tashaphyne 0.3.4.1. Available online: https://pypi.org/project/Tashaphyne/ (accessed on 22 November 2020). 26. Abdelali, A.; Darwish, K.; Durrani, N.; Mubarak, H. Farasa: A fast and furious segmenter for arabic. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, San Diego, CA, USA, 12–17 June 2016. 27. Alahdal, H.; Basingab, F.; Alotaibi, R. An analytical study on the awareness, attitude and practice during the COVID-19 pandemic in Riyadh, Saudi Arabia. J. Infect. Public Health 2020, 13, 1446–1452. [CrossRef] 28. Bence, J.R. Analysis of Short Time Series: Correcting for Autocorrelation. Ecology 1995, 76, 628–639. [CrossRef] 29. Cohen, J. A power primer. Psychol. Bull. 1992, 112, 155–159. [CrossRef] 30. World Health Organization. Coronavirus Disease (COVID-19): Dexamethasone. 2020. Available online: https://www. who.int/news-room/q-a-detail/coronavirus-disease-covid-19-dexamethasone#:~{}:text=Dexamethasone%20is%20a%20 corticosteroid%20used,for%20critically%20ill%20patients (accessed on 22 November 2020). 31. Rovetta, A.; Bhagavathula, A.S. Global Infodemiology of COVID-19: Analysis of Google Web Searches and Instagram Hashtags. J. Med. Internet Res. 2020, 22, e20673. [CrossRef] 32. Jang, H.; Rempel, E.; Roth, D.; Carenini, G.; Janjua, N.Z. Tracking COVID-19 Discourse on Twitter in North America: Infodemiol- ogy Study Using Topic Modeling and Aspect-Based Sentiment Analysis. J. Med. Internet Res. 2021, 23, e25431. [CrossRef] 33. Mheidly, N.; Fares, J. Leveraging media and health communication strategies to overcome the COVID-19 infodemic. J. Public Health Policy 2020, 41, 1–11. [CrossRef][PubMed] 34. Centers for Disease Control and Prevention. Stop the Spread of Rumors. Available online: https://www.cdc.gov/coronavirus/ 2019-ncov/daily-life-coping/share-facts.html (accessed on 22 November 2020). 35. Van Der Linden, S.; Roozenbeek, J.; Compton, J. Inoculating Against Fake News About COVID-19. Front. Psychol. 2020, 11, 2928. [CrossRef][PubMed] 36. Baraybar-Fernández, A.; Arrufat-Martín, S.; Rubira-García, R. Public Information, Traditional Media and Social Networks during the COVID-19 Crisis in Spain. Sustainability 2021, 13, 6534. [CrossRef] 37. Commodari, E.; La Rosa, V.; Coniglio, M. Health risk perceptions in the era of the new coronavirus: Are the Italian people ready for a novel virus? A cross-sectional study on perceived personal and comparative susceptibility for infectious diseases. Public Health 2020, 187, 8–14. [CrossRef] 38. Fang, H.; Wang, L.; Yang, Y. Human mobility restrictions and the spread of the Novel Coronavirus (2019-nCoV) in China. J. Public Econ. 2020, 191, 104272. [CrossRef] 39. Engle, S.; Stromme, J.; Zhou, A. Staying at Home: Mobility Effects of COVID-19. SSRN Electron. J. 2020.[CrossRef] 40. Qureshi, A.I.; Suri, M.F.K.; Chu, H.; Suri, H.K.; Suri, A.K. Early mandated social distancing is a strong predictor of reduction in peak daily new COVID-19 cases. Public Health 2021, 190, 160–167. [CrossRef] 41. Ceccato, R.; Rossi, R.; Gastaldi, M. Travel Demand Prediction during COVID-19 Pandemic: Educational and Working Trips at the University of Padova. Sustainability 2021, 13, 6596. [CrossRef]