Insight from NLP Analysis: COVID-19 Vaccines Sentiments on Social Media
Total Page:16
File Type:pdf, Size:1020Kb
Insight from NLP Analysis: COVID-19 Vaccines Sentiments on Social Media Tao Na∗, Wei Cheng∗, Dongming Li∗, Wanyu Lu∗, Hongjiang Li∗ ∗Department of Computer Science, University of Manchester, Manchester, UK Correspondence to: Tao Na <[email protected]> Each author has equal contribution Abstract—Social media is an appropriate source for analyzing 2 December 2020 [4], in the US on 12 December 2020 [5] and public attitudes towards the COVID-19 vaccine and various in the EU on 21 December 2020 [6]. At the time of writing, brands. Nevertheless, there are few relevant studies. the availability of the COVID-19 vaccine is still limited, and In the research, we collected tweet posts by the UK and US residents from the Twitter API during the pandemic and designed less than 6% of the global population has received the vaccine. experiments to answer three main questions concerning vaccina- As of 20 March 2021, vaccination coverage in the UK and US tion. To get the dominant sentiment of the civics, we performed was 43.99% and 36.31%, respectively [7]. sentiment analysis by VADER and proposed a new method that The development of a safe vaccine through animal models can count the individual’s influence in. This allows us to go a step of RSV can take up to thirty years [8]. Vaccine development further in sentiment analysis and explain some of the fluctuations in the data changing. The results indicated that celebrities could needs to be evaluated repeatedly in animal models before lead the opinion shift on social media in vaccination progress. putting into clinical trials. Because of the severity of the Moreover, at the peak, nearly 40% of the population in both epidemic, the development pace of new crown vaccines is countries have a negative attitude towards COVID-19 vaccines. unprecedented. Each country and global organization have Besides, we investigated how people’s opinions toward different lowered relevant criteria for new COVID-19 vaccines, and they vaccine brands are. We found that Pfizer vaccine enjoys the most popular among people. By applying the sentiment analysis have also shortened the clinical trials of vaccines [9]. Although tool, we discovered most people hold positive views toward the the vaccines now in use have undergone rigorous testing and COVID-19 vaccine manufactured by most brands. In the end, review to ensure safety, many people remain skeptical about we carried out topic modelling by using the LDA model. We the safety of the vaccines. There are small parts of people who found residents in the two countries are willing to share their even refused to be vaccinated with the COVID-19 vaccine. views and feelings concerning the vaccine. Several death cases have occurred after vaccination. Due to these negative events, US In order to promote the vaccine effectively, it is undoubtedly residents are more worried about the side effects and safety of significant to collect people’s opinions toward the vaccine and the vaccine. its side effects. However, measures such as social isolation, Index Terms—Natural Language Processing, COVID-19, Vac- quarantine and travel restrictions imposed by governments cine, UK, US, Sentiment analysis, Topic modelling, Social media, have hindered the collection process. Therefore social media, Text mining like Twitter, becomes the ideal source of data. To investigate the perceptions and attitudes of the UK I. INTRODUCTION and US citizens regarding the vaccine, we used the twitter Since the first patient was identified in Wuhan, China, in API to collect relevant tweets during the outbreak. And we December 2019 [1], the COVID-19 has spread rapidly to conducted social media analysis on these tweets. The dataset Europe and eventually worldwide. COVID-19 can cause severe was collected by using the Twitter API. Based on a public respiratory illness [2]. This complication has caused more than dataset1 provided by Banda et al. [10], we only kept tweet id arXiv:2106.04081v1 [cs.CL] 8 Jun 2021 2.17 million deaths. An outbreak of the virus has also been column of the English twitter posts with the location limitation spreading in the UK and the US since March 2020. As of 22 to the US and the UK. And then we downloaded all COVID- March 2021, the number of confirmed cases in both countries 19 vaccine-related tweets from the twitter stream via twitter exceeded 29.8 million and 4.3 million. COVID-19 has killed API and the tweet ids. The public messages, like tweet id, more than 668,000 people in the two countries. date, time, lang and country place are included in our final To prevent further spread of COVID-19 and relieve the dataset. The usage of dataset fully compliant with Twitter’s enormous medical pressure, the development and promotion terms of service. of vaccines are crucial. Several pharmaceutical companies Three main questions we analyze in this paper about and universities have been working on COVID-19 vaccines COVID-19 vaccine and our contribution to each question are at an unprecedented rate. More than 260 possible COVID- as shown below: 19 vaccines have been proposed, but only a few have been • What is the dominant sentiment towards COVID-19 approved. Several others are in state-of-the-art steps of testing vaccines? For this question, We provide the analysis [3]. Pfizer is the first international pharmaceutical company to have its vaccine approved in multiple countries: in the UK on 1Public dataset: https://zenodo.org/record/4603998#.YGGK3GQzb0q 1 about the attitude of citizens locating in the UK and US questions: First, such an approach requires a considerable toward the COVID-19 vaccine. The analysis is mainly amount of tagged emotional vocabulary data that is often chal- conducted using a sentiment analysis tool, VADER [11], lenging to obtain towards a particular text-domain. Second, the which is used primarily to explore the main sentiment deep model is Computational extensive in training, validation expressed in people’s tweets. In the research, we also and testing, and the model’s performance in predicting tasks proposed a new method that can capture the user’s public directly limit its ability to process streaming data. Third, such influence on the social network, thus contributing to the deep models are usually of the black box type, with limited analysis of experiments result. interpretability. • Which COVID-19 vaccine brands/manufacturers have Because of the advantages of VADER, it has a wide range been most talked about recently? Do people prefer of applications. Toni Pano and Rasha Kashef [18] researched any brands? Regarding to this, we explore the sentiment if outbreaks of COVID-19 can influence Bitcoin prices. They of people locating in the UK and US toward different performed 13 different strategies for BTC tweets. VADER COVID-19 vaccine brands. We manually determined vac- scoring systems are regarded as the optimum processing cine brands and corresponding keywords that are cur- approach. Mohapatra et al. [19] had attempted to assign each rently talked most about on the Twitter platform. VADER tweet a compound sentiment score based on the VADER is used to analyze people’s preferences regarding different sentiment analysis algorithm. The number of Twitter followers, brands. number of likes, and number of retweets associated with each • What people concern about the COVID-19 vaccine? tweet is used for the final sentiment score. What are the popular topics regarding vaccines? With Three-level hierarchical Bayesian model, Latent Dirichlet respect to this issue, we identified the main concerns Allocation(LDA), is generative probabilistic model for finding among citizens about the COVID-19 vaccine. Using the patterns of words in text corpus [20]. LDA is demonstrated that LDA model, we explored the popular topics from the it outperforms batch variational bayes (VB) and also need less perspective of time series, country and sentiment, respec- running time [21]. The performance of classical state space tively. models and specify a statistical model of topic evolution has been enhanced by David et al. [22]. Based on probabilistic II. RELATED WORK time series, this dynamic model can capture the evolution of Sentiment analysis, also called opinion mining, aims to topics in a corpus. One of LDA limitations is the incompetence evaluate embedded attitudes, opinions, sentiments, evaluations to model topic correlation. [23] has presented the correlated via the computational subjectivity in natural language text topic model(CTM) with respect to this limitation. The CTM di- [11]. Through sentiment analysis, we can know whether a text rectly models correlation between topics by using co-variance has a positive or negative subjective orientation. structure among the components. This proved correlation play Common sentiment analysis models rely heavily on senti- an important role in topic modelling. In [24], key words ment dictionaries. A sentiment lexicon is a set of vocabulary are used to represent topics. Automatic coherence evaluation in which each word is labelled according to the positivity and was proposed to rate coherence or interpretability. Michael et negativity of its subjective orientation. al. [25] proposed a framework that combines existing word- Separately, the different lexicons can be classified into two based coherence measures and the combinations of basic types: semantic orientation labelling (divided into positive or components. This configuration space has explored the best negative) or more fine-grained quantitative scoring with pre- overall correlation for the coherence definition with respect to defined rules. LIWC [12], GI [13], HU-LIU04 [14] are widely all available human topic ranking data. used polarity-based lexicons in which words are context-free. In contrast, ANEW [15], SentiWordNet [16] and SenticNet III. METHODOLOGY [17] are based on sentiment intensity thus could conduct a A. Overview quantitative scoring evaluation. The VADER (Valence Aware Dictionary for Sentiment Rea- Firstly, we collect the tweets data from a public Twitter sonable) proposed by C.J.