Topic, Sentiment and Impact Analysis: COVID19 Information Seeking on Social Media
Total Page:16
File Type:pdf, Size:1020Kb
Topic, Sentiment and Impact Analysis: COVID19 Information Seeking on Social Media Md Abul Bashar, Richi Nayak, Thirunavukarasu Balasubramaniam Queensland University of Technology Brisbane, Queensland, Australia {m1.bashar,r.nayak,t.balasubramaniam}@qut.edu.au ABSTRACT An insight into an outbreak is essential for controlling infectious When people notice something unusual, they discuss it on social diseases and identifying subsequent mental and social issues [2]. media. They leave traces of their emotions via text expressions. A This will help in reducing costs to the economy over the long term systematic collection, analysis, and interpretation of social media and bringing harmony to the society. Especially, early detection data across time and space can give insights on local outbreaks, helps in placing strategies and resources for an appropriate and mental health, and social issues. Such timely insights can help in de- efficient response. On social media, people discuss things thatthey veloping strategies and resources with an appropriate and efficient observe in community [2] They leave traces of their emotions via response. This study analysed a large Spatio-temporal tweet dataset text expressions [9]. A systematic collection, analysis, and interpre- of the Australian sphere related to COVID19. The methodology tation of social media data can give insight into an outbreak. Twitter included a volume analysis, dynamic topic modelling, sentiment is one of the most popular micro-blogging social media websites detection, and semantic brand score to obtain an insight on the where users express their thoughts and opinions on real-world COVID19 pandemic outbreak and public discussion in different events[7]. Social scientists have used tweet datasets for various states and cities of Australia over time. The obtained insights are purposes such as investigating public opinion of Hurricane Irene compared with independently observed phenomena such as gov- [15] and election result prediction [20]. ernment reported instances. Recently, the spatio-temporal texts collected from Sina-Weibo (Twitter alike microblogging system in China) was analysed to CCS CONCEPTS understand public opinions on COVID19 related topics [10]. The static topic modelling technique (LDA) and Random Forest classifier • Applied computing ! Sociology. were used to group tweets into topics for analysis. Studies have KEYWORDS also been published to analyse climate change-related tweets and understand what are the topics of discussion, how the tweet volume COVID19, Sentiment Analysis, Topic Analysis, Impact Analysis, and sentiment changed over time [1, 3, 7]. Authors in [13] applied CNN, ULMFit, Dynamic Topic Modelling, SBS topic modelling on a corpus of geotagged tweets collected from the ACM Reference Format: London sphere. Topic modelling has also been used to estimate the Md Abul Bashar, Richi Nayak, Thirunavukarasu Balasubramaniam. 2018. similarity between users in a location-based social network [14] Topic, Sentiment and Impact Analysis: COVID19 Information Seeking on and to estimate the relatedness of businesses based on business Social Media. In Woodstock ’18: ACM Symposium on Neural Gaze Detection, descriptions [19]. June 03–05, 2018, Woodstock, NY. ACM, New York, NY, USA, 8 pages. https: In this paper, we analyse a large Spatio-temporal tweet dataset //doi.org/10.1145/1122445.1122456 of the Australian sphere Twitter1 containing certain keywords re- 1 INTRODUCTION lating to COVID19. The methodology included a volume analysis, Dynamic Topic Modelling [5], Sentiment Detection [16], and Se- An outbreak of infectious diseases such as COVID19 has a devas- mantic Brand Score (SBS) [8] to obtain an insight into COVID19 tating impact on society with severe socio-economic consequences. outbreak in different states and cities of Australia over the time. The COVID19 pandemic has already caused the largest global reces- arXiv:2008.12435v1 [cs.SI] 28 Aug 2020 The obtained insights are compared with independently ob- sion in history; global stock markets have crashed, travel and trade served phenomena such as government reported instances and industries are losing billions, schools are closed, and health care news on newspapers. To our knowledge, ours is a first in-depth systems are overwhelmed. Mental health and social issues creep study of understanding Australian people’s perception of this on- up as people fear of catching the disease or losing loved ones, as going COVID19 pandemic using a large Twitter data collection. they lose jobs, or as they are required to stay in isolation. More specifically, this study makes the following main contribu- Permission to make digital or hard copies of all or part of this work for personal or tions. (a) Investigates how closely the insights into local outbreak classroom use is granted without fee provided that copies are not made or distributed match independently observed phenomena in space and time. (b) for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM Understands what topics related to COVID19 have been discussed must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, in communities and how they change over time. (c) Understands to post on servers or to redistribute to lists, requires prior specific permission and/or a the COVID19 related sentiments in communities over time. (d) In- fee. Request permissions from [email protected]. Woodstock ’18, June 03–05, 2018, Woodstock, NY vestigates the impact of COVID19 related concepts or words on © 2018 Association for Computing Machinery. ACM ISBN 978-1-4503-XXXX-X/18/06...$15.00 1Location of author or tweet or a location mentioned in the tweet is Australia or any https://doi.org/10.1145/1122445.1122456 of its cities Woodstock ’18, June 03–05, 2018, Woodstock, NY Bashar, Richi and Thiru, et al. The body of each tweet, i.e. tweet message, is used for analysing sentiment, topics and impact. Location and time information of each tweet gives it spatio-temporal dimensions. The location information for each tweet comes from either of three sources based on their availability: (a) tweet location, i.e. the user was in when the tweet was posted; (b) user location, i.e. residence of the user; or (c) a location mentioned in the tweet message. The locations are mapped to capital cities, states, or the country Australia depending on how granular level location information could be extracted. The time Figure 1: The Experimental Workflow information of each tweet comes from the time and day the tweet was posted. Table 1 shows a few examples. social media discussion. (e) Proposes a simple but effective CNN For preprocessing, we removed stopwords, punctuations, invalid architecture for sentiment analysis. characters. We dropped any non English tweets. We fixed repeating 2 EXPERIMENTAL METHODOLOGY characters, converted text to lowercase, replaced an occurrence of link or URL with a token namely xurl, and stemmed the text. We 2.1 Methodology used Named-entity Recognition (NER) technique from spacy3 to The aim of this study is to use social media analysis to uncover extract locations. 4 what is happening in communities and to give insight into (a) For sentiment analysis, we collected the sentiment140 dataset how the virus and lockdown is affecting community emotions, (2) from kaggle. This dataset contains 1.6 million annotated tweets. understand the main topics or themes emerging and evolving in the The tweets are annotated for classes of sentiments: positive and conversation and (3) impact of different COVID19 related concepts. negative. We train a classifier model using these tweets to detect In this study, we conduct spatio-temporal analysis of volume, sentiment in the collected dataset of 2.9 million tweets. We use the sentiment, topic, and impact to a large volume of COVID19 re- same preprocessing as above to prepare this dataset. lated tweets from the Australian sphere. First, we collect a dataset 2.3 Volume analysis of tweets from the Australian sphere containing geospatial and temporal values. The dataset is then preprocessed and prepared Analysing the volume of tweets posted from a particular area at a for volume analysis, sentiment analysis, dynamic topic modelling, particular time is an important step of data exploration that can and impact analysis. The volume analysis aims to identify basic provide interesting insights into observations [7]. We analysis the geospatial and temporal facts from the dataset which will facilitate number of tweets posted in each state and capital in Australia over subsequent analysis such as sentiment and topic into context. Dy- time. namic topic modelling extracts topics present in the dataset and 2.4 Sentiment analysis shows how those topics evolve over time. Sentiment analysis de- Sentiment analysis is used to identify the emotional state or opin- termines the sentiment of every tweet to show how community ion polarity in the samples. We identify the sentiment of each sentiments change over time. Impact analysis generates networks tweet using our proposed CNN based sentiment classifier. We then of concepts or words from the text collection and uses those net- aggregate tweets by location and time to obtain spatio-temporal works to measure how differently the concepts or words impact a distribution of sentiments. The following section gives a summary discussion. The analytical findings are then discussed