Analysis of Twitter and Youtube During Uselections 2020

Analysis of Twitter and YouTube during USelections 2020

Alexander Shevtsov Maria Oikonomidou Despoina Antonakaki [email protected] [email protected] [email protected] FORTH-ICS FORTH-ICS FORTH-ICS Computer Science Dept., U. of Crete Computer Science Dept., U. of Crete Polyvios Pratikakis Sotiris Ioannidis [email protected] [email protected] FORTH-ICS Technical University of Crete Computer Science Dept., U. of Crete FORTH-ICS ABSTRACT KEYWORDS The presidential elections in the United States on 3 November 2020 online social networks, Twitter, YouTube, sentiment analysis have caused extensive discussions on social media. A part of the ACM Reference Format: content on US elections is organic, coming from users discussing Alexander Shevtsov, Maria Oikonomidou, Despoina Antonakaki, Polyvios their opinions of the candidates, political positions, or relevant con- Pratikakis, and Sotiris Ioannidis. 2020. Analysis of Twitter and YouTube tent presented on television. Another significant part of the content during USelections 2020. In Proceedings of ACM Conference (Conference’17). generated originates from organized campaigns, both official and ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn by astroturfing. In this study, we obtain approximately 17.5M tweets containing 1 INTRODUCTION 3M users, based on prevalent hashtags related to US election 2020, as well as the related YouTube links, contained in the Twitter dataset, Today social networks have a very important role in online social likes, dislikes and comments of the videos and conduct volume, discourse and especially during pre-elections period. OSNs can sentiment and graph analysis on the communities formed. either be used productively to perform dissemination, communica- Particularly, we study the daily traffic per prevalent hashtags, tion or administration in elections, like in [? ]. The content posted plot the retweet graph from July to September 2020, show how its by the users can represent their political belief or is either used main connected component becomes denser in the period closer to comment, be sarcastic or express negative opinion towards a to the elections and highlight the two main entities (’Biden’ and political party or ideology. Twitter and YouTube while constitute ’Trump’). Additionally, we gather the related YouTube links con- two of the most popular online social networks attracting million tained in the previous dataset and perform sentiment analysis. The of users daily, capture a very important proportion of this online results on sentiment analysis on the Twitter corpus and the YouTube discourse. metadata gathered, show the positive and negative sentiment for the Through analysis of this online discourse we can discover the two entities throughout this period. The results of sentiment analy- main tendencies and preference of the electorate, generate patters sis indicate that 45.7% express positive sentiment towards Trump in that can distinguish users’ favouritism towards an ideology or Twitter and 33.8% positive sentiment towards Biden, while 14.55% a specific political party, study the sentiment prevailed towards of users express positive sentiment in YouTube metadata gathered the political parties or even predict the outcome of the elections. towards Trump and 8.7% positive sentiment towards Biden. Our Analyzing the sentiment is a necessary step towards any of these analysis fill the gap between the connection of offline events and directions. We are not focusing on the prediction of the electoral their consequences in social media by monitoring important events outcome, or sarcasm detection, but rather exploring the sentiment in real world and measuring public volume and sentiment before towards the political parties in Twitter and YouTube, as well. and after the event in social media. Sentiment analysis have been extensively studied in Twitter, arXiv:2010.08183v4 [cs.SI] 10 Nov 2020 usually studying a specific event. Through sentiment analysis we CCS CONCEPTS can visualize the variation of sentiment of the electorate during a political event [[18, 69]], a company event [[16]], product reviews • Networks → Online social networks; • Information systems [[40]] or model the public mood and emotion and connect tweets’ → Sentiment analysis. sentiment features with fluctuations with real events12 [[ ]]. The main task of these works is to predict elections, classify the elec- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed torate and distinguish posts towards one political party or ideology, for profit or commercial advantage and that copies bear this notice and the full citation although is has been addressed in many works that Twitter is not on the first page. Copyrights for components of this work owned by others than ACM suitable for elections prediction [[20, 21]]. must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a Sentiment in these analyses is represented by a variable with fee. Request permissions from [email protected]. values like ’Positive’, ’Negative’ and ’Neutral’, or even more specific Conference’17, July 2017, Washington, DC, USA (’Happy’,’Angry’, etc.). Each word in the corpus can be assigned © 2020 Association for Computing Machinery. ACM ISBN 978-x-xxxx-xxxx-x/YY/MM...$15.00 with more than one sentiment (’Positive’ and ’Negative’). Other https://doi.org/10.1145/nnnnnnn.nnnnnnn metrics that can be measured in this analysis, are ’subjectivity’ and Conference’17, July 2017, Washington, DC, USA Shevtsov, Oikonomidou, et al.

’polarity’, where the first one is defined as the ratio of ’positive’ and Twitter is used by party candidates, understand the content and ’negative’ tweets to ’neutral’ tweets, while the second is defined as the level of interaction by followers. More detailed background on the ratio of ’Positive’ to ’Negative’ tweets. There is a plethora of Twitter Sentiment Analysis methods can be found on surveys like works of sentiment analysis in Twitter [[23, 25, 37, 38, 48]]. [2, 7, 47, 50]. Sentiment analysis usually requires ’text normalization’, an ini- Background work on sentiment analysis on YouTube [[57, 71]], tial preprocessing of the corpus in order to extract the lexical fea- studies the sentiment on user comments [[58], [35]], identifies the tures that can significantly affect the performance [[29, 32, 44]]. trends and demonstrates the influence of real events of user senti- The steps of the preprocessing include tokenization, expansion of ments [[34]], implements model utilizing audio, visual and textual abbreviations and removal of stop words (URLs, mentions etc.). modalities as sources of information [[45]] and studies the popu- In this study we obtain the most popular hashtags around the US larity indicators and audience sentiments of videos [[1]]. YouTube elections and gather a dataset of 7.5M tweets, for a period of three analysis has been used to discover irrelevant and misleading meta- months. We extract 16.642 unique YouTube videos contained in this data [[6]], to identify spam campaigns [[42]], to discover extremists dataset as well as their metadata (likes, comments, authors etc.). videos and hidden communities [[56]], to propagate preference in- Initially, we perform a volume analysis and an association of the formation of personalised video [[8]], to estimate causality between diverse features of the YouTube videos. The next step is to identify user profiles [[28]] and to apply opinion mining [[54]]. Our analysis two main entities (‘Trump’ and ‘Biden’) in our corpus, in order on the YouTube corpus, is based on several metadata from YouTube to perform sentiment analysis. Next, we study the retweet graph videos, like user comments, likes and identify the sentiment towards in six different time points in our dataset, from July to September our two main entities: ’Biden’ and ’Trump’. 2020 and highlight the two main entities (’Biden’ and ’Trump’). The final section is the sentiment analysis by utilizing Vader sentiment 2 DATASET analysis model and we show the higher positive sentiment towards 2.1 Twitter Donald Trump in Twitter (45.7%) and in YouTube (14.55%) in comparison to the positive sentiment expressed towards Joe Biden, by In this study, we search for all the prevalent hashtags (#2020election, the users in Twitter(33.8%) and YouTube (8.7%). #Vote, #2020usaelection, #Biden, #BlueWave2020, #donaldtrump, #Election2020, #Elections_2020, #MyPresident, #November3rd etc.) regarding the US elections on 3 November 2020 and obtain the 1.1 Background Twitter corpus through Twitter API. The acquisition of the dataset Some methods incorporate the use of Twitter features, like emoti- started on 19 July 2020 and finished 22 September 2020. This resulted cons [[36, 70, 75, 79] ]. The main technique that is used in sentiment in a dataset of approximately 17.5M tweets, containing 3M users, analysis in Twitter is to incorporate a lexicon, specially made for the with the most prevalent hashtags being #Biden, #Trump2020 and domain of the dataset [[4, 22]]. In [33] they obtain three different #vote. In table 1 we can see the most popular hashtags, sorted by the corpora of tweets and explore the usage of linguistic features to- number of tweets within which they are contained and in Appendix wards sentiment analysis. In [77] they are adopting a lexicon-based A, on table 5 the whole list of hashtags used in our analysis. method on diverse Twitter datasets. In [55] they conduct a study of sentiment analysis on a dataset of 26,175 general Bulgarian tweets. Through feature selection and classification (binary SVM) they show that the negative sentiment predominated before and after the election period. We are not compiling a specially made lexicon, because we do not have a language barrier or analyzing a plethora of entities; we are just focusing on the two main candidates. Sentiment Analysis in Twitter does not have to be limited in one language; there are works studying multiple language datasets [[17, 26, 41, 46, 49, 73, 74, 78]]. Recent studies in sentiment analysis use Deep Convolutional Neural Networks [[19, 30, 31, 51–53, 73, 76? ]]. Our work focuses on a single language (English) which is the dataset based on. There is a plethora of studies that have used sentiment analysis in the political domain [[3, 43]], either for group polarization [[14, 15]], during the Arab spring [[72]] or for Hugo Chávez [[39]]. For example in [69] they study a dataset of 36 million tweets on 2012 U.S. presidential candidates and apply a realtime analysis of public sentiment. Using the Amazon Mechanical Turk, they label the dataset with tweets’ sentiment (positive, negative, neutral, or Figure 1: Number of tweets per day. unsure) in order to apply statistical classification (Naïve Bayes model on uni-gram features) on a training set consisting of nearly 17.000 tweets (16% positive,56% negative, 18% neutral, 10% unsure). Figure 1 shows the total number of tweets per day, for every Also, in [13] they attempt to understand the broader picture of how hashtag contained in our dataset. Analysis of Twitter and YouTube during USelections 2020 Conference’17, July 2017, Washington, DC,USA

Hashtag Tweets count From the 12.538 videos, we gathered all the comments and their #Trump2020 2.930.633 replies generated between 19/7/20 and 22/9/20. This resulted in a #VOTE 2.258.115 dataset of 3.091.176 unique commenters and 27.927.909 comments #Vote 1.785.378 and replies. Figure 3 shows the total number of comments and #vote 1.494.138 replies per day, related to the elections. We notice an increase in #Election2020 1.277.839 number of comments from July to September, with diurnal patters. #Biden 1.129.063 Also there is a peak in 21/8 potentially explained by a speech from #Debate2020 827.446 Joe Biden [61] as well as three specific tweets current President #VoteBlueToSaveAmerica 802.793 Donald Trump posted [11, 27, 64]. #BidenHarris2020 658.816 #Trump 597.172 Table 1: The 10 most popular hashtags in our dataset.

2.2 YouTube From the election tweets, we extracted all the YouTube video links, contained on those tweets. We followed these links and ended up with 16.642 unique videos. Through the YouTube Data API, we obtained all the publicly available data regarding these videos. The accessible data used in this study for each video are: (a) the number of views, likes, dislikes and comments, (b) the category where it belongs (e.g. News & Politics, Enter- tainment, Music, etc.), (c) the text and the author of each selected comment, and (d) the YouTube channel that posted the video. In figure 2 we see how many videos belong to each category. Figure 3: Number of comments in videos per day In this study we focus on the election’s topic, so we filtered out the videos that do not belong to the following categories: News & In figure 4 we see different join plots that illustrate the loga- Politics, People & Blogs, and Entertainment. The filtering led to a rithmic relationship between the accessible features of the 12.538 dataset of 12.538 videos. videos. For example, in 4 (a) we see that many videos have 0 comments and close to 0 likes. The main concentration is between 6 and 10. In 4 (e) we see the comparison of views with the duration of a video. The higher concentration of views is between 10 and 15 and for the video duration, between 5 and 8. 3 METHODOLOGY 3.1 Text Preprocessing Initially, for both Twitter and YouTube, we follow a prerequisite set of steps for preprocessing of the corpus (tweets - comments); by removing punctuation symbols, text emojis, URLs, by modifying mentions and hashtags and by removing starting characters of (’@’ and ’#’). This procedure removes the text noise and finally it allows the identification of the entity that was discussed by the users. The next step includes the transformation of the text to lower-case and the tokenization of the collected tweets. Then, we perform lemmati- zation of each token. This technique normalizes the inflected word forms. As a result of the previous steps, this sentiment analysis will be performed on lower cased and normalized sentences. 3.2 Sentiment Analysis In our implementation of sentiment analysis, we utilize Vader [24] sentiment analysis model from python NLTK library[10]. We choose this particular implementation because it is especially at- Figure 2: Number of videos per YouTube Category tuned to the sentiment expressed in social media. Vader uses a list Conference’17, July 2017, Washington, DC, USA Shevtsov, Oikonomidou, et al.

(a) (b)

(e) (f)

Figure 4: Join plots that show the relationship between the different features of the videos. Analysis of Twitter and YouTube during USelections 2020 Conference’17, July 2017, Washington, DC,USA of lexical features that are labeled according to their semantic orien- 4 RESULTS tation. Thanks to the implementation simplicity, sentiment analysis 4.1 Volume Analysis execution time remains low. Utilizing the already developed sentiment solution, we need to perform text filtering and parse our data In this section, we include several volume measures derived from to SentimentIntensityAnalyzer function that returns the scores of our dataset. Initially, we perform a volume analysis on the whole positive, negative, and neutral sentiment types. In our analysis, we corpus of the tweets. In figure 5 we plot the cumulative distribution compute such sentiment for each collected tweet and comment in of the daily tweets where we notice that 90% of the times there are our database and summarize those sentiment scores daily for each 150.000 tweets daily post. In figure 1 we plot the number of tweets particular user. We compute two values of daily sentiment scores per day where we notice the diurnal increases of the posts. In figure per user: summary, where the daily user sentiment is a sum product 6 we can see the cumulative distribution of the daily tweets per of scores of tweets that was created by the user at each day; and the hashtag, where we note the prevailing hashtags of #DonaldTrump, average score where the summary score is divided by the number #MAGA, #vote and #Trump. of tweets where a particular entity was mentioned by the user.

Entity Key words Trump trump, donald, donaldtrump, trump2020, votetrumpout, trumpislosing, dumptrump, nevertrump, republican Biden biden, joseph, bluewave2020, votebluetosaveamerica, voteblue, ridinwithbiden, neverbiden, democrat Table 2: The complete list of all entities with the corresponding keywords that were used for each one.

3.3 The Entities Figure 5: CDF of tweets per day The sentiment analysis of the corpus develops only the emotion vector of a particular sentence without presenting the entity within it was discussed. To identify the entity that was described at each tweet, we generate the set of keywords for the particular dataset entities (‘Trump’ and ‘Biden’). We are matching those keywords in each text to identify if the entity was used on the specific tweet- comment text. Each time the entity is used in a particular tweet- comment, we assign the sentiment values on those entities to the user who posts this particular tweet. Those entity sentiments are assigned to the user daily, since we are interested in the identification of the user/community dynamic and present how entity sentiment evolves day by day. The two entities along with the corresponding keywords that were used for each one, are on table 2.

3.4 Event consequences Since our work is based on the analysis of the 2020 US Presidential elections we monitor real world events that may trigger significant user interest in social media. Interesting examples of such events that can be used are the candidates’ debate on TV. The depict of the online conversations regarding these events are visible within Figure 6: CDF distribution of the daily tweets per hashtag. our analysis. By analyzing the user engagement of these specific periods, one day before and one day after, shows whether such real We plot the daily active users per entity for both Twitter and world events are connected in the virtual world of social networks. YouTube in figure 7, for each entity of ‘Trump’ and ‘Biden’ andwe For this reason we selected all TV debate dates and the dates after notice that the total number of users are increasing from July to President trump was diagnosed positive with COVID-19. September and that the users posting at Twitter and commenting Conference’17, July 2017, Washington, DC, USA Shevtsov, Oikonomidou, et al. on YouTube for Trump exceed the users for Biden, in Twitter and highest numbers of in degree and out-degree respectively, with YouTube respectively. The number of users posting tweets for the anonymized usernames. entity of Trump seems to exceed the corresponding number of users for the entity ‘Biden’. In both figures, we notice a peak on User Number of Followings Number of Followers day 20/8/2020, potentially because of three specific tweets current User 1 62.9K 68.5K President Donald Trump posted [11, 27, 64]. We also notice the User 2 18.5K 78.4K total traffic of Twitter to overcomes the total YouTube traffic. User 3 43.1K 61.7K Table 3: Top retweeted users(highest indegree).

User Number of Followings Number of Followers User 1 5,299 8,505 User 2 37.1K 34.7K User 3 72.3K 78.1K Table 4: Top retweeted users(highest out degree).

4.3 Sentiment Analysis In this section, we present the results from the sentiment analysis in our corpus, as described in 3.2. In figure 9 we present the daily average sentiment for the entity ‘Biden’. The solid line is the average sentiment in YouTube comments and the dotted line is the average sentiment in the corpus of the tweets. Below zero we have the Figure 7: In this figure, we represent the number of daily negative sentiment and above zero the positive sentiment for each users that post YouTube comments and tweets where partic- social media. On 23 August of 2020, we notice a peak on the positive ular entities are discussed. sentiment and could be potentially explained by [62, 63]. In figure 10, we plot the daily overall sentiment for entity Biden, where we notice the same peak of negative sentiment on 23 August, while 4.2 The retweet graph a second peak on 19/9 until 22/9 could be explained by his tweet In figure 8 we plot the retweet graph. This graph is developed bythe regarding the successor of Justice Ginsburg [59] or a follow tweet retweet relationships between all users in the collected dataset. We on the current president[60]. present the retweet relation as a directed edge between two users The corresponding daily plots for overall and average sentiment (nodes). Also, we assign edge weight with the number of retweets on the entity of Trump are shown in 12 and 11. The peak on overall that users are performed for a particular destination node. We also positive sentiment on 12-13 September could be explained by [65– perform a filtering metric where we remove the low weighted edges 68]. of weight 1 until weight 7. With the filtering procedure, we reduce Additionally, in figure 14 we plot the positive sentiment time the noise the volume of not significant relations and also reduce series for the two sets of hashtags for each state. We notice the daily the number of edges that are manageable for visualization. Before fluctuations for every states per entity (blue is the entity ’Biden’ filtering, our graph consisted of 1.142.376 nodes and 3.859.640 edges and red is for ’Trump’. The juxtaposition of the time series in the and after filtering non-significant edges we reduce the number of form resembling an EEG makes it easier to discern localized events nodes to 114.971 and the number of edges to 211.534. For the entity from nation-wide twitter traffic. The list of states abbreviations can visualization, we use 2 colors for our entities, in red color represent be found here: [80]. the entity of Trump, and in blue color the entity of Biden. s We apply sentiment analysis on each day and we measure the number of tweets that a user posts containing a particular entity. We 4.4 Event consequences use this counter to provide coloring of the user node by selecting We use sentiment analysis on specific time points of our dataset the most popular entity of the user at each particular date. We timeline ,with important events that allow us to identify how the also develop a bar plot with the volume of users that allows us social media users react to those events. Specifically, we select the to compare the daily volume per entity. Users that don’t use any dates of the TV debates (September 29 and October 22) and the date entities in their tweets, remain without particular coloring. when the President Trump was diagnosed positive to COVID-19 Graph plots 8 were generated with Gephi Furchterman Reingold (October 2). We selected those particular dates since they were layout [9] while we export Gephi generated positions and we use the most important events during the pre-elections period. We use them to generate a daily graph with networkX python library [5]. the sentiment analysis results to identify the fluctuations of the In tables 3 and 4 we show the highest retweeted used with the sentiment and we measure the volume around each entity topic. Analysis of Twitter and YouTube during USelections 2020 Conference’17, July 2017, Washington, DC,USA

(a) (b)

(e) (f)

Figure 8: Retweet graph relation between users in our dataset. Colours represent entity activity by user red colour represent entity Trump and blue colour entity Biden.

Our results are presented in figures: [16,15], where it is noticeable 5 LINKING TWITTER AND YOUTUBE DATA that the first debate and Trump COVID-19 announcement events In this section, we explore the differences in discussion and commu- generated high volume of user interest in social media. In the first nity between the social networks. We perform Louvain community case of the debates both entities are soared in comparison with pre- detection on both social graphs, we associate communities in the vious dates. At the second event, the volume of the entity ‘Trump’ YouTube comment graph with communities in the Twitter retweet is taking the first place during the user discussions on social media graph, and measure their similarity and differences. by increasing the volume of tweets for the entity ‘Trump’ and by Figure 17 shows the 4-core of the YouTube comment graph. presenting high dissonance on sentiment values. Conference’17, July 2017, Washington, DC, USA Shevtsov, Oikonomidou, et al.

Figure 9: Result of daily average sentiment for the entity for Figure 11: Result of daily average sentiment for the entity Twitter and YouTube collected dataset for entity Biden. for Twitter and YouTube collected dataset for entity Trump.

Figure 10: Result of overall sentiment analysis for Twitter and YouTube collected dataset for entity Biden. Particular Figure 12: Result of overall sentiment analysis for Twitter sentiment values present overall sentiment that was posted and YouTube collected dataset for entity Trump. Particular by users at given dates. sentiment values present overall sentiment that was posted by users at given dates. Figure 19 shows the interactions between the 3 largest YouTube communities (top half) in the YouTube-comment graph (YT) and the 6 largest Twitter communities in the Retweet graph (RT). Each community is named after its highest PageRank member (or second between the features of the YouTube videos. Next, it demonstrates highest, when more clear) in the corresponding graph. The size of sentiment analysis on the Twitter corpus and the YouTube metadata each relation depicts the number of users in the RT community and shows that the positive sentiment is higher for Donald Trump that tweeted video URLs from any channel in the YT community. in comparison with Joe Biden. We identify how real world events trigger user discussions in social media around that topic. Addition- 6 CONCLUSIONS ally, this study includes the evolution of the retweet graph along This work having obtained the tweets for the most popular hash- six different time points in the dataset, from July to September 2020 tags regarding the US elections 2020, as well as the the extracted and highlights the two main entities (’Biden’ and ’Trump’). unique YouTube videos, performs an analysis regarding the volume We are planning to perform sarcasm detection in this dataset, of tweets and users, the identification of entities and the correlation which is an important step in the analysis of political content. Analysis of Twitter and YouTube during USelections 2020 Conference’17, July 2017, Washington, DC,USA

Figure 15: Summary of user tweets sentiment values per Figure 13: Result of daily sentiment per entity. each entity followed by number of tweet. At presented plot we mention 3 important dates in dataset, where 29-09 is the date of first debate, 2-10 the date when Trump was tested positive to COVID-19 and 22-10 the date of last election debate.

Figure 16: Normalized sentiment values of user tweet per each entity followed by number of tweet. At presented plot we mention 3 important dates in dataset, where 29-09 is the date of first debate, 2-10 the date when Trump was tested positive to COVID-19 and 22-10 the date of last election debate.

ACKNOWLEDGMENTS This document is the results of the research project co-funded Figure 14: Positive sentiment time series for the two sets of by the European Commission, project CONCORDIA, with grant hashtags for each state number 830927 (EUROPEAN COMMISSION Directorate-General Communications Networks, Content and Technology) and by the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under Conference’17, July 2017, Washington, DC, USA Shevtsov, Oikonomidou, et al.

Figure 19: Interactions between largest communities in YouTube and Twitter. Figure 17: The 4-core of the YouTube comment graph, color- coded by community. the call RESEARCH–CREATE–INNOVATE (project code:T1EDK- 02857, and T1EDK-01800).

Hashtag Tweets count #Trump2020 2.930.633 #Vote 1.785.378 #vote 1.494.138 #Election2020 1.277.839 #Biden 1.129.063 #VoteBlueToSaveAmerica 802.793 #trump2020 228.289 #VoteTrumpOut 147.065 #election2020 106.412 #trump 89.767 #biden 82.802 #2020election 41.794 #November3rd 34.299 #NovemberIsComing 32.680 #donaldtrump 28.265 #MyPresident 24.609 #Elections_2020 6.249 #2020elections 3.828 Figure 18: The 4-core of the YouTube comment graph, color- #USElections 3.580 coded by community. #bluewave2020 3.040 Total 10.252.523 Table 5: The list of all hashtags in our dataset. Analysis of Twitter and YouTube during USelections 2020 Conference’17, July 2017, Washington, DC,USA

A LIST OF ALL HASHTAGS Systems with Applications 106 (2018), 197–216. [23] Anastasia Giachanou and Fabio Crestani. 2016. Like it or not: A survey of twitter In this section, we list all the retrieved hashtags on which we based sentiment analysis methods. ACM Computing Surveys (CSUR) 49, 2 (2016), 1–41. our dataset along with the tweets count within which there were [24] CHE Gilbert and Erric Hutto. 2014. Vader: A parsimonious rule-based model contained, in table 5. for sentiment analysis of social media text. In Eighth International Conference on Weblogs and Social Media (ICWSM-14). Available at (20/04/16) http://comp. social. gatech. edu/papers/icwsm14. vader. hutto. pdf, Vol. 81. 82. [25] Alec Go, Lei Huang, and Richa Bhayani. 2009. Twitter sentiment analysis. Entropy REFERENCES 17 (2009), 252. [1] Inoka Amarasekara and Will J Grant. 2019. Exploring the YouTube science [26] B. Guthier, K. Ho, and A. E. Saddik. 2017. Language-independent data set anno- communication gender gap: A sentiment analysis. Public Understanding of Science tation for machine learning-based sentiment analysis. In 2017 IEEE International 28, 1 (2019), 68–84. Conference on Systems, Man, and Cybernetics (SMC). SMC, Banff, AB, Canada, [2] Despoina Antonakaki, Paraskevi Fragopoulou, and Sotiris Ioannidis. 2020. A 2105–2110. survey of Twitter research: Data model, graph structure, sentiment analysis and [27] Tucker Higgins. 2020 (accessed September 29, 2020). White House asks Supreme attacks. Expert Systems with Applications 164 (2020), 114006. Court to let Trump block critics on Twitter. https://www.cnbc.com/2020/08/20/ [3] Despoina Antonakaki, Dimitris Spiliotopoulos, Christos V Samaras, Sotiris Ioan- white-house-asks-supreme-court-to-let-trump-block-critics-on-twitter.html nidis, and Paraskevi Fragopoulou. 2016. Investigating the complete corpus of [28] Haeng-Jin Jang, Jaemoon Sim, Yonnim Lee, and Ohbyung Kwon. 2013. Deep referendum and elections tweets. In 2016 IEEE/ACM International Conference on sentiment analysis: Mining the causality between personality-value-attitude for Advances in Social Networks Analysis and Mining (ASONAM). IEEE, IEEE, San analyzing business ads in social media. Expert Systems with applications 40, 18 Francisco, CA, USA, 100–105. (2013), 7492–7503. [4] Despoina Antonakaki, Dimitris Spiliotopoulos, Christos V. Samaras, Polyvios [29] Zhao Jianqiang and Gui Xiaolin. 2017. Comparison research on text pre- Pratikakis, Sotiris Ioannidis, and Paraskevi Fragopoulou. 2017. Social media processing methods on twitter sentiment analysis. IEEE Access 5 (2017), 2870– analysis during political turbulence. PloS one 12, 10 (2017), e0186836. 2879. [5] Dan Schult Aric Hagberg, Pieter Swart. 2005. Python networkx library for graph [30] Z. Jianqiang, G. Xiaolin, and Z. Xuejun. 2018. Deep Convolution Neural Networks creation/visualization. https://networkx.github.io/ for Twitter Sentiment Analysis. IEEE Access 6 (2018), 23253–23260. [6] Payal Bajaj, Mridul Kavidayal, Priyanshu Srivastava, Md Nadeem Akhtar, and [31] Zhao Jianqiang, Gui Xiaolin, and Zhang Xuejun. 2018. Deep convolution neural Ponnurangam Kumaraguru. 2016. Disinformation in multimedia annotation: networks for twitter sentiment analysis. IEEE Access 6 (2018), 23253–23260. Misleading metadata detection on YouTube. In Proceedings of the 2016 ACM [32] Olga Kolchyna, Tharsis TP Souza, Philip Treleaven, and Tomaso Aste. 2015. workshop on Vision and Language Integration Meets Multimedia Fusion. 53–61. Twitter sentiment analysis: Lexicon method, machine learning method and their [7] Rushlene Kaur Bakshi, Navneet Kaur, Ravneet Kaur, and Gurpreet Kaur. 2016. combination. arXiv preprint arXiv:1507.00955 5656 (2015), 33–38. Opinion mining and sentiment analysis. In 2016 3rd International Conference on [33] Efthymios Kouloumpis, Theresa Wilson, and Johanna D Moore. 2011. Twitter Computing for Sustainable Global Development (INDIACom). IEEE, IEEE, New sentiment analysis: The good the bad and the omg! Icwsm 11, 538-541 (2011), Delhi, Andaman and Nicobar Islands, India, 452–455. 164. [8] Shumeet Baluja, Rohan Seth, Dharshi Sivakumar, Yushi Jing, Jay Yagnik, Shankar [34] Amar Krishna. 2014. Polarity trend analysis of public sentiment on YouTube. Kumar, Deepak Ravichandran, and Mohamed Aly. 2008. Video suggestion and dis- (2014). covery for youtube: taking random walks through the view graph. In Proceedings [35] Simon Lindgren. 2012. ‘It took me about half an hour, but I did it!’Media circuits of the 17th international conference on World Wide Web. 895–904. and affinity spaces around how-to videos on YouTube. European Journal of [9] Mathieu Bastian, Sebastien Heymann, and Mathieu Jacomy. 2009. Gephi: An Communication 27, 2 (2012), 152–170. Open Source Software for Exploring and Manipulating Networks. http://www. [36] Kun-Lin Liu, Wu-Jun Li, and Minyi Guo. 2012. Emoticon smoothed language aaai.org/ocs/index.php/ICWSM/09/paper/view/154 models for twitter sentiment analysis.. In Aaai, Vol. 12. Citeseer, Citeseer, Paris, [10] Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural Language Processing France, 22–26. with Python. O’Reilly Media. [37] Eugenio Martínez-Cámara, M Teresa Martín-Valdivia, L Alfonso Urena-López, [11] Justin Boggs. 2020 (accessed September 29, 2020). President Trump and A Rturo Montejo-Ráez. 2014. Sentiment analysis in Twitter. Natural Language goes on an all caps Twitter tirade during the Democratic Convention. Engineering 20, 1 (2014), 1–28. https://www.thedenverchannel.com/news/election-2020/president-trump- [38] Anshul Mittal and Arpit Goel. 2012. Stock prediction using twitter sen- goes-on-an-all-caps-twitter-tirade-during-the-democratic-convention timent analysis. Standford University, CS229 (2011 http://cs229. stanford. [12] Johan Bollen, Alberto Pepe, and Huina Mao. 2009. Modeling public mood and edu/proj2011/GoelMittal-StockMarketPredictionUsingTwitterSentimentAnalysis. emotion: Twitter sentiment and socio-economic phenomena. arXiv preprint pdf) 15 (2012), 2352. arXiv:0911.1583 (2009). [39] Alfredo Jose Morales, Javier Borondo, Juan Carlos Losada, and Rosa M Benito. [13] Christian Christensen. 2013. WAVE-RIDING AND HASHTAG-JUMPING. Infor- 2015. Measuring political polarization: Twitter shows the two sides of Venezuela. mation, Communication & Society 16, 5 (2013), 646–666. https://doi.org/10.1080/ Chaos: An Interdisciplinary Journal of Nonlinear Science 25, 3 (2015), 033114. 1369118X.2013.783609 [40] Subhabrata Mukherjee and Pushpak Bhattacharyya. 2012. Feature specific senti- [14] Elanor Colleoni, Alessandro Rozza, and Adam Arvidsson. 2014. Echo chamber or ment analysis for product reviews. In International Conference on Intelligent Text public sphere? Predicting political orientation and measuring political homophily Processing and Computational Linguistics. Springer, 475–487. in Twitter using big data. Journal of communication 64, 2 (2014), 317–332. [41] Sascha Narr, Michael Hulfenhaus, and Sahin Albayrak. 2012. Language- [15] Michael D Conover, Jacob Ratkiewicz, Matthew R Francisco, Bruno Gonçalves, independent twitter sentiment analysis. Knowledge discovery and machine learn- Filippo Menczer, and Alessandro Flammini. 2011. Political polarization on twitter. ing (KDML), LWA 89898 (2012), 12–14. Icwsm 133, 26 (2011), 89–96. [42] Derek O’Callaghan, Martin Harrigan, Joe Carthy, and Pádraig Cunningham. [16] Mariana Daniel, Rui Ferreira Neves, and Nuno Horta. 2017. Company event 2012. Network analysis of recurring youtube spam campaigns. arXiv preprint popularity for financial markets using Twitter and sentiment analysis. Expert arXiv:1201.3783 (2012). Systems with Applications 71 (2017), 111–124. [43] Brendan O’Connor, Ramnath Balasubramanyan, Bryan R Routledge, and Noah A [17] Alex Davies and Zoubin Ghahramani. 2011. Language-independent Bayesian Smith. 2010. From tweets to polls: Linking text sentiment to public opinion time sentiment mining of Twitter. In The 5th SNA-KDD Workshop’11 (SNA-KDD’11). series. Tepper School of Business 344 (2010), 559. SNA-KDD, University of California, 56–58. [44] Alexander Pak and Patrick Paroubek. 2010. Twitter as a corpus for sentiment [18] Nicholas A Diakopoulos and David A Shamma. 2010. Characterizing debate analysis and opinion mining.. In LREc, Vol. 10. LREC, Valletta, Malta, 1320–1326. performance via aggregated twitter sentiment. In Proceedings of the SIGCHI [45] Soujanya Poria, Erik Cambria, Newton Howard, Guang-Bin Huang, and Amir conference on human factors in computing systems. 1195–1198. Hussain. 2016. Fusing audio, visual and textual clues for sentiment analysis from [19] Cicero Dos Santos and Maira Gatti. 2014. Deep convolutional neural networks multimodal content. Neurocomputing 174 (2016), 50–59. for sentiment analysis of short texts. In Proceedings of COLING 2014, the 25th [46] Tomáš Ptáček, Ivan Habernal, and Jun Hong. 2014. Sarcasm detection on czech International Conference on Computational Linguistics: Technical Papers. COLING, and english twitter. In Proceedings of COLING 2014, the 25th International Con- Dublin, Ireland, 69–78. ference on Computational Linguistics: Technical Papers. ACL, Dublin, Ireland, [20] Daniel Gayo-Avello. 2013. A meta-analysis of state-of-the-art electoral prediction 213–223. from Twitter data. Social Science Computer Review 31, 6 (2013), 649–679. [47] Kumar Ravi and Vadlamani Ravi. 2015. A survey on opinion mining and sentiment [21] Daniel Gayo-Avello, Panagiotis Metaxas, and Eni Mustafaraj. 2011. Limits of analysis: tasks, approaches and applications. Knowledge-Based Systems 89 (2015), electoral predictions using social media data. In Proceedings of the International 14–46. AAAI Conference on Weblogs and Social Media, Barcelona, Spain. [48] Hassan Saif, Yulan He, and Harith Alani. 2012. Alleviating data sparsity for [22] Manoochehr Ghiassi and S Lee. 2018. A domain transferable lexicon set for twitter sentiment analysis. In CEUR Workshop proceedings. CEUR Workshop Twitter sentiment analysis using a supervised machine learning approach. Expert Proceedings (CEUR-WS. org), CEUR, Lyon, France., 297–312. Conference’17, July 2017, Washington, DC, USA Shevtsov, Oikonomidou, et al.

[49] Carl Saroufim, Akram Almatarky, and Mohammad Abdel Hady. 2018. Language [75] Yuki Yamamoto, Tadahiko Kumamoto, and Akiyo Nadamoto. 2014. Role of independent sentiment analysis with sentiment-specific word embeddings. In Emoticons for Multidimensional Sentiment Analysis of Twitter. In Proceedings Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, of the 16th International Conference on Information Integration and Web-Based Sentiment and Social Media Analysis. ACL, Brussels, Belgium, 14–23. Applications and Services (iiWAS ’14). Association for Computing Machinery, New [50] Jesus Serrano-Guerrero, Jose A Olivas, Francisco P Romero, and Enrique Herrera- York, NY, USA, 107–115. Viedma. 2015. Sentiment analysis: A review and comparative analysis of web [76] Quanzeng You, Jiebo Luo, Hailin Jin, and Jianchao Yang. 2015. Robust image services. Information Sciences 311 (2015), 18–38. sentiment analysis using progressively trained and domain transferred deep [51] Aliaksei Severyn and Alessandro Moschitti. 2015. Twitter Sentiment Analysis networks. arXiv preprint arXiv:1509.06041 3 (2015), 270–279. with Deep Convolutional Neural Networks. In Proceedings of the 38th International [77] Lei Zhang, Riddhiman Ghosh, Mohamed Dekhil, Meichun Hsu, and Bing Liu. 2011. ACM SIGIR Conference on Research and Development in Information Retrieval Combining lexicon-based and learning-based methods for Twitter sentiment (SIGIR ’15). Association for Computing Machinery, New York, NY, USA, 959–962. analysis. HP Laboratories, Technical Report HPL-2011 89 (2011), 32. https://doi.org/10.1145/2766462.2767830 [78] Shiwei Zhang, Xiuzhen Zhang, and Jeffrey Chan. 2017. A Word-Character Con- [52] Aliaksei Severyn and Alessandro Moschitti. 2015. Twitter sentiment analysis volutional Neural Network for Language-Agnostic Twitter Sentiment Analysis. with deep convolutional neural networks. In Proceedings of the 38th International In Proceedings of the 22nd Australasian Document Computing Symposium (ADCS ACM SIGIR Conference on Research and Development in Information Retrieval. 2017). Association for Computing Machinery, New York, NY, USA, Article 12, ACM, Chile, 959–962. 4 pages. https://doi.org/10.1145/3166072.3166082 [53] Aliaksei Severyn and Alessandro Moschitti. 2015. Unitn: Training deep convo- [79] Jichang Zhao, Li Dong, Junjie Wu, and Ke Xu. 2012. MoodLens: An Emoticon- lutional neural network for twitter sentiment classification. In Proceedings of Based Sentiment Analysis System for Chinese Tweets. In Proceedings of the the 9th international workshop on semantic evaluation (SemEval 2015). SIGLEX | 18th ACM SIGKDD International Conference on Knowledge Discovery and Data SIGSEM, Denver, Colorado, 464–469. Mining (KDD ’12). Association for Computing Machinery, New York, NY, USA, [54] Aliaksei Severyn, Olga Uryupina, Barbara Plank, Alessandro Moschitti, and Katja 1528–1531. Filippova. 2014. Opinion mining on YouTube. (2014). [80] Wikipedia. 2020 (accessed September 29, 2020). List of U.S. state and territory [55] J. Smailović, J. Kranjc, M. Grčar, M. Žnidaršič, and I. Mozetič. 2015. Monitoring abbreviations. https://en.wikipedia.org/wiki/List_of_U.S._state_and_territory_ the Twitter sentiment during the Bulgarian elections. In 2015 IEEE International abbreviations Conference on Data Science and Advanced Analytics (DSAA). IEEE, Paris, 1–10. [56] Ashish Sureka, Ponnurangam Kumaraguru, Atul Goyal, and Sidharth Chhabra. 2010. Mining youtube to discover extremist videos, users and hidden communities. In Asia Information Retrieval Symposium. Springer, 13–24. [57] Anjana Susarla, Jeong-Ha Oh, and Yong Tan. 2012. Social networks and the diffusion of user-generated content: Evidence from YouTube. Information Systems Research 23, 1 (2012), 23–41. [58] Mike Thelwall, Pardeep Sud, and Farida Vis. 2012. Commenting on YouTube videos: From Guatemalan rock to el big bang. Journal of the American Society for Information Science and Technology 63, 3 (2012), 616–629. [59] Twitter. 2020 (accessed September 29, 2020). Biden tweet on 20/9/2020. https: //twitter.com/JoeBiden/status/1307703815630540802 [60] Twitter. 2020 (accessed September 29, 2020). Biden tweet on 20/9/2020. https: //twitter.com/JoeBiden/status/1307491919384260609 [61] Twitter. 2020 (accessed September 29, 2020). Biden tweet on 21/8/2020. https: //twitter.com/JoeBiden/status/1296599144119914499 [62] Twitter. 2020 (accessed September 29, 2020). Biden tweet on 23/8/2020. https: //twitter.com/JoeBiden/status/1297595030002180099 [63] Twitter. 2020 (accessed September 29, 2020). Biden tweet on 23/8/2020. https: //twitter.com/JoeBiden/status/1297206657576046595 [64] Twitter. 2020 (accessed September 29, 2020). Donald Trump tweet on 20/8/2020. https://twitter.com/realDonaldTrump/status/1296274065762717696 [65] Twitter. 2020 (accessed September 29, 2020). Trump tweet on 10/9/2020. https: //twitter.com/realDonaldTrump/status/1304611528201703424 [66] Twitter. 2020 (accessed September 29, 2020). Trump tweet on 12/9/2020. https: //twitter.com/realDonaldTrump/status/1304920644165881856 [67] Twitter. 2020 (accessed September 29, 2020). Trump tweet on 13/9/2020. https: //twitter.com/realDonaldTrump/status/1304577591597301760 [68] Twitter. 2020 (accessed September 29, 2020). Trump tweet on 13/9/2020. https: //twitter.com/realDonaldTrump/status/1304613988337057794 [69] Hao Wang, Doğan Can, Abe Kazemzadeh, François Bar, and Shrikanth Narayanan. 2012. A system for real-time twitter sentiment analysis of 2012 US presidential election cycle. In Proceedings of the ACL 2012 system demonstrations. ACL, Jeju Island, Korea, 115–120. [70] Hao Wang and Jorge A Castanon. 2015. Sentiment expression via emoticons on social media. In 2015 ieee international conference on big data (big data). IEEE, IEEE, Santa Clara, CA, USA, 2404–2408. [71] Xiaolong Wang, Furu Wei, Xiaohua Liu, Ming Zhou, and Ming Zhang. 2011. Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach. In Proceedings of the 20th ACM international conference on Information and knowledge management. 1031–1040. [72] Ingmar Weber, Venkata R Kiran Garimella, and Alaa Batayneh. 2013. Secular vs. islamist polarization in egypt on twitter. In Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining. IEEE, Niagara Ontario Canada, 290–297. [73] Joonatas Wehrmann, Willian Becker, Henry EL Cagnini, and Rodrigo C Barros. 2017. A character-based convolutional neural network for language-agnostic Twitter sentiment analysis. In 2017 International Joint Conference on Neural Networks (IJCNN). IEEE, IEEE, Anchorage, Alaska, 2384–2391. [74] J. Wehrmann, W. Becker, H. E. L. Cagnini, and R. C. Barros. 2017. A character- based convolutional neural network for language-agnostic Twitter sentiment analysis. In 2017 International Joint Conference on Neural Networks (IJCNN). IJCNN, Anchorage, Alaska, 2384–2391.