<<

applied sciences

Article Mining Textual and Imagery Data during the COVID-19 Pandemic

Dimitrios Amanatidis 1 , Ifigeneia Mylona 2,*, Irene (Eirini) Kamenidou 2 , Spyridon Mamalis 2 and Aikaterini Stavrianea 3

1 Faculty of Sciences and Technology Hellenic Open University, 26335 Patras, Greece; [email protected] 2 Department of Management Science and Technology, International Hellenic University, 65404 Kavala, Greece; [email protected] (I.K.); [email protected] (S.M.) 3 Department of Communication and Media Studies, National and Kapodistrian University of Athens, 10559 Athens, Greece; [email protected] * Correspondence: [email protected]

Abstract: Instagram is perhaps the most rapidly gaining in popularity of photo and video sharing social networking applications. It has been widely adopted by both end-users and organizations, posting their personal experiences or expressing their opinion during significant events and periods of crises, such as the ongoing COVID-19 pandemic and the search for effective vaccine treatment. We identify the three major companies involved in vaccine research and extract their Instagram posts, after vaccination has started, as well as users’ reception using respective , constructing the   datasets. Statistical differences regarding the companies are initially presented, on textual, as well as visual features, i.e., image classification by transfer learning. Appropriate preprocessing of English Citation: Amanatidis, D.; Mylona, I.; language posts and content analysis is subsequently performed, by automatically annotating the Kamenidou, I.; Mamalis, S.; posts as one of four intent classes, thus facilitating the training of nine classifiers for a potential Stavrianea, A. Mining Textual and application capable of predicting user’s intent. By designing and carrying out a controlled experiment Imagery Instagram Data during the we validate that the resulted algorithms’ accuracy ranking is significant, identifying the two best COVID-19 Pandemic. Appl. Sci. 2021, performing algorithms; this is further improved by ensemble techniques. Finally, polarity analysis 11, 4281. https://doi.org/10.3390/ app11094281 on users’ posts, leveraging a convolutional neural network, reveals a rather neutral to negative sentiment, with highly polarized user posts’ distributions. Academic Editors: Stavros Souravlas, Stefanos Katsavounis and Keywords: Instagram; COVID-19; communication; ; machine learning; intent classifica- Sofia Anastasiadou tion; sentiment analysis

Received: 5 April 2021 Accepted: 6 May 2021 Published: 9 May 2021 1. Introduction Social media has radically changed the way that society consumes and produces Publisher’s Note: MDPI stays neutral information today [1]. Organizations are presented with significant opportunities, as social with regard to jurisdictional claims in media contributes to decreasing costs, improving brand awareness, and increasing sales [2]. published maps and institutional affil- Social media has introduced platforms such as , Instagram, and [3], thus iations. greatly enriching businesses’ means of reaching out to their consumers [4]. The most popular social networking services in the USA are , Instagram, and Facebook, with a high degree of acceptance, particularly in the case of younger users [5]. Among the different features that social media offer, interactivity, connectivity, and sharing are Copyright: © 2021 by the authors. identified as the most important ones in [6]. Currently, the number of social media users Licensee MDPI, Basel, Switzerland. around the world exceeds two billion. Table1, redrawn from [ 7], lists the most popular This article is an open access article social media platforms and their associated registered users. distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Appl. Sci. 2021, 11, 4281. https://doi.org/10.3390/app11094281 https://www.mdpi.com/journal/applsci Appl. Sci. 2021, 11, 4281 2 of 18

Table 1. Social media platforms and registered users in January 2021 (millions).

Facebook—2740 YouTube—2291 WhatsApp—2000 FB Messenger—1300 Instagram—1221 WeChat—1213 TikTok—689 QQ—617 Douyin—600 —511 Telegram—500 Snapchat—498 Kuaishou—481 —442 Reddit—430 Twitter—353

1.1. Instagram Instagram is described in [8] as a highly accepted and world-wide image-based social media application and as a trendy tool enabling rapid image and comment sharing across a user’s media channels; at the same time, it invites likes or dislikes by interested followers. It is a rapidly gaining popularity [9], also highlighted as a social photography ‘app’ designed to run on a smartphone [10]. Selfies are associated with Instagram as people take selfie photos by using the camera of their smartphone [11] and they can then easily upload them. Instagram is a new form of social media that offers users the opportunity to communicate their experiences by sharing photos and videos [12–15] and it has been recognized as ‘highly visual social media’ [16]. Photos are very powerful means of conveying emotions, sharing feelings and thoughts, or simply visualizing a random incident [17]. Photos are often uploaded with hashtags. Instagram functions very much like Twitter as it uses the terms following and follower and users can add comments related to the photos uploaded [18]. When adding a caption to an image, users very frequently make use of the ‘@’ symbol, so that other users are mentioned [19]. Instagram users can also add filters to videos and pictures and distribute them to other media [9,19]. Brands have easy access to people and share their exclusive point of view, and companies can use Instagram to sell ideas [20]. It encourages e-WOM (online word of mouth) as it is solely based on the concept of sharing orally and allows users to interact [21]. Young people use Instagram to create visually sophisticated feeds by editing their photos [22]. Brands’ relationships with consumers can be effectively enhanced with Instagram advertisements [23].

1.2. COVID-19 and Social Media The Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) pandemic has caused a great change in the world, due to its extensive and fast spread in November 2019 [24]. The SARS-CoV-2 virus represents a huge health crisis of global proportions [25,26]. The coronavirus disease 2019 (COVID-19) is a new disease that has affected the global community; it resulted in 1,037,430 deaths as of 5 October 2020 [27], while by 6 February 2021, this number reached 2,294,534. Various symptoms characterize the disease, with their appearance ranging from 2 to 14 days after virus exposure [28]. According to World Health Organization (WHO), the most common COVID-19 symptoms are mild or higher fever, persisting dry cough, and sense of tiredness. Other, not so common symptoms include aches and pains, nasal congestion, headache, conjunctivitis, sore throat, diarrhea, and loss of taste or smell. As stated in [29], ‘the disease affects the respiratory tract and disease severity can range from very mild rhinorrhea, to severe acute respiratory distress syndrome’. The contagion of the disease COVID-19 is mainly conducted in places of human congregation, e.g., sports venues, bars, restaurants, beaches, and airports [25]. Masks, physical distancing, hand hygiene, and adequate ventilation are the main measures [30] that all governments have taken in order to prevent the spreading of the disease. Dealing with a situation of the extent of a pandemic like this, has been extremely difficult for all countries. The COVID-19 pandemic is not only a health crisis, it has generated unprecedented and disturbing socio-economic and cultural repercussions and asymmetries. To deal with the coronavirus pandemic and its consequences, most countries have created special committees. The dire economic situation in Greece in recent years had already seriously affected and weakened the economic and social sectors in the country, Appl. Sci. 2021, 11, 4281 3 of 18

and consequently has had a serious, destructive impact on the public health system; the necessity to strengthen the health system was urgent. With the appearance of the first COVID-19 case, a Government Committee of Infectologists was appointed to deal with the oncoming health crisis in the country. It is a flexible and small-member reporting body whose task is to immediately evaluate any new data and act for the timely management of any emergency or regular need that arises, to ensure the health of the population while trying to minimize the consequences on the economy [31]. In the attempt to bring the current COVID-19 pandemic under control, some measures must be taken, both at effective prevention level and at therapy level [32]. Medicines and vaccines have been intensely tested but emphasis has been placed on the mass production of an effective vaccine to control the current COVID-19 pandemic and stop coronavirus infections [33]. In some cases, collaboration among biotechnology and pharmaceutical companies has been essential to achieve a better and quicker result [34]. The vaccine is a challenge for all stakeholders and this has been the first time that a vaccine has been researched and developed at such a rapid pace [29]. The main companies that undertook the obligation to produce the COVID-19 vaccine were Pfizer/BioNTech, Moderna, AstraZeneca, Novavax, Johnson & Johnson, Sinovac Biotech, CanSino Biologics, Sinopharm, and R- Pharm. Most governments have adopted and implemented protective measures for public health safety, including physical distancing between persons and the use of masks. As there is a strong relationship between social media and the various government structures [35], very often those measures were communicated to the public through communication chan- nels such as social media platforms, like Twitter, Facebook [36], and virtual communities that promote the discussion on the critical necessity for acting towards the prevention of the COVID-19. During a pandemic, social media ideally facilitates the prompt spread of new and important information, sharing the experiences of diagnostic treatment and follow-up protocol [37], and provides patients with information about COVID-19 [38,39]. Social media can also be exploited by medical experts so that they can quickly contradict false or fake information with accurate advice [40].

1.3. Similar Studies A similar study [41] to the one presented here was conducted by Kim and Kim in the USA, where the use of Instagram is explored. They case-studied the Centers for Disease Control in South Korea, analyzing the content of their Instagram photos using Microsoft Azure Cognitive Services. Results showed that the largest classes of photos were those of ‘text’ and ‘people’. The authors also found out that images having more human faces or human faces taking up most of the image area did not achieve a high user engagement level with respect to the other images. Additionally, there was a negative association with images with faces expressing positive or neutral emotions and user engagement. Corey and MacLean in their study [42] about the disease in the United States, found that social media in general, and Instagram in particular, form a potential venue where the public can be educated on human papillomavirus vaccines (HPV) and the vaccination process. Posts’ analysis revealed an association of HPV with cancer (35%) and prevention of HPV (32%). With posts scoring more ‘likes’, it was more probable to mention cancer (p = 0.016), as well as HPV screening (p = 0.041). With posts that did mention the HPV and were anti-vaccine, the probability of mentioning prevention or cancer was rather low (p < 0.001), on both cases. In a work focusing on the pandemic evolution in Italy, La Gatta et al. [43], employ a combination of Graph Convolutional Neural Networks (GCNs) and Long-Short-Term Memory Networks (LSTMs) to infer the parameters of the epidemiological models SIR and SIRD (Susceptible, Infected, Recovered, Deceased). A dynamic graph is exploited to model the coronavirus spread, with vertices representing places and edges corresponding to the respective spread due to movement of infected individuals. Their derived model correctly Appl. Sci. 2021, 11, 4281 4 of 18

predicts the contagion curve and is also capable of projecting the total number of infected individuals under different scenarios of varying contact rate. Sentiment Analysis (SA) or Opinion Mining, as is occasionally called, is the process where text documents are analyzed to detect specific ‘affects’ or other emotion patterns towards a related product or service. It can be considered as a sub-topic on the intersection of natural language processing (NLP) and text/web mining disciplines [44]. Analyzing sentiment may find significant applications, as in [45], where the authors detect a person’s mood and emotions, as well as its general personality traits and propose a recommender system for music. Personality characteristics are measured using questionnaires and text mining algorithms on social media streams, in the context of the Big Five (OCEAN) Model: openness, conscientiousness, extraversion, agreeableness, and neuroticism.

1.4. Aims of the Study The main objectives of this study are: • Identify the key stakeholders in COVID-19 vaccine research and investigate the content of their Instagram posting, as well as how this is perceived by users; • Detect any similarities/differences between the respective companies posting on both textual and visual features; • Detect any similarities/differences between the respective users’ perception, by means of hashtags; • Perform user posts’ intent classification, to explore a potential predictive modelling application for detecting what users desire to post; • Perform user posts’ sentiment analysis, to quantify their feelings and opinions. As shown later, the only companies offering vaccine with an Instagram account are Pfizer, Astrazeneca, and Johnson & Johnson. Pfizer Inc. is an American multinational pharmaceutical corporation that was founded 172 years ago in New York City. Pfizer, one of the largest pharmaceutical companies in the world, in cooperation with BioNTech created a vaccine that is based on mRNA technology. This technology introduces part of the genetic material of the SARS-CoV-2 virus in the form of messenger RNA (mRNA) [46]. AstraZeneca is a science-led biopharmaceutical company producing innovative medicines that are used by millions of patients around the world [47]. The AZD1222 vaccine is ‘a replication-deficient simian adenovirus vector, containing the full-length codon-optimized coding sequence of SARS-CoV-2 spike protein along with a tissue plas- minogen activator (tPA) leader sequence’ [48]. Johnson & Johnson is multinational company that was founded in 1886 in USA. Their range of products includes medical devices, pharmaceuticals, and consumer goods [49]. The Johnson & Johnson vaccine is ‘based on the virus’s genetic instructions for building the spike protein. Unlike the Pfizer-BioNTech and Moderna vaccines, which store the instructions in single-stranded RNA, the Johnson & Johnson vaccine uses double-stranded DNA’ [50].

2. Materials and Methods National health agencies, as of January 2021, have approved [51] six vaccines for public use, including: • Tozinameran from US-German cooperation Pfizer–BioNTech; • BBIBP-CorV by Chinese Sinopharm; • CoronaVac by Chinese Sinovac; • Ad5-nCoV by Chinese CanSino Biologics; • mRNA-1273 by US Moderna and its partner Johnson and Johnson; • Gam-COVID-Vac by Russian Gamaleya Research Institute. The three Chinese vaccines have been approved for use solely within China. The Russian one has been approved in Russia, Belarus, and Argentina and the Moderna– Johnson and Johnson vaccine for North America use (US and Canada). On the other hand, Appl. Sci. 2021, 11, 4281 5 of 18

the Pfizer vaccine has had a wider adoption, having secured the approval of EU countries, the UK, US, and Canada, and 14 other countries on different continents. Among others, there is also the Oxford vaccine from British–Swedish Astrazeneca. Authorization and planning strategies differ among individual countries. Only three of these companies have an Instagram account: Pfizer (pfizerinc), As- traZeneca (Astrazeneca) and Johnson and Johnson (jnj). The dates that these accounts have been created can be retrieved by the ‘About ... ’ option found on Instagram mobile application only. Moreover, at the time of writing (27 December 2020) their posts/and rate of posting, followers and following are shown on Table2:

Table 2. Companies accounts details.

Pfizerinc (21 April Astrazeneca (21 May Jnj (17 August 2017) 2016) 2013) 390 (rate: 6.8 1309 (rate: 14.2 33 (rate: 0.8 posts posts/month) posts/month) posts/month) followers 67,800 23,600 32,600 following 273 551 338

Using Instaloader [52], an open-source tool for downloading Instagram images and videos along with their captions and other metadata, we extracted all of Table2 posts from the three official company accounts. Each post consists of a .txt file with the post text, one or more images and/or videos (.jpg/.mp4) and a .json file containing additional metadata about the post. We have developed a generic Python script to iteratively process all posts’ files as these reside in separate account folders and extract various post statistics. Using regular expressions in metadata files’ text, we also find and report comments, likes, and other related information. Comments and likes can be seen as the two most important actions that a user may engage in regarding a post and are the best representatives of active and passive interactions respectively. Post data and metadata information are arranged in a Pandas dataframe with features as columns: • DateTime, the date and time of post in UTC standard; • PostText, the text body of the post; • PostChars, the number of post characters; • PostWords, the number of post words; • HashTags, contained in post; • Likes, the number of likes scored; • Comments, the number of comments made; • Images, the number of uploaded images; • Videos, the number of uploaded videos, if any; • VGG16, image classification output from pretrained VGG16 model, as a list in case of more images; • InceptionV3, image classification output from pretrained InceptionV3 model, as a list in case of more images; • ResNet50, image classification output from pretrained ResNet50 model, as a list in case of more images. For the last three columns, we employ a process known as Transfer Learning where deep pretrained neural network models can be downloaded and used as a starting point to build models for different than the original classification or regression tasks, based usually on image or text features. There are usually three different approaches on how to utilize the pretrained models: 1. Pretrained models used directly as classifiers in an application to classify new images; Appl. Sci. 2021, 11,Appl. x FOR Sci. PEER2021 REVIEW, 11, 4281 6 of 18 6 of 18 Appl. Sci. 2021, 11, x FOR PEER REVIEW 6 of 18

2. Pretrained2. modelsPretrained used as models feature used extractors, as feature with extractors, features subsequently with features be subsequently used as be used as

2. Pretrainedinput to another modelsinput model; used to another as feature model; extractors, with features subsequently be used as 3. inputPretrained to another3. models Pretrained model; used for models better usedweight for initialization better weight of initialization the new integrated of the new model. integrated model. 3. Pretrained models used for better weight initialization of the new integrated model. The first approachThe first is approachnaturally isthe naturally simplest the and simplest less andtime less‐consuming time-consuming one. Ap one.‐ Approaches proachesThe 2first and 2approach 3 and need 3 neednew is models newnaturally models to bethe todesigned simplest be designed and and re and ‐lesstraining re-training time is‐consuming essential. is essential. In one. our In caseAp our‐ case we opted proacheswe opted 2 for and thefor 3 needfirst the approach, firstnew approach,models but to chose be but designed chosethree different three and re different‐ trainingcomputer computer is essential.vision visionconvolutional In our convolutional case neural weneural opted network for thenetwork models, first approach, models, perhaps perhaps butthe chosethree the mostthree three populardifferent most popular ones: computer ones: vision convolutional neural network models, perhaps the three most popular ones:  VGG16 [53]• fromVGG16 Oxford [53 ]Visual from OxfordGeometry Visual Group, Geometry where Group,16 refers where to the 16 number refers toof the number of  VGG16layers, with [53] fromVGG19layers, Oxford also with available. Visual VGG19 Geometry also Innovative available. Group, for Innovative introducing where 16 for refers consistent introducing to the and number consistent repeat of‐ and repeating layers,ing structural with VGG19 blocksstructural also available. blocks Innovative for introducing consistent and repeat‐  ingInceptionV3 structural• [54] blocksInceptionV3 where inception [54] where modules, inception blocks modules, of parallel blocks convolutional of parallel convolutionallayers layers  InceptionV3with different [54] sizedwith where differentfilters inception are sized introduced filtersmodules, are introducedblocks of parallel convolutional layers  withResNet50 different •[55] sizedResNet50where filters residual [55 are] where introducedmodules residual are introduced. modules are These introduced. employ Theseunweighted, employ unweighted,  ResNet50shortcut connections [55] shortcutwhere thatresidual connections memorize, modules that e.g., memorize,are input introduced. to later e.g., inputlayersThese to inemploy later the layersnetwork unweighted, in thearchi network‐ architec- shortcuttecture connectionsture that memorize, e.g., input to later layers in the network archi‐ tecture These models Theseare available models under are available Keras and under can Keras be downloaded and can be pre downloaded‐trained on pre-trained on ImageNet.These ImageNetmodelsImageNet. are is aavailable large ImageNet visual under is database, a largeKeras visual andpopular can database, asbe adownloaded benchmark popular as for apre benchmark visual‐trained object on for visual object ImageNet.recognition ImageNet tasks.recognition ImageNet is a large tasks. comprises visual ImageNet database, 14 million comprises popular color 14 images millionas a benchmark colorand more images for than visual and 20,000 more object ob than‐ 20,000 object recognitionject classes. tasks.To classes.be usedImageNet as To standalone be comprises used as classifiers standalone 14 million in color classifiersour case images some in and our preprocessing more case some than 20,000 preprocessing has to ob be‐ has to be jectperformed. classes. ToThe performed.be complete used as flowstandalone The completeis as follows: classifiers flow is asin our follows: case some preprocessing has to be performed.Load each The completeimageLoad and each flow resize image is as it follows:according and resize itto according model (224 to model× 224 (224for VGG16× 224 for and VGG16 Res‐ and ResNet50, Net50,Load 299 each× 299299 image for× InceptionV3)299 and for resize InceptionV3) it according to model (224 × 224 for VGG16 and Res‐ Net50, 299 × 299 for InceptionV3) 1. Convert the1. colorConvert image the to a color Numpy image array to a Numpy array 1.2. ConvertExtract the the2. three colorExtract (or image four the to in threea case Numpy of (or .png) fourarray color in case channels of .png) and color reshape channels as a single and reshape one‐ as a single 2. Extractdimensional the three arrayone-dimensional (or four in case arrayof .png) color channels and reshape as a single one‐ 3. dimensionalDepending3. on array Dependingmodel, scale on pixel model, RGB scale intensities pixel RGB into intensities either [0,1] into (torch either framework [0,1] (torch framework 3. Dependingmode), [−1,+1] on (tensorflowmodel,mode), [scale−1,+1] framework pixel (tensorflow RGB mode) intensities framework or zero into‐center mode) either BGR or [0,1] zero-center intensities (torch framework BGR unscaled intensities unscaled mode),(caffe framework [−1,+1] (tensorflow(caffe mode). framework These framework are mode). internal mode) These details or are zero internalof‐center the preprocess_input details BGR intensities of the preprocess_input unscaledfunction, function, (caffeimplemented framework differentlyimplemented mode). forThese each differently are model internal for eachdetails model of the preprocess_input function, 4. implementedUse the model4. differently toUse make the modelpredictions for each to make model (probabilities) predictions for (probabilities) all classes for all classes 4.5. UseChoose the modelthe5. highest toChoose make probability thepredictions highest as the probability (probabilities) most likely as thepredicted for most all classes likely result predicted result 5. Choose the highest probability as the most likely predicted result The dataframeThe is also dataframe saved in is .csv also format saved infor .csv further format processing. for further We processing. have run the We have run the scriptThe for dataframethe threescript official is for also the Instagramsaved three in official .csv accounts format Instagram and for results accountsfurther (top processing. and five results rows We only) (top have five are run rowsshown the only) are shown scriptin Figures for the 1–3. threein The Figures official three1 classification–Instagram3. The three accounts models classification and have results not models been (top have optimizedfive notrows been only)or tuned optimized are shownin any or tuned in any inway. Figures The models 1–3.way. The do three The not modelsalways classification agree do not with alwaysmodels their agree have prediction withnot been their and optimized prediction there are quiteor and tuned therea few in are cases any quite a few cases way.where The results models arewhere dofar not from results always realistic. are agree far However, from with realistic. their they prediction However, correctly and recognize they there correctly are people, quite recognize a e.g., few incases people, lab e.g., in lab wherecoats or results scientific arecoats farinstruments. or from scientific realistic. instruments. However, they correctly recognize people, e.g., in lab coats or scientific instruments.

Figure 1. pfizer account postsFigure metadata. 1. pfizer account posts metadata. Figure 1. pfizer account posts metadata.

Figure 2. astrazeneca account posts metadata. Figure 2. astrazeneca accountFigure posts 2. metadata.astrazeneca account posts metadata. Appl. Sci. 2021, 11, 4281 7 of 18 Appl. Sci. 2021, 11, x FOR PEER REVIEW 7 of 18 Appl. Sci. 2021, 11, x FOR PEER REVIEW 7 of 18 Appl. Sci. 2021, 11, x FOR PEER REVIEW 7 of 18

Figure 3. jnj accountFigure posts 3. jnj metadata. account posts metadata. Figure 3. jnj account posts metadata. Figure 3. jnj account posts metadata. AA quick way to gain insight of the images used in the companies’ posts is via word A quick way to gain insight of the images used in the companies’ posts is via word clouds,clouds,A quickas as seen seen way on on Figureto Figure gain 44, insight, generated of the from images all three used model in the predictions companies’ for posts each is company. via word clouds,It can be as observed seen on Figure that companies 4, generated post from photos all three of their model employees predictions or other for each persons company. very Itclouds, can be as observed seen on Figure that companies 4, generated post from photos all three of their model employees predictions or other for each persons company. very Itfrequently, can be observed thereby that classifying companies the post images photos as clothing of their itemsemployees or another or other prop. persons They very also frequently,It can be observed thereby that classifying companies the postimages photos as clothing of their items employees or another or other prop. persons They veryalso frequently,uploadupload graphic thereby images classifying which are the identified identified images as accordingly, clothing items e.g.,e.g., or ‘web_site’.‘web_site’. another prop. They also uploadfrequently, graphic thereby images classifying which are the identified images asaccordingly, clothing items e.g., or‘web_site’. another prop. They also upload graphic images which are identified accordingly, e.g., ‘web_site’.

(a) pfizerinc (b) astrazeneca (c) jnj (a) pfizerinc (b) astrazeneca (c) jnj (a) pfizerinc Figure 4. Word clouds for(b pfizerinc) astrazeneca (a), astrazeneca (b) and jnj (c) profiles. (c) jnj FigureFigure 4. Word 4. Word clouds clouds for pfizerincfor pfizerinc (a), astrazeneca(a), astrazeneca (b) and(b) and jnj ( cjnj) profiles. (c) profiles. Figure 4. Word clouds for pfizerinc (a), astrazeneca (b) and jnj (c) profiles. By manual inspection of the results, we can confirm that there are cases where all threeByBy models manual manual agree inspection inspection in their of ofImageNet the the results, results, class we we predictions, can can confirm confirm cases that that wherethere there are two cases of them where agree all threethreeBy models models manual agree agree inspection in in their their of ImageNet the results, class class we predictions, can confirm cases cases that where wherethere aretwo two cases of them where agree all andthree others models where agree there in theiris disagreement ImageNet classbetween predictions, them. There cases are where also cases two where of them all agreethree andmodelsand others others fail where to where correctly there there isidentify isdisagreement disagreement the image between between objects. them. As them. anThere example, There are also are the cases also following cases where where Figureall three all 5 andthree others models where fail there to correctly is disagreement identify thebetween image them. objects. There As are an also example, cases where the following all three modelsdepicts imagesfail to correctly in pfizerinc identify posts the where: image (a) objects. there is As correct an example, classification the following by all three Figure mod 5‐ modelsFigure5 faildepicts to correctly images identify in pfizerinc the image posts where:objects. (a) As there an example, is correct the classification following Figure by all 5 depictsels; (b) there images is disagreementin pfizerinc posts by one where: model; (a) thereand (c) is allcorrect of the classification models disagree, by all whilethree atmod the‐ depictsthree models; images (b) in therepfizerinc is disagreement posts where: by(a) onethere model; is correct and classification (c) all of the modelsby all three disagree, mod‐ els;same (b) time there none is disagreement of them correctly by one identifies model; andthe image(c) all ofclass. the models disagree, while at the samewhileels; (b) time at there the none sameis disagreement of time them none correctly ofby them one identifies model; correctly theand identifies image (c) all ofclass. the the image models class. disagree, while at the same time none of them correctly identifies the image class.

(a) broccoli (all 3 models) (b) sunglass/power_drill (2/3 agree) (c) candle/jellyfish/strawberry (a) broccoli (all 3 models) (b) sunglass/power_drill (2/3 agree) (c) candle/jellyfish/strawberry (a) broccoli (all 3 models)Figure 5. Examples(b) sunglass/power_drill of models’ predictions (2/3 level agree) of agreement.(c) candle/jellyfish/strawberry Figure 5. Examples of models’ predictions level of agreement. Figure 5. Examples of models’ predictions level of agreement. FigureThe 5. models’Examples prediction of models’ for predictions the object level classes of agreement. were structured together in a new data‐ frameThe to models’facilitate prediction the quantitative for the objectassessment classes of were results. structured The models’ together responses in a new datawere‐ The models’ models’ prediction prediction for for the the object object classes classes were were structured structured together together in a new in a data new‐ framemerged to in facilitate a single predictionthe quantitative in cases assessment of total agreement of results. (3/3) The and models’ partial responsesagreement were (2/3) framedataframe to facilitate to facilitate the quantitative the quantitative assessment assessment of results. of results. The models’ The models’ responses responses were mergedwith a simple in a single voting prediction scheme. inFor cases the ofcases total of agreement complete disagreement,(3/3) and partial we agreement have opted (2/3) to weremerged merged in a single in a single prediction prediction in cases in cases of total of totalagreement agreement (3/3) (3/3)and partial and partial agreement agreement (2/3) withkeep athe simple prediction voting made scheme. by theFor Resnet50the cases modelof complete as it slightly disagreement, outperforms we have InceptionV3 opted to (2/3)with a with simple a simple voting voting scheme. scheme. For the For cases the cases of complete of complete disagreement, disagreement, we wehave have opted opted to keepin accuracy the prediction for the ImageNet made by classificationthe Resnet50 taskmodel (https://paperswithcode.com/sota/image as it slightly outperforms InceptionV3‐ inkeepto accuracy keep the the prediction predictionfor the ImageNet made made by by classificationthe the Resnet50 Resnet50 taskmodel model (https://paperswithcode.com/sota/image as as it it slightly slightly outperforms outperforms InceptionV3‐ classificationin accuracy forfor‐on thethe‐imagenet ImageNet ImageNet (accessed classification classification on 29 task January task (https://paperswithcode.com/sota/image- (https://paperswithcode.com/sota/image 2021)). By accounting for multiple‐im‐‐ classificationage posts, we‐on end‐imagenet up with (accesseda dataframe on 29having January 536 2021)). predicted By accounting image classes for formultiple pfizerinc,‐im‐ ageclassificationclassification-on-imagenet posts, we‐ onend‐imagenet up with (accessed a(accessed dataframe on on having29 29 January January 536 2021)).predicted 2021)). By By accountingimage accounting classes for for formultiple pfizerinc, multiple-‐im‐ 1365ageimage posts, predicted posts, we we end image end up up withclasses with a adataframefor dataframe astrazeneca having having and 536 536 64 predicted predictedpredicted image image image classes classes classes for for for pfizerinc, pfizerinc, jnj. The 1365top five predicted rows for image this new classes dataframe for astrazeneca are shown and on 64 Figure predicted 6: image classes for jnj. The top1365 five predictedpredicted rows for image image this new classes classes dataframe for for astrazeneca astrazeneca are shown and and on 64 64 predictedFigure predicted 6: image image classes classes for jnj.for Thejnj. The top topfive five rows rows for thisfor this new new dataframe dataframe are shownare shown on Figure on Figure6: 6: Appl. Sci. 2021, 11, 4281 8 of 18 Appl. Sci. 2021, 11, x FOR PEER REVIEW 8 of 18 Appl. Sci. 2021, 11, x FOR PEER REVIEW 8 of 18

Figure 6. Predicted image classes. FigureFigure 6. 6. PredictedPredicted image image classes. classes. Creating dictionaries for the images’ classes, we can plot thethe top-10top‐10 mostmost commoncommon Creating dictionaries for the images’ classes, we can plot the top‐10 most common encountered classes for the three companies,companies, as shown on FigureFigure7 7.. WeWe can can infer infer that that encountered classes for the three companies, as shown on Figure 7. We can infer that AstraZeneca mostlymostly post post synthetic synthetic images images with with some some text text superimposed, superimposed, thus thus classified classified as AstraZeneca mostly post synthetic images with some text superimposed, thus classified ‘web_site’,as ‘web_site’, followed followed by clothingby clothing items items (‘lab_coat’) (‘lab_coat’) which which denotes denotes the presencethe presence of a of human. a hu‐ as ‘web_site’, followed by clothing items (‘lab_coat’) which denotes the presence of a hu‐ Theman. same The appliessame applies to Pfizer, to withPfizer, results with however results however following following a flatter distribution. a flatter distribution. Johnson & man. The same applies to Pfizer, with results however following a flatter distribution. Johnson,Johnson & on Johnson, the other on hand, the other prefer hand, posting prefer images posting of humans, images of although humans, their although number their of Johnson & Johnson, on the other hand, prefer posting images of humans, although their postsnumber is significantly of posts is significantly lower. lower. number of posts is significantly lower.

(a) AstraZeneca top‐10 classes (b) Pfizer top‐10 classes (c) Johnson & Johnson top‐10 classes (a) AstraZeneca top‐10 classes (b) Pfizer top‐10 classes (c) Johnson & Johnson top‐10 classes Figure 7. Top‐10 classes for the three companies. FigureFigure 7. 7. TopTop-10‐10 classes classes for for the the three three companies. companies. Rather than just classifying the images in posts (and in fact single objects), it would RatherRather than than just just classifying classifying the the images images in in posts posts (and (and in in fact fact single single objects), objects), it it would would be interesting to employ automatic caption generation, a process to textually describe the bebe interesting interesting to to employ employ automatic automatic caption caption generation, generation, a a process process to to textually textually describe describe the the whole image scene. As Instagram is to a large extent image‐powered (vs, e.g., Twitter) and wholewhole image image scene. scene. As As Instagram Instagram is is to to a a large large extent extent image image-powered‐powered (vs, (vs, e.g., e.g., Twitter) Twitter) and and images are naturally information‐richer, this approach would result in more information imagesimages are are naturally naturally information information-richer,‐richer, this this approach approach would would result result in in more more information information being extracted and perhaps facilitate even sentiment analysis to be performed on images, beingbeing extracted extracted and and perhaps perhaps facilitate facilitate even even sentiment sentiment analysis analysis to to be be performed performed on on images, images, rather than just text. Automatic caption generation is a particularly challenging active re‐ ratherrather than than just just text. text. Automatic Automatic caption caption generation generation is a is particularly a particularly challenging challenging active active re‐ search application area field that lies on the intersection of natural language processing searchresearch application application area area field field that that lies lies on on the the intersection intersection of of natural natural language language processing processing and computer vision. This task is significantly harder than image classification as it re‐ andand computer computer vision. ThisThis task task is is significantly significantly harder harder than than image image classification classification as it as requires it re‐ quires detecting all objects in a scene and how they relate to each other. We have had some quiresdetecting detecting all objects all objects in a scene in a scene and how and they how relate they relate to each to other. each other. We have We had have some had initialsome initial experimentation with different encoder‐decoder architectures, where a Convolu‐ initialexperimentation experimentation with differentwith different encoder-decoder encoder‐decoder architectures, architectures, where where a Convolutional a Convolu‐ tional Neural Network usually encodes the images and a Recurrent Neural Network, e.g., tionalNeural Neural Network Network usually usually encodes encodes the images the images and a Recurrent and a Recurrent Neural Neural Network, Network, e.g., a LSTM e.g., a LSTM (Long Short Term Memory) network, is employed to act as encoder/decoder be‐ a(Long LSTM Short (Long Term Short Memory) Term Memory) network, network, is employed is employed to act as encoder/decoderto act as encoder/decoder between be the‐ tween the text sequence and its vector representation. However, results were not that sat‐ tweentext sequence the text sequence and its vector and its representation. vector representation. However, However, results were results not were that not satisfactory that sat‐ isfactory and as extensive tuning and re‐training with a GPU is needed; we opted not to isfactoryand as extensive and as extensive tuning and tuning re-training and re with‐training a GPU with is needed; a GPU is we needed; opted not we to opted pursue not this to pursue this approach any further. pursueapproach this any approach further. any further. 3. Results 3. Results Reverting toto textualtextual information, information, Table Table3 displays3 displays the the mean mean and and standard standard deviation deviation (in Reverting to textual information, Table 3 displays the mean and standard deviation parentheses)(in parentheses) values values for thefor associatedthe associated dataset dataset features features for the for three the companies.three companies. Although Alt‐ (in parentheses) values for the associated dataset features for the three companies. Alt‐ therehough is there some is difference some difference in the numberin the number of posts of forposts each for company, each company, it can it be can observed be ob‐ hough there is some difference in the number of posts for each company, it can be ob‐ thatserved while that Johnson while Johnson & Johnson & Johnson is significantly is significantly less active less active in posting in posting than thethan other the other two served that while Johnson & Johnson is significantly less active in posting than the other companies,two companies, they they do postdo post longer longer messages. messages. Their Their messages messages also also contain contain moremore hashtags, two companies, they do post longer messages. Their messages also contain more hashtags, receive more likes and comments, and are richer (or at least as rich) in images and videos. receive more likes and comments, and are richer (or at least as rich) in images and videos.

Appl. Sci. 2021, 11, 4281 9 of 18

Table 3. Mean and standard deviation (in parentheses) values for the three companies.

Characters Words Hashtags Likes Comments Images Videos Pfizer (390 posts—since 272.6 (152.7) 41.9 (24.0) 3.6 (3.9) 203.7 (364.9) 17.5 (57.7) 1.4 (1.3) 0.4 (0.5) 4/2016) Astrazeneca (1309 posts—since 177.2 (99.8) 26.8 (15.4) 2.4 (2.6) 43.2 (73.5) 1.8 (7.0) 1.0 (0.4) 0.1 (0.2) 5/2013) Johnson & Johnson 139.1 (33 posts—since 1168.8 (736.1) 198.0 (127.9) 3.9 (1.8) 301.0 (287.5) 1.9 (1.5) 0.3 (0.4) (586.7) 8/2017)

The official companies’ posts are a means of gaining insight into how their social media strategy expresses their policies, how it informs and promotes their products and services. Some degree of user perception can be measured with likes and comments, as these are the most user-familiar ways of passive and active interaction. The number of likes does carry an inherent positive sign; however, comments do not necessarily do so. Thus, in order to measure users’ perception we subsequently downloaded posts containing the three respective hashtags: #pfizer, #astrazeneca, and #jnj. We opted to download these posts for December 2020 and onwards, as it is the month just before and during the first vaccinations took place. We also opted to download posts only from public accounts. The total number of anytime -containing posts are (as of 27 December 2020); 99.519 for #pfizer, 26.222 for #astrazeneca, and 118.282 for #jnj. It is interesting that #jnj is significantly adopted by users, although the company Instagram profile is not as active as the other two. This may be justified by the company’s more recognizable brand and wider product range. For December 2020, we ended up with 646 public posts for #pfizer, 738 public posts for #astrazeneca, and 70 public posts for #jnj. As before, using a similar Python script we have constructed a single dataframe, consisting of all the three hashtag containing user posts text, i.e., 1454 rows. The reason for doing so is partially due to #jnj being underrepresented and to the fact that we would like to increase our sample size for training purposes, as discussed later. Having a first look at the user posts we observe that there are different post languages and that the use of emojis is quite frequent, as seen in Figure8, so although emojis could be utilized, some preprocessing is deemed essential: Google’s Python module CLD2 (Compact Language Detection) is based on a Naive Bayesian classifier and can identify up to three languages in a document and their associ- ated probabilities. Keeping the highest probability for each post we detected its language, except for two unidentified language posts (‘un’) and results are shown in Figure9: Detecting the English language for a post does not necessarily imply that it is com- pletely free of non-English words and characters. In our study, this was shown later when we performed text ‘cleaning’. Our first preprocessing step was to filter out non-English language posts, so the new dataframe consisted of 675 English language posts: 339 for #pfizer, 291 for #astrazeneca, and 45 for #jnj. Our aim here was to classify users’ communi- cation intents. Intent classification differs from sentiment and opinion mining, as it focuses on futuristic action rather than the current state of feelings [56]. Four intent classes were specified, motivated by the works of [57,58] and were partially adopted and modified to fit in our context: Appl. Sci. 2021, 11, x FOR PEER REVIEW 9 of 18

Table 3. Mean and standard deviation (in parentheses) values for the three companies.

Characters Words Hashtags Likes Comments Images Videos Pfizer (390 posts—since 4/2016) 272.6 (152.7) 41.9 (24.0) 3.6 (3.9) 203.7 (364.9) 17.5 (57.7) 1.4 (1.3) 0.4 (0.5) Astrazeneca (1309 posts—since 177.2 (99.8) 26.8 (15.4) 2.4 (2.6) 43.2 (73.5) 1.8 (7.0) 1.0 (0.4) 0.1 (0.2) 5/2013) Johnson & Johnson (33 posts— 1168.8 (736.1) 198.0 (127.9) 3.9 (1.8) 301.0 (287.5) 139.1 (586.7) 1.9 (1.5) 0.3 (0.4) since 8/2017)

The official companies’ posts are a means of gaining insight into how their social media strategy expresses their policies, how it informs and promotes their products and services. Some degree of user perception can be measured with likes and comments, as these are the most user‐familiar ways of passive and active interaction. The number of likes does carry an inherent positive sign; however, comments do not necessarily do so. Thus, in order to measure users’ perception we subsequently downloaded posts contain‐ ing the three respective hashtags: #pfizer, #astrazeneca, and #jnj. We opted to download these posts for December 2020 and onwards, as it is the month just before and during the first vaccinations took place. We also opted to download posts only from public accounts. The total number of anytime hashtag‐containing posts are (as of 27 December 2020); 99.519 for #pfizer, 26.222 for #astrazeneca, and 118.282 for #jnj. It is interesting that #jnj is signif‐ icantly adopted by users, although the company Instagram profile is not as active as the other two. This may be justified by the company’s more recognizable brand and wider product range. For December 2020, we ended up with 646 public posts for #pfizer, 738 public posts for #astrazeneca, and 70 public posts for #jnj. As before, using a similar Python script we have constructed a single dataframe, con‐ sisting of all the three hashtag containing user posts text, i.e., 1454 rows. The reason for doing so is partially due to #jnj being underrepresented and to the fact that we would like to increase our sample size for training purposes, as discussed later. Having a first look at Appl. Sci. 2021, 11, 4281 the user posts we observe that there are different post languages and that the use of 10emojis of 18 is quite frequent, as seen in Figure 8, so although emojis could be utilized, some prepro‐ cessing is deemed essential:

Appl. Sci. 2021, 11, x FOR PEER REVIEW 10 of 18

Google’s Python module CLD2 (Compact Language Detection) is based on a Naive Bayesian classifier and can identify up to three languages in a document and their associ‐ ated probabilities. Keeping the highest probability for each post we detected its language, exceptFigureFigure 8. for8. AnAn two excerptexcerpt unidentified fromfrom users’ users’ language posts. posts. posts (‘un’) and results are shown in Figure 9:

Figure 9. Posts’ language codes.

• Detecting‘Acknowledge’ the English (ACK), language for generic for statements, a post does reporting not necessarily facts and imply sharing that experience it is com‐ pletely• ‘Advise’ free of (ADV),non‐English for suggestions, words and recommendations, characters. In our study, giving this guidelines was shown or offering later when help we• performed‘Seek’ (SEK), text for ‘cleaning’. seeking help, Our advice,first preprocessing comments, orstep answers was to filter out non‐English language• ‘Express’ posts, (EXP), so the for new any dataframe kind of expression, consisted feeling, of 675 English or thought, language positive posts: or negative 339 for #pfizer,(hybrid 291 for intent-sentiment) #astrazeneca, and 45 for #jnj. Our aim here was to classify users’ commu‐ nicationThese intents. were Intent label-encoded, classification leading differs to afrom multi-label sentiment classification and opinion problem, mining, withas it fo all‐ cusesthe challenges on futuristic this action may entail rather (most than notablythe current imbalanced state of feelings classification). [56]. Four To constructintent classes the werenecessary specified, annotated motivated dataset, by the we works employed of [57] INCEpTION and [58] and [59 were], an open-sourcepartially adopted semantic and modifiedannotation to tool fit in which our context: enables automatic text labelling at different levels; word entities, sen- tences ‘Acknowledge’ or larger documents. (ACK), Afterfor generic manual statements, annotation reporting of 100 posts, facts theand tool’s sharing recommender experience subsystem, ‘Advise’ a multi-token(ADV), for sequencesuggestions, classifier recommendations, based on OpenNLP giving NER guidelines (named entityor offering recog- nition)help model, proposed the remaining post labels, which were human-inspected and either ‘Seek’ accepted (SEK), or for corrected. seeking help, Degree advice, of acceptance comments, was or quiteanswers high. Table4 displays the annotation ‘Express’ results (EXP), for for the any three kind hashtag of expression, containing feeling, posts, or with thought, ACK and positive SEK being or negative larger and smaller(hybrid respectivelyintent‐sentiment) for #pfizer and #astrazeneca. The case of #jnj is different as there These were label‐encoded, leading to a multi‐label classification problem, with all the challenges this may entail (most notably imbalanced classification). To construct the nec‐ essary annotated dataset, we employed INCEpTION [59], an open‐source semantic anno‐ tation tool which enables automatic text labelling at different levels; word entities, sen‐ tences or larger documents. After manual annotation of 100 posts, the tool’s recommender subsystem, a multi‐token sequence classifier based on OpenNLP NER (named entity recognition) model, proposed the remaining post labels, which were human‐inspected and either accepted or corrected. Degree of acceptance was quite high. Table 4 displays the annotation results for the three hashtag containing posts, with ACK and SEK being larger and smaller respectively for #pfizer and #astrazeneca. The case of #jnj is different as there were a lot of EXP posts and by manual inspection we did verify that there were not as many COVID‐19‐related posts, but other cosmetic product related ones. As vocabulary is also different, this may have had its impact on the annotation process:

Appl. Sci. 2021, 11, 4281 11 of 18

were a lot of EXP posts and by manual inspection we did verify that there were not as many COVID-19-related posts, but other cosmetic product related ones. As vocabulary is also different, this may have had its impact on the annotation process:

Table 4. Annotation results.

#pfizer #astrazeneca #jnj Totals ACK 155 120 7 282 ADV 81 67 9 157 SEK 6 4 1 11 EXP 97 100 28 225 Totals 339 291 45 675

The 675 annotated English posts were further preprocessed by the following text ‘cleaning’ pipeline steps, where string handling routines and NLTK (Natural Language Tool Kit) were mostly utilized: • Substitute any other, possibly remaining, words containing language characters, ac- cents, etc., with their closest ASCII equivalent, as user posts can be very noisy (e.g., changing the Greek word ‘ελληνικα´ ’ to ‘ellenika’); • Remove URLs in posts, as they are also frequently used, using regular expressions; • Tokenize text and remove punctuation, using NLTK regular expressions tokenizer; • Convert all tokens to lower case, using python’s string method; • Normalize text, using NLTK lemmatization for verbs, nouns and adjectives; • Remove tokens than contain non-alphabetic characters, e.g., numbers, using python’s string method; • Remove English ‘stop words’, words of less importance that appear quite frequently in natural speech, using NLTK; • Remove any remaining non-English, or English un-normalized words (e.g., ‘amigo’, ‘yeaah’, ‘lol’) that may have survived in post, using NTLK corpus. Stemming and lemmatization are the most popular normalization methods. Stemming refers to the process of transforming derivatives words to their root; however, with some undesired effects, e.g., trouble→troubl. Lemmatization refers to the process of grouping different word forms together, so that they can be analyzed under a single ‘lemma’, e.g., ‘better’ and ‘best’ could be lemmatized as ‘good’. The last is an example of adjective lemmatization. Lemmatization can also be applied to nouns (tables→table) and verbs (giving→give). Finally, the most important step of text preparation that was carried out was vectorization, a process that converts words (or tokens) to numerical feature representations. Popular vectorization approaches can utilize different models; Bag-of-Words (BoW) term frequency model, L1-normalized term frequency model, and L2-normalized TF-IDF (Term Frequency—Inverse Document Frequency) model. More recently, models that employ word embeddings are considered; Word2Vec/Doc2Vec (Google), GloVe (Stanford University) and fastText (Facebook). In our case, the corpus of 675 posts was TF-IDF vectorized using scikit-learn feature extractor. An excerpt from the vectorized, sparse dataset is shown in Figure 10. The vocabulary (attributes) consisted of 3024 words: Appl. Sci. 2021, 11, 4281 12 of 18 Appl. Sci. 2021, 11, x FOR PEER REVIEW 12 of 18

FigureFigure 10.10. AnAn excerptexcerpt fromfrom thethe vectorizedvectorized dataset.dataset.

TheThe finalfinal vectorizedvectorized andand annotatedannotated datasetdataset waswas split,split, usingusing 10-fold10‐fold crosscross validation,validation, intointo twotwo subsets subsets for for training training and and testing testing respectively. respectively. These These subsets subsets were were used used to train to ninetrain wekanine weka classifiers classifiers with their with default their default configuration: configuration: one linear one one, linear Logistic one, Logistic Regression Regression (LOG), seven(LOG), non-linear seven non ones,‐linear Naïve ones, Bayes Naïve (NB), Bayes C4.5 (NB), Decision C4.5 Tree Decision (DT), k-NearestTree (DT), Neighbour k‐Nearest (KNN),Neighbour Support (KNN), Vector Support Machine Vector (SVM), Machine Multi-Layer (SVM), Multi Perceptron‐Layer Perceptron (MLP), Bayes (MLP), Net Bayes (BN), LogisticNet (BN), Model Logistic Tree Model (LMT), Tree and one(LMT), ensemble and one classifier, ensemble Random classifier, Forest Random (RF). The Forest objective (RF). hereThe objective was to evaluate here was the to classifiers’ evaluate the performance classifiers’ and performance choose the and best choose suited the algorithm best suited for buildingalgorithm a for predictive building model. a predictive The baseline model. The performance baseline performance was set by using was set the by Zero-Rule using the (ZR)Zero‐ classifierRule (ZR) which classifier simply which predicts simply the predicts dataset the class dataset mode. class In our mode. case In this our was case ‘ACK’ this withwas ‘ACK’ 282 instances with 282 over instances a total over of 675, a total thus of an 675, accuracy thus an of accuracy 41.8%. Resultsof 41.8%. are Results reported are onreported Table5 on. Reducing Table 5. Reducing the vocabulary the vocabulary by keeping by only keeping the, e.g.,only 1000the, e.g., top words,1000 top did words, not improvedid not improve the accuracy. the accuracy. Similarly, Similarly, considering considering bigrams bigrams as well as well unigrams as unigrams led to worse led to results.worse results. However, However, it would it bewould expected be expected to improve to improve with more with data, more i.e., data, more i.e., posts. more Further posts. individualFurther individual parameter parameter tuning for tuning each for algorithm each algorithm might have might also have yielded also betteryielded results. better Differentresults. Different metrics metrics are also are reported also reported (weighted (weighted averages) averages) such as: such Mathews as: Mathews Correlation Corre‐ Coefficientlation Coefficient (MCC), (MCC), Receiver Receiver Operating Operating Characteristic Characteristic (ROC) (ROC) area, F-measure area, F‐measure (harmonic (har‐ precision-recallmonic precision mean),‐recall mean), and Cohen’s and Cohen’s Kappa Kappa metric, metric, with the with last the two last suggested two suggested as more as appropriatemore appropriate for imbalanced for imbalanced classification classification tasks. tasks. Confusion Confusion matrices matrices (Figure (Figure 11) are 11) also are usefulalso useful for showing for showing the numbers the numbers of true of andtrue false and predictionsfalse predictions for each for class.each class.

TableTable 5. 5.Classifiers’ Classifiers’ evaluation evaluation metrics. metrics.

LOGLOG NBNB DTDT KNNKNN SVMSVM MLPMLP RFRF BNBN LMTLMT AccuracyAccuracy 60.460.4 52.952.9 62.162.1 44.044.0 68.668.6 51.751.7 68.068.0 61.561.5 61.561.5 (%)(%) MCCMCC 0.4090.409 0.3940.394 0.4220.422 0.2690.269 0.5230.523 0.3490.349 0.5230.523 0.4120.412 0.4140.414 ROCROC area 0.777 0.763 0.704 0.618 0.790 0.702 0.854 0.766 0.772 0.777 0.763 0.704 0.618 0.790 0.702 0.854 0.766 0.772 F‐areameasure 0.609 0.572 0.614 0.373 0.677 0.549 0.652 0.576 0.609 KappaF- 0.407 0.344 0.419 0.163 0.516 0.310 0.498 0.394 0.412 0.609 0.572 0.614 0.373 0.677 0.549 0.652 0.576 0.609 measure Kappa 0.407 0.344 0.419 0.163 0.516 0.310 0.498 0.394 0.412 Appl. Sci. 2021, 11, x FOR PEER REVIEW 12 of 18

Figure 10. An excerpt from the vectorized dataset.

The final vectorized and annotated dataset was split, using 10-fold cross validation, into two subsets for training and testing respectively. These subsets were used to train nine weka classifiers with their default configuration: one linear one, Logistic Regression (LOG), seven non-linear ones, Naïve Bayes (NB), C4.5 Decision Tree (DT), k-Nearest Neighbour (KNN), Support Vector Machine (SVM), Multi-Layer Perceptron (MLP), Bayes Net (BN), Logistic Model Tree (LMT), and one ensemble classifier, Random Forest (RF). The objective here was to evaluate the classifiers’ performance and choose the best suited algorithm for building a predictive model. The baseline performance was set by using the Zero-Rule (ZR) classifier which simply predicts the dataset class mode. In our case this was ‘ACK’ with 282 instances over a total of 675, thus an accuracy of 41.8%. Results are reported on Table 5. Reducing the vocabulary by keeping only the, e.g., 1000 top words, did not improve the accuracy. Similarly, considering bigrams as well as unigrams led to worse results. However, it would be expected to improve with more data, i.e., more posts. Further individual parameter tuning for each algorithm might have also yielded better results. Different metrics are also reported (weighted averages) such as: Mathews Corre- lation Coefficient (MCC), Receiver Operating Characteristic (ROC) area, F-measure (har- monic precision-recall mean), and Cohen’s Kappa metric, with the last two suggested as more appropriate for imbalanced classification tasks. Confusion matrices (Figure 11) are also useful for showing the numbers of true and false predictions for each class.

Table 5. Classifiers' evaluation metrics.

LOG NB DT KNN SVM MLP RF BN LMT Accuracy 60.4 52.9 62.1 44.0 68.6 51.7 68.0 61.5 61.5 (%) MCC 0.409 0.394 0.422 0.269 0.523 0.349 0.523 0.412 0.414 ROC area 0.777 0.763 0.704 0.618 0.790 0.702 0.854 0.766 0.772 F-measure 0.609 0.572 0.614 0.373 0.677 0.549 0.652 0.576 0.609 Appl. Sci. 2021, 11, 4281Kappa 0.407 0.344 0.419 0.163 0.516 0.310 0.498 0.394 0.412 13 of 18

Appl. Sci. 2021, 11, x FOR PEER REVIEW 13 of 18

Appl. Sci. 2021, 11, x FOR PEER REVIEW 13 of 18

(b) NB (c) DT (a) LOG

(a) LOG (b) NB (c) DT

(d) KNN (e) SVM (f) MLP

(d) KNN (e) SVM (f) MLP

(g) RF (h) BN (i) LMT

(g) RF (h) BN (i) LMT Figure 11. Confusion matrices. Figure 11. Confusion matrices. Figure 11. Confusion matrices. From these initialFrom results these we initial can resultsinfer th weat the can SVM infer and that RF the classifiers SVM and seem RF classifiers more seem more promisingFrom and these promisingrequire initial further andresults require investigation. we can further infer investigation.In that fact, the a simple SVM In andvoting fact, RF a simple(averaging classifiers voting probabil-seem (averaging more probability itypromising rule) scheme andrule) derivedrequire scheme furtherfrom derived these investigation. two from classifiers these In fact, two gave a classifiers simple a joint voting accuracy gave (averaging a joint of 69.3%, accuracy probabil with of ‐ 69.3%, with improvementsity rule) scheme improvementsin all derived other metricsfrom in all these otheras well. two metrics Otherclassifiers as popular well. gave Other ensemble a joint popular accuracy techniques ensemble of 69.3%, techniquesare boot- with are bootstrap strapimprovements aggregationaggregation in (bagging, all other (bagging, asmetrics in RF), as boosti inwell. RF),ng, Other boosting, and popular stacked and stackedensemblegeneralization. generalization. techniques Boosting are Boosting bootthe ‐ the SVM or SVMstrap or aggregationthe RF classifierthe RF (bagging, classifier alone, asdid alone, in not RF), did improve boosting, not improve the and accuracy, the stacked accuracy, with generalization. either with either10 or 100 10 Boosting or models. 100 models. the Stacking StackingSVM or the the two RFthe classifierclassifiers, two classifiers, alone, with did logistic with not logistic improve regression regression the as accuracy, the asmeta-classifier the with meta-classifier either rule, 10 or yielded rule,100 models. yielded an an accuracy accuracyStacking of the 70.5%. twoof 70.5%. classifiers, with logistic regression as the meta‐classifier rule, yielded an accuracyTo validate of 70.5%. ourTo initial validate algorithm our initial ranking, algorithm we design ranking, a controlled we design experiment a controlled where experiment where the topTo 5 classifiers validatethe topour (SVM, 5initial classifiers RF, algorithm DT, BN (SVM, and ranking, RF, LM DT,T) wewere BN design and further LMT) a controlled analyzed. were further Withexperiment analyzed.the experi- where With the exper- ment,the topa result 5 classifiers datasetiment, (SVM,of a result500 rowsRF, dataset DT, was BN ofcreated and 500 LMT) rows (5 algorithms waswere created further ×10-fold (5 analyzed. algorithms cross validationWith× 10-foldthe experi × 10 cross ‐ validation timesment, run) a result and ×statistical dataset10 times of tests 500 run) rows(corrected and was statistical created paired tests (5T-tests) algorithms (corrected could ×10be paired ‐cafoldrried cross T-tests) out validation on could different be × 10 carried out on performancetimes run) andevaluationdifferent statistical performancemetrics. tests (corrected The base evaluation algorithmpaired metrics. T‐tests) was TheSVM.could base Reasonablebe algorithmcarried out assumptions was on different SVM. Reasonable as- aboutperformance Gaussian sumptionsevaluation distributions aboutmetrics. were Gaussian madeThe base and distributions algorithm significance was were level SVM. made was Reasonable and set significanceto 0.05. assumptions Figure level 12 was set to 0.05. showsabout that Gaussian all fourFigure distributions algorithms 12 shows have thatwere allworse made four accuracy algorithmsand significance than have SVM, worse level with was accuracy results set to than being0.05. SVM, Figuresignifi- with 12 results being cantshows for DT, that BN, allsignificant fourand LMT.algorithms (at Thus, the haveSVM 0.05 confidence worseand RF accuracy (with level, not than assignificantly denoted SVM, with by different theresults ’*’ symbol beingresults) signifi next are to‐ results) for indeedcant for the DT, best BN,DT, choices, and BN, LMT. andwith LMT. accuraciesThus, Thus, SVM as SVM and shown RF and (with and RF (withstandard not significantly not significantlydeviations different of different4.83 results) and results,4.66 are as denoted respectively.indeed the bestSimilarby choices, the controlled absence with of accuracies experiments any symbol) as shown ca aren indeedbe and set standardout the for best individual deviations choices, with algorithm of 4.83 accuracies and pa- 4.66 as shown and rameterrespectively. tuning. standardSimilar controlled deviations experiments of 4.83 and 4.66 can respectively. be set out for Similar individual controlled algorithm experiments pa‐ can be set rameter tuning.out for individual algorithm parameter tuning.

Figure 12. T-test results (1-SVM, 2-RF, 3-DT, 4-BN, 5-LMT). Figure 12. T‐test resultsFigure (1 12.‐SVM,T-test 2‐ resultsRF, 3‐DT, (1-SVM, 4‐BN, 2-RF,5‐LMT). 3-DT, 4-BN, 5-LMT). Therefore, in a potential, real-world application the finalized (trained on the entire dataset)Therefore, model could inTherefore, a potential,be leveraged in real a potential, ‐toworld automatically application real-world classify the application finalized new posts the(trained finalizedas they on theare (trained entireup- on the entire loaded,dataset) e.g., model in dataset)real-time could model beor leverageddaily could and be automaticallyto leveragedautomatically to predicting automatically classify andnew classifying classifyposts as new they the posts users’are up as ‐ they are up- communicationloaded, e.g., inloaded, intents real‐time e.g.,for aor in company’s daily real-time and orautomaticallymoni dailytoring and purposes automatically predicting and and beneficiary predicting classifying goals. and the classifying users’ the users’ communicationWith sentimentcommunication intents analysis, for a very company’s intents frequently for amonitoring company’s apart from purposes monitoring opinion and mining, purposes beneficiary the andprocess goals. beneficiary also goals. extractsWith the sentimentexpressionWith analysis, attributes, sentiment very e.g., analysis, frequently polarity, very apart subject, frequently from and opinion apart opinion from mining, holder. opinion the processSubjectiv- mining, also the process also ity/Objectivityextracts the expressionextractsclassification the attributes, expression can also e.g.,be attributes, performed, polarity, e.g., subject, as well polarity, asand retrieval subject,opinion of andholder. direct/compar- opinion Subjectiv holder.‐ Subjectiv- ativeity/Objectivity and explicit/implicitity/Objectivity classification opinions. classificationcan also There be performed,have can also also be been performed,as well research as retrieval as efforts well as toof retrieval detectdirect/compar affects of direct/comparative ‐ withinative text.and Aexplicit/implicit list of six basic opinions.emotions Thereis given have in [60]: also happiness, been research sadness, efforts surprise, to detect anger, affects disgust,within and text. fear. A list Ekman of six basic later emotionsexpanded is his given list in[61] [60]: to happiness,include pride, sadness, shame, surprise, embarrass- anger, ment,disgust, and andexcitement, fear. Ekman while later other expanded researchers his havelist [61] additionally to include considered pride, shame, trust embarrass and an- ‐ ticipation [62], or guilt and shyness [63]. More recently, 27 discrete emotions were identi- fied by self-reporting, in a study by Keltner and Cowen [64]. Appl. Sci. 2021, 11, 4281 14 of 18

and explicit/implicit opinions. There have also been research efforts to detect affects within Appl. Sci. 2021, 11, x FOR PEER REVIEW text. A list of six basic emotions is given in [60]: happiness, sadness,14 surprise, of 18 anger, dis-

gust, and fear. Ekman later expanded his list [61] to include pride, shame, embarrassment, and excitement, while other researchers have additionally considered trust and anticipa- tion [62], or guilt and shyness [63]. More recently, 27 discrete emotions were identified by ment, and excitement, while other researchers have additionally considered trust and an‐ self-reporting, in a study by Keltner and Cowen [64]. ticipation [62], or guilt and shyness [63]. More recently, 27 discrete emotions were identi‐ Obviously, text must be vectorized if any sentiment analysis is to be performed. fied by self‐reporting, in a study by Keltner and Cowen [64]. With Bag-Of-Words models, the ordering of words is not considered, thereby producing Obviously, text must be vectorized if any sentiment analysis is to be performed. With sparse numerical arrays for word representations. However, with Embedding models Bag‐Of‐Words models, the ordering of words is not considered, thereby producing sparse the position of words is learned from text, based on surrounding words and dense vector numerical arrays for word representations. However, with Embedding models the posi‐ word projections are obtained. In the context of this work we trained a Convolution tion of words is learned from text, based on surrounding words and dense vector word Neural Network (CNN) on the IMDB dataset for 100 epochs, under Keras, which supports projections are obtained. In the context of this work we trained a Convolution Neural Net‐ Embedding layers. The CNN consisted of the following sequential layers and respective work (CNN) onhyperparameter the IMDB dataset settings: for 100 epochs, under Keras, which supports Embed‐ ding layers. The CNN consisted of the following sequential layers and respective hyperpa‐ 1. Embedding layer where input_dim = 100,000, output_dim = 32, input_length = 1000; rameter settings: 2. Conv1D layer with filters = 32, kernel_size = 3, size = ‘same’, activation = ‘relu’; 1. Embedding3. layerMaxPooling1D where input_dim layer = with 100,000, pool_size output_dim = 2, strides = 32, = input_length 2; = 1000; 2. Conv1D layer4. withFlatten filters layer = 32, where kernel_size the previous = 3, size layer’s = ‘same’, 2d output activation is flattened = ‘relu’; to a 1d vector; 3. MaxPooling1D5. Dense layer with layer pool_size with 500 fully= 2, strides connected = 2; units, activation = ‘relu’; 4. Flatten layer6. whereDense the layer previous with a layer’s single 2d output output neuron, is flattened activation to a 1d = ‘sigmoid’. vector; 5. Dense layer with 500 fully connected units, activation = ‘relu’; The CNN model was configured to utilize logarithmic loss (binary_crossentropy) and 6. Dense layer with a single output neuron, activation = ‘sigmoid’. the ADAM optimization procedure. There was a total of ~11 M trainable parameters and The CNNwithout model anywas furtherconfigured hyperparameter to utilize logarithmic tuning, it achieved loss (binary_crossentropy) an accuracy of 86.6%. Deploying and the ADAMthe optimization model on the procedure. 675 English There language was a posts, total we of ~11M obtained trainable a list of parameters sentiment polarity scores, and without anywith further values hyperparameter in interval [0,1] (negativetuning, it to achieved positive an respectively). accuracy of The 86.6%. overall De‐ sentiment was ploying the model0.38, on neutral the 675 to English negative, language with a rather posts, large we obtained standard a deviationlist of sentiment however po of‐ 0.45. Results larity scores, withfor posts values corresponding in interval [0,1] to the(negative three hashtagsto positive are respectively). shown on Table The6 overalland distributions in sentiment wasFigure 0.38, neutral 13. Posts to madenegative, for #pfizer with a seem rather to large be more standard positive deviation than the however other two hashtags and of 0.45. Resultsstandard for posts deviations corresponding are in to all the cases three large. hashtags Distributions are shown are alsoon Table highly 6 polarizedand towards distributions inthe Figure ends 13. of the Posts interval, made withfor #pfizer many positiveseem to be and more negative positive results. than In the any other case, one does not two hashtags andhave standard to neglect deviations the fact that are users’in all cases short large. posts Distributions are very frequently are also full highly of peculiarities and polarized towardscannot the be ends considered of the interval, as ‘proper with text’. many positive and negative results. In any case, one does not have to neglect the fact that users’ short posts are very frequently full of peculiaritiesTable and 6. Mean cannot and be standard considered deviation as ‘proper polarity text’. values.

Table 6. Mean and standard deviation polarityOverall values. #pfizer #astrazeneca #jnj mean 0.38 0.42 0.34 0.39 Overall #pfizer #astrazeneca #jnj standard mean 0.38 0.450.42 0.460.34 0.440.39 0.43 deviation standard deviation 0.45 0.46 0.44 0.43

Figure 13. Polarity distributions. Figure 13. Polarity distributions.

4. Discussion In this paper, we performed an analysis of textual and visual features regarding the Instagram posts of the three vaccine‐offering companies, during the onset period of the first vaccinations. Our results can be compared with other studies, e.g., [41], as we too found out that the companies post images of people to a large extent and users highly Appl. Sci. 2021, 11, 4281 15 of 18

4. Discussion In this paper, we performed an analysis of textual and visual features regarding the Instagram posts of the three vaccine-offering companies, during the onset period of the first vaccinations. Our results can be compared with other studies, e.g., [41], as we too found out that the companies post images of people to a large extent and users highly discuss the arrival of the vaccines and engage in opinion sharing with respect to their success. Our research has shown that only three companies have an active account on Insta- gram, i.e., Pfizer, Astrazeneca, and Johnson & Johnson. Astrazeneca has had an Instagram account longer than the other two, and has had the most posts and the highest rate of posting, but the lowest number of followers. By descriptive statistics, we infer that Johnson & Johnson post less frequently than Pfizer and Astrazeneca, but when they do post, their posts are lengthier. Moreover, their posts contain more hashtags, receive more likes and comments, thus having a greater impact on users. This can be attributed to the fact that Johnson & Johnson is a larger company with a wider range of products, e.g., cosmetics. Their posts additionally include more images and videos. Regarding the images posted, the study has shown that all three companies upload photos of their employees or other persons, classifying the images as clothing items or props, e.g., stethoscope or microscope. Moreover, the image classification outputs from the three models have been organized as a dataset enabling the quantitative and qualitative assessment of results, which is demon- strated and discussed. AstraZeneca mostly post synthetic images followed by images of humans. The same applies to Pfizer, with results however following a flatter distribution. Johnson & Johnson on the other hand, prefer posting images of humans, although their number of posts is significantly lower. With respect to the user posts, these are to a large extent written in English, with Spanish, Portuguese, and Italian following. After filtering out non-English language posts and preprocessing, the automatic annotation process has shown that the ‘acknowledge’ class is the largest, with ‘expression’ and ‘advice’ following and a very small size for the ‘seek’ class. Thus, users’ post intent was mainly devoted to making generic statements, reporting facts, and sharing their experiences, which in this context meant their experiences after vaccination. Users do not seem to be seeking help or advice about COVID-19 or vaccination process. For the predictive modelling application, results have shown that the best performing algorithms for intent classification, were equally Support Vector Machines and Random Forest, significantly better than the rest of the suite of algorithms examined. Finally, polarity analysis on users’ posts, leveraging a convolutional neural network, reveals a rather neutral to negative sentiment, with highly polarized user posts’ distributions. Possible future extensions to this work include: investigation of other social media platforms, for example Facebook and Twitter; investigation of other vaccine producing companies, e.g., Moderna and augment the dataset with more recent posts; employment of automatic caption generation to retrieve the textual description of the image scene rather than mere object classification; performance of sentiment analysis on other affects as well, rather than just polarity; carrying out parameter tuning to improve model’s performance, as in all of the algorithms used in this work we have mostly opted to stick to the default configurations. Due to the tremendous increase in the data volumes being produced [65–68] and the big data explosion, an investigation on the use of big data from the social media platforms could be of great importance.

Author Contributions: Conceptualization, D.A., I.M. and I.K.; Methodology, D.A., I.M., I.K. and A.S.; Software, D.A.; Validation, D.A., S.M. and I.K.; Formal Analysis, D.A., I.M., I.K. and A.S.; Investigation, D.A. and I.M.; Resources, D.A. and I.M.; Data Curation, D.A.; Writing—Original Draft Preparation, D.A., S.M. and A.S.; Writing—Review & Editing, D.A., S.M. and A.S.; Visualization, D.A.; Supervision, I.M., I.K., S.M. and D.A.; Project Administration, I.M., I.K., S.M. and D.A. All authors have read and agreed to the published version of the manuscript. Funding: This research received no external funding. Institutional Review Board Statement: Not applicable. Appl. Sci. 2021, 11, 4281 16 of 18

Informed Consent Statement: Not applicable. Data Availability Statement: Datasets (raw or processed) and results are available from the authors upon request. Conflicts of Interest: The authors declare no conflict of interest.

References 1. Hays, S.; Page, S.J.; Buhalis, D. Social media as a destination marketing tool: Its use by national tourism organisations. Curr. Issues Tour. 2013, 16, 211–239. [CrossRef] 2. Dwivedi, Y.K.; Ismagilova, E.; Hughes, D.L.; Carlson, J.; Filieri, R.; Jacobson, J.; Jain, V.; Karjaluoto, H.; Kefi, H.; Krishen, A.S.; et al. Setting the future of digital and research: Perspectives and research propositions. Int. J. Inf. Manag. 2020, 102168. [CrossRef] 3. Hanika, I.M.; Miranti, A. Social Media and Fake News in 2017 Jakarta Governor Election. In Proceedings of the 5th International Conference on Education & Social Sciences (ICESS), “The Asia Network: Bringing Time, Space and Social Life Together”, Semarang, Indonesia, 26–27 July 2017. 4. Srivastava, R.; Chawla, S.; Popat, V. Social Media a New Platform for Mass Marketing. Adv. Innov. Res. 2020, 7, 238. 5. Saunders, J.F.; Eaton, A.A. Snaps, selfies, and shares: How three popular social media platforms contribute to the sociocultural model of disordered eating among young women. Cyberpsychol. Behav. Soc. Netw. 2018, 21, 343–354. [CrossRef] 6. Amanatidis, D.; Mylona, I.; Mamalis, S.; Kamenidou, I.E. Social media for cultural communication: A critical investigation of museums’ Instagram practices. J. Tour. Herit. Serv. Mark. JTHSM 2020, 6, 38–44. 7. Most Popular Social Networks Worldwide as of October 2020, Ranked by Number of Active Users. Available online: https: //www.statista.com/statistics/272014/ (accessed on 29 January 2021). 8. Hanan, H.; Putit, N. Express marketing of tourism destinations using Instagram in social media networking. Hosp. Tour. 2013, 471.[CrossRef] 9. Liebhart, K.; Bernhardt, P. Political storytelling on Instagram: Key aspects of Alexander Van der Bellen’s successful 2016 presidential election campaign. Media Commun. 2017, 5, 15–25. [CrossRef] 10. Zappavigna, M. Social media photography: Construing subjectivity in Instagram images. Vis. Commun. 2016, 15, 271–292. [CrossRef] 11. Trulline, P.; El Karimah, K. Selfie Women’s Photo on Instagram (Virtual Etnography Study Post Photos Selfie on Instagram). In Proceedings of the 5th International Conference on Education & Social Sciences (ICESS), “The Asia Network: Bringing Time, Space and Social Life Together”, Semarang, Indonesia, 26–27 July 2017. 12. Weilenmann, A.; Hillman, T.; Jungselius, B. Instagram at the museum: Communicating the museum experience through social photo sharing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Paris, France, 27 April–2 May 2013; pp. 1843–1852. 13. Tyer, S. Instagram: What Makes You Post? Pepperdine J. Commun. Res. 2016, 4, 14. 14. Anagnostopoulos, C.; Parganas, P.; Chadwick, S.; Fenton, A. Branding in pictures: Using Instagram as a brand management tool in professional team sport organisations. Eur. Sport Manag. Q. 2018, 18, 413–438. [CrossRef] 15. Hunt, D.S.; Lin, C.A.; Atkin, D.J. Communicating social relationships via the use of photo-messaging. J. Broadcasting Electron. Media 2014, 58, 234–252. [CrossRef] 16. Marengo, D.; Longobardi, C.; Fabris, M.A.; Settanni, M. Highly-visual social media and internalizing symptoms in adolescence: The mediating role of body image concerns. Comput. Hum. Behav. 2018, 82, 63–69. [CrossRef] 17. Terttunen, A. The Influence of Instagram on Consumers’ Travel Planning and Destination Choice. Bachelor’s Thesis, Haaga-Helia University, Helsinki, Finland, 2017. 18. Kusuma, K. Activities of the Cyber Public Relations of O Chanel TV in Promoting their Company on the Instagram Social Media. Am. J. Humanit. Soc. Sci. Res. AJHSSR 2018, 2, 50–56. 19. Hu, Y.; Manikonda, L.; Kambhampati, S. What we instagram: A first analysis of instagram photo content and user types. In Proceedings of the International AAAI Conference on Web and Social Media, Ann Arbor, MI, USA, 1–4 June 2014; Volume 8. 20. Singh, M. Instagram Marketing—The Ultimate Marketing Strategy. Adv. Innov. Res. 2020, 7, 379. 21. Latiff, Z.A.; Safiee, N.A.S. New business set up for branding strategies on social media–Instagram. Procedia Comput. Sci. 2015, 72, 13–23. [CrossRef] 22. Manovich, L. The aesthetic society: Instagram as a life form. In Data Publics: Public Plurality in an Era of Data Determinacy; Mörtenböck, P., Mooshammer, H., Eds.; Routledge: London, UK, 2020. 23. Gaber, H.R.; Wright, L.T.; Kooli, K. Consumer attitudes towards Instagram advertisements in Egypt: The role of the perceived advertising value and personalization. Cogent Bus. Manag. 2019, 6, 1618431. [CrossRef] 24. Fouda, A.; Mahmoudi, N.; Moy, N.; Paolucci, F. The COVID-19 pandemic in Greece, Iceland, New Zealand, and Singapore: Health policies and lessons learned. Health Policy Technol. 2020, 9, 510–524. [CrossRef][PubMed] 25. Morens, D.M.; Fauci, A.S. Emerging pandemic diseases: How we got to COVID-19. Cell 2020, 182, 1077–1092. [CrossRef] 26. Matarese, A.; Gambardella, J.; Sardu, C.; Santulli, G. miR-98 regulates TMPRSS2 expression in human endothelial cells: Key implications for COVID-19. Biomedicines 2020, 8, 462. [CrossRef] Appl. Sci. 2021, 11, 4281 17 of 18

27. COVID-19 Map. Johns Hopkins Coronavirus Resource Center. Available online: https://coronavirus.jhu.edu/map.html (accessed on 29 January 2021). Appl.28. Sci.Kamenidou, 2021, 11, x FOR I.E.; PEER Stavrianea, REVIEW A.; Liava, C. Achieving a Covid-19 free country: Citizens preventive measures and communication17 of 18 pathways. Int. J. Environ. Res. Public Health 2020, 17, 4633. [CrossRef][PubMed] 29. Koirala, A.; Joo, Y.J.; Khatami, A.; Chiu, C.; Britton, P.N. Vaccines for COVID-19: The current state of play. Paediatr. Respir. Rev. 2020, 35, 43–49. [CrossRef][PubMed] 30.30. Lerner,Lerner, A.M.; A.M.; Folkers, Folkers, G.K.; G.K.; Fauci, Fauci, A.S. A.S. Prevent Preventinging the the spread spread of of SARS SARS-CoV-2-CoV-2 with with masks masks and and other other “low “low-tech”-tech” interventions. interventions. JAMAJAMA 20202020, ,324324, ,1935 1935–1936.–1936. [CrossRef][PubMed] 31.31. CoVidCoVid19.gov.gr.19.gov.gr. Τα Tα μέτρα µε´τρα της της ΚυβέρνησηςKυβε´ρνησης για την αντιµεταντιμετώπισηω´ πιση ττουoυ κκορονοϊού.oρoνo o . Available online:online: https://covid19.gov.gr/https://covid19.gov.gr/ (accessed(accessed on on 29 29 January January 2021 2021).). 32.32. O’CaO’Callaghan,llaghan, K K.P.;.P.; Blatz, Blatz, A.M.; A.M.; Offit, Offit, P.A. DevelopingDevelopinga aSARS-CoV-2 SARS-CoV-2 vaccine vaccine at at warp warp speed. speed.JAMA JAMA2020 2020, 324, 324, 437–438., 437–438. [CrossRef ] 33.33. Dai,Dai, L.; L.; Zheng, Zheng, T.; T.; Xu, Xu, K.; K.; Han, Han, Y.; Y.; Xu, Xu, L.; L.; Huang, Huang, E.; E.; An, An, Y.; Y.; Cheng, Cheng, Y.; Y.; Li, Li, S.; S.; Liu, Liu, M.; M.; et et al. al. A A universal universal design design of of betacoronavirus betacoronavirus vaccinevacciness against against COVID COVID-19,-19, MERS, MERS, and and SARS. SARS. CellCell 20202020, ,182182, ,722 722–733.–733. [CrossRef] 34.34. Corey,Corey, L.; L.; Mascola, Mascola, J.R.; J.R.; Fauci, Fauci, A.S.; A.S.; Collins, Collins, F.S. F.S. A A strategic strategic approach approach to to COVID COVID-19-19 vaccine R&D. Science 2020, 368, 948 948–950.–950. 35.35. Chen,Chen, Q.; Q.; Min, Min, C.; Zhang,Zhang, W.;W.; Wang, Wang, G.; G.; Ma, Ma, X.; X.; Evans, Evans, R. UnpackingR. Unpacking theblack the black box: Howbox: toHow promote to promote citizen citizen engagement engagement through throughgovernment government social media social during media theduring COVID-19 the COVID crisis.-19Comput. crisis. Comput. Hum. Behav. Hum.2020 Behav., 110 2020, 106380., 110, 106380 [CrossRef. ] 36.36. Saire,Saire, J. J.C.;C.; Panford Panford-Quainoo,-Quainoo, K. TwitterTwitter InteractionInteraction to to Analyze Analyze Covid-19 Covid-19 Impact Impact in in Ghana, Ghana, Africa Africa from from March March to July. to July.arXiv arXiv2020 , 2020arXiv:2008.12277., arXiv:2008.12277. 37.37. GonzálezGonzález-Padilla,-Padilla, D D.A.;.A.; Tortolero Tortolero-Blanco,-Blanco, L. L.Social Social media media infl influenceuence in the in the COVID COVID-19-19 pandemic. pandemic. Int. Braz.Int.Braz. J. Urol. J. 2020 Urol., 462020, 120, 46– , 124120–124.. [CrossRef][PubMed] 38.38. Adly,Adly, A A.S.;.S.; Adly, Adly, A.S.; Adly, M.S.M.S. ApproachesApproachesbased based on on artificial artificial intelligence intelligence and and the the internet internet of of intelligent intelligent things things to preventto prevent the thespread spread of COVID-19:of COVID-19: Scoping Scoping review. review.J. Med. J. Med Internet Internet Res. Res.2020 2020, 22, 22, e19104., e19104. [CrossRef ] 39.39. Kushner,Kushner, J. J. The The Role Role of of Social Social Media Media during during a a Pandemic Pandemic.. Available Available online: online: https://bit.ly/34mOPcKhttps://bit.ly/34mOPcK (accessed (accessed on on8 M 8ay May 202 2021).1). 40.40. Malecki,Malecki, K K.;.; Keating, Keating, J.A.; J.A.; Safdar, Safdar, N. N. Crisis Crisis communication communication and and public public perception perception of of COVID COVID-19-19 risk risk in in the the era era of of social social media. media. Clin.Clin. Infect. Infect. Dis. Dis. 20202020, 72, 72, 697, 697–702.–702. [CrossRef][PubMed] 41.41. Kim,Kim, Y Y.;.; Kim, Kim, J.H. J.H. Using Using photos photos for for public public health health communication: communication: A A computational computational analysis analysis of of the the Centers Centers for for Disease Disease Control Control andand Prevention Prevention Instagr Instagramam photos photos and and public public responses. responses. HealthHealth Inform. Inform. J. J. 20202020, ,2626, ,2159 2159–2180.–2180. [CrossRef][PubMed] 42.42. Basch,Basch, C C.H.;.H.; MacLean, MacLean, S.A S.A.. A contentcontent analysisanalysisof of HPV HPV related related posts posts on on instagram. instagram.Hum. Hum. Vaccines Vaccines Immunother. Immunother.2019 2019, 15, 1476–1478.15, 1476– 1478.[CrossRef ][PubMed] 43.43. LaLa Gatta, Gatta, V.; V.; Moscato, Moscato, V.; V.; Postiglione, Postiglione, M.; M.; Sperli, Sperli, G. G. An An Epidemiological Epidemiological Neural Neural network network exploiting exploiting Dynamic Dynamic Graph Graph Structured Structured DataData applied applied to to the the COVID COVID-19-19 outbreak. outbreak. In In IEEEIEEE Transactions Transactions on on Big Big Data Data; ;IEEE: IEEE: Piscataway, Piscataway, NJ, NJ, USA, USA, 2020. 2020. 44.44. AbdelFattah,AbdelFattah, M.; M.; Galal, Galal, D.; D.; Hassan, Hassan, N.; N.; Elzanfaly, Elzanfaly, D.; D.; Tallent, Tallent, G. G. A A Sentiment Sentiment Analysis Analysis Tool Tool for for Determining Determining the the Promotional Promotional SuccessSuccess of of Fa Fashionshion Images Images on on Instagram. Instagram. Int.Int. J. J. Interact. Interact. Mob. Mob. Technol. Technol. 20172017, 11, 11, 66, 66–73.–73. [CrossRef] 45.45. Moscato,Moscato, V.; V.; Picariello, Picariello, A.; A.; Sperli, Sperli, G. G. An An emotional emotional recommender recommender system system for for music. music. In In IEEEIEEE Intelligent Intelligent Systems Systems; IEEE:; IEEE: Piscataway, Piscataway, NJ,NJ, USA, USA, 2020. 2020. 46.46. InformationInformation about about the the Pfizer Pfizer-BioNTech-BioNTech COVID COVID-19-19 Vaccine Vaccine.. Available Available online: online: https://www.cdc.gov/coronavirus/2019 https://www.cdc.gov/coronavirus/2019-ncov/-ncov/vac- cines/differentvaccines/different-vaccines/Pfizer-BioNTech.html-vaccines/Pfizer-BioNTech.html (accessed (accessed on 29 onJanuary 29 January 2021). 2021). 47.47. AstraZenecaAstraZeneca-Research-Based-Research-Based Bio Bio-Pharmaceutical-Pharmaceutical Company Company.. Available Available online: https://www.astrazeneca.com/https://www.astrazeneca.com/ (accessed (accessed on on 29 29 JanuaryJanuary 2021 2021).). 48.48. Arashkia,Arashkia, A.; A.; Jalilvand, Jalilvand, S.; S.; Mohajel, Mohajel, N.; N.; Afchangi, Afchangi, A.; A.; Azadmanesh, Azadmanesh, K.; K.; Salehi-Vaziri, Salehi-Vaziri, M.; M.; Fazlalipour, Fazlalipour, M.; M.; Pouriayevali, Pouriayevali, M.H M.H.;.; Jalali,Jalali, T.; T.; Nasab, Nasab, S.D.M. S.D.M.; et et al. al. Severe Severe acute acute respiratory respiratory syndrome-coronavirus-2 syndrome-coronavirus-2 spike spike (S) (S) protein protein based based vaccine vaccine candidates: candidates: Sta Statete ofof the the art art and and future future prospects. prospects. Rev.Rev. Med Med. Virol. Virol. 20202020, e2183, e2183., doi:10.1002/rmv.2183. [CrossRef][PubMed ] 49.49. AboutAbout Johnson Johnson & & Johnson Johnson.. Available Available online: online: https://www.jnj.com/about https://www.jnj.com/about-jnj-jnj (accessed (accessed on on29 29January January 2021 2021).). 50.50. HowHow the the Johnson Johnson & & Johnson Johnson Vaccine Vaccine Works Works.. Available Available online: online: https: https://www.nytimes.com/interactive/2020/health/johnson-//www.nytimes.com/interactive/2020/health/johnson-john- sonjohnson-covid-19-vaccine.html-covid-19-vaccine.html (accessed (accessed on 29 onJanuary 29 January 2021). 2021). 51.51. COVIDCOVID-19-19 Vaccine Vaccine.. Available Available online: online: https://en.wikipedia.org/wiki/COVID https://en.wikipedia.org/wiki/COVID-19_vaccine-19_vaccine (accessed (accessed on 29 on January 29 January 2021 2021).). 52.52. Instaloader.Instaloader. Available Available online: online: https://instaloader.github.io/ https://instaloader.github.io/ (accessed (accessed on on 29 29January January 2021 2021).). 53.53. SimonyanSimonyan,, K.; K.; Zisserman, Zisserman, A. A. Very Very deep deep convolutional convolutional networks networks for for la large-scalerge-scale image image recognition. recognition. arXivarXiv 20142014,, arXiv: arXiv:1409.1556.1409.1556. 54.54. Szegedy,Szegedy, C.; C.; Vanhoucke, V.;V.; Ioffe, Ioffe, S.; S.; Shlens, Shlens, J.; J.; Wojna, Wojna, Z. RethinkingZ. Rethinki theng inceptionthe inception architecture architecture for computer for computer vision. vision. In Proceedings In Pro- ceedingsof the IEEE of the Conference IEEE Conference on Computer on Computer Vision and Vision Pattern and Recognition, Pattern Recognition Las Vegas,, Las NV, Vegas, USA, NV, 26 June–1USA, 26 July June 2016.–1 July 2016. 55.55. He,He, K.; K.; Zhang, Zhang, X.; X.; Ren, Ren, S.; S.; Sun, Sun, J. J. Deep Deep residual residual learning learning for for image image recognition. recognition. In In Proceedings Proceedings of of the the IEEE IEEE Conference Conference on on ComputerComputer Vision Vision and and Pattern Pattern Recognition, Recognition, Las Las Vegas, Vegas, NV, NV, USA, 26 June June–1–1 July 2016. 56.56. Adekotujo,Adekotujo, A.S.; A.S.; Lee, Lee, J.; J.; Enikuomehin, Enikuomehin, A.O.; A.O.; Mazzara, Mazzara, M.; M.; Aribisala, Aribisala, S.B. S.B. Bi Bi-lingual-lingual Intent Intent Classificati Classificationon of of Twitter Twitter Posts: Posts: A A RoadmaRoadmap.p. In In InternationalInternational Conference Conference in in Software Software Engin Engineeringeering for for Defense Defense Applications Applications; ;Springer Springer:: Cham, Cham, Germany, Germany, 2018. 2018. 57.57. Purohit,Purohit, H.; H.; Dong, Dong, G.; G.; Shalin, Shalin, V.; V.; Thirunarayan, K.; Sheth,Sheth, A.A. IntentIntent classificationclassification of of short-text short-text on on social social media. media. In In Proceedings Proceedings of ofthe the 2015 2015 IEEE IEEE International International Conference Conference on on Smart Smart City/Socialcom/Sustaincom City/Socialcom/Sustaincom (Smartcity) (Smartcity),, Chengdu, Chengdu, China, China, 19 19–21–21 December December 20152015.. 58.58. Saha,Saha, T T.;.; Saha, Saha, S.; Bhattacharyya,Bhattacharyya,P. P. Tweet Tweet act act classification: classification: A deepA deep learning learning based based classifier classifier for recognizing for recognizing speech speech acts in acts twitter. in twitter.In Proceedings In Proceedings of the 2019of the International 2019 International Joint Conference Joint Conference on Neural on Neural Networks Networks (IJCNN), (IJCNN Budapest,), Budapest, Hungary, Hungary, 14–19 July14–19 2019. July 2019. 59. Inception. Available online: https://inception-project.github.io/ (accessed on 29 January 2021). 60. Ekman, P. Are there basic emotions? Psychol. Rev. 1992, 99, 550–553. 61. Ekman, P. Emotions inside out. 130 years after Darwin’s ‘The Expression of the Emotions in Man and Animal. Ann. N. Y. Acad. Sci. 2003, 1000, 1–6. 62. Plutchik, R. The circumplex as a general model of the structure of emotions and personality. In Circumplex Models of Personality and Emotions; Plutchik, R., Conte, H.R., Eds.; American Psychological Association: Washington, DC, USA, 1997. Appl. Sci. 2021, 11, 4281 18 of 18

59. Inception. Available online: https://inception-project.github.io/ (accessed on 29 January 2021). 60. Ekman, P. Are there basic emotions? Psychol. Rev. 1992, 99, 550–553. [CrossRef][PubMed] 61. Ekman, P. Emotions inside out. 130 years after Darwin’s ‘The Expression of the Emotions in Man and Animal. Ann. N. Y. Acad. Sci. 2003, 1000, 1–6. [CrossRef] 62. Plutchik, R. The circumplex as a general model of the structure of emotions and personality. In Circumplex Models of Personality and Emotions; Plutchik, R., Conte, H.R., Eds.; American Psychological Association: Washington, DC, USA, 1997. 63. Izard, C.E.; Libero, D.Z.; Putnam, P.; Haynes, O.M. Stability of emotion experiences and their relations to traits of personality. J. Personal. Soc. Psychol. 1993, 64, 847. [CrossRef] 64. Cowen, A.S.; Keltner, D. Self-report captures 27 distinct categories of emotion bridged by continuous gradients. Proc. Natl. Acad. Sci. USA 2017, 114, E7900–E7909. [CrossRef] 65. Souravlas, S.; Anastasiadou, S. Pipelined Dynamic Scheduling of Big Data Streams. Appl. Sci. 2020, 10, 4796. [CrossRef] 66. Souravlas, S.; Anastasiadou, S.; Katsavounis, S. More on Pipelined Dynamic Scheduling of Big Data Streams. Appl. Sci. 2021, 11, 61. [CrossRef] 67. Anastasiadou, S.; Kofou, I. Incorporating Web 2.0 Tools into Greek Schools. Int. J. Technol. Learn. 2013, 20, 11–23. 68. Kofou, I.; Anastasiadou, S. Language and Communication Needs Analysis in Intercultural Education. Int. J. Divers. Educ. 2013, 12, 15–64. [CrossRef]