<<

XLIV ENCONTRO DA ANPAD - EnANPAD 2020 XLIV ENCONTROEvento on-line - 14 a DA16 de outubroANPAD de 2020 - EnANPAD- 2177-2576 versão 2020 online Evento on-line - 14 a 16 de outubro de 2020 2177-2576 versão online

Hype, Something in Between or Nevermind: How Sentiment Analysis Can Capture Consumers? Reactions about Different Types of Products

Autoria GRAZIELE CAMARGO KEMMERICH - [email protected] Prog de Pós-Grad em Admin/Esc de Admin - PPGA/EA/UFRGS - Universidade Federal do Rio Grande do Sul

Resumo Since its origin in Fracastoro's contagion theory (LEDERBERG, 2000), the term contagion has expanded to explain attitudes and behaviors. In consumer behavior, current studies on contagion have been concerned with understanding its association with product adoption (ARAL et al., 2009; CENTOLA; MACY, 2007; MANCHANDA et al., 2008; IYENGAR et. al. , 2011). However, the association between the potential for contagion of new products of different levels of maturity in the market and the analysis of immediate reactions of consumers related to products that are not yet available on the market has not yet been explored in depth. This article analised three products (Samsung Ballie, Playstation 5 and AMDRyzen processor in terms of their sentimento analysis and tweets more shared. The method used the Naive Bayes algorithm to classify the sentiment contained in the tweet. The results showed which tweets were most shared and what polarity they were (positive, negative or neutral). However, the results also showed certain inconsistencies, which shows that Naive Bayes has limitations in analyzing the context of the sentence in a global manner. XLIV ENCONTRO DA ANPAD - EnANPAD 2020 Evento on-line - 14 a 16 de outubro de 2020 - 2177-2576 versão online

Hype, Something in Between or Nevermind: How Sentiment Analysis Can Capture Consumers’ Reactions about Different Types of Products

Abstract Since its origin in Fracastoro's contagion theory (LEDERBERG, 2000), the term contagion has expanded to explain attitudes and behaviors. In consumer behavior, current studies on contagion have been concerned with understanding its association with product adoption (ARAL et al., 2009; CENTOLA; MACY, 2007; MANCHANDA et al., 2008; IYENGAR et. al. , 2011). However, the association between the potential for contagion of new products of different levels of maturity in the market and the analysis of immediate reactions of consumers related to products that are not yet available on the market has not yet been explored in depth. This article analised three products (Samsung Ballie, Playstation 5 and AMDRyzen processor in terms of their sentimento analysis and tweets more shared. The method used the Naive Bayes algorithm to classify the sentiment contained in the tweet. The results showed which tweets were most shared and what polarity they were (positive, negative or neutral). However, the results also showed certain inconsistencies, which shows that Naive Bayes has limitations in analyzing the context of the sentence in a global manner.

Keywords- contagion; sentiment analysis; natural language processing; twitter; naïve bayes.

1. Introduction

It is usual that new products are launched every year on the market. Some of these products tend to spark more consumer interest. Others, however, do not like it so much and often fall into people's disgust. And there are also those products whose reactions tend to compromise, and do not arouse much curiosity and interest from consumers. In a period when social relations and people's interaction with the world is strongly associated with the use of social media, it is possible to perceive these types of consumer reactions to certain products just by following what people post on their online profiles. In the face of so many emotional reactions from consumers on the internet, how to identify these reactions in a more objective way? How to identify which types of emotional reactions of consumers would have a greater contagion power? The origin of the term "contagion" goes back to the first studies that investigated the transmission of infectious diseases (LEDERBERG, 2000). Despite its origins in health, its application crossed different areas of knowledge, having been an important source of explanation in events, attitudes and behaviors in different contexts, such as economics, sociology (LOCHER, 2002) and psychology (HATFIELD et al., 1992; ROZIN; ROYZMAN, 2001; DOHERTY, 1998). In consumer behavior, current studies on contagion have been concerned with understanding its association with product adoption (ARAL et al., 2009; CENTOLA; MACY, 2007; MANCHANDA et al., 2008; IYENGAR et. al. , 2011). Despite the advances in research on the subject, the association between the potential for contagion of new products of different levels of maturity in the market has not yet been explored in depth. In the same way, there is no greater deepening in relation to the instant reactions of consumers in relation to products that have been announced, but that are not yet available on the market. So, this article has two main objectives: 1) to analyze the reactions (by sentimento analysis) of consumers towards three different technological products at different levels of maturity in the market; 2) identify which of these products would have the most potential for contagion, taking into account the number of retweets and the polarity of the sentiment expressed. 1 XLIV ENCONTRO DA ANPAD - EnANPAD 2020 Evento on-line - 14 a 16 de outubro de 2020 - 2177-2576 versão online

2. Contagion and Product Adoption

The term "contagion" had its origin in medicine, but your application in context of consumption, especially about explaining attitudes and behaviors related to products adoption has attracted more interest of researchers. Aral and Walker (2012) identified that demographic variables, such as age and gender, interfere in the power of influence exerted and the susceptibility to that influence. The authors found that young people are more influential, and that women have a greater power of influence than men. In addition, the authors found that the social structure in which people belong and their number of connections are largely responsible for the dissemination of products. Hu and Van den Bulte (2014) analyzed the impact of factors such as social status and susceptibility to social contagion for product adoption. When the innovation present in a product has the potential to boost the person's social status, that is, to bring it up in the social hierarchy, there is a greater propensity to be influenced by the contagion and, consequently, to adopt the product. Contagion also manifests itself through the social connections present in online communities. Park and colleagues (2018) have demonstrated that the characteristics of this type of social structure coupled with the type of product and user profile influence the buying behavior of young people belonging to these communities. Regarding the adoption of products, the social contagion existing in the medical context involving the prescription of medicines generated important contributions to the theme. Manchanda and colleagues (2008) investigated the effect of the influence of marketing communication and interpersonal communication on the adoption of pharmaceutical products. Through behavioral data and analyzing factors such as the time of adoption of the drug since its launch, average monthly samples and average number of prescriptions of the product category in the month, the authors verified the existence of the effects of contagion among doctors from different cities. In addition to medicines, the adoption of other types of products is also benefited through social contagion. Bollinger and Gillingham (2012) observed that the social interaction caused by proximity between neighbors favored the diffusion process of eco-friendly technologies, such as solar panels. Years later, Iyengar and colleagues (2015) studied the contagion from the perspective of attempting and repeating the use of drugs. The findings showed that there is a difference in the level of influence exerted by the pairs in the attempt and the repetition of the use of the product, besides there is difference as to the susceptibility of social influence. The physicians with greater power of influence in encouraging other doctors in the attempt to use the product would be those who are reference in their professional circle, who dominate the discussions and who possess high volume of prescriptions. The closest doctors in the social circle regarding the reference physician tend to be more influential in the attempt and repetition of use. In terms of susceptibility to influence, in relation to attempted use, the most influential physicians are those who do not consider themselves opinion leaders, whereas physicians located in the central part of the social structure tend to be more influential in the repetition of use of the product (IYENGAR et al., 2015). It is important to emphasize that, unlike other studies, these authors included the self-confidence of the recipient of the influence and the confidence in the own judgment as variables capable of impacting the social contagion. The role of opinion leadership in proliferation of social contagion is an important point to be considered in this last perspective. Some consumers exert a disproportional influence on the purchasing decisions of other consumers, which may be due of their position

2 XLIV ENCONTRO DA ANPAD - EnANPAD 2020 Evento on-line - 14 a 16 de outubro de 2020 - 2177-2576 versão online

in the social circle (people who are in central positions tend to be more influential) or because they have a high level of expertise about the product (IYENGAR et al., 2011). Iyengar and colleagues (2011) comment that this greater power of influence turns these consumers in "seeds" in the process of social contagion, because they are capable of viralization, that is, they can proliferate much faster and more intense the information they receive.

2. Natural Language Processing (NLP)

According to Humphreys and Wang (2017), the data available on online platforms, especially textual data, are already being analyzed through computational resources, in order to compare, correlate elements and identify patterns in information that humans usually can not identify, because natural limitation. Machine learning techniques are examples of the use of this type of computational resource. Machine learning is a subfield of artificial intelligence, and can be defined such a system that is trained to recognize and perform tasks automatically, after this system being presented to numerous examples relevant to the execution of task (CHOLLET; ALLAIRE, 2017). Advances in availability and analysis of data as well as the existence of open source platforms were some factors that have boosted the rapid diffusion of the machine learning in the current days (ATHEY, 2017). These techniques, in particular, have been increasingly used to address issues related to resource optimization and prediction of economic and social events. Cederman and Weidmann (2017) cited the use of machine learning to try to predict the imminence of armed conflicts that threaten internal stability or state sovereignty. Clauset et. al. (2017), in turn, commented that machine learning has been applied to predict the process of scientific discovery, using the analysis of different variables such as the academic trajectory of the researcher and the number of citations of past works. Kennedy and colleagues (2017) indicate that computational resources helps to predict the outcomes of specific elections. The existence of great amount data available, important condition for the application of computational techniques, such as machine learning, finds fertile ground in social media. Kern and colleagues (2016) comment that information obtained through the processing of the language exposed in online platforms is capable to reveal thoughts, feelings and characteristics of individuals and communities, broadening the understanding about people's affective, cognitive and behavioral processes. In this way, computational linguistics, also known as Natural Language Processing (NLP) would be the appropriate resource for this kind of analysis. NLP is an application of machine learning focused in a specific type of problem: the learning, understanding and creation of human language content (HIRSCHBERG; MANNING, 2016). NLP techniques, whose origin dates Cold War machine translation systems, allow the extraction of information, inference and summarization of large amounts of text, throught their words, sentences and excerpts from the speech (HIRSCHBERG; MANNING, 2016; BIRD et al., 2009). Computational linguistics approaches encompass both simple techniques that analyze the word in isolation context (called "bag of words") and complex techniques such as those used to detect fake news in online content (CONROY et. al., 2015). In this case, signals of veracity in the information are analyzed through a comparison between the level of compatibility of the message with the content profile. In other words, the techniques verify the existence of contradictions and omissions of facts present in profiles of similar topics (CONROY et al., 2015). According to Hirschberg and Manning (2016), the computational linguistic systems have three main purposes: 1) to aid human-human communication; 2) to aid human-machine

3 XLIV ENCONTRO DA ANPAD - EnANPAD 2020 Evento on-line - 14 a 16 de outubro de 2020 - 2177-2576 versão online

communication; and 3) to bring benefits for human and machines through the learning of available linguistic content. About human communication, machine translation is one of the first examples of non- numerical application by computers (HIRSCHBERG; MANNING, 2016). These authors point out that the translation is more satisfactory and correct (i.e. closer to natural language) if the computational system has the ability to analyze, generate and understand textual content within a context. In other words, face to complexity and ambiguity of specific words and expressions, the NLP technique must recognize and / or generate its correct meaning. The second main purpose of NLP deals with called spoken dialogue systems and conversational agents. In this case, linguistic computational techniques are employed to machines interact directly with humans. Large technology companies such as Google, Apple and Amazon have their assistants who are activated by human voice command and can answer questions, make recommendations and perform simple tasks, such as creating shopping lists. In this type of NLP application, robots are also included to help and to perform tasks for those individuals who need special care, like elderly people and people with health problems (HIRSCHBERG; MANNING, 2016). Finally, the third purpose for which NLP is employed brings benefits to both humans and machines because it makes use of the learning and analysis of great amount of human language content available online (HIRSCHBERG; MANNING, 2016). Through social media, it is possible to obtain a wide variety of individual and collective information about users, such as the identification of social interactions, feelings, opinions, beliefs about events, people or products, and is even possible to predict the spread of diseases through the links mentioned on content platforms such as Twitter (HIRSCHBERG; MANNING, 2016; KERN et. al., 2016). About the speaker, it is possible to identify individual and personality characteristics (i.e. age, gender, likeability, motivations, interests), aspects related to their cognitive ability, possible disorders and health problems (i.e. autism, Parkinson's disease), and even to identify alterations in speaker's state, due to situations of drunkenness and drowsiness (HIRSCHBERG; MANNING, 2016). In addition, the researchers point out that called "private states" allow us to extract opinions, speculations, beliefs and emotions from the content of the textual message, giving emphasis to the sentiment analysis. According to Villaroel and colleagues (2017), the sentiment analysis has already been applied in the context of online reviews, by counting the frequency of emotions emitted in text and by the categorization of their valence (positive, negative and neutral).

3. Method

In order to analyze consumers' perceptions regarding the announcement of new products, and from the reactions collected to verify which of these products would indicate a greater propensity to contagion, a context was chosen that favored the incidence of these characteristics. Thus, the Consumer Electronics Show (CES) was the event chosen because it is a fair of great importance in the world of technology, responsible for announcing the main launches of products of this nature in the market. For the purposes of this article, it was important that the event chosen as a context for data collection was extremely current, as this way the reactions of consumers would be more spontaneous. The choice for the period of data collection also showed dependence on the period of the event. The CES took place from January 6th to 10th, 2020, in Las Vegas, however, the first days of the event counted only with the participation of the press and guests. The opening of the event's exhibition space to the general public took place from January 7, therefore, the

4 XLIV ENCONTRO DA ANPAD - EnANPAD 2020 Evento on-line - 14 a 16 de outubro de 2020 - 2177-2576 versão online

data was collected after the end of the event, to increase the chance of more people having visited the fair, which would result in more people posting their reactions about the products they knew about at the event. Twitter was the social network chosen for data collection. Regarding the products chosen for the analysis of consumer reactions, we sought to select products that had different degrees of maturity, even as a way of making comparisons later in other studies on the feelings posted. Reports from in the period before and after the event served as support to justify the choice of products. The choice for this newspaper was due to the fact that it is a media vehicle that conveys its message to a wide variety of audiences and, consequently, consumers who have different levels of admiration, curiosity and interest in technology. Three types of products were chosen, all new, but with different levels of maturity in the market. The context of this technology fair was important and favorable for this selection of products to take place, making it more certain that the tweets that would be collected were limited to this period. The products were: Sony's Playstation 5 console, AMD Ryzen processor and Samsung's Ballie. This choice took into account the different degrees of maturity of the product on the market. The Playstation 5 is the evolution of an existing and consolidated product on the market; the AMD Ryzen processor is the evolution of a product that already exists on the market, however it is dominated by a competing company (Intel); and, finally, Samsung's Ballie is a totally innovative and unexpected product in terms of expectations for the event. Each of these products was searched in the search field of the Twitter platform so that the author could have knowledge of how it was mentioned by consumers in the media. From the most common reference of the product seen in the media, it was defined which hashtags would be placed in the code. At first, 10,000 tweets were selected through the R, starting with the search for Samsung's Ballie product hashtag. The total required was not realized, resulting in only 7,670 tweets. To maintain fairness in the number of data, the search for other products was limited by the same amount of tweets. It was from this perspective of how much data it would be possible to collect on the products that the selection criterion for choosing the algorithm to be used was made. The algorithm used for the sentiment analysis was a machine learning algorithm called Naive Bayes. Natural language processing methods for text analysis can be done through procedures such as bag of words, support vector machine and deep learning. Neural networks, one of the characteristics of methods based on deep learning, are more efficient for text analysis. However, the author chose to apply a machine learning algorithm, as this type of algorithm works very well for a not too big amount of data. As previously mentioned, less than 10,000 tweets were collected for each selected product (7670 tweets), an amount that perhaps would not justify the use of more robust methods such as those based on deep learning.

3.1 Algorithm Application

About the steps of applying the code to carry out the sentiment analysis, at first, data was collected through the Twitter platform. This social media allows anyone who already has an account as a user to have access to part of their data (tweets) through a login as a developer. In this process, the platform sends access keys (keywords) to access the API. It is in this part that, in addition to the total number of tweets to be collected, the language and the key word (in this case, the hashtag) are defined. As it is a private access key, the article code will not show the characters that relate to the password. However, the raw database collected by the author will be available for consultation.

5 XLIV ENCONTRO DA ANPAD - EnANPAD 2020 Evento on-line - 14 a 16 de outubro de 2020 - 2177-2576 versão online

The library used for processing textual data is the TextBlob. TextBlob uses a dataset of movie reviews (IMDB) in which reviews have previously been categorized as positive and negative. This form of feeling classification comprises a level of polarity ranging from -1 to 1. In addition to this library, NLTK corpora was installed, which works as a kind of compilation of a large amount of structured textual data. After these first installations, a preliminary data preparation was carried out, using the clean_tweet function. This step is important, because before the categorization itself, the phrases must be cleaned with expressions that are not relevant for the analysis, such as punctuation, pronouns and other characters. The classification of the sentiment contained in the tweet through the TextBlob library was made considering the polarity of the feelings: when the polarity found is greater than zero, the resulting analysis is positive; when the polarity is equal to zero, the classification results in neutral, and the rest is categorized as being negative. This polarity criterion is the characteristic of the Naive Bayes algorithm, since this is how this classifier works.

4. Results

From the application of the machine learning algorithm specified previously in the method, it was possible to identify the first general perceptions of the public in relation to the selected products. Regarding the AMD Ryzen processor, the number of tweets categorized as being of positive polarity (4228 tweets - 55%) was much higher than the number of tweets categorized as negative (550 tweets - 7%) and neutral (2892 tweets - 37%). Samsung's Ballie product, although the number of tweets categorized as being of positive polarity was the largest of all (2690 tweets - 35%), this value was relatively close to the number of tweets categorized as being of negative polarity (2573 tweets - 33%) and neutral (2407 tweets - 31%). As for the playstation 5 console announcement, the results indicated that the percentage of tweets categorized as neutral was higher than the percentage of tweets categorized as positive and negative, with a value equivalent to more than half of the tweets collected. This product showed 4215 neutral tweets (54.95%), followed by 2463 positive tweets (32.11%) and, finally, it obtained 992 negative tweets (12.93%). In order to further investigate tweets, a new database was created containing only the original tweets. In the first attempt to apply the method, the base considered the sentences referring to retweets as susceptible to analysis. This time, however, the sentences to be processed were from original tweets. After performing this kind of filter, it was possible to verify that the products showed a difference in the polarity of feelings compared to the first text processing. The new configuration of the polarity of the tweets can be seen in the table below:

Product Percentage Total AMD Ryzen Positive 42.88% 1608 Negative 11.25% 422 Neutral 45.87% 1720 Samsung BALLIE Positive 39.89% 1241 Negative 11.70% 364 Neutral 48.41% 1506 Sony Playstation 5 Positive 41.80% 1297 Negative 16.60% 515 Neutral 41.60% 1291

6 XLIV ENCONTRO DA ANPAD - EnANPAD 2020 Evento on-line - 14 a 16 de outubro de 2020 - 2177-2576 versão online

Regarding the AMD Ryzen processor, the number of tweets categorized as being of positive polarity (1608 tweets - 42.88%), despite being higher than the number of tweets categorized as negative (422 tweets - 11.25%), was lower than than the number of tweets categorized as neutral polarity (1720 tweets - 45.86%). As for Samsung's Ballie product, the difference compared to the first results is noticeable. This time, the number of neutral results (1506 tweets - 48.40%) is higher than the tweets categorized as positive (1241 tweets - 39.89%), and is also considerably higher than the tweets categorized as negative (364 tweets - 11, 70%). With the launch of the playstation 5 console, the results indicated that the tweets categorized as positive (1297 tweets - 41.79%) were close to the number of tweets classified as neutral (1291 tweets - 41.70%), and both presented result higher than the number of tweets considered negative (515 tweets -16.59%). Such results can be better visualized in the graph below:

An analysis was also carried out considering the retweets about the products. As a way to better visualize the data, tables were created for each type of product with the polarity classification performed by the algorithm (Sentiment by Naive Bayes), the polarity index that the algorithm classified (Polarity), a classification measure performed empirically by an individual (Sentiment by human analysis), and the number of retweets. Regarding AMD Ryzen processor, the tweets that had the most retweets have positive polarity, followed by tweets with neutral feeling polarity. The most shared tweet (378 retweets) has a positive polarity. However, some sentences were read again and by a subjective criterion it was categorized by an individual (human analysis) differently from that presented by the algorithm. The sentences that were classified by the algorithm as neutral were considered positive by the human criterion.

7 XLIV ENCONTRO DA ANPAD - EnANPAD 2020 Evento on-line - 14 a 16 de outubro de 2020 - 2177-2576 versão online

Tables Table 2: AMD Ryzen Tweet Sentiment Polarity Sentiment Retweets (by Naive (by Naive (by Human Bayes) Bayes) analyses) 1 Im happy that i've joined @lnaticgg! positive 0.48 positive 378 As a welcome present we give away a Gaming PC Ryzen 7 2700X - RTX 2080 Super S… https://t.co/7TUZnprxZQ 2 The AMD Ryzen 4000 Series positive 1,0 positive 62 mobile processor is @PCMag's Best PC Core Component at #CES2020! With ultra-responsive pe… https://t.co/c3DBqJsyyl 3 Intel Dual Xeon 8280 56c / 112t - neutral 0,0 positive 45 $20,000 VS AMD Ryzen 3990X 64c / 128t - $3,990 CES has a lot of information th… https://t.co/VYQ8V2yp5o 4 As seen in Club @AMD at neutral 0,0 positive 171 #CES2020, @CORSAIR’s Vengeance gaming desktop Powered by Ryzen 7 3700X and @Radeon RX 5700… https://t.co/baa7fmOvLk 5 AMD Ryzen Threadripper 3990X positive 0.17 positive 158 “sets the stage for a whole new level of both Performance and value,” says… https://t.co/xAwo1AYR4S 6 Calling it “a whole different beast positive 0.15 positive 71 entirely,” @DigitalTrends has some mighty Predictions for AMD's Ryzen Threadrip… https://t.co/qVmquTB5pt 7 Yeah AMD is going to kick butt this neutral 0,0 positive 152 year once again.

Intel Xeon Platinum 9282 56 Cores, 112 Threads 2.6 GHz3.8 GHz… https://t.co/0dgHMfW5NH

8 XLIV ENCONTRO DA ANPAD - EnANPAD 2020 Evento on-line - 14 a 16 de outubro de 2020 - 2177-2576 versão online

Regarding Samsung's Ballie, the tweets that had the most retweets have positive polarity. Unlike AMD Ryzen, the most shared tweet (2296 retweets) is of negative polarity, according to the classification of sentiment polarity performed by the algorithm. However, the classification made by a human categorized it as being of positive polarity. Other inconsistencies in the algorithm's polarization classification with the human classification were also found in other tweets that were highly shared, as can be seen in the table below:

Table 3: Samsung Ballie Tweet Sentiment Polarity Sentiment Retweets (by Naive (by (by Human Bayes) Naive analyses) Bayes) 1 Many people don't know that the positive 0.26 positive 132 cute little ball Ballie released by Samsung At CES2020 and the kitchen smart robot… https://t.co/uPxwV4jMT1 2 Proud of my team. Kudos Think positive 0,9 positive 60 Tank Team for making Ballie roll this perfect. https://t.co/N5PgYfAg68 3 Samsung introduced a new ball- positive 0.17 negative 89 shaped rolling robot assistant named Ballie. I’m really gonna need it after I break my hip tripping on it. 4 Is it me? The classical music used positive 0.05 negative 42 to promote #ballie completely invokes 2001: A Space Odyssey for me. You know,… https://t.co/aCJ4e6M4KX 5 Samsung's new Ballie robot is like positive 0.37 positive 56 your own BB-8, if BB-8 just followed you around Your house taking pictures. (via… https://t.co/B9XR5m9iem 6 Lmao my dog is gonna FUCK positive 0.1 negative 64 Ballie up tho https://t.co/HmgrG7FCzi 7 Dibs on the script for ‘Trust me, neutral 0 negative 88 I’m Ballie’ where this thing gaslights you and Locks all the doors in your home.… https://t.co/HaAJf5oJFm 8 My dog would destroy Ballie in 10 negative -0.2 negative 98 seconds https://t.co/m5kHFh3ahh 9 She said bye to Ballie but not to the neutral 0 negative 71 dog smh https://t.co/vxfDFmJXxY

9 XLIV ENCONTRO DA ANPAD - EnANPAD 2020 Evento on-line - 14 a 16 de outubro de 2020 - 2177-2576 versão online

Tweet Sentiment Polarity Sentiment Retweets (by Naive (by (by Human Bayes) Naive analyses) Bayes) 10 @SamsungNewsUS BB-8 + Wall- neutral 0 positive 88 E = Ballie ? #CES2020 https://t.co/PwJBDGyN37 11 Samsung's new robot personal positive 0.11 positive 64 assistant, a tennis ball-like device named Ballie, Is like a real-life BB-8, the lovab… https://t.co/7a2dRxeLnP 12 Samsung presented its "droid" positive 0.21 neutral 96 Ballie - a ball with a camera that will monitor Security and control a smart home?… https://t.co/6rhS0x8QgJ 13 Samsung’s new Ballie robot is like positive 0.17 positive 239 a real-life mini BB-8 https://t.co/Kd1dzuuIGz 14 Meet #Ballie, Samsung’s human- negative -0.06 positive 2296 centric vision of robots that takes Personalized care to the next level. The small ro… https://t.co/MEjEjWoHFU

The Playstation 5, the most shared tweet about the product was categorized by the algorithm as neutral polarity (301 retweets). In fact, most of the most shared tweets were categorized as being neutral, followed by positive and negative tweets. As with the other products, the results showed an inconsistency between the feeling classifications made by the algorithm with the classifications performed by a human.

Table 4: SONY PS5 – Playstation 5 Tweet Sentiment Polarity Sentiment Retweets (by Naive (by (by Human Bayes) Naive analyses) Bayes) 1 Fan-made PS5 startup screen, neutral 0 neutral 100 courtesy of YouTube user 'Paulo Manso Animation' ?

What are your thoughts? #PS5… https://t.co/Hw8kDSsWso 2 PS5 will also play games, Sony neutral 0 negative 192 reports https://t.co/y6sSOvunPF 3 So when MS sad they won't have positive 0.01 neutral 78 exclusives for XSX PS5 fans

10 XLIV ENCONTRO DA ANPAD - EnANPAD 2020 Evento on-line - 14 a 16 de outubro de 2020 - 2177-2576 versão online

Tweet Sentiment Polarity Sentiment Retweets (by Naive (by (by Human Bayes) Naive analyses) Bayes) laughed. Although anybody with half a brain knew this wo… https://t.co/D7ha19P8Sa 4 There Will Be Exclusives You Can neutral 0 neutral 118 Only Play on PS5 https://t.co/rHXcDEkBMJ #Sony #PS5 https://t.co/od20Zn703S 5 Thats how #playstation gamers positive 0.2 positive 87 getting ready for #PS5 :3 https://t.co/pjMENtm2Sn 6 Ps5 will reportedly act like a positive 0.14 positive 79 normal new gen. https://t.co/DnhFrOvEp9 7 @MiddleEastEye @guardian PS5/ negative -0.24 neutral 102 For those confused by the difference between An anonymous "rumor" and a single-source… https://t.co/AKZbtO6oS0 8 To be clear, Playstation will only positive 0.03 neutral 105 release Playstation 5 games after the launch Of Ghost Of Tsushima on PS4. Third… https://t.co/bkG4KBpOY0 9 PS5 Will Reportedly Have PS5- neutral 0 neutral 301 Exclusive Titles at Launch https://t.co/IQp9OhYDaU https://t.co/gnOjFo8viw 10 Jason Schreier (Kotaku Splitscreen) neutral 0 neutral 74 : "Sony will have PS5 only titles at launch" https://t.co/DLSgLn6PQN

5. General Discussion

The results found brought important contributions. First, the creation of a new database with only the original tweets showed new perspectives on consumers' reactions to the products. About Ballie product, for example, showed a totally different perspective compared to the first results. If in a first moment the results did not indicate anything very conclusive, because the percentage of reactions showed polarities very closed, after removing the analysis of retweets it was possible to identify that the innovative product pleased consumers. Playstation 5, on the other hand, had very close neutral and positive reactions, which may indicate that the announcement of the product's features should not have

11 XLIV ENCONTRO DA ANPAD - EnANPAD 2020 Evento on-line - 14 a 16 de outubro de 2020 - 2177-2576 versão online

aroused people's curiosity so much. Although the number of tweets has reduced, the fact that it excluded retweets sentences from the analysis brought more truthful reactions to the selected products. About the analysis of the tweets that were most shared, it was important to identify the polarity of the feeling of the tweets of each product. In addition, it was important to note that the algorithm did not classify certain sentences that were highly shared on social media in the same way as a human. This demonstrates a certain weakness of the method in interpreting the sentences in a more global way.

6. Final Considerations

The application of the Natural Language Processing method, more specifically, sentiment analysis, in people's reactions on social media in relation to products that are new to the market and that do not yet have the availability of access and purchase was an important contribution in the context of social contagion. However, like any study, this research also had limitations that may be resolved in future studies. An important limitation seen after analyzing the results concerns the limitation of the algorithm employed. As observed after the creation of the new database with only the presence of the original tweets, it was possible to identify that Naive Bayes is not a very reliable classifier for the context of the sentence. The individualized analysis of some tweets showed that this classifier is unable to properly interpret the context of some situations, categorizing something that would be positive as negative. For example, the tweet about AMD Ryzen processor “Its just sad I can build a Ryzen PC that outperforms the Mac Pro in those apps at a quarter of the price” is labelled as negative, but this tweet is positive for AMD Ryzen. A deeper understanding of data dynamics and access to social media platforms was also a limitation of the study. It was not possible to identify, for example, if the number of tweets that were initially intended to be downloaded did not work due to platform restrictions or if there was really no 10,000 tweets about the selected product. Another limitation related to tweets is the fact that is not possible to ensure that they are real, that is, there is a possibility that some tweets were generated by bots. Future studies could apply specific machine learning models to detect this type of anomaly, making a filter before analyzing the data. Despite this distrust, it is believed that the possibility is small due to the fact this type of manifestation is more common in internet publications that have the intention of manipulating public opinion, as controversial subjects, as political themes, which is not the case of that article. Another interesting aspect that could serve as a theme for future research is the role of neutrality in the context of contagion. The results found showed a significant percentage of neutral comments in the evaluation of some products. What would be the impact of this type of consumer reaction on the potential for contagion? Do these comments have an inclination to become favorable product evaluations? Future studies will be able to investigate this aspect better. In managerial terms, the analysis of the feelings of newly announced products can contribute to a better definition of communication strategy for companies, especially those of technology. Depending on the stage of maturation of the type of product on the market, generating a buzz or creating a lot of expectations before releasing the product may be unfavorable.In this sense, the use of computational resources such as NLP would make the processes of checking consumer reactions faster and more efficient, and the evaluation of posts would be carried out in a more objective way compared to a human analysis.

12 XLIV ENCONTRO DA ANPAD - EnANPAD 2020 Evento on-line - 14 a 16 de outubro de 2020 - 2177-2576 versão online

References

ARGO, Jennifer J.; DAHL, Darren W.; MORALES, Andrea C. Positive consumer contagion: Responses to attractive others in a retail context. Journal of Marketing Research, v. 45, n. 6, p. 690-701, 2008. ARAL, Sinan; WALKER, Dylan. Identifying influential and susceptible members of social networks. Science, v. 337, n. 6092, p. 337-341, 2012. ATHEY, Susan. Beyond prediction: Using big data for policy problems. Science, v. 355, n. 6324, p. 483-485, 2017. BERGER, Jonah; MILKMAN, Katherine L. What makes online content viral?. Journal of marketing research, v. 49, n. 2, p. 192-205, 2012. BIRD, Steven; KLEIN, Ewan; LOPER, Edward. Natural language processing with Python: analyzing text with the natural language toolkit. " O'Reilly Media, Inc.", 2009. BERGER, Jonah; SCHWARTZ, Eric M. What drives immediate and ongoing word of mouth?. Journal of Marketing Research, v. 48, n. 5, p. 869-880, 2011. BOLLINGER, Bryan; GILLINGHAM, Kenneth. Peer effects in the diffusion of solar photovoltaic panels. Marketing Science, v. 31, n. 6, p. 900-912, 2012. CEDERMAN, Lars-Erik; WEIDMANN, Nils B. Predicting armed conflict: Time to adjust our expectations?. Science, v. 355, n. 6324, p. 474-476, 2017. CLAUSET, Aaron; LARREMORE, Daniel B.; SINATRA, Roberta. Data-driven predictions in the science of science. Science, v. 355, n. 6324, p. 477-480, 2017. CONROY, Niall J.; RUBIN, Victoria L.; CHEN, Yimin. Automatic deception detection: Methods for finding fake news. Proceedings of the Association for Information Science and Technology, v. 52, n. 1, p. 1-4, 2015. DU, Rex Yuxing; KAMAKURA, Wagner A. Measuring contagion in the diffusion of consumer packaged goods. Journal of Marketing Research, v. 48, n. 1, p. 28-47, 2011. GINO, Francesca; AYAL, Shahar; ARIELY, Dan. Contagion and differentiation in unethical behavior: The effect of one bad apple on the barrel. Psychological science, v. 20, n. 3, p. 393- 398, 2009. HATFIELD, Elaine; CACIOPPO, John T.; RAPSON, Richard L. Primitive emotional contagion. Review of personality and social psychology, v. 14, p. 151-177, 1992. HINZ, Oliver et al. Seeding strategies for viral marketing: An empirical comparison. Journal of Marketing, v. 75, n. 6, p. 55-71, 2011. HIRSCHBERG, Julia; MANNING, Christopher D. Advances in natural language processing. Science, v. 349, n. 6245, p. 261-266, 2015. HODAS, Nathan O.; LERMAN, Kristina. The simple rules of social contagion. Scientific reports, v. 4, p. 4343, 2014. HOWARD, Daniel J.; GENGLER, Charles. Emotional contagion effects on product attitudes. Journal of Consumer research, v. 28, n. 2, p. 189-201, 2001. HU, Yansong; VAN DEN BULTE, Christophe. Nonmonotonic status effects in new product adoption. Marketing Science, v. 33, n. 4, p. 509-533, 2014. HUMPHREYS, Ashlee; WANG, Rebecca Jen-Hui. Automated text analysis for consumer research. Journal of Consumer Research, v. 44, n. 6, p. 1274-1306, 2017. IYENGAR, Raghuram; VAN DEN BULTE, Christophe; VALENTE, Thomas W. Opinion leadership and social contagion in new product diffusion. Marketing Science, v. 30, n. 2, p. 195-212, 2011. IYENGAR, Raghuram; VAN DEN BULTE, Christophe; LEE, Jae Young. Social contagion in new product trial and repeat. Marketing Science, v. 34, n. 3, p. 408-429, 2015. MANCHANDA, Puneet; XIE, Ying; YOUN, Nara. The role of targeted communication and contagion in product adoption. Marketing Science, v. 27, n. 6, p. 961-976, 2008.

13 XLIV ENCONTRO DA ANPAD - EnANPAD 2020 Evento on-line - 14 a 16 de outubro de 2020 - 2177-2576 versão online

PARK, Eunho et al. Social dollars in online communities: The effect of product, user, and network characteristics. Journal of Marketing, v. 82, n. 1, p. 93-114, 2018. SMALL, Deborah A.; VERROCHI, Nicole M. The face of need: Facial emotion expression on charity advertisements. Journal of Marketing Research, v. 46, n. 6, p. 777-787, 2009. LE BON, Gustave. The crowd. Routledge, 2017. LEDERBERG, Joshua. Infectious history. Science, v. 288, n. 5464, p. 287-293, 2000. LEHMANN, Donald R. Section I: How do customers and consumers Really behave?. Journal of Marketing, v. 63, p. 14-18, 1999. LOCHER, David A.; LOCHER, David A. Collective behavior. Upper Saddle River, NJ: Prentice Hall, 2002. NEWMAN, Matthew L. et al. Lying words: Predicting deception from linguistic styles. Personality and social psychology bulletin, v. 29, n. 5, p. 665-675, 2003. VILLARROEL ORDENES, Francisco et al. Unveiling what is written in the stars: Analyzing explicit, implicit, and discourse patterns of sentiment in social media. Journal of Consumer Research, v. 43, n. 6, p. 875-894, 2017. ZHANG, Jurui; LIU, Yong; CHEN, Yubo. Social learning in networks of friends versus strangers. Marketing Science, v. 34, n. 4, p. 573-589, 2015.

14