Opinion Exploration of Tweets and Amazon Reviews

INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 01, JANUARY 2020 ISSN 2277-8616 Opinion Exploration Of Tweets And Amazon Reviews

S. Uma Maheswari, Dr. S. S. Dhenakaran

Abstract: Social Network plays an important role in the growth of technology. The Social Network generates huge amount of data for every millisecond. Twitter and Facebook are famous Social Network to share customer‘s feelings, opinions, post etc. This work is related to analyzing Tweets and Amazon Reviews of customer products through online shopping websites. It is seen that customers Tweets vary from website to website and difficult for customer to know about product comments. This problem is identified in the proposed work for conducting sentiment analysis considering different social networks. It is aimed at gathering Reviews from the E-Commerce website and Twitter for analyzing sentiment of products. Since the social network generates a huge amount of data, big data analytic techniques are used for sentiment analysis. This work has derived useful information from online shopping customers and also benefits online retailers to identify customer trend of buying products.

Index Terms: Social Network, Opinion Variation, Tweets, Reviews, Decision Support System, Analysis, Big Data, ——————————  ——————————

1. INTRODUCTION algorithm proposed by Hu and Liu in [2]. Finally sentiment ONLINE shopping customers have different opinion about score is calculated for each sentence based on feature online products. The customer shares their ranking and Reviews are summarized by clustering technique. opinion/feelings/posts on the Social Networks through Twitter, Facebook and also through E-Commerce websites Amazon, 2.2 Sentiment Analysis on Twitter Data Flipkart, etc. The opinions are in terms of Reviews, Tweets, Avinash Kumar et al. [3] have recommended ―Sentiment likes, ratings, comments, Emoji symbols and shares. This Analysis on Twitter Data using a Hybrid Approach‖. Twitter API information is very useful for sentiment analysis for product tweet has collected Tweets and preprocessed for converting Reviews. But these information are structured, semi-structured upper case into lower case, stop word, URL, user name starts and also the un-structured. So collecting real time data, with @ and also ReTweets has been removed in the processing, analyzing are complex. Since social networks preprocessing steps. Using TF-IDF and Count Vectorization deals with large volume of data, big data techniques are used methods feature words extracted. Then most appropriate to manage this type of problem. This work is mainly dealt with features are by random forest method. Multinomial Naïve analyzing Tweets and Reviews on the same product for Bayes is used to classify sentences into positive, negative and generating comments of products for customer and retailer neutral sentences. Using the parameters such as Accuracy, use. This work has collected Twitter Tweets of ―Books‖ on Precision, Recall, and F1-Score are used to measure the Amazon product in real time and analyzed with proposed performance. A researcher Avinash has proposed an algorithm technique. Similarly, Amazon Reviews on Amazon website is to improve the Accuracy from 58% to 75.5%. Mahalakshmi R collected in real time and analyzed with proposed method et al. [4] have devised method for ―Social Sentiment Analysis independently. Also Twitter Tweets and Amazon Reviews are and Data Visualization on Big Data‖. The Twitter data is gathered as a single dataset and analyzed using proposed captured using Hadoop Flume tool, stored in HDFS and technique. Then the performance of the analysis is evaluated. analyzed. Here features are extracted by removing This work is organized into the following sections. Section 2 unnecessary words. Sentiment analysis is done by polarity discusses the related work on sentiment analysis. Section 3 is checking which is calculated with the help of MapRedcue considered the methodology of proposed work. Section 4 function. Each Tweets and their polarity status is stored in describes the results the proposed work. Finally, conclusion is MongoDB. Finally the polarity score is visualized using R tool. provided in the last section. 2.3 Predicting Consumer Product Demand on 2 RELATED WORKS Amazon Data Alain Yee Loong Chong et al. [5] have suggested ―Predicting 2.1 Sentiment Analysis Consumer Product Demands via Big Data‖. Here promotional Benito Alvares et al. [1] have proposed opinion mining on marketing data and customers online Reviews of electronic sentiment analysis. Here the sentiment analysis is done on products are analyzed to predict sales or demands and movie Reviews. First, the movie Reviews are collected by service. Promotional marketing data and customer‘s review of crawling e-commerce websites and stored in a database. Then electronics products are scraped from web pages of Amazon Reviews are preprocessed by POS tagging and feature words E-Commerce website. Sentiment analysis is done for are extracted. Then opinion words are extracted by an electronics products. Number of positive and negative score is calculated by sentiment analysis and influencing factors ———————————————— identified from promotional marketing data and online review  S. UMA MAHESWARI is currently pursuing Ph.D. in Department of data. With influencing factors, sales or product demands are Computer Science in Alagappa University,India, PH-9385983958. predicted by machine learning method.These literature E-mail: [email protected] Reviews has established social network analysis which did not  Dr. S. S. DHENAKARAN is currently Wroking as a Professor in combine customer sentiment/opinion on the different sites Department of Computer Science in Alagappa University, India, E- mail: [email protected] such as Social Network (Twitter) and E-Commerce website (Amazon). So this proposed work can provide details of famous products and their quality, and sellers also know about 2514 IJSTR©2020 www.ijstr.org INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 01, JANUARY 2020 ISSN 2277-8616 the customer‘s opinion about their product and expectations. 3. Emoji 4. Slang word (OMG) 3 PROPOSED WORK AND METHODOLOGY 5. Modifiers words (Most) In this proposed work, two different datasets viz Twitter‘s 6. Intensifier words (Verrryyyyy, Fabulous!, Tweets about the Amazon product on ―Books‖ product and E- EXCELLENT) Commerce website Amazon data about the product ―Books‖ 7. Conjunction words (while, although, but) are collected. Twitter API has been used to collect real time 8. Negation words (Not Excellent) Tweets of Books product which is stored in HDFS. [7], [8], [9], [10]. Dictionaries and word list have been created for Attributes of Amazon Reviews sentiment analysis. Positive and Negative word dictionary, 1. Review‘s Text slang word dictionary, Emoji word dictionary are created. Also 2. Overall Rating intensifiers word list, modifiers word list, stop words list and Sentiment Analysis and Classification negation word list are created. Then collected data are For sentiment analysis, sentiment words are identified from preprocessed by URL (http://) removal, User name (@) each tweet and review word by negation word handling, emoji removal, Hash Tag (#) removal, Stop word removal (is, the, symbol handling, slang word handling, modifier words etc), special symbol (―‖, : , ; , %, *, ^, etc.) removal then handling, intensifier word handling, and conjunction word tokenizing and stemming. Then POS-Tagging and polarity handling methods [7], [8], [9], [10]. Polarity score is calculated score calculation are done for each tweet and each Amazon for each sentiment word of tweet and review by the set of rules Reviews. Finally, it generates Tweets and Reviews with defined in this work and polarity is aggregated. Then sentiment corresponding sentiment score. Similarly, the above processes score has been calculated for each tweet and review. Tweets are applied to combined dataset. and Reviews are classified positive or negative when the sentiment score is above zero or less than zero. The Proposed Methodology combined dataset is again used with the above principle to  Twitter Tweets and Amazon Reviews are collected study sentiment score. and preprocessed  Apply POS-Tagging and polarity score on Confusion matrix preprocessed data Confusion matrix produces the classification report such as  Polarity score are calculated for each word in Tweets Accuracy, Precision, Recall and F1-Score of the used and Reviews algorithm [6].  Tweets and Reviews are classified and prediction for customer Accuracy  Combined Tweets and Reviews are classified and Accuracy is the number of correctly predicted predictions. This predicted is calculated based on the confusion matrix [6].  Performance is evaluated for each tweet and review Accuracy Rate = (TP + FN) / (TP + TN + FP + FN) classification and prediction Precision Precision is the number of Positive class. This is calculated by the following formula [6]. Precision = TP / (TP + FP) Recall Recall is the number of predicted positive class. This is calculated by the following formula [6]. Recall = TP / (TP+FN) F1-Score F1- Score is the harmonic mean of Precision and Recall. This is calculated by the following formula [6]. F1- Score = 2 *(Precision * Recall) / (Precision + Recall) Performance Evaluation The performance of the finding sentiment analysis is measured by confusion matrix using Accuracy, Precision, Recall and F1-Score. It is seen that the performance is better than the existing work by producing acceptable Accuracy.

4 RESULTS AND DISCUSSIONS The proposed work is tested on Tweets and Reviews about the Amazon product ―Books‖. Considering three different dataset such as Twitter Tweets dataset, Amazon Reviews dataset, and combined dataset (Tweets and Reviews). This dataset contains the real time Tweets data and also the Fig. 1. Architecture of the Proposed Work existing Amazon Reviews data.

Attributes of Twitter Tweets 4.1 Experiment - I 1. Retweet count This experiment is conducted with nearly 1300 Tweets which 2. Likes count have produced Accuracy 99.50%, Precision 100%, Recall

99.30%, and F1-Score 99.65%. Here the threshold value is 0.05. Since all performance metrics are above 90%, the sentiment analysis suggests better customer Tweets for buying Books product. Percentage of the positive score calculated from the proposed technique is for recommend the customers for buying the product Books. Confusion Matrix for Tweets

[283 0] [57 10]

In this Experiment I, proposed technique produced 988 preprocessed Tweets out of 1300 Tweets. Which means only 988 Tweets have the sentiment words remaining Tweets do not have the sentiment words so those Tweets are eliminated by the preprocessed method of the proposed technique. Proposed Rule Based Sentiment Analysis classified 715 Tweets as positive and 283 Tweets as negative out of preprocessed 988 Tweets. Figure 3 show that the sentiment analysis and classification on Tweets dataset provides the maximum number of positive rate than the negative rate. Since positive score is higher, people mostly likes the product ―Books‖ based on Twitter Tweet.

4.2 Experiment - II This experiment is conducted with nearly 1100 Amazon Reviews which have produced Accuracy 87.34%, Precision 100%, Recall 87.22%, and F1-Score 93.17%. Here also threshold value is fixed at 0.05. Seeing performance metrics, the sentiment analysis suggests recommendations is less than Twitter Tweets for buying Books product

Confusion Matrix for Reviews TABLE 1

EXPERIMENTAL RESULT ON TWITTER TWEETS TABLE 2 Accuracy (%) Precision (%) Recall (%) F1-Score (%) EXPERIMENTAL RESULT ON AMAZON REVIEWS

99.50 100 99.30 99.65 Accuracy (%) Precision (%) Recall (%) F1-Score (%)

87.34 100 87.22 93.17

[10 0] [132 901]

TABLE 3 EXPERIMENTAL RESULT ON TWEETS + REVIEWS Accuracy (%) Precision (%) Recall (%) F1-Score (%)

93.28 100 92.16 95.92

In this Experiment II, proposed technique produced 1033 In this Experiment III, proposed technique produced 2125 preprocessed Reviews out of 1100 Reviews. Which means preprocessed (Reviews + Tweets) out of 2400 Reviews. Which only 1033 Reviews have the sentiment words remaining means only 2125 (Reviews + Tweets) have the sentiment Reviews do not have the sentiment words so those Reviews words remaining (Reviews + Tweets) do not have the are eliminated by the preprocessed method of the proposed sentiment words, so those (Reviews + Tweets) are eliminated technique. Proposed Rule Based Sentiment Analysis classified by the preprocessed method of the proposed technique. 1033 Reviews as positive and 10 Reviews as negative out of Proposed Rule Based Sentiment Analysis classified 2125 preprocessed 1100 Reviews. Figure 5 show that the sentiment (Reviews + Tweets) as positive and 10 Reviews as negative analysis and classification on Reviews dataset provides the out of preprocessed 2400 ReviewsFigure 6 show that the maximum number of positive rate than the negative rate. sentiment analysis and classification on Tweets + Reviews Since positive score is higher, people mostly like the product dataset provides the maximum number of positive rate than ―Books‖ Amazon product. the negative rate. So that people mostly likes the product ―Books‖. 4.2 Experiment - III This experiment is conducted nearly 2200 data by combining 4.2 Experiment - IV Tweets + Reviews of Amazon Books product. Here, the This experiment-IV has been done on existing machine Accuracy 93.28%, Precision 100%, Recall 92.16%, F1-Score learning techniques such as, Multinomial Naïve Bayes, 95.92% is calculated with threshold 0.05.eviews. This result Support Vector Machine, Decision Tree, Logistic Regression shows that the proposed work has produced the better result and Random Forest algorithms. This result shows that the for combing customer comments on Amazon Books product. proposed work has produced the best result. And when compare the proposed technique with existing machine Confusion Matrix for Tweets + Reviews learning techniques; proposed technique only produced the [294 0] best positive sentiment score percentage. In this Experiment [138 1622] IV, Existing Machine Learning Techniques have the same positive sentiment score percentage 79% out of 2125 Tweets + Reviews. But proposed Technique Rule Based Sentiment Analysis has the 86% of positive sentiment score out of 2055. Since the positive score percentage is above 80% this product ―Books‖ is recommended for the customers to the product Books. So the proposed technique perform best when 2517 IJSTR©2020 www.ijstr.org INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 01, JANUARY 2020 ISSN 2277-8616 compare to the existing techniques. Boying Li, ―Predicting Consumer Product demands via Big Data: the roles of online promotional marketing and online 5 CONCLUSION Reviews‖, International Journal of Production Research, In this proposed work, customer opinion of Books Amazon 2015. product on Twitter Tweets and Amazon product Reviews have [6] S. Uma Maheswari, S. S. Dhenakaran, ‗Sentiment been collected. Variation on the Tweets and Reviews are Analysis on Social Media Big Data With Multiple Tweet analyzed, classified as positive and negative, and sentiment Words‘, International Journal of Innovative Technology predicted using the proposed technique. Based on the defined and Exploring Engineering (IJITEE), ISSN: 2278–3075 set of rules polarity scores calculated for each word and (Online), Volume-8 Issue-10, August 2019, Page sentiment score calculated for each Tweets and Reviews. No.:3429-3434. Based on the calculated threshold value sentiment has [7] Mr. Amritkumar Tupsoundarya1, Prof.Padma, S. classified and predicted. From the result of this proposed wok Dandannavar, Sentiment Expression via Emoticons on can conclude that the proposed work perform well and Social Media: Twitter, International Journal for Research in producing best Accuracy. And the result of this work tells that Applied Science & Engineering Technology (IJRASET), the Amazon product ―Books‖ has the highest positive opinion ISSN: 2321-9653; Volume 6 Issue VI, June 2018. rate than the negative opinion rate. [8] Wareesa Sharif, Noor Azah Samsudin, Mustafa Mat Deris, Rashid Naseem, Muhammad Faheem Mushtaq, Effect of TABLE 4 Negation in Sentiment Analysis, International Journal of EXPERIMENTAL RESULT ON MACHINE LEARNING TECHNIQUES Computational Linguistics Research Volume 8 Number 2 Classification Positive Positive Negative Total June 2017 Techniques Score % [9] Liang Wu, Fred Morstatter, Huan Liu, SlangSD: Building Multinomial Naïve 1687 438 2125 79% and Using a Sentiment Dictionary of Slang Words for Bayes Short-Text Sentiment Classification, arXiv: 1608.05129v1 Support Vector Machine 1687 438 2125 79% [cs.CL] 17 Aug 2016. [10] Rizal Setya Perdana and Aryo Pinandito, Journal of Decision Tree 1687 438 2125 79% Telecommunication, Electronic and Computer Engineering, e-ISSN: 2289-8131 Vol. 10 No. 1-8. April- Logistic Regression 1687 438 2125 79% 2018.

Random Forest 1687 438 2125 79% Rule Based Sentiment Analysis (Proposed 1761 294 2055 86% Technique)

ACKNOWLEDGMENT This Proposed Research Work ―Opinion Exploration of Tweets and Amazon Reviews‖ has been written with the financial support of RUSA - Phase 2.0 grant sanctioned vide Letter No. F. 24-51 / 2014-U, Policy (TNMulti -Gen), Dept.of Edn. Govt. of India, Dt.09.10.2018.

REFERENCES [1] Benito Alvares, Nishant Thakur, Siddhi Patil, ―Sentiment Analysis Using Opinion Mining‖, International Journal of Engineering Research & Technology (IJERT), ISSN: 2278- 0181, Vol. 5 Issue 04, April-2016. [2] Hu and Liu in ―Mining and Summarizing Customer Reviews‖ 2004. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1. 76.2378&rep=rep1&type=pdf I.S. Jacobs and C.P. Bean, ―Fine Particles Thin Films and Exchange Anisotropy‖, in Magnetism, vol. III, G.T. Rado and H. Suhl, Eds, New York: Academic, 1963, PP. 271-350. [3] Avinash Kumar, Savita Sharma, Dinesh Singh, ―Sentiment Analysis on Twitter Data using a Hybrid Approach‖, International Journal of Computer Sciences and Engineering, Vol.-7, Issue-5, May 2019, E-ISSN: 2347- 2693. [4] Mahalakshmi R, Suseela S, ―Big-SoSA:Social Sentiment Analysis and Data Visualization on Big Data‖, International Journal of Advanced Research in Computer and Communication Engineering, Vol. 4, Issue 4, April 2015. [5] Alain Yee Loong Chong, Eugene Ch‘ng, Martin J. Liu, &