International Journal of Pure and Applied Mathematics Volume 119 No. 15 2018, 235-241 ISSN: 1314-3395 (on-line version) url: http://www.acadpubl.eu/hub/ Special Issue http://www.acadpubl.eu/hub/

A SVM Based Sentiment Analysis Method (SBSAM) for Unigram and Tweets

S. Geetha Dr. R. Kaniezhil Assistant Professor Principal Department of Computer Science MIT College of Arts & Science for Women Muthurangam Govt. Arts College(Autonomous) Musiri - 621211 Vellore- 632001 Email:[email protected] Email: [email protected] I. INTRODUCTION Abstract: sites are places where citizens voice their opinions without fear. There is growing sense of Social media sites are places where citizens voice urgency to understand public opinions because of the viral their opinions without fear. There is growing sense of nature of social media. Making sense of these mass conversations for interacting meaningfully is in demand. urgency to understand citizens opinions because of the Sentiment analysis (SA) is the study where sentiments are viral nature of social media. Making sense of these computed for a conclusion. SA can apply anywhere as mass conversations for interacting meaningfully are public opinion on variety of subjects can be assessed. increasing in demand. Sentiment analysis is the study Stock markets fluctuations are in a way based on public where sentiments are computed for a conclusion. opinion. SA can help optimize promotional strategies. In Sentiments Entity can be classified as product, person, tactics, sentiment analysis can help fit , event, concept. Analysis of sentiments is a marketing campaigns for target audiences. Success of a dynamic and challenging task [1]. This has created the campaign also lies in positive discussions amongst demand for Information filtering systems (IFS) dealing customers, where sentiment analysis plays a major role. Moreover, the volume of digital information on the with information overload (Recommender Systems) [2]. Internet has been responsible in increasing access times on Information retrieval systems have been able to satisfy items of interest for users. This voluminous information these analysis by prioritization or personalization of has to be filtered, prioritized and delivered to users to interesting information found on the internet, but are satisfy their search requirements for recommendations. scarce in numbers. These systems also called SA can help quality refinements and further research. recommender systems, filter vital information This paper underlines the need for sentimental analysis fragments according to user‟s preferences or interest or and recommender systems based on sentimental analysis behaviors from the dynamically generated internet for users. Further, it proposes a Sentiment Analysis information [3]. Recommender systems are beneficial Method based on SVM technique (SBSAM) on unigram and bigram tweets. SBSAM aims to fulfill to service providers and users alike and can predict a sentimental analysis with speed for recommender systems user preferences on items based on their profile[4]. The useful to end users. systems help improve qualitative decision making processes [5]. Millions of users share their opinions on

235 International Journal of Pure and Applied Mathematics Special Issue

various topics on social networking sites. The limit of randomly, start to plan strategically for promoting 140 characters for messages in these micro-blogging their products based on public feedbacks. sites forces the users to be concisely expressive in their Managerial viewpoints can based on the analysis of comments or opinions. customers chit chat about and used to post , a social media site has over two hundred full brand information to gauge how customers million users, where more than 50% are active. More perceive the product or brand. It is a place for than fifty percent if Twitter users log on generating optimization of marketing strategies. By listening more than 200 million tweets per day [6]. Public to customers feeling high level decisions can be sentiments reflect on tweets which can be analyzed. adjusted to meet customer needs. Tactical These expressed sentiments are vital to firms for marketing can be effected by building short-term finding out responses about their products or to marketing campaigns for customer requirements. politicians for predicting election results or investors By continuously having sentiment analysis in for predicting stock prices. Studies classified sentiments place, campaigns can be adjusted to fit more of the as positive and negative based on unigrams, features target audiences. and a tree kernel. Unigrams were analyzed for more Measuring ROI of Campaigns: Success of than hundred features and joint to a tree [7]. Tweet marketing campaign may be measured by the sentiment namely objective and subjective were increase in the number of followers or comments automatically identified and separated in [8]. In the or likes. The true success lies in positive second step subjective message‟s polarity was discussions amongst customers on campaigns. determined. Sentiments in linguistic features were Sentiment analysis can relate and count positive or studied in [9] and evaluated the usefulness of lexical negative discussions that have occurred amongst resources used in informal and creative texts of tweets. audiences. By combining quantitative and e Parts of speech were found to be less useful in qualitative measurements, the true ROI of analyzing micro-blogging tweets [10]. They extracted marketing campaigns can be measured. features from lexicons along with micro-blogging Developing product quality: Sentiment analysis features and classified automatically Twitter sentiments helps completion of market research by assessing The messages were classified as positive or negative. customer opinions on products/services and ways Their framework had two distinct components namely to align products/services. Products are judged by classifiers and feature extractors which achieved higher presentations like pricing and package design. accuracy with algorithms on Ideas on developing product quality and sentiment analysis. presentation can be derived directly from customer Thus, sentiment analysis from user tweets can be of opinions. Sentimental Analysis of opinions is an help in many areas. Manual extraction of such useful alternative to structured and planned surveys. It can information from this voluminous data is almost also be used to fulfill customer complaints. impossible. Sentiments can be categorized into positive Improving customer service: Customers who buy or negative or neutral which helps determine attitude of products try to be loyal to the brand as long as the general public on topics. This paper aims to detect possible and influence the products or brands sentiments from tweets as accurately as possible. The positively. Hence, it is important to have the best technique has three main parts, where preprocessing customer service in place for keeping current tweets text is the first part, in the second part, required customers happy. On-time delivery, response on features are extracted and in the final part machine social media and reimbursement for erroneous learning algorithm SVM is used for classification [11]. product are some of the cues for good customer Thus, this paper proposes a novel method for service. Sentiment analysis can identify negative sentimental analysis called, SBSAM based on SVM discussions and thus be an alert on improvements. classifier, for accurate and automatic sentiment analysis Faster responses draws customer attention and of twittered tweets. eventually their satisfaction. Sentiment analysis is a part of social complaint listener that helps avoid A. Benefits of Sentimental Analysis customers feeling ignored and angry like Finnair responding to customer‟s twitter. Sentimental Analysis can be used by celebrities, Crisis management: Constant monitoring of organizations, governments and individuals to get social media conversations also helps minimize public opinions voiced on social media. Some of the damage of crisis due to online communications. A major benefits of sentimental analysis is detailed below product's bad quality, environmental worries , bad Fine-tune marketing strategies: Many customer service can create a catastrophe in organization actively use social media to promote emerging markets. Sentiment analysis can detect their products. Even companies which post

236 International Journal of Pure and Applied Mathematics Special Issue

manifestations from customer discussions and thus names, they are prone to poor results. Mostly names get help manage crisis early. misclassified. Lead generation: Accurate sentiment analysis can Identifying the user’s preferences: Sentiment result in better marketing campaigns and with analysis in political arena can forecast election customer service, and quality, it can result in new outcomes. Categorizing users into groups like left, right leads. Happy customers, act as brand ambassadors and independent and gaining information about users and bring in new customers. Also, customer needs, helps improve the social media-based predictions, but their views can give create a path for better selling are not too easy [17]. content that can attract new customers. Hashtags: Sentiment analysis on and Sales Revenue: The biggest benefit of sentiment emoticons can identify emotions for monitoring [18]. analysis lies in boosting sales revenue. Positive People use a plethora of hashtags in their tweets and discussions on a product boost its sales, increasing due to the dynamic nature of the tweets, the quality, sales revenue, while negative discussions reduce quantit, and freshness of labeled data plays a vital role sales. in creating a robust classifier. Using popular hashtags for automatic labeling leads to incorrectly labeled B. Challenges in Sentimental Analysis instances, which are difficult to interpret. Links: Tweet classifiers rely on tweet contents and Social media‟s explosion and publicly voiced may ignore URL pointing contents. Most tweets contain comments on this media has created a great scope for URLs which are crucial for completion of sentimental research in natural language processing or Sentimental analysis of a tweet. There is very limited work in this Analysis [12]. Organization use systems to interpret area [19]. customer conversations for feedbacks [13]. There are Sarcasm: This is a major challenge with its hidden multiple entities like features, entities, sentences and meanings. Though Neural network was used to identify documents while analyzing sentiments from public sarcasm detection, it is still an open research issue and tweets [14]. There is a need to separate text on an entity more researches are needed to overcome this challenge and then analyze sentiments. Analyzing sentiments [20]. from tweets is a challenging task, due to inbuilt conventions of twitter communications like hashtags. II. RECOMMENDER SYSTEMS Sentiments are subjective, but in some cases they are expressed objectively, making it complicated to System based recommendations are strategic understand the sentiment expressed. For example the decisions taken for users in complex information , unpredictable, can be positive while describing environments like social media [21]. They were defined movies, but may denote negative sentiments while for preferences of E-commerce users. Recommender describing vehicle driving. Another challenge is in systems handle information overload for providing tracing sarcasm, where negative opinions are expressed personalized services to users and can be based on using positive . Simple English words like no, content or be collaborative or be a mix of both not, never when used in conjunction with positive collaborative and content filtering [22]. An online words are difficult to be identified as negative filtering system (Ringo) uses music album ratings to sentiments. Further, the scarcity in sentimental analysis build users profile using [23]. researches is due to restrictions and security policies in Amazon‟s algorithms improve recommendations with accessing social media data. A few of the challenges collaborative filtering that generates a table of similar faced in sentimental analysis is detailed below items with item-to-item matrix and then recommends similar online products. The user‟s users‟ purchase Dynamic nature of Tweets: The main challenge is history is also considered [24]. Content-based filtering coping with the dynamic nature of tweets for techniques predict basing their operations only on classification. Sentiments expressed may soon lose user‟s information [25]. scope and context, making the comment irrelevant. Sentiments expressed implicitly like political comments III. SBSAM are difficult to trace [15] [16]. Most of the machine learning approaches assume an identical distribution, Twitter user content is voluminous and worthy of which may not occur in real world environments. SVM research in sentiment analysis [26], since it is a is a robust model for managing such dynamic changes. complex task which involves machine learning, Annotating tweets is an added challenge as tweets can language processing and mining. One basic part of pre be short, satirical or vague. processing tweets is Part-of-speech (POS) tagging. It is Candidate dependency: Most of sentiment analysis a basic form of syntactic analysis which has countless tools are object independent and when tweets contain applications in natural language processing. Twitter

237 International Journal of Pure and Applied Mathematics Special Issue

data poses additional problems, since the conversational Pre-processing: In pre-processing phase SBSAM nature of text is different, lacks conventional removes all non English tweets. Emoticons are orthography and has a 140-character limit of each replaced by their polarity. Word Polarity can be tweet. The tweets include twitter hashtags, positive like sweet or negative like evil. unsupervised capitalization, lengthened words, url‟s, exclamation approaches use publicly available online dictionaries marks, question marks and internet emoticons and for word polarity. Custom polarity dictionary is built slangs. The tweet text is processed for classification by and the tweet words are checked for polarity. SBSAM in four phases namely Tokenization, Polarity can also judged by finding a middle ground preprocessing, Feature extraction and Classification. between the above two approaches. Url and Figure 1 depicts the flow of SBSAM. web reference tokens http and @, Hashtags and Numbers are removed as only text of the tweet has to be analyzed. Negative mentions and repeated characters are then replaced. Finally Nouns and Prepositions are removed from the stream text for further processing. The overall polarity of a tweet is the sum of individual polarities. Feature Extraction: This phase is the base for classification of tweets. The sentiment orientation of individual words in the phrase or sentence is found and then combined to determine the sentiment of the phrase or sentence. SBSA calculates positive/negative/neutral polarity score of the tweets, which are based on pre-defined or compiled words Fig. 1 – Flow of SBSAM and a sentiment function. It is also called as opinion mining or people‟s view on a subject or item or topic. Tokenization: In this phase a stream of text is Each word is given a score of minus 1 for negative, broken up into words, symbols and other zero for neutral and plus one when positive. The meaningful (Tokens). Tokens are separated by words are rated from -5 to +5, where Very Negative is whitespaces and or punctuation characters. It -5 or -4, Negative is -3to -1, Positive is 1-3 and Very Tokens are individual components that make up a Positive is 5. SBSA assesses the polarity in POS Tags tweet. This study used ARK Social Media Search‟s Emoticons, words and Hashtags. Polarity scores Twitter POS Tagger. The POS each individual sometimes may not correspond to the overall context word in a sentence is identified as noun, pronoun, of the tweets because of its focus on individual words. adverb, adjective, verb, interjection, etc. Twitter In spite of this hurdle, feature extractors help in based features are more informal and include twitter classification of the processed tweets. Figure 3 depicts hashtags, retweets, word capitalization, word etc. Sentiment Analysis of Tweets Figure 2 depicts a POS Tagger

Fig. 3 – Flow of Tweet Sentiment Analysis

Classification: Though training classifiers on unknown data simplifies predictions, features Fig. 2- POS Tagger extraction plays an important part prior to classification. Prior polarity of words and phrases is used for sentiment classifications. Support vector

238 International Journal of Pure and Applied Mathematics Special Issue

machine operations are performed in an input space Accuracy of SBSA on Unigram and Tweets that analyzes the data with a kernel after defining DATASET ACCURACY decision boundaries [27]. The input data are two sets 100 0.5245 of vectors of size m each. Then every data which 200 0.6137 represented as a vector is classified into a class. The 500 0.72118 distance between documents is found (margins), 1000 0.7487 where the margin is directly proportional to 5000 0.7597 10000 0.76666 deciusions. More the margin, more of indecisive decisions can occur.

IV. RESULTS AND DISCUSSION SBSA Precision The dataset used by SBSAM in the study was

15000 created using tweets downloader and Twitter API. The raw dataset was labeled and analyzed after using 10000 various feature extractors. SBSA was applied right from the preprocessing stage to classification of tweets 5000 Dataset for sentiments. The performance of SBSA Accuracy

classifications were evaluated using equations (1) and Dataset Size 0 (2) 1 2 3 4 5 6

Accuracy = (TP+TN)/(TP+TN+FP+FN) ………(1) Accuracy

Precision = TP/(TP+FP) ………………..(2) Fig. 5 – SBSA Precision

Figure 4 depicts a snapshot of SBSA polarizing tweets Table 2 lists the accuracy achieved by SBSA on for both positive and negative polarity. unigram and bigram tweets analysis.

TABLE 2 Perceived Accuracy of SBSA

Algorithm Accuracy (%)

SBSA with Unigram 81.668

SBSA with Bigram 83.451

It is evident from Table 1 and Figure five that SVM based classifiers perform better on larger dataset sizes. As the dataset sizes grow SBSA precision rate improves. On accuracy, SBSA performs well with both Fig. 4 – SBSA Negative Polarity Output unigram and bigram tweets, where the accuracy increases in bigrams. It is evident again that the accuracy will improve further on N-grams.

V. CONCLUSION

This paper has proposed a method to accurately analyze sentiments of twitter tweets using machine learning algorithms. It has proved that methods based on SVM increase sentiment analysis accuracy on the voluminous internet data. Thus it can be concluded that cleaner and accurate results can be obtained. Current research work focuses mostly on classifying positive vs. Fig. 5 – SBSA Positive Polarity Output negative correctly. Tweets without sentiments also need to be classified for improving accuracy further. Though T ABLE 1. the current Research work focus mainly on English

239 International Journal of Pure and Applied Mathematics Special Issue

content, it can be extended to other languages as well as [19] P. Anantharam, K. Thirunarayan, and A. Sheth, "Topical twitter is a global social networking site. anomaly detection from twitter stream," Proceedings of the 4th Annual ACM Web Science Conference, ACM, 2012. [20] S. Poria, et al., "A deeper look into sarcastic Tweets using deep REFERENCES convolutional neural networks," arXiv preprint arXiv: 1610.08815, 2016. [1] R. Xia, C. Zong, and S. Li, “Ensemble of feature sets and [21] P.Priyanka, Deivanai Kathiresan, “A Machine Learning classification algorithms for sentiment classification,” Approach To Mainframe Analysis”, International Journal of Information Sciences: an International Journal, vol. 181, no. 6, Innovations in Scientific and Engineering Research (IJISER), pp. 1138–1152, 2011. Vol.4, No.1, pp.18-24, 2017. [2] Konstan JA, Riedl J. Recommender systems: from algorithms to [22] Rashid AM, Albert I, Cosley D, Lam SK, McNee SM, Konstan user experience. User Model User-Adapt Interact 2012;22:101– JA et al. Getting to know you: learning new user preferences in 23. recommender systems. In: Proceedings of the international [3] Pu P, Chen L, Hu R. A user-centric evaluation framework for conference on intelligent user interfaces; 2002. p. 127–34. recommender systems. In: Proceedings of the fifth ACM confer- [23] Jalali M, Mustapha N, Sulaiman M, Mamay A. WEBPUM: a ence on Recommender Systems (RecSys‟11), ACM, New York, web-based recommendation system to predict user future move- NY, USA; 2011. p. 57–164. ment. Exp Syst Applicat 2010;37(9):6201–12. [4] Hu R, Pu P. Potential acceptance issues of personality-ASED [24] Chen LS, Hsu FH, Chen MC, Hsu YC. Developing recommender systems. In: Proceedings of ACM conference on recommender systems with the consideration of product recommender systems (RecSys‟09), New York City, NY, USA; profitability for sellers. Int J Inform Sci 2008;178(4):1032–48. October 2009. p. 22–5. [25] Ziegler CN, McNee SM, Konstan JA, Lausen G. Improving [5] Pathak B, Garfinkel R, Gopal R, Venkatesan R, Yin F. recommendation lists through topic diversification. In: Empirical analysis of the impact of recommender systems on Proceedings of the 14th international conference on World sales. J Manage Inform Syst 2010;27(2):159–88. Wide Web; 2005. p. 22–32. [6] Ben Parr. Twitter Has 100 Million Monthly Active Users; 50% [26] Min SH, Han I. Detection of the customer time-variant pattern Log In Everyday. http://mashable.com/2011/10/17/twitter- for improving . Exp Syst Applicat costolo-stats. 2010;37(4):2911–22. [7] Bo Pang, Lillian Lee and Shivakumar Vaithyanathan. Thumbs [27] Alan Ritter, Colin Cherry, and Bill Dolan. 2010. Unsupervised up? Sentiment Classification using Machine Learning modeling of Twitter conversations. In Proc. of NAACL Techniques. In Proceedings of the Conference on Empirical [28] Liu, S., Li, F., Li, F., Cheng, X., &Shen, H.. Adaptive co- Methods in Natural Language Processing (EMNLP), 2002. training SVM for sentiment classification on tweets. In [8] Chenhao Tan, Lilian Lee, Jie Tang, Long Jiang, Ming Zhou and Proceedings of the 22nd ACMinternational conference on Ping Li. User Level Sentiment Analysis Incorporating Social Conference on information & knowledgemanagement (pp. Networks. In Proceedings of ACM Special Interest Group on 2079-2088). ACM,2013. Knowledge Discovery and Data Mining (SIGKDD), 2011 [9] Efthymios Kouloumpis, Theresa Wilson and Johanna Moore. Twitter Sentiment Analysis: The Good the Bad and the OMG! In Proceedings of AAAI Conference on Weblogs and Social Media (ICWSM), 2011. [10] Hatzivassiloglou, V., & McKeown, K.R.. Predicting the semantic orientation of adjectives. In Proceedings of the 35th Annual Meeting of the ACL and the 8th Conference of the European Chapter of the ACL, 2009. [11] AnalyticsVidhaya: https://www.analyticsvidhya.com/blog/ 2015/10/understaing-support-vector-machine-example-code/ (accessed on 12/11/16) [12] E. Cambria, et al., "New avenues in opinion mining and sentiment analysis," IEEE Intelligent Systems, vol. 28, no. 2, 2013, pp.15-21 [13] E. Cambria, " and sentiment analysis," IEEE Intelligent Systems 31.2 (2016): pp. 102-107. [14] A. Agarwal, B. Xie, I. Vovsha, O. Rambow, and R. Passonneau. Sentiment analysis of twitter data. In Proceedings of the Workshop on Languages in Social Media, LSM '11, pages 30{38, Stroudsburg, PA, USA, 2011. Association for Computational Linguistics. [15] M. Ebrahimi, et al., "Recognition of side effects as implicit - opinion words in drug reviews," Online Information Review, vol. 40, no. 7, 2016, pp. 1018-1032.] [16] A.H. Yazdavar, M. Ebrahimi, and N. Salim, "Fuzzy Based Implicit Sentiment Analysis on Quantitative Sentences," arXiv preprint arXiv: 1701.00798, 2017. [17] Chen, et al., "Extracting Diverse Sentiment Expressions with Target-Dependent Polarity from Twitter," Sixth International AAAI Conference on Weblogs and Social Media, 2012. [18] W. Wang, et al., "Harnessing twitter „big data‟ for automatic identification," Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International Conference on Social Computing (SocialCom), IEEE, 2012.

240 241 242