Sentiment Analysis for Imdb Movie Review
Total Page:16
File Type:pdf, Size:1020Kb
Sentiment Analysis for IMDb Movie Review Ang (Carl) Li December 2019 Abstract their attributes (Liu, 2012). In applications such as recommender systems, \what other people think" has Sentiment analysis has long been a problem for busi- always been an important piece of information dur- ness, marketing and management areas for more ing the decision process (Pang, Lee, et al., 2008), like value earned in the decision process. Previous recommendation of clothes in Amazon based on the work about sentiment analysis has been focusing on sentiment analysis of user reviews. document-level, sentence-level and word-level senti- Application of sentiment analysis has been revealed ment extraction, with both supervised and unsuper- in multiple areas. In social media such as Facebook vised approaches. In this paper, I'm introducing and Snapchat, there are goldmines of consumer sto- classification model for sentiment analysis with con- ries and opinion data (Lexalytics, 2019), which can text information participated in the feature space. be used for advertising, marketing, recommending The dataset was captured from IMDb movie re- new friend relationship, etc. But social posts are views, in which I sampled 1,000 instances from the full of complex abbreviations, acronyms, and emoti- huge dataset and split it by the 20%-70%-10% ratio cons, which are non-trivial problems for the feature for development, cross-validation and final test sets. space in machine learning models. The sheer volume Through multiple error analysis, including stretchy is a problem, too. Hopefully, a successful model can patterns, character N-grams and elimination of stop save the human beings some valuable hours parsing words, and tuning procedure on ridge parameter, the mountains of social data by hand (Lexalytics, 2019). performance on final test set hit 84% of percentage Such things can also be applied to business intelli- correctness and 0.6806 in kappa statistics, revealing gence models. For example, the sentiment analysis marginal improvement to the baseline Logistic Re- statistics can be used to estimate the retention rate gression model. Some exploration of data and dis- of customers in a new product, adjusting the cur- cussion about this, for error analysis and tuning, is rent marketing situation and trying to satisfy the also included through the work. Future work of this customers in a better way (Gupta, 2018). Such a model may be focusing on some new approaches like monitoring approach has enabled the companies to deep learning, and introduction of embedded context adapt their business plan in real time, which can lead information in the feature space. to possible reduction of cost on marketing. For the area of analysis of movie review, sen- timent analysis means finding the mood of the 1 Introduction public about how do they judge a specific movie (PythonforEngineers, 2019). For details, the docu- Sentiment analysis, also called opinion mining, is the ments, from the user reviews, are classified based on field of study that analyzes people's opinions, senti- the mood they are expressing. Some sentiment anal- ments, evaluations, appraisals, attitudes, and emo- ysis tasks do a binary classification like positive and tions towards entities such as products, services, or- negative, and others may do multi-level classification ganizations, individuals, issues, events, topics, and like positive, somehow positive, neutral, somehow neg- 1 ative and negative. The actual number of classes de- pends on the aims of specific tasks. Previous papers have talked about the traditional approaches for sentiment analysis on movie review datasets. In this paper, I am trying to extend the feature space of movie review dataset, which is rarely seen in previous work, to achieve an increase of clas- sification performance. Parameter tuning is also in- cluded in this process to achieve a better perfor- mance. This paper is organized by nine parts: Introduc- Figure 1: Extraction-based Approach tion, which is this part, introduces the background about sentiment analysis problem on movie reviews. In Related Work, I will cover the typical approaches lack of making use of the original labels of training in previous papers about sentiment analysis and its data in an unsupervised model. In the work of Pang application. For the following Data Collection, I will and Lee (Pang & Lee, 2004), the classification task is introduce the data source and baseline feature space between two labels thumb up and thumb down, which of the IMDb movie review dataset I am using. Data is similar to positive and negative, through which the Preparation part will be talking about the transfor- algorithm identifies the subjective sentences in a doc- mation of data from its raw format into the ARFF ument and extract them for determination of thumb files I am using in WEKA and the CSV file I am using up or thumb down. Such approach is shown in Figure in LightSIDE. In Baseline Performance, a baseline 1. approach will be shown and all the following work Another step for sentiment analysis approach is will be improvement on the baseline. The error anal- based on sentence level is from Hu and Liu's work ysis will be shown in Error Analysis, showing the ap- (Hu & Liu, 2004). Instead of focusing on documents, proaches that make sense for improvement of per- the concentration on opinions is based on word-level, formance. In Tuning, I will be using parameter tun- in which they try to extract the words as representa- ing to improve the performance of baseline algorithm. tion of opinions shown in a document. In Kim and The result of improvement on the final test dataset Hovy's work (Kim & Hovy, 2004), a word classifi- will be shown in Final Result. In Discussion, I will cation model is used to calculate the polarity of all summarize the approaches I made in the project and sentiment-bearing words. The calculation results are introduce any improvement points for the future. combined for evaluation of sentiment for a whole sen- tence. The details of different levels of architecture are shown in Figure 2. 2 Related Work In recent years, some of the research is focusing on the phrase-level of sentiment analysis, which is con- For a long time, sentiment analysis has been handled sidered as a combination of the words. Wilson et al's as a Natural Language Processing task at many lev- work presented an approach to phrase-level sentiment els of granularity (Agarwal, Xie, Vovsha, Rambow, analysis, which determines whether an expression is & Passonneau, 2011). The first form of sentiment neutral or polar at first and then disambiguates the analysis is on document level. In the work of Turney polarity of the polar expressions (Wilson, Wiebe, & (Turney, 2002), an unsupervised learning approach of Hoffmann, 2005). The usage of tagging the phrases classifying reviews into recommended (positive) or not in a document, instead of words, considered about recommended (negative) was introduced. There was the context of a PoS element and contributed to the also a PoS (part-of-speech) tagger used to identify the context understanding of rule-based classification in phrases with adjectives or adverbs. However, there is their work. 2 In the work of Agarwal et al, they extended Word- Net's functionality to automatically score the vast majority of words in the input, avoiding the need for manual labeling (Agarwal, Biadsy, & Mckeown, 2009). N-grams of constituents are extracted from the sentences in a document based on the evaluation performed above, and the consideration of context is also explicitly mentioned. 3 Data Collection The dataset used in this work is from Stanford Arti- ficial Intelligence Laboratory. This is a dataset for binary sentiment classification containing substan- tially more data than previous benchmark datasets of IMDb movie reviews. They have provided a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. (Mass, 2011) (Maas et al., 2011) In my work, I have randomly picked 1,000 instances among these documents for error analysis and param- eter tuning. The raw dataset provided on their website is di- vided into two directories: train and test, which serve as training data and testing data correspondingly. In each directory, there are another two directories pos and neg, to divide the data through different labels. In each of these folders, there are multiple TXT files containing the content of movie review, with each file containing one document. These raw data is collected from IMDb website. When people are watching movies, there are a lot of factors to consider, such as the director, the actors, and the movie's budget. Most of us base our decision off of a review, a short trailer, or just by checking Figure 2: Levels of Classifiers in Opinion Mining the movie's rating (Olteanu, 2017). IMDb is now an important source for publishing professional movie reviews, on which the users can rate the movie along with providing their own feelings about whether a movie is good or not. I want to predict the sentiment (either positive or negative) in these instances. Almost no movie re- views are neutral, and there would be some kind of mood that the author of the review would like to ex- press about. In my project my goal is to dig into these text data and find any features that are repre- 3 sentative for judging whether the review is positive and postimpressionist artists. All the or negative. Gershwin songs are beautifully staged, Some of the data from this dataset is presented but the most memorable are It´sVery Clear below. Caron and Kelly on the banks of the Seine and I Got Rhythm the kids of Paris joining Gene Kelly in Une Chanson Americaine. If Hollywood is one of the best and the you love Paris, see this movie.