<<

Sentiment Analysis for IMDb Movie Review

Ang (Carl) Li December 2019

Abstract their attributes (Liu, 2012). In applications such as recommender systems, “what other people think” has Sentiment analysis has long been a problem for busi- always been an important piece of information dur- ness, marketing and management areas for more ing the decision process (Pang, Lee, et al., 2008), like value earned in the decision process. Previous recommendation of clothes in Amazon based on the work about sentiment analysis has been focusing on sentiment analysis of user reviews. document-level, sentence-level and word-level senti- Application of sentiment analysis has been revealed ment extraction, with both supervised and unsuper- in multiple areas. In social media such as Facebook vised approaches. In this paper, I’m introducing and Snapchat, there are goldmines of consumer sto- classification model for sentiment analysis with con- ries and opinion data (Lexalytics, 2019), which can text information participated in the feature space. be used for advertising, marketing, recommending The dataset was captured from IMDb movie re- new friend relationship, etc. But social posts are views, in which I sampled 1,000 instances from the full of complex abbreviations, acronyms, and emoti- huge dataset and split it by the 20%-70%-10% ratio cons, which are non-trivial problems for the feature for development, cross-validation and final test sets. space in machine learning models. The sheer volume Through multiple error analysis, including stretchy is a problem, too. Hopefully, a successful model can patterns, character N-grams and elimination of stop save the human beings some valuable hours parsing words, and tuning procedure on ridge parameter, the mountains of social data by hand (Lexalytics, 2019). performance on final test set hit 84% of percentage Such things can also be applied to business intelli- correctness and 0.6806 in kappa statistics, revealing gence models. For example, the sentiment analysis marginal improvement to the baseline Logistic Re- statistics can be used to estimate the retention rate gression model. Some exploration of data and dis- of customers in a new product, adjusting the cur- cussion about this, for error analysis and tuning, is rent marketing situation and trying to satisfy the also included through the work. Future work of this customers in a better way (Gupta, 2018). Such a model may be focusing on some new approaches like monitoring approach has enabled the companies to deep learning, and introduction of embedded context adapt their business plan in real time, which can lead information in the feature space. to possible reduction of cost on marketing. For the area of analysis of movie review, sen- timent analysis means finding the mood of the 1 Introduction public about how do they judge a specific movie (PythonforEngineers, 2019). For details, the docu- Sentiment analysis, also called opinion mining, is the ments, from the user reviews, are classified based on field of study that analyzes people’s opinions, senti- the mood they are expressing. Some sentiment anal- ments, evaluations, appraisals, attitudes, and emo- ysis tasks do a binary classification like positive and tions towards entities such as products, services, or- negative, and others may do multi-level classification ganizations, individuals, issues, events, topics, and like positive, somehow positive, neutral, somehow neg-

1 ative and negative. The actual number of classes de- pends on the aims of specific tasks. Previous papers have talked about the traditional approaches for sentiment analysis on movie review datasets. In this paper, I am trying to extend the feature space of movie review dataset, which is rarely seen in previous work, to achieve an increase of clas- sification performance. Parameter tuning is also in- cluded in this process to achieve a better perfor- mance. This paper is organized by nine parts: Introduc- Figure 1: Extraction-based Approach tion, which is this part, introduces the background about sentiment analysis problem on movie reviews. In Related Work, I will cover the typical approaches lack of making use of the original labels of training in previous papers about sentiment analysis and its data in an unsupervised model. In the work of Pang application. For the following Data Collection, I will and Lee (Pang & Lee, 2004), the classification task is introduce the data source and baseline feature space between two labels thumb up and thumb down, which of the IMDb movie review dataset I am using. Data is similar to positive and negative, through which the Preparation part will be talking about the transfor- algorithm identifies the subjective sentences in a doc- mation of data from its raw format into the ARFF ument and extract them for determination of thumb files I am using in WEKA and the CSV file I am using up or thumb down. Such approach is shown in Figure in LightSIDE. In Baseline Performance, a baseline 1. approach will be shown and all the following work Another step for sentiment analysis approach is will be improvement on the baseline. The error anal- based on sentence level is from Hu and Liu’s work ysis will be shown in Error Analysis, showing the ap- (Hu & Liu, 2004). Instead of focusing on documents, proaches that make sense for improvement of per- the concentration on opinions is based on word-level, formance. In Tuning, I will be using parameter tun- in which they try to extract the words as representa- ing to improve the performance of baseline algorithm. tion of opinions shown in a document. In Kim and The result of improvement on the final test dataset Hovy’s work (Kim & Hovy, 2004), a word classifi- will be shown in Final Result. In Discussion, I will cation model is used to calculate the polarity of all summarize the approaches I made in the project and sentiment-bearing words. The calculation results are introduce any improvement points for the future. combined for evaluation of sentiment for a whole sen- tence. The details of different levels of architecture are shown in Figure 2. 2 Related Work In recent years, some of the research is focusing on the phrase-level of sentiment analysis, which is con- For a long time, sentiment analysis has been handled sidered as a combination of the words. Wilson et al’s as a Natural Language Processing task at many lev- work presented an approach to phrase-level sentiment els of granularity (Agarwal, Xie, Vovsha, Rambow, analysis, which determines whether an expression is & Passonneau, 2011). The first form of sentiment neutral or polar at first and then disambiguates the analysis is on document level. In the work of Turney polarity of the polar expressions (Wilson, Wiebe, & (Turney, 2002), an unsupervised learning approach of Hoffmann, 2005). The usage of tagging the phrases classifying reviews into recommended (positive) or not in a document, instead of words, considered about recommended (negative) was introduced. There was the context of a PoS element and contributed to the also a PoS (part-of-speech) tagger used to identify the context understanding of rule-based classification in phrases with adjectives or adverbs. However, there is their work.

2 In the work of Agarwal et al, they extended Word- Net’s functionality to automatically score the vast majority of words in the input, avoiding the need for manual labeling (Agarwal, Biadsy, & Mckeown, 2009). N-grams of constituents are extracted from the sentences in a document based on the evaluation performed above, and the consideration of context is also explicitly mentioned.

3 Data Collection

The dataset used in this work is from Stanford Arti- ficial Intelligence Laboratory. This is a dataset for binary sentiment classification containing substan- tially more data than previous benchmark datasets of IMDb movie reviews. They have provided a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. (Mass, 2011) (Maas et al., 2011) In my work, I have randomly picked 1,000 instances among these documents for error analysis and param- eter tuning. The raw dataset provided on their website is di- vided into two directories: train and test, which serve as training data and testing data correspondingly. In each directory, there are another two directories pos and neg, to divide the data through different labels. In each of these folders, there are multiple TXT files containing the content of movie review, with each file containing one document. These raw data is collected from IMDb website. When people are watching movies, there are a lot of factors to consider, such as the director, the actors, and the movie’s budget. Most of us base our decision off of a review, a short trailer, or just by checking Figure 2: Levels of Classifiers in Opinion Mining the movie’s rating (Olteanu, 2017). IMDb is now an important source for publishing professional movie reviews, on which the users can rate the movie along with providing their own feelings about whether a movie is good or not. I want to predict the sentiment (either positive or negative) in these instances. Almost no movie re- views are neutral, and there would be some kind of mood that the author of the review would like to ex- press about. In my project my goal is to dig into these text data and find any features that are repre-

3 sentative for judging whether the review is positive and postimpressionist artists. All the or negative. Gershwin songs are beautifully staged, Some of the data from this dataset is presented but the most memorable are It´sVery Clear below. Caron and Kelly on the banks of the Seine and I Got Rhythm the kids of Paris joining Gene Kelly in Une Chanson Americaine. If Hollywood is one of the best and the you love Paris, see this movie. If you’ve beautiful things that had occurred in my never been to Paris in your life, see it. life. I admire and am very much fascinated But see it ! (positive) by the way Hollywood generates ideas and implement them. It makes me wonder about the scope of human brain. I saw Flatliners It looks to me as if the creators of The a long time back but the story, direction, Class Of Nuke ’Em High wanted it to become cast and of all acting is still fresh in a , but it ends up as any old high my mind. The story begins with our lead school Bmovie, only tackier. The satire actor Sutherland saying during sunrise feels totally overshadowed by the extremely what a beautiful day to die. For all of steretyped characters. It’s very unfunny, us, It’s a story which shows emotions that even for a turkey. (negative) are sometimes withheld in our mind during our entire life. Never able to understand few things in life. It shows us to get It looked cool from the movie sleeve, but motivated and to improve our quality of after five minutes we weren’t sure if it life. Anyway I suggest it to all that was a homosexual documentary of west side watch it once. (positive) story without any female interest. The film quality was poor, and there was hardly enough gang fighting action to sustain even I saw An American in Paris on its first the drunkest person’s interest for long release when I was still at school and enough to watch the entire film. May god fell in love with it straightaway. I have mercy on the souls of both the actors went back to see it again the next day and the filmmakers responsible for what I and have lost count of the number of can only describe as my new one and only times I have seen it since, both in the reason why I never will want to see or cinema and on TV. It makes fantastic use trust an Australian made film again. I of some of the best music and songs by the have to write more so I will again say that greatest popular composer of the twentieth the actors were so bad that I’m positive century George Gershwin and features the I could make a better movie with fifteen greatest male Gene Kelly and female Leslie dollars and a box of Trojans. Please don’t Caron dancers in Hollywood history. The see this movie for your own sake. (negative) supporting cast of Oscar Levant as quirky as ever, Georges Guetary why didn’t he make more movies ? and Nina Foch brilliant in 4 Data Preparation an unsympathetic role are at the top of their form. The closing ballet, superbly These data comes in TXT format for each document, choreographed to the title music, makes which means it cannot be directly put into WEKA excellent use of the sights and sounds of (accepting ARFF format) or LightSIDE (accepting Paris and of the images of impressionist CSV format). It needs some transformation to make

4 the data readable by these tools. The transformation steps are shown in Figure 3. For the transformation, I used Python and the two libraries numpy and pandas to load the TXT files through iteration in the directories mentioned above. All the documents read from the TXT files are ag- gregated into a pandas.DataFrame, which is a good tool to build CSV from. In the DataFrame, I have randomly picked 1,000 instances from all the 25,000 documents given by the dataset and divided these data into development set (20%), cross-validation set (70%) and final test set (10%). Through this pro- cess, some non-ASCII characters are also removed from the data. These datasets are exported in CSV format from Python. Later, these datasets are loaded into WEKA to export the ARFF format file. Additionally, the output CSV from pandas and numpy modules are considered as nominal values, in- stead of string values, originally in WEKA. To solve this problem, I used NominaltoString filter in the preprocess tab of WEKA Explorer to transform nom- inal data into string data in ARFF file. However, string data cannot be directly input in WEKA for the classification models. I used StringToWordVector to transform string data into vectors (which means, each document is represented by a vector).

5 Baseline Performance

To determine the baseline algorithm for the senti- ment analysis task, I did some experiments on de- velopment set to evaluate some algorithms with their default parameters. The results are shown in Ta- ble 1, in which J48 is an implementation of decision tree algorithm in LightSIDE, and SMO is an imple- mentation for support vector machine (SVM). In the evaluation, I was using uni-grams extracted from the text data (without punctuation) as the feature space and all the algorithms are evaluated based on their default settings in LightSIDE. To evaluate these al- gorithms, I used 10-fold (random) cross-validation on development set to obtain percentage correct value and kappa statistics value. Figure 3: Flow Diagram for Data Preparation Considering both the data on the table, the fea- tures and error analysis ideas, I am choosing Logistic

5 Algorithm Percentage Correct Kappa Stats Classified P Classified N SMO 0.74 0.4769 Actual Positive 261 98 Naive Bayes 0.735 0.471 Actual Negative 84 257 J48 0.635 0.2731 Logistic 0.775 0.5476 Table 4: Confusion Matrix for Baseline in CV Set

Table 1: Experiment Results potential space for improvement of model perfor- mance. After going through some of the features, Parameter Value I have found that the current Logistic model does maxlts −1 not work good on understanding the context of text Batch Size 100 features. To improve this, we may replace the uni- numDecimalPlaces 4 gram bag-of-words features by bi-grams, tri-grams ridge 10−8 and stretchy patterns, or even introduce word vec- useConjugateGradientDescent False tors from Continuous Bag-of-Words (CBOW) model Table 2: Parameter Settings for Baseline Algorithm or skip-gram model for word representations. An- other solution for understanding the context would be using some kinds of models that extract features Regression for future improvement in my work. from blocks of data, like convolutional neural net- After the comparison done on LightSIDE, I ex- works, or extract sequential information, like recur- tracted the same baseline features from the cross- rent neural networks. These approaches are worth a validation set and evaluated the performance of Lo- try for improvement of performance in text classifi- gistic Regression in WEKA. The default parameter of cation task. Logistic is shown in Table 2. The detailed statistics are shown in Table 3, and the confusion matrix for baseline algorithm is shown in Table 4. It needs to 6 Error Analysis be mentioned that some performance indicator val- ues are slightly different in LightSIDE: the percent- In assignment 6, 7 and 8 among the past few weeks, age correctness is 78.14%, and kappa value is 0.5627. some of the approaches like introduction of stretchy In Table 4, P indicates positive and N indicates nega- patterns, character N-grams and skipping all-stop- tive. This feature space and algorithm settings would word N-grams have been applied for improvement be the baseline for all the improvement in my work of model performance. Some of these approaches later on. brought increase to the performance, while some did In this current stage, I think there should be some not make significant change to the result.

6.1 Stretchy Patterns Statistics Value Correctly Classified Instances 518 (74%) For the baseline algorithm, I’ve used Logistic Re- Incorrectly Classified Instances 182 (26%) gression for binary label classification, whose fea- Kappa statistic 0.4802 ture space only includes uni-gram features of text. Mean absolute error 0.2583 And, I have used this feature space to train a Lo- Root mean squared error 0.4964 gistic Regression model using L2 regularization. The Relative absolute error 51.7106% cross-validation correctness percentage of this model Root relative squared error 99.307% is 0.7814, and the kappa value is 0.5627. Based on Total Number of Instances 700 this value, I’m trying to do error analysis on the de- velopment set and identify some features from hori- Table 3: Statistics for Logistic on CV set in WEKA zontal and vertical difference.

6 Feature Frequency H V Weight Feature F H V W movie 10 0.2018 0.0365 0.2004 n’t 13 0.3766 0.1159 0.246 out 7 0.075 0.0333 0.0499 this [GAP] was 8 0.4103 0.3315 0.087 the [GAP] of the 10 0.2788 0.3553 0.151 Table 5: Problematic Features in Error Analysis 1 Table 6: Problematic Features in Error Analysis 2

I have sorted the features by horizontal absolute difference in descendant order. The first uni-gram make much sense in their context. Just like the “n’t”, I identified is “movie”, which has a relatively high “this [GAP] was” and “the [GAP] of the”, which are frequency (10) and a large horizontal absolute dif- correspondingly shown in Table 6. All of these fea- ference value (0.2018). The, I sorted the features tures have high frequency of appearance in the de- by vertical absolute difference in descendant order, velopment set (13, 8, and 10, correspondingly). A and identified the feature “out” which has a relatively common observation of these 3 features is that all of high frequency (7) and a small vertical absolute value them have relatively large horizontal absolute differ- (0.0333). These two features share a same specifica- ence values (0.3766, 0.4103, and 0.2788) among the tion in my project context of binary classification on list of features in “Explore Results” tab. Note that movie sentiment analysis: they do not make sense by in Table 6, “F” means frequency, “H” means horizon- their own and have to show the meaning depending tal absolute difference, “V” means vertical absolute on the words before them and after them. The de- difference and “W” means feature weight. tails about these two problematic features are shown From this observation, I have enabled “Skip Stop- in Table 5, in which “H” means horizontal absolute words in N-Grams” in the feature extraction pro- difference, “V” means vertical absolute difference and cess, which is shown in Figure 6. In cross validation “Weight” means feature weight. Therefore, I planned process, however, it did not work for increasing the to introduce some new features regarding the context model performance. The newest correctness percent- of uni-grams. age for cross validation is 0.8114. I’m adding some new stretchy patterns in Light- SIDE, with pattern lengths between 2 to 4 and gaps between 1 to 2. The stretchy patterns would be 6.3 Character N-grams able to extract some context information to the fea- In the last experiment, I tried to eliminate the stop ture space. The stretchy pattern features work good word features to remove some noise in the feature in the new model. For the improved performance space, but it did not make sense to the model and in cross-validation set, the accuracy is 0.8214, and even decreased the performance. Before that, in the kappa value is 0.6424. The improvement is significant first error analysis where I introduced stretchy pat- (p = 0.01, t = 2.566, as shown in model comparison terns, the improved model achieved an increase of of LightSIDE). correctness of classification on development set. The new percentage of correctness was 0.8214. This time 6.2 Elimination of Stop Words I’m trying to do error analysis based on the higher performance model built from section 6.1. In the error analysis above, I’ve introduced stretchy From the “Explore Results” view, I have noticed patterns in addition to the uni-gram text features and some words that share the same stem but were iden- achieved an increase of correctness of classification on tified as different features, such as movie and movies. development set, which was from 0.7814 to 0.8214. Feature movie has a horizontal absolute difference of From the “Explore Results” view (for the improved 0.0994, and vertical absolute difference of 0.0492; and model) of LightSIDE, I have noticed a lot of the uni- for feature movies, it has a horizontal absolute dif- grams, bi-grams and tri-grams with words that do not ference of 0.1442, which is high among the feature

7 Feature Frequency H V Weight movies 6 0.1442 0.139 -0.0173 movie 10 0.0994 0.0492 0.1335

Table 7: Problematic Features in Error Analysis 3 list, and vertical absolute difference of 0.139. These things are listed in Table 7. The two features together has a frequency of 16, which is not a small number in the feature list. Here I try introducing some character N-grams in the feature space to extract these features in which some words share the same stem. In the evaluation Figure 4: Percentage Correctness Changes from Dif- process when I use “Build Models” in Figure 8, there ferent Ridge Parameter is improvement in the new model, which has the per- centage of correctness of 0.8229, but the improvement is not significant. From the observation of 3 processes of error analysis above, I choose to take the feature space generated in section 6.1 for next step of tuning.

7 Tuning

To optimize the performance of Logistic Regression based on the current feature space with stretchy pat- terns, I tuned the ridge parameter in logistic regres- sion since it is the only option of parameter to be tuned in this algorithm (on WEKA). With the de- fault setting of ridge parameter 10−8, the 5-fold cross validation on cross-validation set achieved 78.71% percentage correctness and 0.5748 kappa statistic in Figure 5: Kappa Changes from Different Ridge Pa- WEKA, which is slightly lower than those in Light- rameter SIDE (percentage correctness 80.86%). I used CVParameterSelection meta classifier in WEKA to tune this ridge parameter. After exporting the feature spoace built in LightSIDE as ARFF file and importing it into WEKA, I used R (ridge) values Besides this insignificant result, I tried to deter- between 1 and 10−10, in 5 steps. In the final result of mine the performance of the logistic regression model this tuning process after 3 stages, ridge parameter of based on multiple settings of ridge value. This ex- 10−10 achieved the best performance with percentage tra experiment is performed to determine the effect correctness 78.89% and kappa statistics 0.5812. After of ridge parameter on the model performance. As performing T-Test in Excel, I found there is no sig- shown in Figure 4 and Figure 5, there is only minor nificant change after this tuning process. Therefore, mitigation of model performance, measured in per- in the final test, I would be using the default setting centage correctness and kappa statistic, among dif- (ridge value = 10−8) in this model. ferent parameter values.

8 8 Final Result introduction of stretchy patterns in feature space. The normal settings of feature space, either ex- To achieve the final result, I have trained the model tracted by uni-grams only in LightSIDE or by based on cross-validation set and tested on the final StringToWordVector filter in WEKA. From my pre- test set (10% of the whole sampled dataset). Using vious experience in machine learning engineer intern- the configuration of feature space (stretchy patterns ship, the solution is either capturing context infor- with pattern lengths between 2 to 4 and gaps be- mation, like bi-grams, tri-grams or more in the fea- tween 1 to 2), and the default configuration of lo- ture space (maybe with gaps), or embedding the con- gistic regression (ridge value = 10−8), the new al- text information in the vector presentation of words gorithm achieved 84% of percentage correctness and or documents, like word2vec (through skip-gram or 0.6806 in kappa statistics. Comparing to the baseline CBOW), fasttext or doc2vec. According to the algorithm performance, 76% percentage correctness content of this course, I chose the former solution and 0.5248 in kappa statistics, the new model has above and achieved some improvement from the new achieved marginal improvement (p = 0.059, t = 1.91, feature space. as shown in model comparison in LightSIDE). The limitation of this work would be the “some- how” failure of parameter tuning. There are not many options for tuning the Logistic Regression in 9 Discussion WEKA. The tuning of ridge value recalled me about my previous experience in tuning the learning rate In this paper, I have discussed the improvement for of multi-layer perceptrons, which was also a hard pa- classification task of sentiment analysis using Logistic rameter to deal with. And, multi-layer perceptron is Regression. Throughout my work, I have introduced actually an advanced, multi-layer version of Logistic the origin of dataset, data preparation steps, fea- Regression model. In my future work, the choice of ture space improvement, error analysis, and tuning algorithm and the selection of parameter for tuning through CVParameterSelection. The performance would be taken more consideration. on final test set hit 0.6806 in kappa statistics, mak- In the future, if there were opportunity to con- ing marginal improvement to the original model. tinue working on this project, some new approaches In the previous work about sentiment analysis in- applied to feature space, like word embedding, would troduced in section 2, the focus of prior research on be introduced for possible improvement of model per- sentiment analysis focused on multiple aspects of fea- formance. Besides, I would be trying some more ap- ture space, from document-level to word-level. In my proaches like deep learning for better baseline perfor- work, I tried to improve the performance of Logis- mance and another good start for tuning. tic Regression model through proper feature space according to the dataset itself, and the tuning of pa- rameter. In the data preparation part, I used Python instead References of using WEKA filters since I have some program- ming background and programming is a more effi- Agarwal, A., Biadsy, F., & Mckeown, K. R. (2009). cient approach for me. Through the usage of numpy Contextual phrase-level polarity analysis using and pandas, it is not hard to randomly select the lexical affect scoring and syntactic n-grams. In data included in development, cross-validation, and Proceedings of the 12th conference of the eu- final test set, meanwhile eliminating some non-ASCII ropean chapter of the association for computa- characters in these data frames and exporting them tional linguistics (pp. 24–32). into CSV file, which are readable by WEKA and Agarwal, A., Xie, B., Vovsha, I., Rambow, O., & LightSIDE. Passonneau, R. (2011). Sentiment analysis of The most significant part of this work is the twitter data. In Proceedings of the workshop on

9 language in social media (lsm 2011) (pp. 30– sentiment analysis. Foundations and Trends R 38). in Information Retrieval, 2 (1–2), 1–135. Gupta, S. (2018, March). Applications of PythonforEngineers. (2019, November). sentiment analysis in business. To- Build a sentiment analysis app with wards Data Science. Retrieved from movie reviews. Web. Retrieved from https://towardsdatascience.com/applications-of-sentiment-analysis-in-business-b7e660e3de69https://www.pythonforengineers.com/build-a-sentiment-analysis-app-with-movie-reviews/

Hu, M., & Liu, B. (2004). Mining and summarizing Turney, P. D. (2002). Thumbs up or thumbs down?: customer reviews. In Proceedings of the tenth semantic orientation applied to unsupervised acm sigkdd international conference on knowl- classification of reviews. In Proceedings of the edge discovery and data mining (pp. 168–177). 40th annual meeting on association for compu- Kim, S.-M., & Hovy, E. (2004). Determining the sen- tational linguistics (pp. 417–424). timent of opinions. In Proceedings of the 20th Wilson, T., Wiebe, J., & Hoffmann, P. (2005). Rec- international conference on computational lin- ognizing contextual polarity in phrase-level sen- guistics (p. 1367). timent analysis. In Proceedings of human lan- Lexalytics. (2019, December). Top ap- guage technology conference and conference on plications of sentiment analysis & empirical methods in natural language process- text analytics. Retrieved from ing. https://www.lexalytics.com/applications

Liu, B. (2012). Sentiment analysis and opinion min- ing. Synthesis lectures on human language tech- nologies, 5 (1), 1–167. Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011, June). Learning word vectors for sentiment analysis. In Pro- ceedings of the 49th annual meeting of the as- sociation for computational linguistics: Human language technologies (pp. 142–150). Portland, Oregon, USA: Association for Computational Linguistics. Mass, A. L. (2011, May). Large movie review dataset. Retrieved from http://ai.stanford.edu/ amaas/data/sentiment/

Olteanu, A. (2017, April). Whose ratings should you trust? , , metacritic, or fandango? freeCodeCamp. Retrieved from https://www.freecodecamp.org/news/whose-reviews-should-you-trust-imdb-rotten-tomatoes-metacritic-or-fandango-7d1010c6cf19/

Pang, B., & Lee, L. (2004). A sentimental education: Sentiment analysis using subjectivity summa- rization based on minimum cuts. In Proceed- ings of the 42nd annual meeting on association for computational linguistics (p. 271). Pang, B., Lee, L., et al. (2008). Opinion mining and

10