A Comparison of Domain-Based Word Polarity Estimation Using Different Word Embeddings

A Comparison of Domain-based Word Polarity Estimation using different Word Embeddings

Aitor Garc´ıa-Pablos, Montse Cuadros, German Rigau Vicomtech-IK4, Vicomtech-IK4, IXA research group [email protected], [email protected], [email protected]

Abstract A key point in Sentiment Analysis is to determine the polarity of the sentiment implied by a certain word or expression. In basic Sentiment Analysis systems this sentiment polarity of the words is accounted and weighted in different ways to provide a degree of positivity/negativity. Currently words are also modelled as continuous dense vectors, known as word embeddings, which seem to encode interesting semantic knowledge. With regard to Sentiment Analysis, word embeddings are used as features to more complex supervised classiﬁcation systems to obtain sentiment classiﬁers. In this paper we compare a set of existing sentiment lexicons and sentiment lexicon generation techniques. We also show a simple but effective technique to calculate a word polarity value for each word in a domain using existing continuous word embeddings generation methods. Further, we also show that word embeddings calculated on in-domain corpus capture the polarity better than the ones calculated on general-domain corpus.

Keywords: sentiment lexicon, sentiment analysis, word embedding

1. Introduction lated on in-domain corpus improve the ones calculated on general-domain corpus and analyse pros and cons of each A key point in Sentiment Analysis is to determine the po- compared method. The paper is structured as follows. Sec- larity of the sentiment implied by a certain word or ex- tion 2. reviews several works related to the generation of pression (Taboada et al., 2011). In basic Sentiment Anal- sentiment lexicons, providing the context for the rest of the ysis systems this sentiment polarity of the words is ac- paper. Section 3. describes the lexicons and methods that counted and weighted in different ways to provide a degree will be used to make the comparison, focusing on the ones of positivity/negativity of, for example, a customer review. using continuous word representations. Section 4. presents In more sophisticated systems, word polarity is employed the datasets used to generate some of the lexicons. Section as an additional feature for machine learning algorithms. 5. describes the experiments to compare the different ap- This polarity value can be a categorical value (e.g. pos- proaches and discusses them. Finally the last section shows itive/neutral/negative) or a real value within a range (e.g. the conclusions. from -1.0 to +1.0), and can be plugged in supervised classification algorithms together with other lexical and seman- 2. Related Work tic features to help discriminating the overall polarity of an Sentiment analysis refers to the use of NLP techniques to expression or a sentence. Currently words are also mod- identify and extract subjective information in digital texts elled as continuous dense vectors, known as word embed- like customer reviews about products or services. Due to dings, which seem to encode interesting semantic knowl- the grown of the social media, and specialized websites edge. The word vectors are usually computed using very that allow users posting comments and opinions, Sentiment big corpora of texts, like the English Wikipedia. One of Analysis has been a very prolific research area during the the best known systems to obtain a dense continuous rep- last decade (Pang and Lee, 2008; Zhang and Liu, 2014). resentation of words is Word2Vec (Mikolov et al., 2013c). A key point in Sentiment Analysis is to determine the polar- But Word2Vec is not the only one, and in fact there are ity of the sentiment implied by a certain word or expression already a lot of variants and many researchers working on (Taboada et al., 2011). Usually this polarity is also known different kinds of word embeddings (Le and Mikolov, 2014; as Semantic Orientation (SO). SO indicates whether a word Iacobacci et al., 2015; Ji et al., 2015; Hill et al., 2014; or an expression states a positive or a negative sentiment, Schwartz et al., 2014). With regard to Sentiment Analy- and can be a continuous value in a range from very positive sis, word embeddings are used as features to more complex to very negative, or a categorical value (like the common 5- supervised classification systems to obtain very precise sen- star rating used to rate products). Further, the SO of a word timent classifiers (Tang et al., 2014a; Socher et al., 2013). is a useful feature to be used within a more complex Sen- In this paper we compare a set of existing static sentiment timent Analysis system like machine learning algorithms lexicons and dynamic sentiment lexicon generation tech- (Lin et al., 2009; Jaggi et al., 2014; Tang et al., 2014a). niques. We also show a simple but competitive technique A collection of words and their respective SO is known as to calculate a word polarity value for each word in a do- sentiment lexicon. Sentiment lexicons can be constructed main using continuous word embeddings. Our objective is manually, by human experts that estimate the correspond- to see if word embeddings calculated on an in-domain cor- ing SO value to each word of interest. Obviously, this pus can be directly used to obtain a polarity measure of the approach is usually too time consuming for obtaining a domain vocabulary with no additional supervision. Further, good coverage and difficult to maintain when the vocabu- we want to see to which extent word embeddings calcu- lary evolves or a new language or domain must be analyzed.

54 Therefore it is necessary to devise a method to automate the et al., 2010; Huang et al., 2012; Mikolov et al., 2013c). process as much as possible. Word embeddings show interesting semantic properties to Some systems employ existing lexical resources like Word- find related concepts, word analogies, or to use them as fea- Net (Fellbaum, 1998) to bootstrap a list of positive and neg- tures to conventional machine learning algorithms (Socher ative words via different methods. In (Esuli and Sebastiani, et al., 2013; Tang et al., 2014b; Pavlopoulos and Androut- 2006) the authors employ the glosses that accompany each sopoulos, 2014). Word embeddings are also explored in WordNet synset1 to perform a semi-supervised synset clas- tasks such as deriving adjectival scales (Kim, 2013). sification. The result consists of three scores per synset: positivity, negativity and objectivity. In (Baccianella et al., 3. Lexicons and methods 2010) version 3.0 of SentiWordNet is introduced with im- Our aim is to compare different existing sentiment lexicons provements like a random walk approach in the WordNet and methods to find out if continuous word embeddings can graph to calculate the SO of the synsets. In (Agerri and be used to easily compute accurate sentiment polarity over Garcia, 2009) another system is introduced, Q-WordNet, the words of a domain, and under which conditions. The which expands the polarities of the WordNet synsets using experiments are carried on two specific domains, in partic- lexical relations like synonymy. In (Guerini et al., 2013) the ular restaurants and laptops reviews. authors propose and compare different approaches based SentiWordNet to improve the polarity determination of the 3.1. General lexicons synsets. The General Inquirer (GI) (Stone et al., 1966) is a very Other authors try different bootstrapping approaches and well-known manually crafted lexicon that includes the po- evaluate them on WordNet of different languages (Maks et larity of many English words. GI contains about 2000 pos- al., 2014; Vicente et al., 2014). A problem with the ap- itive and negative words. It has been used in many different proaches based on resources like WordNet is that they rely research works over the past years. on the availability and quality of those resources for a new On the other hand we have also used the Bing Liu’s sen- languages. Being a general resource, WordNet also fails timent lexicon (Hu and Liu, 2004). According to the web to capture domain dependent semantic orientations. Like- page 2 it has been compiled and incremented over many wise other approaches using common dictionaries do not years. It contains around 6800 words with an assigned cat- take into account the shifts between domains (Ramos and egorical polarity (positive or negative). Marques, 2005). Other methods calculate the SO of the words directly from 3.2. Wordnet based lexicons text. In (Hatzivassiloglou and McKeown, 1997) the authors SentiWordnet assigns scores to each WordNet synset3 model the corpus as a graph of adjectives joined by con- (Esuli and Sebastiani, 2006). SentiWordNet polarity con- junctions. Then, they generate partitions on the graph based sists of three scores per synset: positivity, negativity and on some intuitions like that two adjectives joined by ”and” objectivity. In (Baccianella et al., 2010) version 3.0 of Sen- will tend to share the same orientation while two adjectives tiWordNet is introduced with improvements like a random joined by ”but” will have opposite orientations. walk approach in the graph of WordNet. We have also On the other hand, in (Turney, 2002) the SO is obtained cal- used the Q-WordNet as Personalized PageRanking Vector culating the Pointwise Mutual Information (PMI) between (QWN-PPV) which propagates and ranks polarity values on each word and a very positive word (like ”excellent”) and a the WordNet graph starting from few seed words (Vicente very negative word (like ”poor”) in a corpus. The result is et al., 2014). a continuous numeric value between -1 and +1. These ideas of bootstrapping SO from a corpus have been 3.3. PMI based lexicons further explored and sophisticated in more recent works Following the work at (Turney, 2002), we also have derived (Popescu and Etzioni, 2005; Brody and Elhadad, 2010; Qiu some polarity lexicons from a domain corpus using Point- et al., 2011) wise Mutual Information (PMI). In few words, PMI is used 2.1. Continuous word representations as a measure of relatedness between two events, in this case the co-occurrence of words with known positive contexts. Continuous word representations (also vector representa- In the original Turney’s work the value of co-occurrence tions or word embeddings) represent each word by a n- was measured counting hits in a web search (the extinct Al- dimensional vector. Usually, these vector encapsulates tavista) between words and the seed word ”excellent” (for some semantic information derived from the corpus used positives) and the seed word ”poor”. and the process applied to derive the vector. One of the best known techniques for deriving vector representations SO(w) = PMI(w, P OS) − PMI(w, NEG) (1) of words and documents are Latent Semantic Indexing (Du- mais et al., 1995) and Latent Semantic Analysis (Dumais, p(w1, w2) 2004). PMI(w1, w2) = log (2) Currently it is becoming very common in the literature to p(w1) × p(w2) employ Neural Networks and the so-called Deep Learning 2 to compute word embeddings (Bengio et al., 2003; Turian https://www.cs.uic.edu/\˜liub/FBS/ sentiment-analysis.html\#lexicon 3 1A WordNet synset in a set of synonym words that denote the A WordNet synset in a set of synonym words that denote the same concept same concept 55 Firstly, we have borrowed the lexicon generated in (Kir- Laptops dataset computed similarities itchenko et al., 2014) (named NRC CANADA in the ex- excellent horrible slow periment tables), which was generated computing the PMI between each word and positive reviews(4 or 5 stars in a outstanding terrible counterintuitive 5-star rating) and negative reviews (1 or 2 stars), for both exceptional deplorable painfully restaurants and laptops review datasets. Because it uses the awesome awful unstable user ratings, this approach is supervised. incredible abysmal sluggish As a counterpart we have calculated another PMI based excelent poor choppy lexicon, in which we employ the co-occurrence of words amazing horrid fast within a five word window with the word excellent (analo- excellant lousy buggy gously with the word terrible for negative) to calculate the fantastic whining slows PMI score. This is potentially less accurate but requires no terrific horrendous frustratingly supervised information apart from the two seed words. superb unprofessional flaky

3.4. Word embedding lexicons Table 2: Most similar words in the word embedding space computed on laptops reviews dataset, according to the co- We have applied the popular Word2Vec (Mikolov et al., sine similarity, for words excellent, horrible and slow 2013a) and the Stanford Glove system (Pennington and Manning, ) to calculate word embeddings. We have computed three models for each system: one in a restau- In table 3.4. and table 3.4. it can be observed how the rant reviews dataset, another in a laptop reviews dataset word embedding computed for restaurant and laptop do- and a third one in a much bigger general domain dataset main seem to capture polarity quite accurately just by us- (consisting on the first billion characters from the English ing word similarity. This is because the employed datasets 4 Wikipedia ). are customer reviews of each domain, and the kind of con- Notice that the employed general domain dataset is pretty tent present in customer reviews helps modelling the mean- much bigger than the domain-based datasets. General do- ing and polarity of the words (adjectives in this case). Ta- main dataset is a 700MB raw text file after cleaning it, bles show top similarities according to the cosine distance while restaurants and laptop dataset only weight 28 and 40 between word vectors computed by each model. Words MB respectively. General domain datasets, like the whole like excellent and horrible are domain independent, and the Wikipedia data or News dataset from online newspapers, most similar words are quite equivalent for both domains. capture very well general syntactic and semantic regulari- But for the third word, slow, the differences between both ties. However, to capture in-domain word polarities smaller domains are more evident. The word slow in the context of domain focused dataset might work better (Garc´ıa-Pablos restaurants is usually employed to describe the service qual- et al., 2015). Also notice that at the time of writing this pa- ity (when judging waiters and waitresses serving speed and per, there are appearing a lot of different techniques to cal- skills), while in the context of laptops it refers to the per- culate word embeddings that could work better than plain formance of hardware and/or software. Another advantage Word2Vec(Li and Jurafsky, 2015; Rothe et al., 2016), but versus a general domain computed model is that domain- due to their recent apparition are not employed in these ex- based models will contain any domain jargon words or even periments. commonly misspelled words (as long as it appears usually enough in the corresponding dataset). A general domain Restaurant dataset computed similarities dataset is less likely to cover all the vocabulary present for excellent horrible slow any possible domain. We have used a simple formula to assign a polarity to the outstanding terrible spotty words in the vocabulary, using a single positive seed word fantastic awful inattentive and a single negative seed word. amazing sucked uncaring exceptional horrid painfully pol(w) = sim(w, P OS) − sim(w, NEG) (3) awesome poor neglectful top notch sucks lax In the equation POS is the seed positive word for the do- great atrocious slower main represented by its corresponding word vector, and superb lousy inconsistent analogously NEG is the vector representation of seed neg- incredible horrific uneven ative word. In the experiments we have used domain in- wonderful yuck iffy dependent seed words with a very clear and context- and domain-independent polarity, in particular excellent and Table 1: Most similar words in the word embedding space horrible as positive and negative seeds respectively. sim computed on restaurants reviews dataset, according to the stands for the cosine distance between word vectors. Note cosine similarity, for words excellent, horrible and slow that this simple formula provides a real number, that in a sense gives a continuous value for the polarity. The fact of obtaining a continuous value for the polarity could be an in- 4Obtained from http://mattmahoney.net/dc/ teresting property to measure the strength of the sentiment, enwik9.zip but for now we simply convert the polarity value to a binary

56 label: positive if the value is greater or equal to zero, and jectives restaurant-adjectives-test-set and laptop-adjectives- negative otherwise. This makes the comparison with the test-set respectively. The restaurant-adjectives-test-set con- other examined lexicons easier. tains 119 positive adjectives and 81 negatives adjectives, while laptops-adjectives-test-set contains 127 positives and 4. Domain corpora 73 negatives11. In order to generate the lexicons with the methods that re- On the other hand, we have used the SemEval 2015 task 12 12 quire an in-domain corpus (i.e. the PMI based one, the datasets . The first dataset contains 254 annotated reviews Word2Vec and the GloVe) we have used corpus from two about restaurants (a total of 1,315 sentences). The second different domains. dataset contains 277 annotated reviews about laptops (a to- The first corpus consists of customer reviews about restau- tal of 1,739 sentences). rants. It is a 100k review subset about restaurants obtained 5.1. Manual gold-lexicon based experiments from the Yelp dataset5 (henceforth Yelp-restaurants). We also have used a second corpus of customer reviews about laptops. This corpus contains a subset of about 100k re- RESTAURANTS 200 ADJ GOLD LEXICON views from the Amazon electronic device review dataset Name Posit. Neg. Overall 6 from the Stanford Network Analysis Project (SNAP) af- Acc. 0.935 0.944 0.939 ter selecting reviews that contain the word ”laptop” (hence- General Inquirer forth Amazon-laptops). Cover. 0.521 0.444 0.490 The corpora have been processed removing all non-content Acc. 0.935 0.979 0.952 BingLiu words (i.e. everything except adjectives, adverbs, verbs Cover. 0.647 0.580 0.620 and nouns is removed), and words have been lowercased. Acc. 0.725 0.746 0.733 For other tasks like word-analogy discovery (Mikolov et SentiWordNet al., 2013d) or marchine translation (Mikolov et al., 2013b) Cover. 0.857 0.778 0.825 every word (even those that are usually considered stop- Acc. 0.821 0.609 0.746 7 QWN-PPV words), but as our in-domain datasets are of reduced size Cover. 0.706 0.568 0.650 we remove the words that are less informative to model Acc. 0.933 0.753 0.860 the polarity, like pronouns, articles or prepositions. After NRC CANADA that, both corpora have been used to feed the target meth- Cover. 1.000 1.000 1.000 ods, obtaining their respective domain-aware results. In the Acc. 0.917 0.655 0.821 PMI WINDOW 5 case of the PMI based lexicon a score, and in the case of Cover. 0.807 0.679 0.755 Word2Vec and GloVe, a vector representation of the words Acc. 0.849 0.827 0.840 for each domain. In the case of Word2Vec we have em- W2V DOMAIN ployed the implementation contained in the Apache Spark Cover. 1.000 1.000 1.000 Mllib library8. This Word2Vec implementation is based on Acc. 0.491 0.400 0.454 W2V GENERAL the Word2Vec Skip-gram architecture, and we have let the Cover. 0.958 0.988 0.970 default hyper-parameters and configuration9. Acc. 0.866 0.802 0.840 GloVe DOMAIN 5. Experiments Cover. 1.000 1.000 1.000 We have performed two different evaluations. On the one Acc. 0.754 0.588 0.686 GloVe GENERAL hand, we have used the domain corpora (Yelp-restaurants Cover. 0.958 0.988 0.970 and Amazon-laptops) to automatically obtain a list of domain adjectives ranked by frequency. From that list we have Table 3: Resturants 200 adjs lexicon results manually selected the first 200 adjectives with context- independent positive or negative polarity for each do- On the restaurant-adjectives-test-set and laptop-adjectives- main10. Then we have manually assigned a polarity la- test-set we measure the polarity accuracy (when a lexicon bel (positive or negative) to each of the selected adjec- assigns the correct polarity) and the coverage (when a lexi- tives. From now on we will refer to these annotated ad- con contains a polarity for the requested word). Tables 3 and 4 show the results for restaurants and laptops 5http://www.yelp.com/dataset_challenge 6 respectively. In the tables the accuracy measures how many http://snap.stanford.edu/data/ word polarities have been correctly tagged from the ones web-Amazon.html present in each lexicon (i.e. out-of-vocabulary words are 7Compared to the billion words datasets employed in other works not taken as errors). The coverage measures how many 8http://spark.apache.org/mllib/ words were present in each lexicon regardless of the tagged 9Please, refer to the Apache Spark Mllib Word2Vec documen- polarity. The experiment shows that the static lexicons like tation to see which the default parameters are 10With context-independent polarity we refer to those adjec- 11Available at https://dl.dropboxusercontent. tives with unambiguous polarity not depending on the domain as- com/u/7852658/files/restaur_adjs_test.txt pect they are modifying (e.g. superb is likely to be always posi- and https://dl.dropboxusercontent.com/u/ tive, while small could be positive or negative depending on the 7852658/files/laptops_adjs_test.txt respectively context) 12http://alt.qcri.org/semeval2015/task12/

57 LAPTOPS 200 ADJ GOLD LEXICON RESTAURANTS (SEMEVAL 2015 DATASET) Name Posit. Neg. Overall Name Prec. Rec. F1 Acc. Acc. 0.965 0.971 0.967 posit. 0.783 0.937 0.853 General Inquirer GI 0.760 Cover. 0.677 0.479 0.605 neg. 0.610 0.335 0.432 Acc. 0.971 0.984 0.976 posit. 0.810 0.958 0.878 BingLiu BingLiu 0.799 Cover. 0.827 0.863 0.840 neg. 0.731 0.431 0.540 Acc. 0.795 0.833 0.809 posit. 0.790 0.896 0.840 SentiWordNet SWN 0.745 Cover. 0.921 0.904 0.915 neg. 0.539 0.394 0.455 Acc. 0.895 0.661 0.814 posit. 0.751 0.954 0.841 QWN-PPV QWN-PPV 0.733 Cover. 0.829 0.767 0.805 neg. 0.522 0.171 0.257 Acc. 0.890 0.712 0.825 posit. 0.816 0.927 0.868 NRC CANADA NRC CAN. 0.786 Cover. 1.000 1.000 1.000 neg. 0.648 0.471 0.546 Acc. 0.850 0.395 0.720 posit. 0.811 0.842 0.826 PMI WINDOW 5 PMI W 5 0.732 Cover. 0.843 0.589 0.750 neg. 0.493 0.503 0.498 Acc. 0.874 0.740 0.825 posit. 0.848 0.874 0.861 W2V DOMAIN W2V DOM 0.781 Cover. 1.000 1.000 1.000 neg. 0.582 0.605 0.593 Acc. 0.540 0.575 0.553 posit. 0.708 0.467 0.563 W2V GENERAL W2V GEN 0.457 Cover. 0.992 1.000 0.995 neg. 0.228 0.488 0.311 Acc. 0.890 0.740 0.835 posit. 0.792 0.940 0.860 GloVe DOMAIN GloVe DOM 0.770 Cover. 1.000 1.000 1.000 neg. 0.633 0.364 0.463 Acc. 0.849 0.589 0.754 posit. 0.747 0.900 0.816 GloVe GENERAL GloVe GEN 0.703 Cover. 0.992 1.000 0.995 neg. 0.404 0.210 0.277

Table 4: Laptops 200 adjs lexicon results Table 5: Semeval 2015 restaurants results

• hate recommend GI and Liu’s assign polarities with a very high precision, Only adjectives and verbs (e.g. , ) but they suffer from lower coverage. A similar behaviour are taken into account to calculate polarity (common be have can be observed for polarities based on WordNet. On the verbs like and are omitted) other hand the lexicons calculated directly on the domain • Negation words are taken into account to reverse the datasets are less accurate, but they have much higher cover- polarity of the subsequent word, in particular: no, nei- age. NRC CANADA lexicon achieves a very good result, ther, nothing, not, n’t, none, any, never, without but it must be noted that it employs supervised information. The PMI based on windows achieve a quite good result de- • The number of positive and negative words according spite of its simplicity, but it does not cover all the words (i.e. to each lexicon is counted. If the positives count is some words do not co-occur in the same context). The lex- greater or equal to negatives count, the polarity of all icons based on word embeddings calculated on the domain polarity slots of the sentence is assigned as positive; achieve a 100% coverage, because they are modelling the and negative otherwise. whole vocabulary, and offer a reasonable precision. Word Notice that this is a very naive polarity annotation process. embeddings (both Word2Vec and Glove) calculated on gen- It is not intended to obtain good results but for comparing eral domain corpus still cover a lot of the adjectives since the lexicons against real sentences using the same setting. they have been trained on a very large corpora, but they That is way in general the results are lower than in the ex- show a lower accuracy capturing the polarity of the words. periment with the bare adjective lists. This naive polarity annotation process is repeated for every polarity lexicon so 5.2. SemEval 2015 datasets based experiments the different lexicons and methods can be compared under SemEval 2015 based datasets consists of quintuples of the same conditions in real reviews test sets. aspect-term, entity-attribute, polarity, and starting and end- Table 5 shows the results for restaurants dataset while table ing position of the aspect-term. We are only interested in 6 shows the results for laptops dataset. These results have using the polarity slots, which refer to the polarity of a par- been calculated using the evaluation script provided by the ticular aspect of each sentence (not to the overall sentence SemEval 2015 organizers during the competition13. The polarity). We have applied the different lexicons to infer results show that there is no a clear winner, and the best the polarity of each sentence, and then we have compared performing lexicon vary depending on the domain. Some them to the gold annotations that come with the datasets. The process of assigning a polarity to each sentence using 13Available at http://alt.qcri.org/semeval2015/ the different polarity lexicons is the following: task12/index.php?id=data-and-tools

58 LAPTOPS (SEMEVAL 2015 DATASET) they improve the performance. Also, many machine learn- Name Prec. Rec. F1 Acc. ing based sentiment analysis approaches in the literature already employ word embeddings as input features, usually posit. 0.631 0.939 0.755 GI 0.651 computed against very big general corpora. It would be neg. 0.751 0.328 0.456 interesting to see how general domain word embeddings, posit. 0.64 0.96 0.768 which provide general language knowledge, and in-domain BingLiu 0.669 neg. 0.821 0.343 0.484 calculated word embeddings, which provide domain-aware information, can be combined to improve the results of such posit. 0.63 0.903 0.742 SWN 0.638 systems. Also we would like to explore if approaches with neg. 0.671 0.345 0.456 a more weakly supervised nature, like topic modelling and posit. 0.605 0.9411 0.736 Latent Dirichlet Allocation based systems, that try to jointly QWN-PPV 0.614 model the polarity and other facets of documents could ben- neg. 0.675 0.228 0.341 eﬁt from the information coming from in-domain word em- posit. 0.653 0.922 0.764 NRC CAN. 0.673 beddings. neg. 0.75 0.409 0.529 posit. 0.622 0.841 0.715 7. Acknowledgements PMI W 5 0.611 neg. 0.58 0.366 0.449 This work has been supported by Vicomtech-IK4 and par- posit. 0.728 0.825 0.774 tially funded by TUNER project (TIN2015-65308-C5-1- W2V DOM 0.708 R). neg. 0.673 0.636 0.654 posit. 0.533 0.443 0.484 W2V GEN 0.441 8. Bibliographical References neg. 0.362 0.5 0.42 Agerri, R. and Garcia, A. (2009). Q-WordNet : Extracting posit. 0.59 0.971 0.734 Polarity from WordNet Senses. Seventh Conference on GloVe DOM 0.604 neg. 0.762 0.159 0.263 International Language Resources and Evaluation Malta Retrieved May, 25:2010. posit. 0.571 0.932 0.708 GloVe GEN 0.567 Baccianella, S., Esuli, A., and Sebastiani, F. (2010). Senti- neg. 0.528 0.120 0.196 WordNet 3.0: An Enhanced Lexical Resource for Sen- timent Analysis and Opinion Mining. Proceedings of Table 6: Semeval 2015 laptops results the Seventh International Conference on Language Re- sources and Evaluation (LREC’10), 0:2200–2204. lexicon seem to be more accurate capturing positive words Bengio, Y., Ducharme, R., Vincent, P., and Janvin, C. and others seem to have a better recall. It must be noted that (2003). A Neural Probabilistic Language Model. The in this case what is being annotated are whole sentences Journal of Machine Learning Research, 3:1137–1155. of actual reviews, so there are a lot of facts involved apart Brody, S. and Elhadad, N. (2010). An unsupervised from the mere polarity of single words. Also in this case aspect-sentiment model for online reviews. The 2010 the domain-based word embeddings work better capturing Annual Conference of the North American Chapter of the the polarity than their general-domain counterparts. Association for Computational Linguistics, (June):804– 812. 6. Conclusions Dumais, S., Furnas, G., Landauer, T., Deerwester, S., Deer- In this work we have compared different existing lexicons wester, S., et al. (1995). Latent semantic indexing. In and methods to obtain a polarity value for words in a par- Proceedings of the Text Retrieval Conference. ticular domain. We have shown a simple yet functional Dumais, S. T. (2004). Latent semantic analysis. Annual re- way to quickly get a polarity value only with unlabelled view of information science and technology, 38(1):188– texts using continuous word representations. It is simi- 230. lar in essence to other exiting methods that require co- Esuli, A. and Sebastiani, F. (2006). SentiWordNet : A occurrence computations among words, but the semantic Publicly Available Lexical Resource for Opinion Min- properties of the continuous word embeddings does not ing. Proceedings of LREC 2006, pages 417–422. require words to co-occur and is easier to compute. In Fellbaum, C. (1998). WordNet. Wiley Online Library. addition we have shown that the similarity of sentiment Garc´ıa-Pablos, A., Cuadros, M., and Rigau, G. (2015). Un- bearing words (mainly adjectives) is better modelled us- supervised word polarity tagging by exploiting continu- ing a smaller in-domain dataset rather than a bigger gen- ous word representations. Procesamiento del Lenguaje eral dataset. We have observed a similar behaviour in pre- Natural, 55:127–134. liminary experiments for other languages such us Spanish, Guerini, M., Gatti, L., and Turchi, M. (2013). Sentiment French or Italian. An obvious advantage is that provided Analysis : How to Derive Prior Polarities from Senti- enough unlabelled domain data, the word embeddings and WordNet. Emnlp, pages 1259–1269. polarity scores can be easily obtained for any language. As Hatzivassiloglou, V. and McKeown, K. R. (1997). Pre- a further work, we would like to experiment with these in- dicting the semantic orientation of adjectives. Proceed- domain calculated word embeddings (and other variants) ings of the 35th Annual Meeting of the Association for within more complex sentiment analysis systems to see if Computational Linguistics and Eighth Conference of the

59 European Chapter of the Association for Computational Mikolov, T., Yih, W.-t., and Zweig, G. (2013d). Linguis- Linguistics, pages:181. tic regularities in continuous space word representations. Hill, F., Cho, K., Jean, S., Devin, C., and Bengio, Y. (2014). Proceedings of NAACL-HLT, pages 746–751. Embedding Word Similarity with Neural Machine Trans- Pang, B. and Lee, L. (2008). Opinion mining and sen- lation. pages 1–12. timent analysis. Foundations and trends in information Hu, M. and Liu, B. (2004). Mining opinion features in retrieval, 2(1-2):1–135. customer reviews. AAAI. Pavlopoulos, J. and Androutsopoulos, I. (2014). Aspect Huang, E. H., Socher, R., Manning, C. D., and Ng, A. term extraction for sentiment analysis: New datasets, (2012). Improving word representations via global con- new evaluation measures and an improved unsupervised text and multiple word prototypes. In Proceedings of the method. Proceedings of LASMEACL, pages 44–52. 50th Annual Meeting of the Association for Computa- Pennington, J. and Manning, C. ). Glove: Global vectors tional Linguistics: Long Papers-Volume 1, pages 873– for word representation. Emnlp2014.Org. 882. Popescu, A. and Etzioni, O. (2005). Extracting product Iacobacci, I., Pilehvar, M. T., and Navigli, R. (2015). features and opinions from reviews. Natural language SensEmbed: Learning Sense Embeddings for Word and processing and text mining, (October):339–346. Relational Similarity. Proceedings of the 53rd Annual Qiu, G., Liu, B., Bu, J., and Chen, C. (2011). Opin- Meeting of the Association for Computational Linguis- ion word expansion and target extraction through double tics and the 7th International Joint Conference on Nat- propagation. Computational linguistics, (July 2010). ural Language Processing (Volume 1: Long Papers), Ramos, C. and Marques, N. C. (2005). Determining the (1):95–105. Polarity of Words through a Common Online Dictionary. Jaggi, M., Zurich, E. T. H., and Cieliebak, M. (2014). Rothe, S., Ebert, S., and Schutze,¨ H. (2016). Ultradense Swiss-Chocolate : Sentiment Detection using Sparse word embeddings by orthogonal transformation. arXiv SVMs and Part-Of-Speech n -Grams. (SemEval):601– preprint arXiv:1602.07572. 604. Schwartz, R., Reichart, R., and Rappoport, A. (2014). Ji, S., Yun, H., Yanardag, P., Matsushima, S., and Vish- Symmetric Pattern Based Word Embeddings for Im- wanathan, S. V. N. (2015). WordRank: Learning Word proved Word Similarity Prediction. 353. Embeddings via Robust Ranking. pages 1–12. Socher, R., Perelygin, A., Wu, J., and Chuang, J. (2013). Kim, J.-k. (2013). Deriving adjectival scales from Recursive Deep Models for Semantic Compositionality continuous space word representations. Emnlp, Over a Sentiment Treebank. newdesign.aclweb.org. (October):1625–1630. Stone, P. J., Dunphy, D. C., and Smith, M. S. (1966). The Kiritchenko, S., Zhu, X., Cherry, C., Mohammad, S. M., general inquirer: A computer approach to content analy- and Mohammad, S. (2014). NRC-Canada-2014 : De- sis. tecting Aspects and Sentiment in Customer Reviews. Taboada, M., Brooke, J., Tofiloski, M., Voll, K., and (SemEval):437–442. Stede, M. (2011). Lexicon-Based Methods for Senti- Le, Q. and Mikolov, T. (2014). Distributed Representa- ment Analysis. Computational Linguistics, 37(Septem- tions of Sentences and Documents. International Con- ber 2010):267–307. ference on Machine Learning - ICML 2014, 32:1188– Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., and Qin, 1196. B. (2014a). Learning Sentiment-Specific Word Embed- ding. Acl, pages 1555–1565. Li, J. and Jurafsky, D. (2015). Do multi-sense embeddings improve natural language understanding? arXiv preprint Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., and Qin, arXiv:1506.01070. B. (2014b). Learning sentiment-specific word embedding for twitter sentiment classification. In Proceedings Lin, C., Road, N. P., and Ex, E. (2009). Joint Sentiment / of the 52nd Annual Meeting of the Association for Com- Topic Model for Sentiment Analysis. Cikm, pages 375– putational Linguistics, pages 1555–1565. 384. Turian, J., Ratinov, L., and Bengio, Y. (2010). Word Rep- Maks, I., Izquierdo, R., Frontini, F., Agerri, R., Azpeitia, resentations: A Simple and General Method for Semi- A., and Vossen, P. (2014). Generating Polarity Lexi- supervised Learning. Proceedings of the 48th Annual cons with WordNet propagation in five languages. pages Meeting of the Association for Computational Linguis- 1155–1161. tics, pages 384–394. Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013a). Turney, P. D. (2002). Thumbs Up or Thumbs Down? Se- Efficient Estimation of Word Representations in Vec- mantic Orientation Applied to Unsupervised Classifica- arXiv preprint arXiv:1301.3781 tor Space. , pages 1–12, tion of Reviews. Computational Linguistics, (July):8. January. Vicente, S., Agerri, R., and Rigau, G. (2014). Simple , Ro- Mikolov, T., Le, Q. V., and Sutskever, I. (2013b). Exploit- bust and ( almost ) Unsupervised Generation of Polarity ing Similarities among Languages for Machine Transla- Lexicons for Multiple Languages. Eacl2014. arXiv preprint arXiv:1309.4168v1 tion. In , pages 1–10. Zhang, L. and Liu, B. (2014). Aspect and entity extrac- Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and tion for opinion mining. In Data mining and knowledge Dean, J. (2013c). Distributed Representations of Words discovery for big data, pages 1–40. Springer. and Phrases and their Compositionality. arXiv preprint arXiv: . . . , pages 1–9, October.