Refining Word Embeddings for Sentiment Analysis

Refining Word Embeddings for Sentiment Analysis 1,3 2,3,4 2,3 4 Liang-Chih Yu , Jin Wang , K. Robert Lai and Xuejie Zhang 1 Department of Information Management, Yuan Ze University, Taiwan 2 Department of Computer Science & Engineering, Yuan Ze University, Taiwan 3Innovation Center for Big Data and Digital Convergence Yuan Ze University, Taiwan 4School of Information Science and Engineering, Yunnan University, Yunnan, P.R. China Contact: [email protected] Abstract 1995) are also useful information for learning word embeddings. These embeddings have been Word embeddings that can capture seman- successfully used for various natural language tic and syntactic information from contexts processing tasks. have been extensively used for various In general, existing word embeddings are se- natural language processing tasks. Howev- er, existing methods for learning context- mantically oriented. They can capture semantic based word embeddings typically fail to and syntactic information from unlabeled data in capture sufficient sentiment information. an unsupervised manner but fail to capture suffi- This may result in words with similar vec- cient sentiment information. This makes it diffi- tor representations having an opposite sen- cult to directly apply existing word embeddings to timent polarity (e.g., good and bad), thus sentiment analysis. Prior studies have reported degrading sentiment analysis performance. that words with similar vector representations Therefore, this study proposes a word vec- (similar contexts) may have opposite sentiment tor refinement model that can be applied to polarities, as in the example of happy-sad men- any pre-trained word vectors (e.g., tioned in (Mohammad et al., 2013) and good-bad Word2vec and GloVe). The refinement model is based on adjusting the vector rep- in (Tang et al., 2016). Composing these word vec- resentations of words such that they can be tors may produce sentence vectors with similar closer to both semantically and sentimen- vector representations but opposite sentiment po- tally similar words and further away from larities (e.g., a sentence containing happy and a sentimentally dissimilar words. Experi- sentence containing sad may have similar vector mental results show that the proposed representations). Building on such ambiguous method can improve conventional word vectors will affect sentiment classification per- embeddings and outperform previously formance. proposed sentiment embeddings for both To enhance the performance of distinguishing binary and fine-grained classification on words with similar vector representations but op- Stanford Sentiment Treebank (SST). posite sentiment polarities, recent studies have suggested learning sentiment embeddings from 1 Introduction labeled data in a supervised manner (Maas et al., Word embeddings are a technique to learn con- 2011; Labutov and Lipson, 2013; Lan et al., 2016; tinuous low-dimensional vector space representa- Ren et al., 2016; Tang et al., 2016). The common tions of words by leveraging the contextual in- goal of these methods is to capture both seman- formation from large corpora. Examples include tic/syntactic and sentiment information such that C&W (Collobert and Weston, 2008; Collobert et sentimentally similar words have similar vector al., 2011), Word2vec (Mikolov et al., 2013a; representations. They typically apply an objective 2013b) and GloVe (Pennington et al., 2014). In function to optimize word vectors based on the addition to the contextual information, character- sentiment polarity labels (e.g., positive and nega- level subwords (Bojanowski et al., 2016) and se- tive) given by the training instances. The use of mantic knowledge resources (Faruqui et al., 2015; such sentiment embeddings has improved the per- Kiela et al., 2015) such as WordNet (Miller, formance of binary sentiment classification. 534 Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 534–539 Copenhagen, Denmark, September 7–11, 2017. c 2017 Association for Computational Linguistics This study adopts another strategy to obtain Target word: good (7.89) Ranked by Ranked by both semantic and sentiment word vectors. Instead cosine similarity sentiment score of building a new word embedding model from great (7.50) excellent (7.56) labeled data, we propose a word vector refinement bad (3.24) great (7.50) model to refine existing semantically oriented terrific (7.12) fantastic (8.36) Re-ranking word vectors using sentiment lexicons. That is, the decent (6.27) wonderful (7.41) proposed model can be applied to the pre-trained nice (6.95) terrific (7.12) vectors obtained by any word representation excellent (7.56) nice (6.95) learning models (e.g., Word2vec and GloVe) as a fantastic (8.36) decent (6.27) post-processing step to adapt the pre-trained vec- solid (5.65) solid (5.65) tors to sentiment applications. The refinement lousy (3.14) bad (3.24) model is based on adjusting the pre-trained vector wonderful (7.41) lousy (3.14) of each affective word in a given sentiment lexicon such that it can be closer to a set of both se- Figure 1: Example of nearest neighbor rank- mantically and sentimentally similar nearest ing. neighbors (i.e., those with the same polarity) and further away from sentimentally dissimilar neigh- (target word) and the other words in the lexicon bors (i.e., those with an opposite polarity). based on the cosine distance of their pre-trained The proposed refinement model is evaluated by vectors, and then select top-k most similar words examining whether our refined embeddings can as the nearest neighbors. These semantically simi- improve conventional word embeddings and out- lar nearest neighbors are then re-ranked according perform previously proposed sentiment embed- to their sentiment scores provided by the lexicon dings. To this end, several deep neural network such that the sentimentally similar neighbors can classifiers that performed well on the Stanford be ranked higher and dissimilar neighbors lower. Sentiment Treebank (SST) (Socher et al., 2013) Finally, the pre-trained vector of the target word is are selected, including convolutional neural net- refined to be closer to its semantically and senti- works (CNN) (Kim, 2014), deep averaging net- mentally similar nearest neighbors and further work (DAN) (Iyyer et al., 2015) and long-short away from sentimentally dissimilar neighbors. term memory (LSTM) (Tai et al., 2015; Looks et The following subsections provide a detailed de- al., 2017). The conventional word embeddings scription of the nearest neighbor ranking and re- used in these classifiers are then replaced by our finement model. refined versions and previously proposed sentiment embeddings to re-run the classification for 2.1 Nearest Neighbor Ranking performance comparison. The SST is chosen be- cause it can show the effect of using different The sentiment lexicon used in this study is the ex- word embeddings on fine-grained sentiment clas- tended version of Affective Norms of English sification, whereas prior studies only reported bi- Words (E-ANEW) (Warriner et al., 2013). It con- nary classification results. tains 13,915 words and each word is associated The rest of this paper is organized as follows. with a real-valued score in [1, 9] for the dimen- Section 2 describes the proposed word vector re- sions of valence, arousal and dominance. The va- finement model. Section 3 presents the evaluation lence represents the degree of positive and nega- results. Conclusions are drawn in Section 4. tive sentiment, where values of 1, 5 and 9 respec- tively denote most negative, neutral and most positive sentiment. In Fig. 1, good has a valence score 2 Word Vector Refinement of 7.89, which is greater than 5, and thus can be The refinement procedure begins by giving a set considered positive. Conversely, bad has a va- of pre-trained word vectors and a sentiment lexi- lence score of 3.24 and is thus negative. In addi- con annotated with real-valued sentiment scores. tion to the E-ANEW, other lexicons such as Sen- Our goal is to refine the pre-trained vectors of the tiWordNet (Esuli and Fabrizio, 2006), SoCal affective words in the lexicon such that they can (Taboada et al., 2011), SentiStrength (Thelwall et capture both semantic and sentiment information. al., 2012), Vader (Hutto et al., 2014), ANTUSD To accomplish this goal, we first calculate the se- (Wang and Ku, 2016) and SCL-NMA mantic similarity between each affective word (Kiritchenko and Mohammad, 2016) also provide 535 real-valued sentiment intensity or strength scores like the valence scores. For each target word to be refined, the top-k semantically similar nearest neighbors are first selected and ranked in descending order of their cosine similarities. In Fig. 1, the left ranked list shows the top 10 nearest neighbors for the target word good. The semantically ranked list is then sentimentally re-ranked based on the absolute dif- ference of the valence scores between the target word and the words in the list. A smaller differ- ence indicates that the word is more sentimentally similar to the target word, and thus will be ranked higher. As shown in the right ranked list in Fig. 1, Figure 2: Conceptual diagram of word vector the re-ranking step can rank the sentimentally refinement. similar neighbors higher and the dissimilar neighbors lower. In the refinement model, the higher ranked sentimentally similar neighbors will re- similar neighbors and further away from lower- ceive a higher weight to refine the pre-trained vec- ranked dissimilar neighbors, as shown in Fig. 2. tor of the target word. To prevent too many words being moved to the same location and thereby producing too 2.2 Refinement Model many similar vectors, we add a constraint to keep each pre-trained vector within a certain range Once the word list ranked by both cosine similari- from its original vector. The objective function is ty and valence scores for each target word is ob- thus divided as two parts: tained, its pre-trained vector will be refined to be (1) closer to its sentimentally similar neighbors, arg minΦ (V )= (2) further away from its dissimilar neighbors, and nk tt++11tt (2) (3) not too far away from the original vector.

Load more