Learning Sentiment-Specific Word Embedding Via Global Sentiment Representation

The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) Learning Sentiment-Specific Word Embedding via Global Sentiment Representation Peng Fu,1,2 Zheng Lin,1∗ Fengcheng Yuan,1,2 Weiping Wang,1 Dan Meng1 Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China1 School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China2 {fupeng, linzheng, yuanfengcheng, wangweiping, mengdan}@iie.ac.cn Abstract good lovely Context-based word embedding learning approaches can bad good model rich semantic and syntactic information. However, it ugly is problematic for sentiment analysis because the words with excellent similar contexts but opposite sentiment polarities, such as lovely bad good and bad, are mapped into close word vectors in the em- excellent bedding space. Recently, some sentiment embedding learn- terrible ugly ing methods have been proposed, but most of them are de- terrible signed to work well on sentence-level texts. Directly applying those models to document-level texts often leads to unsatis- fied results. To address this issue, we present a sentiment- specific word embedding learning architecture that utilizes Figure 1: Illustrative normal word embedding (left) and local context information as well as global sentiment repre- sentiment-specific word embedding (right) in embedding sentation. The architecture is applicable for both sentence- space. level and document-level texts. We take global sentiment representation as a simple average of word embeddings in the text, and use a corruption strategy as a sentiment-dependent sentiment polarities. Therefore, it is desired to propose mod- regularization. Extensive experiments conducted on several els that can not only capture the contexts of words but also benchmark datasets demonstrate that the proposed architec- model sentiment information of texts, like word embeddings ture outperforms the state-of-the-art methods for sentiment on right of the Figure 1. classification. To achieve this goal, Tang et al. proposed two models based on the C&W(Collobert and Weston 2008) model that Introduction learn sentiment-specific word embedding by sentiment polarity labels for twitter sentiment classification. They also Continuous word representation, commonly called word extended their work with several customized loss functions embedding, attempt to represent each word as a continuous, (Tang et al. 2016b). These models predict or rank senti- low-dimensional and real-valued vector. Since they can cap- ment polarity based on word embeddings in a fixed win- ture various dimensions of semantic and syntactic informa- dow of words across a sentence. In addition, based on the tion and group words with similar grammatical usages and Skip-Gram (Mikolov et al. 2013), Zhang et al.(2015) inte- semantic meanings, they have less susceptible to data spar- grated the sentiment information by using the semantic word sity. Therefore, word embeddings are widely used for many embeddings in the context to predict the sentiment polarity natural language processing tasks, such as sentiment analy- through a softmax layer, and Yang et al.(2017) proposed a sis (Wang et al. 2015), machine translation (Ding et al. 2017) model that predicted the target word and its label simulta- and question answering (Hao et al. 2017). neously. Both of them took sentiment information as a part Existing word embedding learning approaches mostly of the local context. Due to the limitation of the design of represent each word by predicting the target word through these training methods, they could only be used in specific its context (Collobert and Weston 2008; Mikolov et al. 2013) task and is less efficient for document-level text. Therefore, and map words of similar semantic roles into nearby points the integration of sentiment polarity into semantic word em- in the embedding space. For example, ‘good’ and ‘bad’on beddings is still a major challenge for sentiment analysis. the left of the Figure 1 are mapped into close vectors in the In this paper, we will introduce a sentiment-specific word embedding space. However, it is confusing for sentiment embedding learning architecture that incorporates local con- analysis, because these two words actually have opposite text with global sentiment representation. In general, the lo- ∗Corresponding author cal context can be regarded as a representation of the target Copyright c 2018, Association for the Advancement of Artificial word, while the global sentiment representation is the av- Intelligence (www.aaai.org). All rights reserved. eraged vector of the words in the text through a corruption 4808 strategy. The strategy is a biased randomly sampling pro- Document Representation cess. Thus, the local and the global representations could Document representation is a fundamental problem for be regarded as semantic and sentiment information respec- many natural language processing tasks. Many efforts have tively. In order to learn sentiment-specific word embedding, been done to generate concise document representation. the global sentiment representation could integrate into the Paragraph Vectors (Dai, Olah, and Le 2015) is an unsuper- local context by modeling jointly. vised method that explicitly learns a document representa- Based on the proposed architecture, we develop two neu- tion with word embeddings. In the Paragraph Vectors model, ral network models to learn the sentiment-specific word em- a projection matrix D is introduced. Each column of matrix beddings, which are the extension to the Continuous Bag- D is a document representation x. The model inserts x to the of-Words (CBoW) model. The prediction model (SWPre- standard language model which aims at capturing the global dict) takes sentiment prediction as a multi-class classifi- semantic information of the document. With the document cation task, and it can be viewed as language modeling. representation x, the probability of the target word wt given The ranking model (SWRank) takes sentiment prediction its local context Ct is calculated as: as a ranking problem, and it penalizes relative distances T t exp(V t (UC +x)) among triplet global sentiment representations. Experiments P (wt|Ct, x) = w (2) exp(VT (UCt +x)) demonstrate the effectiveness of our models, and empiri- w∈V w cal comparisons on sentence-level and document-level sen- However, the complexity of Paragraph Vectors grows with timent analysis tasks show that our architecture outperforms the size of vocabulary and training corpus, and it needs ex- state-of-the-art methods. pensive inference to obtain the representations of unseen The main contributions of this work are as follows: documents. To alleviate these problems, Chen (2017) pro- • We propose a general architecture to learn sentiment- posed a model, called Doc2VecC, which simply represents specific word embeddings, and use a global sentiment a document as an average of word embeddings that are ran- representation to model the interaction of words and domly sampled from the document. The randomly sampling sentiment polarity. The architecture is effective for both process is a kind of drop-out corruption that can speed up sentence-level and document-level texts. the training. What’s more, the corruption strategy is proved to be as a data-dependent regularization. • We develop two neural networks to learn sentiment- Given a document D contains word embeddings {w w } x specific word embeddings. The prediction model takes the 1,..., T , its global representation is denoted as and sentiment prediction as a classification task, and the rank- each word embedding is denoted as x. The corruption strat- ing model takes sentiment prediction as a ranking prob- egy randomly overwrites each word embedding of the orig- lem among the triplets. inal document x with probability q, and it sets the uncor- rupted word embeddings to 1 times the value of its origi- • To improve the efficiency of the model, we use a corrup- 1−q nals. Formally, tion strategy that favors informative words which have strong discrimination capability. It can be regarded as a 0, with probability q xd = xd (3) sentiment-dependent regularization for global sentiment 1−q , otherwise representation. Thus, the corrupted document representation is denoted as x= 1 T x , where T is the length of the document. Fi- Background T 1 d nally, the calculation of the probability of the target word wt Modeling Contexts of Words is the same as Eq.2. Many methods can encode contexts of words into embed- Approach dings from a large collection of unlabeled data. Here we fo- cus on the most relevant methods to our model. Bengio et al. Architecture proposed a neural language model and estimated the param- An intuitive solution to model the interaction of sentiment eters of the network and these embeddings jointly. For this information and word embeddings is to predict the sentiment model is quite expensive to train, Mikolov et al.(2013) pro- distribution of the global representation while modeling the posed the Word2vec, which contains CBoW and Skip-Gram contexts of words. The benefit of introducing global repre- models, to learn high-quality word embeddings. sentation is that it can learn sentiment-specific word em- CBoW is an effective framework for modeling contexts beddings from variable-length texts. In this paper, we pro- of words, which aims to predict the target word given its pose an architecture that is an extension of the CBoW model context in a sentence. It contains an input layer, a projection (Mikolov et al. 2013). Based on the architecture, we develop layer parameterized by the matrix U and an output layer pa- two neural networks to learn sentiment-specific word em- rameterized by V. The probability of the target word wt with beddings, including a prediction model and a ranking model. its local context Ct can be calculated as: The architecture consists of two components: • The semantic component is to learn semantic and syntac- T t exp(V t UC ) P (wt|Ct)= w (1) tic information of words in an unsupervised way as shown exp(VT UCt) in the background section.

Learning Sentiment-Specific Word Embedding Via Global Sentiment Representation

A Sentiment-Based Chat Bot

Sentiment Analysis on Interactive Conversational Agent/Chatbots Sameena Thabassum [email protected]

NLP with BERT: Sentiment Analysis Using SAS® Deep Learning and Dlpy Doug Cairns and Xiangxiang Meng, SAS Institute Inc

Best Practices in Text Analytics and Natural Language Processing

Sentiment Analysis Using N-Gram Algo and Svm Classifier

Smart Ubiquitous Chatbot for COVID-19 Assistance with Deep Learning Sentiment Analysis Model During and After Quarantine

Natural Language Processing for Sentiment Analysis an Exploratory Analysis on Tweets

The Application of Sentiment Analysis and Text Analytics to Customer Experience Reviews to Understand What Customers Are Really Saying

Automated Citation Sentiment Analysis Using High Order N-Grams: a Preliminary Investigation

Sentiment Analysis & Natural Language: Processing Techniques

Natural Language Processing, Sentiment Analysis and Clinical Analytics

Sentiment Analysis and the Complex Natural Language