Weakly-Supervised Deep Learning for Customer Review Sentiment

Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16) Weakly-Supervised Deep Learning for Customer Review Sentiment Classification Ziyu Guan1, Long Chen1, Wei Zhao2, Yi Zheng3, Shulong Tan4, Deng Cai3 1Northwest University 2Xidian University 3Zhejiang University 4Baidu Big Data Laboratory Xi’an, China Xi’an, China Hangzhou, China Sunnyvale, USA 1 ziyuguan@, longchen@stumail. nwu.edu.cn, [email protected] { 3 delostik@, dengcai@cad.} zju.edu.cn, [email protected] { } Abstract these opinion mining techniques is a sentiment classifier for natural sentences. Sentiment analysis is one of the key challenges for Popular sentiment classification methods generally fall into mining online user generated content. In this work, two categories: (1) lexicon-based methods and (2) machine we focus on customer reviews which are an impor- learning methods. Lexicon-based methods [Turney, 2002; tant form of opinionated content. The goal is to Hu and Liu, 2004; Ding et al., 2008] typically take the tack of identify each sentence’s semantic orientation (e.g. first constructing a sentiment lexicon of opinion words (e.g. positive or negative) of a review. Traditional senti- “good”, “bad”), and then design classification rules based on ment classification methods often involve substan- appeared opinion words and prior syntactic knowledge. De- tial human efforts, e.g. lexicon construction, fea- spite effectiveness, this kind of methods require substantial ture engineering. In recent years, deep learning efforts in lexicon construction and rule design. Furthermore, has emerged as an effective means for solving sen- lexicon-based methods cannot well handle implicit opinions, timent classification problems. A neural network i.e. objective statements such as “I bought the mattress a intrinsically learns a useful representation automat- week ago, and a valley appeared today”. As pointed out in ically without human efforts. However, the suc- [Feldman, 2013], this is also an important form of opinions. cess of deep learning highly relies on the availabil- Factual information is usually more helpful than subjective ity of large-scale training data. In this paper, we feelings. Lexicon-based methods can only deal with implicit propose a novel deep learning framework for re- opinions in an ad-hoc way [Zhang and Liu, 2011]. view sentiment classification which employs preva- A pioneering work [Pang et al., 2002] for machine learn- lently available ratings as weak supervision signals. ing based sentiment classification applied standard machine The framework consists of two steps: (1) learn a learning algorithms (e.g. Support Vector Machines) to the high level representation (embedding space) which problem. After that, most research in this direction revolved captures the general sentiment distribution of sen- around feature engineering for better classification perfor- tences through rating information; (2) add a classi- mance. Different kinds of features have been explored, e.g. fication layer on top of the embedding layer and use n-grams [Dave et al., 2003], Part-of-speech (POS) informa- labeled sentences for supervised fine-tuning. Ex- tion and syntactic relations [Mullen and Collier, 2004], etc. periments on review data obtained from Amazon Feature engineering also costs a lot of human efforts, and a show the efficacy of our method and its superiority feature set suitable for one domain may not generate good over baseline methods. performance for other domains [Pang and Lee, 2008]. In recent years, deep learning has emerged as an effective means for solving sentiment classification problems [Glorot 1 Introduction et al., 2011; Kim, 2014; Tang et al., 2015; Socher et al., 2011; With the booming of Web 2.0 and e-commerce, more and 2013]. A deep neural network intrinsically learns a high level more people start consuming online and leave comments representation of the data [Bengio et al., 2013], thus avoiding about their purchase experiences on merchant/review Web- laborious work such as feature engineering. A second ad- sites. These opinionated contents are valuable resources vantage is that deep models have exponentially stronger ex- both to future customers for decision-making and to mer- pressive power than shallow models. However, the success of chants for improving their products and/or service. How- deep learning heavily relies on the availability of large-scale ever, as the volume of reviews grows rapidly, people have training data [Bengio et al., 2013; Bengio, 2009]. Construct- to face a severe information overload problem. To allevi- ing large-scale labeled training datasets for sentence level ate this problem, many opinion mining techniques have been sentiment classification is still very laborious. proposed, e.g. opinion summarization [Hu and Liu, 2004; Fortunately, most merchant/review Websites allow cus- Ding et al., 2008], comparative analysis [Liu et al., 2005] tomers to summarize their opinions by an overall rating score and opinion polling [Zhu et al., 2011]. A key component for (typically in 5-stars scale). Ratings reflect the overall senti- 3719 2.1 Deep Learning for Sentiment Classification In recent years, deep learning has received more and more attention in the sentiment analysis community. Researchers have explored different deep models for sentiment classification. Glorot et al. used stacked denoising auto-encoder to Figure 1: A negative sentence in a 5-stars review. train review representation in an unsupervised fashion, in or- der to address the domain adaptation problem of sentiment classification [Glorot et al., 2011]. Socher et al. [Socher et ment of customer reviews and have already been exploited for al., 2011; 2012; 2013] proposed a series of Recursive Neural sentiment analysis [Maas et al., 2011; Qu et al., 2012]. Nev- Network (RecNN) models for sentiment classification. These ertheless, review ratings are not reliable labels for the con- methods learn vector representations of variable-length sen- stituent sentences, e.g. a 5-stars review can contain negative tences through compositional computation recursively. Kim sentences and we may also see positive words occasionally in investigated using CNN for sentence sentiment classification 1-star reviews. An example is shown in Figure 1. Therefore, and found it outperformed RecNN [Kim, 2014]. A vari- treating binarized ratings as sentiment labels could confuse a ant CNN with dynamic k-max pooling and multiple convo- sentiment classifier for review sentences. lutional layers was proposed in [Kalchbrenner et al., 2014]. In this work, we propose a novel deep learning framework Researchers have also investigated using sequential models for review sentence sentiment classification. The framework such as Recurrent Neural Network (RNN) and Long Short- leverages weak supervision signals provided by review rat- Term Memory (LSTM) for sentiment classification [Tang et ings to train deep neural networks. For example, with 5- al., 2015]. stars scale we can deem ratings above/below 3-stars as pos- However, none of the above works tried to use review rat- itive/negative weak labels respectively. It consists of two ings to train deep sentiment classifiers for sentences. This is steps. In the first step, rather than predicting sentiment la- not a trivial problem since ratings are too noisy to be used bels directly, we try to learn an embedding space (a high directly as sentence labels (see Section 3 and experiments level layer in the neural network) which reflects the general for discussions of this issue). To our knowledge, The WDE sentiment distribution of sentences, from a large number of framework is the first attempt to make use of rating informa- weakly labeled sentences. That is, we force sentences with tion for training deep sentence sentiment classifiers. Note that the same weak labels to be near each other, while sentences although we choose CNN as the deep model due to its com- with different weak labels are kept away from one another. To petitive performance on sentiment classification [Kim, 2014], reduce the impact of sentences with rating-inconsistent orien- the idea of WDE could also be applied to other types of deep tation (hereafter called wrong-labeled sentences), we propose models. The major contribution of this work is a weakly- to penalize the relative distances among sentences in the em- supervised deep learning framework, rather than specific deep bedding space through a ranking loss. In the second step, a models. classification layer is added on top of the embedding layer, 2.2 Exploiting Ratings in Sentiment Classification and we use labeled sentences to fine-tune the deep network. Regarding the network, we adopt Convolutional Neural Net- Rating information has been exploited in sentiment classifica- work (CNN) as the basis structure since it achieved good per- tion. Qu et al. incorporated ratings as weak labels in a proba- formance for sentence sentiment classification [Kim, 2014]. bilistic framework for sentence level sentiment classification We further customize it by taking aspect information (e.g. [Qu et al., 2012]. However, their method still required care- screen of cell phones) as an additional context input. The ful feature design and relied on base predictors. While our framework is dubbed Weakly-supervised Deep Embedding method automatically learns a meaningful sentence represen- (WDE). Although we adopt CNN in this paper, WDE also tation for sentiment classification. Tackstr¨ om¨ and McDonald has the potential to work with other types

Load more