Multiple Instance Learning Networks for Fine-Grained Sentiment Analysis

Multiple Instance Learning Networks for Fine-Grained Sentiment Analysis Stefanos Angelidis and Mirella Lapata Institute for Language, Cognition and Computation School of Informatics, University of Edinburgh 10 Crichton Street, Edinburgh EH8 9AB [email protected], [email protected] Abstract [Rating: ??] I had a very mixed experience at The Stand. The burger and fries were good. The chocolate shake was We consider the task of fine-grained senti- divine: rich and creamy. The drive-thru was horrible. It ment analysis from the perspective of multi- took us at least 30 minutes to order when there were only ple instance learning (MIL). Our neural model is trained on document sentiment labels, and four cars in front of us. We complained about the wait learns to predict the sentiment of text seg- and got a half–hearted apology. I would go back because ments, i.e. sentences or elementary discourse the food is good, but my only hesitation is the wait. units (EDUs), without segment-level supervi- + The burger and fries were good sion. We introduce an attention-based polarity scoring method for identifying positive and + The chocolate shake was divine negative text snippets and a new dataset which + I would go back because the food is good we call SPOT (as shorthand for Segment-level – The drive-thru was horrible Summary POlariTy annotations) for evaluating MIL- – It took us at least 30 minutes to order style sentiment models like ours. Experimen- tal results demonstrate superior performance Figure 1: An EDU-based summary of a 2-out-of-5 against multiple baselines, whereas a judge- star review with positive and negative snippets. ment elicitation study shows that EDU-level opinion extraction produces more informative summaries than sentence-based alternatives. Socher et al., 2013) annotated with sentiment labels and used to predict sentiment in unseen texts. Coarse-grained document-level annotations are rel- 1 Introduction atively easy to obtain due to the widespread use Sentiment analysis has become a fundamental area of opinion grading interfaces (e.g., star ratings ac- of research in Natural Language Processing thanks companying reviews). In contrast, the acquisition to the proliferation of user-generated content in the of sentence- or phrase-level sentiment labels re- form of online reviews, blogs, internet forums, and mains a laborious and expensive endeavor despite social media. A plethora of methods have been pro- its relevance to various opinion mining applica- posed in the literature that attempt to distill senti- tions, e.g., detecting or summarizing consumer opin- ment information from text, allowing users and ser- ions in online product reviews. The usefulness of vice providers to make opinion-driven decisions. finer-grained sentiment analysis is illustrated in the The success of neural networks in a variety of ap- example of Figure 1, where snippets of opposing po- plications (Bahdanau et al., 2015; Le and Mikolov, larities are extracted from a 2-star restaurant review. 2014; Socher et al., 2013) and the availability of Although, as a whole, the review conveys negative large amounts of labeled data have led to an in- sentiment, aspects of the reviewer’s experience were creased focus on sentiment classification. Super- clearly positive. This goes largely unnoticed when vised models are typically trained on documents focusing solely on the review’s overall rating. (Johnson and Zhang, 2015a; Johnson and Zhang, In this work, we consider the problem of segment- 2015b; Tang et al., 2015; Yang et al., 2016), sen- level sentiment analysis from the perspective of tences (Kim, 2014), or phrases (Socher et al., 2011; Multiple Instance Learning (MIL; Keeler, 1991). 17 Transactions of the Association for Computational Linguistics, vol. 6, pp. 17–31, 2018. Action Editor: Ani Nenkova. Submission batch: 7/17; Revision batch: 11/2017; Published 1/2018. c 2018 Association for Computational Linguistics. Distributed under a CC-BY 4.0 license. Instead of learning from individually labeled seg- larity of a text can be computed (e,g., by aggregating ments, our model only requires document-level su- the sentiment scores of constituent words). More re- pervision and learns to introspectively judge the sen- cently, Taboada et al. (2011) introduced SO-CAL, timent of constituent segments. Beyond showing a state-of-the-art method that combines a rich senti- how to utilize document collections of rated reviews ment lexicon with carefully defined rules over syn- to train fine-grained sentiment predictors, we also tax trees to predict sentence sentiment. investigate the granularity of the extracted segments. Supervised learning techniques have subse- Previous research (Tang et al., 2015; Yang et al., quently dominated the literature (Pang et al., 2002; 2016; Cheng and Lapata, 2016; Nallapati et al., Pang and Lee, 2005; Qu et al., 2010; Xia and 2017) has predominantly viewed documents as se- Zong, 2010; Wang and Manning, 2012; Le and quences of sentences. Inspired by recent work in Mikolov, 2014) thanks to user-generated sentiment summarization (Li et al., 2016) and sentiment clas- labels or large-scale crowd-sourcing efforts (Socher sification (Bhatia et al., 2015), we also represent et al., 2013). Neural network models in particular documents via Rhetorical Structure Theory’s (Mann have achieved state-of-the-art performance on vari- and Thompson, 1988) Elementary Discourse Units ous sentiment classification tasks due to their abil- (EDUs). Although definitions for EDUs vary in the ity to alleviate feature engineering. Kim (2014) literature, we follow standard practice and take the introduced a very successful CNN architecture for elementary units of discourse to be clauses (Carlson sentence-level classification, whereas other work et al., 2003). We employ a state-of-the-art discourse (Socher et al., 2011; Socher et al., 2013) uses recur- parser (Feng and Hirst, 2012) to identify them. sive neural networks to learn sentiment for segments Our contributions in this work are three-fold: of varying granularity (i.e., words, phrases, and sen- a novel multiple instance learning neural model tences). We describe Kim’s (2014) approach in more which utilizes document-level sentiment supervision detail as it is also used as part of our model. to judge the polarity of its constituent segments; the Let xi denote a k-dimensional word embedding creation of SPOT, a publicly available dataset which of the i-th word in text segment s of length n. The contains Segment-level POlariTy annotations (for segment’s input representation is the concatenation sentences and EDUs) and can be used for the eval- of word embeddings x1,..., xn, resulting in word uation of MIL-style models like ours; and the em- matrix X. Let Xi:i+j refer to the concatenation pirical finding (through automatic and human-based of embeddings xi,..., xi+j. A convolution filter evaluation) that neural multiple instance learning is W Rlk, applied to a window of l words, produces ∈ superior to more conventional neural architectures a new feature c = ReLU(W X + b), where i ◦ i:i+l and other baselines on detecting segment sentiment ReLU is the Rectified Linear Unit non-linearity, ‘ ’ 1 ◦ and extracting informative opinions in reviews. denotes the entrywise product followed by a sum over all elements and b R is a bias term. Ap- ∈ 2 Background plying the same filter to every possible window of Our work lies at the intersection of multiple research word vectors in the segment, produces a feature map c = [c1, c2, . , cn l+1]. Multiple feature maps areas, including sentiment classification, opinion − mining and multiple instance learning. We review for varied window sizes are applied, resulting in a related work in these areas below. fixed-size segment representation v via max-over- time pooling. We will refer to the application of con- Sentiment Classification Sentiment classification volution to an input word matrix X, as CNN(X).A is one of the most popular tasks in sentiment anal- final sentiment prediction is produced using a soft- ysis. Early work focused on unsupervised meth- max classifier and the model is trained via back- ods and the creation of sentiment lexicons (Turney, propagation using sentence-level sentiment labels. 2002; Hu and Liu, 2004; Wiebe et al., 2005; Bac- The availability of large-scale datasets (Diao et cianella et al., 2010) based on which the overall po- al., 2014; Tang et al., 2015) has also led to the de- 1Our code and SPOT dataset are publicly available at: velopment of document-level sentiment classifiers https://github.com/stangelid/milnet-sent which exploit hierarchical neural representations. 18 These are obtained by first building representations Opinion Mining A standard setting for opinion of sentences and aggregating those into a document mining and summarization (Lerman et al., 2009; feature vector (Tang et al., 2015). Yang et al. (2016) Carenini et al., 2006; Ganesan et al., 2010; Di Fab- further acknowledge that words and sentences are brizio et al., 2014; Gerani et al., 2014) assumes a set deferentially important in different contexts. They of documents that contain opinions about some en- present a model which learns to attend (Bahdanau et tity of interest (e.g., camera). The goal of the system al., 2015) to individual text parts when constructing is to generate a summary that is representative of the document representations. We describe such an ar- average opinion and speaks to its important aspects chitecture in more detail as we use it as a point of (e.g., picture quality, battery life, value). Output comparison with our own model. summaries can be extractive (Lerman et al., 2009) Given document d comprising segments or abstractive (Gerani et al., 2014; Di Fabbrizio et (s1, . , sm), a Hierarchical Network with at- al., 2014) and the underlying systems exhibit vary- tention (henceforth HIERNET; based on Yang ing degrees of linguistic sophistication from identi- et al., 2016) produces segment representations fying aspects (Lerman et al., 2009) to using RST- (v1,..., vm) which are subsequently fed into a style discourse analysis, and manually defined tem- bidirectional GRU module (Bahdanau et al., 2015), plates (Gerani et al., 2014; Di Fabbrizio et al., 2014).

Load more