A Weak-Supervision Method for Automating Training Set Creation in Multi-Domain Aspect Sentiment Classification
Total Page:16
File Type:pdf, Size:1020Kb
A Weak-supervision Method for Automating Training Set Creation in Multi-domain Aspect Sentiment Classification Massimo Ruffolo1;2 a and Francesco Visalli1 1High Performance Computing and Networking Institute of the National Research Council (ICAR-CNR), Via Pietro Bucci 8/9C, Rende (CS), 87036, Italy 2Altilia.ai, Technest - University of Calabria, Piazza Vermicelli, Rende (CS), 87036, Italy fmassimo.ruffolo, [email protected] Keywords: Weak-supervision, Data Programming, Deep Learning, Aspect Based Sentiment Analysis, Transformers, Natural Language Processing. Abstract: Aspect Based Sentiment Analysis (ABSA) is receiving growing attention from the research community be- cause it has applications in several real world use cases. To train deep learning models for ABSA in vertical domains may result a laborious process requiring a significative human effort in creating proper training sets. In this work we present initial studies regarding the definition of an easy-to-use, flexible, and reusable weakly- supervised method for the Aspect Sentence Classification task of ABSA. Our method mainly consists in a process where templates of Labeling Functions automatically annotate sentences, and then the generative model of Snorkel constructs a probabilistic training set. In order to test effectiveness and applicability of our method we trained machine learning models where the loss function is informed about the probabilistic nature of the labels. In particular, we fine-tuned BERT models on two famous disjoint SemEval datasets related to laptops and restaurants. 1 INTRODUCTION care of retrieving them from sentences. Instead, ASC identifies the polarity of terms referred to the aspect Aspect Based Sentiment Analysis (ABSA) is a Nat- and that express a sentiment/opinion. ABSA is a dif- ural Language Processing (NLP) problem that is re- ficult problem to address in multiple domains because ceiving growing attention from the research commu- an opinion term that could be positive for a domain nity because it can be productively used in many dif- may be not for another. ferent real world use cases (Hu and Liu, 2004). For For example, in Figure 1, the term “hot” ex- example, ABSA enables extracting relevant features, presses a Negative opinion about the aspect “battery”, along with buyers opinions, from product reviews whereas the same term referred to the aspect “food” available online. assumes a Positive connotation. A battery that gets hot is not desirable, while a hot tasty food is good. In last years ABSA methods based on deep learn- ing are becoming mainstream. Deep learning auto- mates the feature engineering process. Since 2012, when AlexNet (Krizhevsky et al., 2012) won the Ima- 1 Figure 1: Examples of aspects and related sentiments. geNet Large Scale Visual Recognition Competition , deep neural network architectures have contaminated ABSA comes in two mainly variants (Pontiki et al., also the NLP area, becoming the state of the art for 2014), one of these provides for two subtasks which many tasks in this field (Devlin et al., 2019; Yang are Aspect Extraction (AE) and Aspect Sentiment et al., 2019) comprised ABSA (Xu et al., 2019; Ri- Classification (ASC). Figure 1 shows two sentences etzler et al., 2019). Despite such a success, devel- extracted from laptops and restaurants domains, re- oping enterprise-grade deep learning-based applica- spectively. Aspects are highlighted in blue, AE takes tions still pose many challenges. In particular, deep a https://orcid.org/0000-0002-4094-4810 1http://www.image-net.org/challenges/LSVRC/ learning models are often hungry for data, as they are tended Snorkel in order to achieve better scalability rich of parameters. Leveraging supervised learning and knowledge base re-usability for enterprise-grade methods to train these models requires a large amount training sets labeling. of annotated examples. Such training sets are enor- In this paper we propose a weakly-supervised ap- mously expensive to create, especially when domain proach to the ASC task in ABSA problems. In our expertise is required. Moreover, the specifications of approach we leverage BERT fine-tuning method for a system often change, requiring the re-labeling of the sentiment classification as in (Xu et al., 2019), and datasets. Therefore, it is not always possible to rely Snorkel (Ratner et al., 2017) to apply data program- on subject matter experts for labeling the data. This ming principles to reviews. The main contributions of is one of the most costly bottlenecks to a wide and this work are: pervasive adoption of deep learning methods in real • The definition of a set of easy-to-use, flexible, world use cases. Hence, alleviating the cost of human and reusable LFs templates that make viable the annotation is a major issue in supervised learning. weakly-supervised method for the ASC task; To tackle this problem, vary approaches such as: • The experimental evaluation of the generality, transfer learning, semi-supervised learning, where effectiveness, and robustness of the weakly- both unsupervised and supervised learning methods supervised method proposed in (Ratner et al., are exploited, and weak supervision have been pro- 2017) when applied to complex problems like the posed. Transfer learning methods such as (Ding et al., ASC task. 2017; Wang and Pan, 2018) rely on the fact that a model trained on a specific domain (source) can be In order to prove the flexibility and the robustness of exploited to do the same task on another domain (tar- our approach we tested it on two disjoint domains, get), thus reducing the need for labeled data in the laptops and restaurants. In particular we used the target domain. datasets of SemEval task 4 subtask 2 (Pontiki et al., One of the most important example of semi- 2014). Finally, we compared the obtained results with supervised learning, recently appeared in literature, those of supervised learning methods. Results appear is Bidirectional Encoder Representations from Trans- remarkable for a weakly supervised system. They formers (BERT) (Devlin et al., 2019) that strongly show that our approach can be used for practical pur- reduces the volume of needed labeled data. BERT pose on multiple domains. mainly is a model that provides strong contextual The rest of this paper is organized as follows: in word embeddings learned in an unsupervised way Section 2 we introduce a list of ABSA works that by training on large text corpus. But BERT is also try to reduce the need for human effort; in Section a generic deep learning architecture for many NLP 3 we present our method in terms of LFs template we tasks because it can be fine-tuned in a supervised way have defined; in Section 4 the experiments carried out in order to learn various down-stream NLP tasks. The and the results obtained are shown and discussed; fi- idea behind fine-tuning is that most of the features nally, in Section 5 conclusions are drawn and the fu- have already been learned and the model just needs ture work is presented. to be specialized for the specific NLP task. This way, fine-tuning BERT for a specific NLP problem, such as ABSA, requires much less annotated data than learn- 2 RELATED WORK ing the entire task from scratch. Weak supervision simplifies the annotation pro- While there is a large corpus of scientific papers re- cess in order to make it more automatic and scalable, lated to the ABSA problem, to the best of our knowl- even though less accurate and noisier. Weak super- edge there are very few works that propose the appli- vised methods rely on several different data annota- cation of weak-supervision methods to ABSA. (Pab- tion techniques such as the use of heuristics, distant los et al., 2014) uses some variations of (Qiu et al., learning, pattern matching, weak classifiers and so 2009) and (Qiu et al., 2011) to perform AE and ASC. on. Recently, (Ratner et al., 2016) proposed data pro- In (Pablos et al., 2015) the AE task is done by gramming as a paradigm for semi-automatic datasets bootstrapping a list of candidate domain aspect terms labeling, and Snorkel (Ratner et al., 2017) the system and using them to annotate the reviews of the same that implements it. Data programming is based on the domain. The polarity detection is performed using concept of Labeling Functions (LFs), where LFs are a polarity lexicon exploiting the Word2Vec model procedures that automatically assign labels to data on (Mikolov et al., 2013) for each domain (however the the base of domain knowledge embedded in form of task is a bit different from ASC, they classify Entity- annotation rules. (Bach et al., 2019) at Google ex- Attribute pair, where Entity and Attribute belong to predefined lists, e.g. food, price, location for Entity signed for the ASC task of the ABSA problem, is and food-price, food-quality for Attribute). grounded on data programming (Ratner et al., 2016) (Pablos et al., 2018) presents a fully “almost un- that is a weak-supervised paradigm based on the con- supervised” ABSA system. Starting from a customer cept of Labeling Functions (LFs), where LFs are pro- reviews dataset and a few words list of aspects they cedures, designed by data scientists and/or subject extract a list of words per aspect and two lists of posi- matter experts, that automatically assign labels to data tive and negative words for every selected aspect. It is on the base of domain knowledge embedded in form based on a topic modelling approach combined with of annotation rules. continuous word embeddings and a Maximum En- More in detail, our method consists in a set of pre- tropy classifier. defined, easy-to-use, and flexible LFs capable of au- (Purpura et al., 2018) performs the AE phase with tomatically assigning a sentiment to sentence-aspect a topic modeling technique called Non-negative Ma- pairs.