Sentiment Tagging with Partial Labels Using Modular Architectures

Sentiment Tagging with Partial Labels using Modular Architectures Xiao Zhang Dan Goldwasser Purdue University Purdue University [email protected] [email protected] Abstract annotation effort. We introduce a novel task decomposition approach, learning with partial la- Many NLP learning tasks can be decomposed bels, in which the task output labels decompose into several distinct sub-tasks, each associated hierarchically, into partial labels capturing differ- with a partial label. In this paper we focus on ent aspects, or sub-tasks, of the final task. We a popular class of learning problems, sequence prediction applied to several sentiment analy- show that learning with partial labels can help sup- sis tasks, and suggest a modular learning ap- port weakly-supervised learning when only some proach in which different sub-tasks are learned of the partial labels are available. using separate functional modules, combined Given the popularity of sequence labeling tasks to perform the final task while sharing infor- in NLP, we demonstrate the strength of this mation. Our experiments show this approach approach over several sentiment analysis tasks, helps constrain the learning process and can alleviate some of the supervision efforts. adapted for sequence prediction. These include target-sentiment prediction (Mitchell et al., 2013), 1 Introduction aspect-sentiment prediction (Pontiki et al., 2016) and subjective text span identification and polar- Many natural language processing tasks attempt to ity prediction (Wilson et al., 2013). To ensure the replicate complex human-level judgments, which broad applicability of our approach to other prob- often rely on a composition of several sub-tasks lems, we extend the popular LSTM-CRF (Lample into a unified judgment. For example, consider the et al., 2016) model that was applied to many se- Targeted-Sentiment task (Mitchell et al., 2013), quence labeling tasks1. assigning a sentiment polarity score to entities de- The modular learning process corresponds to a pending on the context that they appear in. Given task decomposition, in which the prediction la- the sentence “according to a CNN poll, Green bel, y, is deconstructed into a set of partial la- Book will win the best movie award”, the sys- bels fy0; ::; ykg, each defining a sub-task, captur- tem has to identify both entities, and associate the ing a different aspect of the original task. Intu- relevant sentiment value with each one (neutral itively, the individual sub-tasks are significantly with CNN, and positive with Green Book). This easier to learn, suggesting that if their dependen- task can be viewed as a combination of two tasks, cies are modeled correctly when learning the fi- entity identification, locating contiguous spans of nal task, they can constrain the learning problem, words corresponding to relevant entities, and sen- leading to faster convergence and a better over- timent prediction, specific to each entity based on all learning outcome. In addition, the modular the context it appears in. Despite the fact that this approach helps alleviate the supervision problem, form of functional task decomposition is natural as often providing full supervision for the overall for many learning tasks, it is typically ignored and task is costly, while providing additional partial la- learning is defined as a monolithic process, com- bels is significantly easier. For example, annotat- bining the tasks into a single learning problem. ing entity segments syntactically is considerably Our goal in this paper is to take a step to- easier than determining their associated sentiment, wards modular learning architectures that exploit which requires understanding the nuances of the the learning tasks’ inner structure, and as a re- sult simplify the learning process and reduce the 1We also provide analysis for NER in the apendix 579 Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 579–590 Florence, Italy, July 28 - August 2, 2019. c 2019 Association for Computational Linguistics context they appear in semantically. By exploiting et al., 2013; Zhang et al., 2015; Li and Lu, 2017; modularity, the entity segmentation partial labels Ma et al., 2018) We show that modular learning can be used to help improve that specific aspect of indeed helps simplify the learning task compared the overall task. to traditional monolithic approaches. To answer Our modular task decomposition approach is the second question, we evaluate our model’s abil- partially inspired by findings in cognitive neu- ity to leverage partial labels in two ways. First, roscience, namely the two-streams hypothesis, by restricting the amount of full labels, and ob- a widely accepted model for neural process- serving the improvement when providing increas- ing of cognitive information in vision and hear- ing amounts of partial labels for only one of the ing (Eysenck and Keane, 2005), suggesting the sub-tasks. Second, we learn the sub-tasks using brain processes information in a modular way, completely disjoint datasets of partial labels, and split between a “where” (dorsal) pathway, special- show that the knowledge learned by the sub-task ized for locating objects and a “what” (ventral) modules can be integrated into the final decision pathway, associated with object representation and module using a small amount of full labels. recognition (Mishkin et al., 1983; Geschwind and Our contributions: (1) We provide a general Galaburda, 1987; Kosslyn, 1987; Rueckl et al., modular framework for sequence learning tasks. 1989). Jacobs et al.(1991) provided a compu- While we focus on sentiment analysis task, the tational perspective, investigating the “what” and framework is broadly applicable to many other “where” decomposition on a computer vision task. tagging tasks, for example, NER (Carreras et al., We observe that this task decomposition naturally 2002; Lample et al., 2016) and SRL (Zhou and fits many NLP tasks and borrow the notation. In Xu, 2015), to name a few. (2) We introduce a the target-sentiment tasks we address in this paper, novel weakly supervised learning approach, learn- the segmentation tagging task can be considered as ing with partial labels, that exploits the modular a “where”-task (i.e., the location of entities), and structure to reduce the supervision effort. (3) We the sentiment recognition as the “what”-task. evaluated our proposed model, in both the fully- Our approach is related to multi-task learn- supervised and weakly supervised scenarios, over ing (Caruana, 1997), which has been extensively several sentiment analysis tasks. applied in NLP (Toshniwal et al., 2017; Eriguchi et al., 2017; Collobert et al., 2011; Luong, 2016; 2 Related Works Liu et al., 2018). However, instead of simply ag- From a technical perspective, our task decom- gregating the objective functions of several dif- position approach is related to multi-task learn- ferent tasks, we suggest to decompose a single ing (Caruana, 1997), specifically, when the tasks task into multiple inter-connected sub-tasks and share information using a shared deep representa- then integrate the representation learned into a sin- tion (Collobert et al., 2011; Luong, 2016). How- gle module for the final decision. We study sev- ever, most prior works aggregate multiple losses eral modular neural architectures, which differ in on either different pre-defined tasks at the final the way information is shared between tasks, the layer (Collobert et al., 2011; Luong, 2016), or on learning representation associated with each task a language model at the bottom level (Liu et al., and the way the dependency between decisions is 2018). This work suggests to decompose a given modeled. task into sub-tasks whose integration comprise the Our experiments were designed to answer two original task. To the best of our knowledge, Ma questions. First, can the task structure be exploited et al.(2018), focusing on targeted sentiment is to simplify a complex learning task by using a most similar to our approach. They suggest a modular approach? Second, can partial labels be joint learning approach, modeling a sequential re- used effectively to reduce the annotation effort? lationship between two tasks, entity identification To answer the first question, we conduct exper- and target sentiment. We take a different ap- iments over several sequence prediction tasks, and proach viewing each of the model components as compare our approach to several recent models for a separate module, predicted independently and deep structured prediction (Lample et al., 2016; then integrated into the final decision module. As Ma and Hovy, 2016; Liu et al., 2018), and when we demonstrate in our experiments, this approach available, previously published results (Mitchell leads to better performance and increased flexibil- 580 ity, as it allows us to decouple the learning process 3.1 CRF Layer and learn the tasks independently. Other modular A CRF model describes the probability of pre- neural architectures were recently studied for tasks dicted labels y, given a sequence x as input, as combining vision and language analysis (Andreas et al., 2016; Hu et al., 2017; Yu et al., 2018), eΦ(x;y) P (yjx) = ; and were tailored for the grounded language set- Λ Z ting. To help ensure the broad applicability of where Z = P eΦ(x;y~) is the partition function that our framework, we provide a general modular net- y~ work formulation for sequence labeling tasks by marginalize over all possible assignments to the adapting a neural-CRF to capture the task struc- predicted labels of the sequence, and Φ(x; y) is ture. This family of models, combining structured the scoring function, which is defined as: prediction with deep learning showed promising X results (Gillick et al., 2015; Lample et al., 2016; Φ(x; y) = φ(x; yt) + (yt−1; yt): Ma and Hovy, 2016; Zhang et al., 2015; Li and Lu, t 2017), by using rich representations through neu- The partition function Z can be computed effi- ral models to generate decision candidates, while ciently via the forward-backward algorithm.

Load more