Semantic Role Labeling with Neural Network Factors

Semantic Role Labeling with Neural Network Factors Nicholas FitzGeraldz∗ Oscar Täckströmy Kuzman Ganchevy Dipanjan Dasy zDepartment of Computer Science and Engineering, University of Washington y Google, New York [email protected] {oscart,kuzman,dipanjand}@google.com Abstract network is learned to capture correlations of the respective embedding dimensions to create argument We present a new method for semantic role and role representations. The similarity of these labeling in which arguments and seman- two representations, as measured by their dot prod- tic roles are jointly embedded in a shared uct, is used to score possible roles for candidate vector space for a given predicate. These arguments within a graphical model. This graphical embeddings belong to a neural network, model jointly models the assignment of semantic whose output represents the potential func- roles to all arguments of a predicate, subject to tions of a graphical model designed for structural linguistic constraints. the SRL task. We consider both local Our model has several advantages. Compared and structured learning methods and ob- to linear multiclass classifiers used in prior work, tain strong results on standard PropBank vector embeddings of the predictions overcome the and FrameNet corpora with a straightfor- assumption of modeling each semantic role as a ward product-of-experts model. We fur- discrete label, thus capturing fine-grained label sim- ther show how the model can learn jointly ilarity. Moreover, since predictions and inputs are from PropBank and FrameNet annotations embedded in the same vector space, and features to obtain additional improvements on the extracted from inputs and outputs are decoupled, smaller FrameNet dataset. our approach is amenable to joint learning of multi- 1 Introduction ple annotation conventions, such as PropBank and FrameNet, in a single model. Finally, as with other Semantic role labeling (SRL) is the task of iden- neural network approaches, our model obviates the tifying the semantic arguments of a predicate and need to manually engineer feature conjunctions. labeling them with their semantic roles. A key chal- Our underlying inference algorithm for SRL lenge in this task is sparsity of labeled data: a given follows Täckström et al. (2015), who presented predicate-role instance may only occur a handful a dynamic program for structured SRL; it is tar- of times in the training set. Most existing SRL geted towards the prediction of full argument spans. systems model each semantic role as an atomic Hence, we present empirical results on three span- unit of meaning, ignoring finer-grained semantic based SRL datasets: CoNLL 2005 and 2012 data similarity between roles that can be leveraged to annotated with PropBank conventions, as well as share context between similar labels, both within FrameNet 1.5 data. We also evaluate our system and across annotation conventions. on the dependency-based CoNLL 2009 shared task Low-dimensional embedding representations by assuming single word argument spans, that rep- have been shown to be successful in overcoming resent semantic dependencies, and limit our ex- sparsity and representing label similarity across a periments to English. On all datasets, our model wide range of tasks (Weston et al., 2011; Sriku- performs on par with a strong linear model base- mar and Manning, 2014; Hermann et al., 2014; line that uses hand-engineered conjunctive features. Lei et al., 2015). In this paper, we present a new Due to random parameter initialization and stochas- model for SRL that embeds candidate arguments ticity in the online learning algorithm used to train and semantic roles (in context of a predicate frame) our models, we observed considerable variance in in a shared vector space. A feed-forward neural performance across datasets. To resolve this vari- ∗Work carried out during an internship at Google. ance, we adopt a product-of-experts model that Theft steal.01 steal.V steal.V contains several hundred core and non-core roles John stole my car . John stole my car . that are shared across frames. For example, the Perpetrator Goods A0 A1 FrameNet frame Theft could be evoked by the verbs Theft lift.02 lift.V lift.V steal, pickpocket, or lift, while PropBank has dis- tinct frames for each of them. The Theft frame also Mary lifted a purse . Mary lifted a purse . Perpetrator Goods A0 A1 contains the core roles Goods and Perpetrator that (a) (b) additionally belong to the Commercial_transaction Figure 1: FrameNet (a) and PropBank (b) annota- and Committing_crime frames respectively. tions for two sentences. A typical SRL dataset consists of sentence-level annotations that identify (possibly multiple) target predicates in each sentence, a disambiguated frame combines multiple randomly-initialized instances for each predicate, and the associated argument of our model to achieve state-of-the-art results on spans (or single word argument heads) labeled with the CoNLL 2009 and FrameNet datasets, while their respective semantic roles. coming close to the previous best published results on the other two. Finally, we present even stronger 2.2 Related Work results for FrameNet data (which is scarce) by SRL using PropBank conventions (Palmer et al., jointly training the model with PropBank-annotated 2005) has been widely studied. There have been data. two shared tasks at CoNLL 2004-2005 to identify the phrasal arguments of verbal predicates (Car- 2 Background reras and Màrquez, 2004; Carreras and Màrquez, In this section, we briefly describe the SRL task 2005). The CoNLL 2008-2009 shared tasks in- and discuss relevant prior work. troduced a variant where semantic dependencies are annotated rather than phrasal arguments (Sur- 2.1 Semantic Role Labeling deanu et al., 2008; Hajicˇ et al., 2009). Similar SRL annotations rely on a frame lexicon containing approaches (Das et al., 2014; Hermann et al., 2014) frames that could be evoked by one or more lexical have been applied to frame-semantic parsing us- units. A lexical unit consists of a word lemma con- ing FrameNet conventions (Baker et al., 1998). We joined with its coarse-grained part-of-speech tag.1 treat PropBank and FrameNet annotations in a com- Each frame is further associated with a set of pos- mon framework, similar to Hermann et al. (2014). sible core and non-core semantic roles which are Most prior work on SRL rely on syntactic parses used to label its arguments. This description of a provided as input and use locally estimated classifiers for each span-role pair that are only combined frame lexicon covers both PropBank and FrameNet 2 conventions, but there are some differences out- at prediction time. This is done by picking the lined below. See Figure 1 for example annotations. highest scoring role for each span, subject to a set of structural constraints, such as avoiding overlap- PropBank defines frames that are essentially ping arguments and repeated core roles. Typically, sense distinctions of a given lexical unit. The set of these constraints have been enforced by integer lin- PropBank roles consists of seven generic core roles ear programming (ILP), as in Punyakanok et al. (labeled A0-A5 and AA) that assume different se- (2008). Täckström et al. (2015) interpreted this mantics for different frames, each associating with as a graphical model with local factors for each a subset of the core roles. In addition, there are 21 span-role pair, and global factors that encode the non-core roles that encapsulate further arguments structural constraints. They derived a dynamic pro- of a frame, such as temporal (AM-TMP) and locative gram (DP) that enforces most of the constraints (AM-LOC) adjuncts. The non-core roles are shared proposed by Punyakanok et al. and showed how between all frames and assume similar meaning. the DP can be used to take these constraints into In contrast, a FrameNet frame often associates account during learning. Here, we use an identical with multiple lexical units and the frame lexicon graphical model, but extend the model of Täck- 1We borrow the term “lexical unit” from the frame seman- ström et al. by replacing its linear potential func- tics literature. The CoNLL 2005 dataset is restricted to verbal lexical units, while the CoNLL 2009 and 2012 datasets con- 2Some recent work have successfully proposed joint mod- tains both verbal and nominal lexical units. FrameNet has els for syntactic parsing and SRL instead of a pipeline ap- lexical units of several coarse syntactic categories. proach (Lewis et al., 2015). tions with a multi-layer neural network. A similar tured graphical models (Srikumar and Manning, use of non-linear potential functions in a structured 2014), and in techniques to learn joint embeddings model was proposed by Do and Artières (2010) of predicate words and their semantic frames in a for speech recognition, and by Durrett and Klein vector space (Hermann et al., 2014). (2015) for syntactic phrase-structure parsing. Feature-based approaches to SRL employ hand- 3 Model engineered linguistically-motivated feature tem- Our model for SRL performs inference separately plates to represent the semantic structure. Some for each marked predicate in a sentence. It assumes recent work has focused on low-dimensional repre- that the predicate has been automatically disam- sentations that reduce the need for intensive feature biguated to a semantic frame drawn from a frame engineering and lead to better generalization in lexicon, and the semantic roles of the frame are the face of data sparsity. Lei et al. (2015) employ used for labeling the candidate arguments in the low-rank tensor factorization to induce a compact sentence. Formally, we are given a sentence x in representation of the full cross-product of atomic which a predicate t, with lexical unit `, has been features; akin to this work, they represent seman- marked. Assuming that the semantic frame f of the tic roles as real-valued vectors, but use a different predicate has already been identified (see §4.2 for scoring formulation for modeling potential argu- this step), we seek to predict the set of spans that ments.

Semantic Role Labeling with Neural Network Factors

Classifying Relevant Social Media Posts During Disasters Using Ensemble of Domain-Agnostic and Domain-Speciﬁc Word Embeddings

Injection of Automatically Selected Dbpedia Subjects in Electronic

Semantic Role Labeling 2

Information Extraction Based on Named Entity for Tourism Corpus

ACL 2019 Social Media Mining for Health Applications (#SMM4H)

A Second Look at Word Embeddings W4705: Natural Language Processing

Cw2vec: Learning Chinese Word Embeddings with Stroke N-Gram Information

Effects of Pre-Trained Word Embeddings on Text-Based Deception Detection

Syntax Role for Neural Semantic Role Labeling

Semantic Role Labeling Tutorial NAACL, June 9, 2013

Efficient Pairwise Document Similarity Computation in Big Datasets

Cross-Lingual Semantic Role Labeling with Model Transfer Hao Fei, Meishan Zhang, Fei Li and Donghong Ji