Neural Semantic Role Labeling with Dependency Path Embeddings

Neural Semantic Role Labeling with Dependency Path Embeddings Michael Roth and Mirella Lapata School of Informatics, University of Edinburgh 10 Crichton Street, Edinburgh EH8 9AB {mroth,mlap}@inf.ed.ac.uk Abstract System Analysis This paper introduces a novel model for mate-tools *He had [troubleA0] raising [fundsA1]. semantic role labeling that makes use mateplus *He had [troubleA0] raising [fundsA1]. of neural sequence modeling techniques. TensorSRL *He had trouble raising [fundsA1]. Our approach is motivated by the obser- easySRL *He had trouble raising [fundsA1]. vation that complex syntactic structures and related phenomena, such as nested This work [HeA0] had trouble raising [fundsA1]. subordinations and nominal predicates, are not handled well by existing models. Table 1: Outputs of SRL systems for the sentence Our model treats such instances as sub- He had trouble raising funds. Arguments of raise sequences of lexicalized dependency paths are shown with predicted roles as defined in Prop- and learns suitable embedding representa- Bank (A0: getter of money; A1: money). Asterisks tions. We experimentally demonstrate that mark flawed analyses that miss the argument He. such embeddings can improve results over previous state-of-the-art semantic role la- belers, and showcase qualitative improve- their work, features based on syntactic con- ments obtained by our method. stituent trees were identified as most valu- able for labeling predicate-argument relation- 1 Introduction ships. Later work confirmed the importance The goal of semantic role labeling (SRL) is to of syntactic parse features (Pradhan et al., 2005; identify and label the arguments of semantic Punyakanok et al., 2008) and found that depen- predicates in a sentence according to a set of dency parse trees provide a better form of rep- predefined relations (e.g., “who” did “what” resentation to assign role labels to arguments to “whom”). Semantic roles provide a layer (Johansson and Nugues, 2008). of abstraction beyond syntactic dependency Most semantic role labeling approaches to date arXiv:1605.07515v2 [cs.CL] 18 Jul 2016 relations, such as subject and object, in that rely heavily on lexical and syntactic indicator fea- the provided labels are insensitive to syntactic tures. Through the availability of large annotated alternations and can also be applied to nomi- resources, such as PropBank (Palmer et al., 2005), nal predicates. Previous work has shown that statistical models based on such features achieve semantic roles are useful for a wide range of high accuracy. However, results often fall short natural language processing tasks, with recent when the input to be labeled involves instances of applications including statistical machine trans- linguistic phenomena that are relevant for the la- lation (Aziz et al., 2011; Xiong et al., 2012), beling decision but appear infrequently at training plagiarism detection (Osman et al., 2012; time. Examples include control and raising verbs, Paul and Jamal, 2015), and multi-document nested conjunctions or other recursive structures, abstractive summarization (Khan et al., 2015). as well as rare nominal predicates. The difficulty The task of semantic role labeling (SRL) lies in that simple lexical and syntactic indicator was pioneered by Gildea and Jurafsky (2002). In features are not able to model interactions trig- gered by such phenomena. For instance, consider ROOT the sentence He had trouble raising funds and the analyses provided by four publicly available tools SBJ had OBJ in Table 1 (mate-tools, Björkelund et al. (2010); he trouble NMOD mateplus, Roth and Woodsend (2014); Ten- A0 raising OBJ sorSRL, Lei et al. (2015); and easySRL, raise.01 funds Lewis et al. (2015)). Despite all systems claiming A1 state-of-the-art or competitive performance, none of them is able to correctly identify He as the Figure 1: Dependency path (dotted) between the agent argument of the predicate raise. Given the predicate raising and the argument he. complex dependency path relation between the predicate and its argument, none of the systems actually identifies He as an argument at all. instance, are typically computed by forwarding a In this paper, we develop a new neural network one-hot word vector representation from the input model that can be applied to the task of seman- layer of a neural network to its first hidden layer, tic role labeling. The goal of this model is to bet- usually by means of matrix multiplication and an ter handle control predicates and other phenomena optional non-linear function whose parameters are that can be observed from the dependency struc- learned during neural network training. ture of a sentence. In particular, we aim to model Here, we seek to compute real-valued vector the semantic relationships between a predicate and representations for dependency paths between a its arguments by analyzing the dependency path pair of words hwi,w ji. We define a dependency between the predicate word and each argument path to be the sequence of nodes (representing head word. We consider lexicalized paths, which words) and edges (representing relations between we decompose into sequences of individual items, words) to be traversed on a dependency parse tree namely the words and dependency relations on to get from node wi to node w j. In the example in a path. We then apply long-short term memory Figure 1, the dependency path from raising to he networks (Hochreiter and Schmidhuber, 1997) to is raising −−−→NMOD trouble −−→OBJ had←−SBJ he. find a recurrent composition function that can re- Analogously to how word embeddings are com- construct an appropriate representation of the full puted, the simplest way to embed paths would path from its individual parts (Section 2). To en- be to represent each sequence as a one-hot vec- sure that representations are indicative of seman- tor. However, this is suboptimal for two reasons: tic relationships, we use semantic roles as target Firstly, we expect only a subset of dependency labels in a supervised setting (Section 3). paths to be attested frequently in our data and By modeling dependency paths as sequences of therefore many paths will be too sparse to learn words and dependencies, we implicitly address the reliable embeddings for them. Secondly, we hy- data sparsity problem. This is the case because we pothesize that dependency paths which share the use single words and individual dependency rela- same words, word categories or dependency re- tions as the basic units of our model. In contrast, lations should impact SRL decisions in similar previous SRL work only considered full syntactic ways. Thus, the words and relations on the path paths. Experiments on the CoNLL-2009 bench- should drive representation learning, rather than mark dataset show that our model is able to out- the full path on its own. The following sections perform the state-of-the-art in English (Section 4), describe how we address representation learning and that it improves SRL performance in other by means of modeling dependency paths as se- languages, including Chinese, German and Span- quences of items in a recurrent neural network. ish (Section 5). 2.1 Recurrent Neural Networks 2 Dependency Path Embeddings The recurrent model we use in this work is a vari- In the context of neural networks, the term embed- ant of the long-short term memory (LSTM) net- ding refers to the output of a function f within the work. It takes a sequence of items X = x1,...,xn network, which transforms an arbitrary input into as input, recurrently processes each item xt ∈ X at a real-valued vector output. Word embeddings, for a time, and finally returns one embedding state en next layer a part-of-speech tag, a word form, or a dependency relation. In the context of semantic role labeling, en we define each path as a sequence from a predicate to its potential argument.1 Specifically, we define xn the first item x1 to correspond to the part-of-speech x1 tag of the predicate word wi, followed by its actual he N had V trouble N raising V ... word form, and the relation to the next word wi+1. The embedding of a dependency path corresponds subj obj nmod to the state en returned by the LSTM layer after the input of the last item, xn, which corresponds to Figure 2: Example input and embedding compu- the word form of the argument head word w j. An tation for the path from raising to he, given the example is shown in Figure 2. sentence he had trouble raising funds. LSTM time The main idea of this model and representation steps are displayed from right to left. is that word forms, word categories and dependency relations can all influence role labeling de- for the complete input sequence. For each time cisions. The word category and word form of the step t, the LSTM model updates an internal mem- predicate first determine which roles are plausible and what kinds of path configurations are to be ex- ory state mt that depends on the current input as pected. The relations and words seen on the path well as the previous memory state mt−1. In or- der to capture long-term dependencies, a so-called can then manipulate these expectations. In Fig- gating mechanism controls the extent to which ure 2, for instance, the verb raising complements each component of a memory cell state will be the phrase had trouble, which makes it likely that modified. In this work, we employ input gates i, the subject he is also the logical subject of raising. output gates o and (optional) forget gates f. We By using word forms, categories and depen- formalize the state of the network at each time dency relations as input items, we ensure that spe- step t as follows: cific words (e.g., those which are part of complex predicates) as well as various relation types (e.g., subject and object) can appropriately influ- σ mi xi i it = ([W mt−1]+ W xt + b ) (1) ence the representation of a path.

Load more