Projecting Semantic Roles Via Tai Mappings

Projecting Semantic Roles via Tai Mappings Hector-Hugo Franco-Penya Martin Emms Trinity College Dublin Trinity College Dublin Dublin, Ireland. Dublin, Ireland. [email protected] [email protected] Abstract and we will describe in this paper an SRL system which works by projecting annotations over This work takes the paradigm of projecting an alignment. This paper extends results reported annotations within labelled data into unla- by Franco-Penya (2010). In the remainder of this belled data, via a mapping, and applies it section we give a brief overview of SRL. Sec- to Semantic Role Labelling. The projec- tions are amongst dependency trees and the tion 2 then describes our system, followed in sec- mappings are the Tai-mappings that under- tions 3 and 4 by discussion of its evaluation. lie the well known tree edit-distance algo- For a number of languages, to an existing syn- rithm. The system was evaluated in seven tactic treebank, a layer of semantic role informa- different languages. A number of variants tion has been added: the evolution of the Penn are explored relating to the amount of in- Treebank into PropBank is an example (Palmer et formation attended to in aligning nodes, whether the scoring is distance-based or al., 2005). Role-label inventories and annotation similarity-based , and the relative ease with principles vary widely, but our system has been which nodes can be ignored. We find that applied to data annotated along the same lines as all of these have statistically significant im- exemplified by PropBank. The example below il- pacts on the outcomes, mostly in language - lustrates a PropBank role-labelling.1 independent ways, but sometimes language dependently. [Revenue]A1 edged [up]A5 [3.4 %]A2 [to $904 million]A4 [from $874 million]A3 [in last year’s 1 Introduction third quarter]TMP A lexical item (such as edge), is given a There are a number of pattern recognition scenar- frameset of enumerated core argument roles (A0 ios that have the characteristics that one has some . A5). In the example, A3 is the start point of the kind of structured test data (sequence, tree, graph, movement, and a minimal PropBank commitment grid) within which some annotation is missing, is that A3 and the other enumerated role identi- and one wants to infer the missing annotation by fiers are used consistently across different tokens exploiting fully annotated training data. A possi- of edge. Across different lexical items, commit- ble approach is to seek to define alignments be- ments concerning continuity in the use of the enu- tween training and test cases, and to use these to merated arguments are harder to state – see the project annotation from the training to test cases. conclusions in section 5. There are also named This has been successfully used in computational roles (such as TMP above), whose use across dif- biology, for example, to project annotation via se- ferent items is intended to be consistent. quence alignments (Marchler-Bauer et al., 2002) and graph alignments (Kolar et al., 2008). Se- 1To save space, this shows simply a labelling of sub- mantic Role Labeling (SRL) can be seen as a fur- sequences, omitting syntactic information. Figure 2 shows ther instance of this pattern recognition scenario annotation added to a syntactic structure. 61 Proceedings of KONVENS 2012 (Main track: oral presentations), Vienna, September 19, 2012 Arising from the CoNLL-2009 SRL evalua- S → T , from the nodes of S into the nodes of T . tion shared task (Hajiˇcet al., 2009), for seven If S is role-labelled, such an alignment projects a T languages (Catalan, Chinese, Czech, English, role for test tree argument ai if it is aligned to S German, Japanese, Spanish), there is data con- a training tree argument, aj , and the predicate S T S sisting of a role-annoation layer added to syntac- nodes p and p are aligned: the role of aj is tic information. The syntactic information is ex- T projected to ai . Such a role-projecting tree will pressed as dependency trees. In some cases this is be termed ’usable for T ’. Fig. 2 shows an exam- derived from a primary constituent-structure rep- ple alignment between subtrees, with the aligned resentation (eg. English), and in other cases it is sub-trees shown in the context of the trees S and the ’native’ representation (eg. Czech). For each T from which they come. Argument nodes T4, language, tree nodes have four kinds of syntac- T6 and T7 would receive projected labels A1, A2, tic information: FORM: a word form; LEMMA: and A3 from S7, S12 and S13. The first two are lemma of the word form; POS: part-of-speech tag correct, whilst T7’s annotation should be A4. (tag sets are language specific); DEPREL: the de- said 3 pendency relation to its head word (the relation- .15 Canada sets are language specific). Additionally in each 2 VBD Statistics 1 rose 10 tree, T , a number of nodes are identified as pred- A3 T ext A2 sbj dir icate nodes. Each predicate node p , is linked A1 T to a set of argument nodes, args(p ). For each NN NN IN output % T T T T 7 12 from 13 ai ∈ args(p ), the link (p , ai ) is labelled with industry JULY a role. The role-labeling follows the PropBank 6 0.4 11 14 service approach. 4 in 8 −5 August 9 VBD 2 An Alignment-based SRL system rose 5 .9 A3 ext A2 sbj dir Sub-tree extraction A preliminary to the role- A1 labelling process itself is to extract a sub-tree that NNP CD TO Devel. 4 40 6to 7 is relevant to a given predicate and its arguments. Where p is a particular predicate node of a tree Sumitomo 1 Reality 2 &3 2170 8 T , let sub tree(T ,p) stand for the relevant sub- tree of T . There is considerable latitude in how Figure 2: An example alignment. to define this, and we define it in a simple way, i via the least upper bound, lub, of p and args(p): Algorithm outline Let [∆]di be an equivalence sub tree(T ,p) includes all nodes on paths down class, contains all training samples at distance di from lub to p and args(p). This is the same sub- to T . The training set can be thought of as a 1 2 tree notion as used by Moschitti et al. (2008). Fig- sequence of equivalence classes [∆]d1 , [∆]d2 ,..., ure 1 illustrates. Henceforth all trees will be as- for increasing distance.2 The algorithm works sumed to have been extracted in this way. with a PANEL of nearest neighbours, which is always a prefix of this sequence of equivalence classes. Where T is defined by predicate p and arguments a1 . an, the algorithm to predict a label for each ai is 1. make sorting of training trees and set PANEL of nearest neighbours to be first equivalence 1 class [∆]d1 Figure 1: Sub-tree extraction: a1,a2= args(p), u = 2. (i) make a set of predictions from the usable lub(a1,a2,p). members of PANEL (ii) if there is a most fre- Alignment and Projection Let an alignment of 2Or alternatively a sequence of similarity equivalence trees S and T be a 1-to-1, partial mapping α : classes, at decreasing similarities to T . 62 Proceedings of KONVENS 2012 (Main track: oral presentations), Vienna, September 19, 2012 quent prediction, return it (iii) if there is a with intuition, CΘ(x,y) ≤ CΘ(x,x). Ad- tie, or no usable members, add next equiv- ditionally, in this work, we also assume that alence class to PANEL if possible and go to CΘ(x, λ)= CΘ(λ, x) = 0. (i), else return ’unclassified’ Besides ranking alternative alignments between fixed S and T , the minimum distance align- Tai mappings In this work, alignments are re- ment score defines a ’distance’, ∆(S, T ), for the stricted to be so-called Tai mappings (Tai, 1979): pair (S, T ), and maximum similarity alignment amongst all possible 1-to-1, partial mappings scores define a ’similarity’, Θ(S, T ), and these are from S into T , α : S → T , these are mappings used to rank alternative neighbours for a tree to which respect left-to-right order and ancestry.3 be labelled. The algorithm to calculate ∆(S, T ) Then to select a preferred alignment, a score is as- and Θ(S, T ) follows very closely that of Zhang signed to it. The definitions relevant to this which and Shasha (1989): although originally proposed are given below follow closely those of Emms and in the context of ’distance’ and minimisation, it is Franco-Penya (2012). straightforwardly adaptable to the context of ’sim- Because a mapping α is partial on S and into ilarity’ and maximisation. T , there is a set D ⊆ S (’deletions’), of those Cost settings On this data-set the label is in i ∈ S which are not mapped to anything and a general a 4-tuple (p,d,l,f) of part-of-speech, set I ⊆ T (’insertions’), of those j ∈ T which dependency-relation, lemma, and word form. are not mapped from anything, and the alignment Four settings for the swap costs, C∆(x,y), scorings make reference to the sets α, D and I. are considered: B(’binary’), T(’ternary’), We consider both a ’distance’ scoring, ∆(α : H(’hamming’) and FT(’frame ternary’), based on S → T ), whose minimum value is used to select the matches/mis-matches on these features.

Load more