Context-aware Frame-

Michael Roth and Mirella Lapata School of Informatics, University of Edinburgh 10 Crichton Street, Edinburgh EH8 9AB mroth,mlap @inf.ed.ac.uk { }

Abstract including (Shen and Lapata, 2007), text-to-scene generation (Coyne et al., 2012), Frame semantic representations have been stock price prediction (Xie et al., 2013), and so- useful in several applications ranging from cial network extraction (Agarwal et al., 2014). text-to-scene generation, to question answer- Whereas some tasks directly utilize information ing and social network analysis. Predicting encoded in the FrameNet resource, others make such representations from raw text is, how- ever, a challenging task and corresponding use of FrameNet indirectly through the output of models are typically only trained on a small SRL systems that are trained on data annotated set of sentence-level annotations. In this pa- with frame-semantic representations. While ad- per, we present a semantic role labeling sys- vances in machine learning have recently given tem that takes into account sentence and dis- rise to increasingly powerful SRL systems follow- course context. We introduce several new fea- ing the FrameNet paradigm (Hermann et al., 2014; tures which we motivate based on linguistic Tackstr¨ om¨ et al., 2015), little effort has been devoted insights and experimentally demonstrate that they lead to significant improvements over the to improve such models from a linguistic perspec- current state-of-the-art in FrameNet-based se- tive. mantic role labeling. In this paper, we explore insights from the lin- guistic literature suggesting a connection between 1 Introduction discourse and role labeling decisions and show how to incorporate these in an SRL system. Although The goal of semantic role labeling (SRL) is to iden- early theoretical work (Fillmore, 1976) has recog- tify and label the arguments of semantic predicates nized the importance of discourse context for the in a sentence according to a set of predefined re- assignment of semantic roles, most computational lations (e.g., “who” did “what” to “whom”). In approaches have shied away from such considera- addition to providing definitions and examples of tions. To see how context can be useful, consider as role labeled text, resources like FrameNet (Ruppen- an example the DELIVERY frame, which states that hofer et al., 2010) group semantic predicates into so- a THEME can be handed off to either a RECIPIENT called frames, i.e., conceptual structures describing or “more indirectly” to a GOAL. While the distinc- the background knowledge necessary to understand tion between the latter two roles might be clear for a situation, event or entity as a whole as well as some fillers (e.g., people vs. locations), there are oth- the roles participating in it. Accordingly, semantic ers where both roles are equally plausible and addi- roles are defined on a per-frame basis and are shared tional information is required to resolve the ambigu- among predicates. ity (e.g., countries). If we hear about a letter being In recent years, frame representations have been delivered to Greece, for instance, reliable cues might successfully applied in a range of downstream tasks, be whether the sender is a person or a country and

449

Transactions of the Association for Computational Linguistics, vol. 3, pp. 449–460, 2015. Action Editor: Diana McCarthy. Submission batch: 5/2015; Revision batch 7/2015; Published 8/2015. c 2015 Association for Computational Linguistics. Distributed under a CC-BY 4.0 license. Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/tacl_a_00150 by guest on 23 September 2021 whether Greece refers to the geographic region or to (Roth and Woodsend, 2014; Lei et al., 2015; Foland the Greek government. and Martin, 2015) explore ways of using low-rank The example shows that context can generally in- vector and tensor approximations to represent lex- fluence the choice of correct role label. Accordingly, ical and syntactic features as well as combinations we assume that modeling contextual information, thereof. such as the meaning of a in a given situation, To the best of our knowledge, there exists no can improve semantic role labeling performance. To prior work where features based on discourse con- validate this assumption, we explore different ways text are used to assign roles on the sentence level. of incorporating contextual cues in a SRL model and Discourse-like features have been previously ap- provide experimental support that demonstrates the plied in models that deal with so-called implicit ar- usefulness of such additional information. guments, i.e., roles which are not locally realized The remainder of this paper is structured as fol- but resolvable within the greater discourse context lows. In Section 2, we present related work on se- (Ruppenhofer et al., 2010; Gerber and Chai, 2012). mantic role labeling and the various features applied Successful features for resolving implicit arguments in traditional SRL systems. In Section 3, we provide include the distance between mentions and any dis- additional background on the FrameNet resource. course relations occurring between them (Gerber Sections 4 and 5 describe our baseline system and and Chai, 2012), roles assigned to mentions in the contextual extensions, respectively, and Section 6 previous context, the discourse prominence of the presents our experimental results. We conclude the denoted entity (Silberer and Frank, 2012), and its paper by discussing in more detail the output of our centering status (Laparra and Rigau, 2013). None system and highlighting avenues for future work. of these features have been used in a standard SRL system to date (and trivially, not all of them will be 2 Related Work helpful as, for example, the number of sentences be- tween a and an argument is always zero Early work in SRL dates back to Gildea and Juraf- within a sentence). In this paper, we extend the sky (2002), who were the first to model role assign- contextual features used for resolving implicit ar- ment to verb arguments based on FrameNet. Their guments to the SRL task and show how a set of model makes use of lexical and syntactic features, discourse-level enhancements can be added to a tra- including binary indicators for the involved, ditional sentence-level SRL model. syntactic categories, dependency paths as well as po- sition and voice in a given sentence. Most subse- 3 FrameNet quent work in SRL builds on Gildea and Jurafsky’s feature set, often with the addition of features that The Berkeley FrameNet project (Ruppenhofer et al., describe relevant syntactic structures in more de- 2010) develops a semantic lexicon and an annotated tail, e.g., the argument’s leftmost/rightmost depen- example corpus based on Fillmore’s (1976) theory dent (Johansson and Nugues, 2008). of frame . Annotations consist of frame- More sophisticated features include the use of evoking elements (i.e., words in a sentence that are convolution kernels (Moschitti, 2004; Croce et associated with a conceptual frame) and frame ele- al., 2011) in order to represent predicate-argument ments (i.e., instantiations of semantic roles, which structures and their lexical similarities more accu- are defined per frame and filled by words or word rately. Beyond lexical and syntactic information, sequences in a given sentence). For example, the a few approaches employ additional semantic fea- DELIVERY frame describes a scene or situation in tures based on annotated word senses (Che et al., which a DELIVERER hands off a THEME to a RE- 2010) and selectional preferences (Zapirain et al., CIPIENT or a GOAL.1 In total, there are 1,019 2013). Deschacht and Moens (2009) and Huang frames and 8,886 frame elements defined in the lat- and Yates (2010) use sentence-internal sequence in- formation, in the form of latent states in a hidden 1See https://framenet2.icsi.berkeley.edu/ markov model. More recently, a few approaches for a comprehensive list of frames and their definitions.

450

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/tacl_a_00150 by guest on 23 September 2021 est publicly available version of FrameNet.2 An av- roles and implement I/O methods to read and gen- erage number of 11.6 different frame-evoking ele- erate FrameNet XML files. For direct compari- ments are provided for each frame (11,829 in total). son with the previous state-of-the-art for FrameNet- Following previous work on FrameNet-based SRL, based SRL, we further implement additional fea- we use the full text annotation data set, which con- tures used in the SEMAFOR system (Das et tains 23,087 frame instances. al., 2014) and combine the role labeling compo- Semantic annotations for frame instances and nents of mate-tools with SEMAFOR’s preprocess- fillers of frame elements are generally provided at ing toolchain.3 All features used in our system are the level of word sequences, which can be single listed in Table 1. words, complete or incomplete phrases, and entire The main differences between our adaptation of clauses (Ruppenhofer et al., 2010, Chapter 4). An mate-tools and SEMAFOR are as follows: whereas instance of the DELIVERY frame, with annotations the latter implements identification and labeling of of the frame-evoking element (underlined) and in- role fillers in one step, mate-tools follow the in- stantiated frame elements (in brackets), is given in sight that these two steps are conceptually differ- the example below: ent (Xue and Palmer, 2004) and should be modeled separately. Accordingly, mate-tools contain a global

(1) The Soviet Union agreed to speed up [oil]THEME reranking component which takes into account iden- deliveriesDELIVERY [to Yugoslavia]RECIPIENT. tification and labeling decisions while SEMAFOR only uses reranking techniques to filter overlapping Note that the oil deliveries here concern Yugoslavia argument predictions and other constraints (see Das as a geopolitical entity and hence the RECIPIENT et al., 2014 for details). We discuss the advantage of role is assigned. If Yugoslavia was referred to as a global reranker for our setting in Section 5. the location of a delivery, the GOAL role would be assigned instead. In general, roles can be restricted 5 Extensions based on Context by so-called semantic types (e.g., every filler of the Context can be relevant for semantic role labeling THEME element in the DELIVERY frame needs to in various different ways. In this section, we moti- be a physical object). However, not all roles are vate and describe four extensions over previous ap- typed and whether a specific phrase is a suitable proaches. filler largely depends on context. The first extension is a set of features that model document-specific aspects of word meaning using 4 Baseline Model . The motivation for this fea- As a baseline for implementing contextual enhance- ture class stems from the insight that the meaning of ments to an SRL model, we use the semantic role a word in context can influence correct role assign- labeling components provided by the mate-tools ment. While concepts such as polysemy, homonymy (Bjorkelund¨ et al., 2010). Given a frame-evoking el- and metonymy are all relevant here, the scarce train- ement in a sentence and its associated frame (i.e., a ing data available for FrameNet-based SRL calls for predicate and its sense), the mate-tools form a a light-weight model that can be applied without pipeline of logistic regression classifiers that iden- large amounts of labeled data. We therefore employ tify and label frame elements which are instantiated distributional word representations which we criti- within the same sentence (i.e., a given predicate’s cally adapt based on document content. We describe arguments). our contribution in Section 5.1. The adopted SRL system has been developed Entities that fill semantic roles are sometimes for PropBank/NomBank-style role labeling and we mentioned in discourse. Given a specific mention make several changes to adapt it to FrameNet. 3We note that better results have been reported in Hermann Specifically, we change the argument labeling pro- et al. (2014) and Tackstr¨ om¨ et al. (2015). However, both of cedure from predicate-specific to frame-specific these more recent approaches rely on a custom frame identifi- cation component as well as proprietary tools and models for 2Version 1.5, released September 2010. tagging and which are not publicly available.

451

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/tacl_a_00150 by guest on 23 September 2021 Argument identification and classification The aforementioned features will influence role labeling decisions directly, however, further im- Lemma form of f POS tag of f provements can be gained by considering interac- Any syntactic dependents of f* Subcat frame of f* tions between labeling decisions. As discussed in Voice of a* Any lemma in a* Das et al. (2014), role annotations in FrameNet are Number of words in a unique with respect to a frame instance in more First word and POS tag in a than 96% of cases. This means that even if a feature Second word and POS tag in a is not a positive indicator for a candidate role filler, Last word and POS tag in a knowing it would be a better cue for another can- Relation from first word in a to its parent didate can also prevent a hypothetical model from Relation from second word in a to its parent assigning a frame element label incorrectly. While Relation from last word in a to its parent this kind of knowledge has been successfully im- Relative position of a with respect to p plemented as constraints in recent FrameNet-based Voice of a and relative position with respect to p* SRL models (Hermann et al., 2014; Tackstr¨ om¨ et al., Identification only 2015), earlier work on PropBank-based role label- Lemma form of the first word in a ing suggests that better performance can be achieved Lemma form of the syntactic head of a with a re-ranking component which has the poten- Lemma form of the last word in a tial to learn such constraints and other interactions POS tag of the first word in a implicitly (Toutanova et al., 2005; Bjorkelund¨ et al., POS tag of the syntactic head of a 2010). In our model, we adopt the latter method and POS tag of the last word in a extend it with additional frame-based features. We Relation from syntactic head of a to its parent describe this approach in more detail in Section 5.4. Dependency path from a to f 5.1 Modeling Word Meaning in Context Length of dependency path from a to f Number of words between a and f The underlying idea of distributional models of se- mantics is that meaning can be acquired based on Table 1: Features from Das et al. (2014) which we adopt distributional properties (typically represented by in our model; a denotes the argument span under con- co-occurrence counts) of linguistic entities such as sideration, f refers to the corresponding frame evoking words and phrases (Sahlgren, 2008). Although the element. Identification features are instantiated as binary absolute meaning of distributional representations indicator features. Features marked with an asterisk are remains unclear, they have proven highly success- role specific. All other features apply to combinations of ful for modeling relative aspects of meaning, as re- role and frame. quired for instance in word similarity tasks (Mikolov et al., 2013; Pennington et al., 2014). Given their for which a role is to be predicted, we can also di- ability to model lexical similarity, it is not surpris- rectly use previous role assignments as classification ing that such representations are also successful at cues. We describe our implementation of this feature representing similar words in semantic tasks related in Section 5.2. to role labeling (Pennacchiotti et al., 2008; Croce et The filler of a semantic role is often a word or al., 2010; Zapirain et al., 2013). phrase which occurs only once or a few times in Although distributional representations can be a document. If neither syntax nor aspects of lexi- used directly as features for role labeling (Pado´ et cal meaning provide cues indicating a unique role, al., 2008; Gorinski et al., 2013; Roth and Wood- useful information can still be derived from the dis- send, 2014, inter alia), further gains should be possi- course salience of the denoted entity. Our model ble when considering document-specific properties makes use of a simple salience indicator that can be such as genre and context. This is particularly true reliably derived from automatically computed coref- in the context of FrameNet, where different senses erence chains. We describe the motivation and ac- are observed across a diverse range of texts includ- tual implementation of this feature in Section 5.3. ing spoken dialogue and debate transcripts as well

452

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/tacl_a_00150 by guest on 23 September 2021 Country Frame Frame Element To make up for the large difference in data size be- tween the Wikipedia corpus and a single document, Iran Supply RECIPIENT we normalize co-occurrence counts based on the ra- Commerce buy BUYER tio between the absolute numbers of co-occurrences China Supply SUPPLIER in both resources. Commerce sell SELLER Given co-occurrence matrices Cwiki and Cd, and Iraq Locative relation GROUND the vocabulary V , we formally define the features Arriving GOAL of our SRL model as the components of the vec- tor space ~w of words w (1 i V ) occurring i i ≤ ≤ | | Table 2: Most frequent roles assigned to country names in document d. The representations are learned by appearing FrameNet texts: whereas Iran and China are applying GloVe to optimize the following objective mostly mentioned in an economic context, references to for n iterations (1 t n): Iraq are mainly found in a news article about a politician’s ≤ ≤ visit to the country. J = f(X )(~wT ~w logX )2, (2) t ij i j − ij i,j as travel guides and newspaper articles. Country X Cwiki if t < td names, for example, can be observed as fillers for where X = (3) C otherwise different roles depending on the text genre and its ( d perspective. Whereas some text may talk about a The weighting function f scales the impact of each country as an interesting holiday destination (e.g., word pair such that unseen pairs do not contribute “Berlitz Intro to Jamaica”), others may discuss what to the overall objective and frequent co-occurrences a country is good at or interested in (e.g., “Iran [Nu- are not overweighted. In our experiments, we use clear] Introduction”). A list of the most frequent the same weighting function and parametrization as roles assigned to different country names are dis- defined in Pennington et al. (2014). We further set played in Table 2. the number of iterations to be performed on each Previous approaches model word meaning in con- co-occurrence matrix following results of an ini- text (Thater et al., 2010; Dinu and Lapata, 2010, in- tial cross-validation experiment on our training data ter alia) using sentence-level information which is (td = 50, n = 100). already available in traditional SRL systems in the form of explicit features. Here, we go one step fur- 5.2 Co-occurring Roles ther and define a simple model in which word mean- If an entity is mentioned several times in discourse, ing representations are adapted to each document. it is likely that it also fills several roles. Whereas As a starting point, we use the GloVe toolkit (Pen- 4 the distributional model described in Section 5.1 nington et al., 2014) for learning representations provides us with information regarding the role as- and apply it to the Wikipedia corpus made available 5 signments suitable for an entity given co-occurring by the Westbury Lab. The learned representations words, we can also can explicitly consider previous can be seen as word vectors whose components en- role assignments to the same entity. As shown in code basic bits of related encyclopaedic knowledge. Table 2, a country that fills the SUPPLIER role is We adapt these general representations to the ac- more likely to also fill the role of a SELLER than tual meaning of a word in a particular text by run- that of a BUYER. Given the high number of different ning additional iterations of the GloVe toolkit us- frame elements in FrameNet, only a small fraction of ing document-specific co-occurrences as input and pairs can be found in the training data, which entails Wikipedia-based representations for initialization. that directly utilizing role co-occurrences might not 4We selected this toolkit in our work due to its flexibility: as be helpful. In order to benefit from previous role it directly operates over co-occurrence matrices, we can manip- assignments in discourse, we follow related work ulate counts prior to word vector computation and easily take on resolving implicit arguments (Ruppenhofer et al., into account multiple matrices. 5http://www.psych.ualberta.ca/˜westburylab/ 2011; Silberer and Frank, 2012) and consider the se- downloads/westburylab.wikicorp.download.html mantic types of role assignments (see Section 3) as

453

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/tacl_a_00150 by guest on 23 September 2021 features instead of the role labels themselves. This Frame Frame Element new/old tremendously reduces the feature space from more Statement SPEAKER 43.8 than 8,000 options (number of defined frame ele- MESSAGE 99.1 ments) to just 27 (number of semantic types ob- MEDIUM 80.0 served for frame elements in the training data). In practice, we define one binary indicator fea- Leadership LEADER 78.0 ture fs for each semantic type s observed at train- GOVERNED 93.4 ing time. Given a potential filler, we set the feature Intensionally create CREATOR 58.8 value of fs to 1 (otherwise 0) if and only if there ex- CREATED ENTITY 90.1 ists a co-referent entity mention annotated as a frame element filler with semantic type s. Since texts in Table 3: Frequent frames that have elements with differ- FrameNet do not contain any manual mark-up of ent likelihoods of discourse-new vs. discourse-old fillers; coreference relations, we rely on entity mentions new/old ratios as observed on the development set. and coreference chains predicted by the Stanford Coreference Resolution system (Lee et al., 2013). entities. For example, it is easy to imagine that the RESULT of a CAUSATION is more likely to be 5.3 Discourse Newness discourse-new than the EFFECT that caused it. Ta- Our third contextual feature type is based on the ble 3 provides an overview of frames found in the observation that the salience of a discourse entity training and development data which have roles with and its semantic prominence are interrelated. Previ- substantially different likelihoods for discourse-new ous work (Rose, 2011) showed that semantic promi- fillers. nence, as signal-led by semantic roles, can better explain subsequent phenomena related to discourse 5.4 Frame-based Reranking salience (such as pronominalization) than syntactic Our goal is to learn a better model for FrameNet- indicators. Our question here is whether this insight based semantic role labeling using linguistically in- can be also applied in reverse. Can information on spired features such as those described in the previ- discourse salience be useful as an indicator for se- ous sections. To do this, we need a framework for mantic roles? representing single role assignments and a model of For this feature, we make use of the same coref- how such assignments depend on each other within erence chains as predicted for determining co- a frame instance. Inspired by previous work on occurring roles. Unfortunately, automatically pre- reranking in SRL, we assume that we can find the dicted mentions and coreference chains are noisy. correct filler of a frame element based on the top k To identify particularly reliable indicators for dis- roles predicted for each candidate word sequence. course salience, we inspected held-out development We leverage this assumption to train a reranking data. One such indicator is whether an entity is men- model that considers the top predictions for each tioned for the first time (discourse-new) or has been candidate and uses all relevant features to select the mentioned before (discourse-old). Let w denote an best overall structure. entity and R1...Rn the set of all co-reference chains Our implementation of the reranking model is with mentions r ...r R (1 i n) ordered an adaptation of the reranker made available in the 1 m ∈ i ≤ ≤ by their appearance in text. We define discourse mate-tools (see Section 4), which we extend to deal newness based on head words r.head as: with frame-specific features and arbitrary role la- bels. As features for the global component, we apply all local features and additionally use the following 0 if rj Ri : j > 1 rj.head w new(w) = ∃ ∈ ∧ ≡ two types of indicator features on the whole frame 1 else (4) ( structure: Although this feature is a simple binary indicator, it Total number of roles in the predicted structure can be very useful for distinguishing between roles • that are more or less likely to be assigned to new Ordered set of predicted role labels •

454

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/tacl_a_00150 by guest on 23 September 2021 Frames SRL model P R F1 Model/added feature P R F1 7 gold SEMAFOR 78.4 73.1 75.7∗ Framat w/o reranker 77.5 72.5 74.9 gold Framat 80.3 71.7 75.8∗ +discourse newness 77.6 72.3 74.9 gold Framat+context 80.4 73.0 76.5 +word meaning vectors 77.9 72.7 75.2 +cooccurring roles 77.9 72.8 75.3 SEMAFOR SEMAFOR 69.2 65.1 67.1∗ +reranker 80.6 72.7 76.4 SEMAFOR Framat 71.1 63.7 67.2∗ +frame structure 80.4 73.0 76.5 SEMAFOR Framat+context 71.1 64.8 67.8 Table 5: Full structure prediction results using gold Table 4: Full structure prediction results using gold (top) frames, Framat and different sets of context features. All and predicted frames (bottom). All numbers are per- numbers are percentages. centages. Significantly different (p<0.05) from ∗ Framat+context. Results Table 4 summarizes our results with Fra- At test time, the reranker takes as input the n-best la- mat, Framat+context, and SEMAFOR using gold and bels for the m-best fillers of a frame structure, com- predicted frames (see the upper and lower half of putes a global score for each of the n m possible the table, respectively). Although differences in × combinations and returns the structure with the high- system architecture lead to different precision/recall est overall score as its prediction output. Based on trade-offs for Framat and SEMAFOR, both sys- initial experiments on our training data, we set these tems achieve comparable F1 (for both gold and pre- parameters to m = 8 and n = 4. dicted frames). Compared to Framat, we can see that the contextual enhancements implemented in 6 Experiments our Framat+context model lead to immediate gains of 1.3 points in recall, corresponding to a signifi- In this section, we demonstrate the usefulness of cant increase of 0.7 points in F . Framat+context’s re- contextual features for FrameNet-based SRL mod- 1 call is slightly below that of SEMAFOR (73.0% vs. els. Our hypothesis is that contextual information 73.1%), however, it achieves a much higher level of can considerably improve an existing semantic role precision (80.4% vs. 78.4%). labeling system. Accordingly, we test this hypothe- We examined whether differences in performance sis based on the output of three different systems. among the three systems are significant using an ap- The first system, henceforth called Framat (short proximate randomization test over sentences (Yeh, for FrameNet-adapted mate-tools) is the baseline 2000). SEMAFOR and Framat perform signifi- system described in Section 4. The second sys- cantly worse (p<0.05) compared to Framat+context tem, henceforth Framat+context, is an enhanced ver- both when gold and predicted frames are used. In the sion of the baseline that additionally uses all exten- remainder of this section we discuss results based on sions described in Section 5. Finally, we also con- gold frames, since the focus of this work lies primar- sider the output of SEMAFOR (Das et al., 2014), a ily on the role labeling task. state-of-the-art model for frame-semantic role label- ing. Although all systems are provided with entire Impact of Individual Features We demonstrate documents as input, SEMAFOR and Framat pro- the effect of adding individual context-based fea- cess each document sentence-by-sentence whereas +context tures to the Framat model in a separate experiment. Framat also uses features over all sentences. Whereas all models in the previous experiment used For evaluation, we use the same FrameNet train- a reranker for direct comparability, here we start ing and evaluation texts as established in Das and with the Framat baseline (without a reranker) and Smith (2011). We compute precision, recall and add each enhancement described in Section 5 in- F1-score using the modified SemEval-2007 scorer crementally. As summarized in Table 5, the base- 6 from the SEMAFOR website. line without a reranker achieves a precision and 6http://www.ark.cs.cmu.edu/SEMAFOR/eval/ 7Results produced by running SEMAFOR on the exact same frame instances for training and testing as our own models.

455

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/tacl_a_00150 by guest on 23 September 2021 recall of 77.5% and 72.5%, respectively. Addi- Model/word representations P R F1 tion of our discourse new feature increases pre- Framat+context without vectors 79.7 72.2 75.8 cision (+0.1%), but also reduces recall ( 0.2%). − +document-specific vectors 80.4 73.0 76.5 Adding word meaning vectors compensates for the +general (Wiki+FN) vectors 79.9 72.8 76.2 loss in recall (+0.4%) and further increases preci- sion (+0.3%). Information about role assignments Table 6: Full structure prediction results using gold to coreferring mentions increases recall (+0.1%) frames, Framat+context and different vector representa- while retaining the same level of precision. Finally, tions. All numbers are percentages. we can see that jointly considering role labeling decisions in a global reranker with additional fea- TIME. This may seem trivial at first glance but is tures on frame structure leads to the strongest boost actually remarkable as the word token Dec neither in performance, with combined additional gains in occurs in the training data nor is well represented precision and recall of +2.5% and +0.2%, respec- as a time expression in Wikipedia. The only way tively. Interestingly, the gains realized here are much the model is able to label this phrase correctly is higher compared to when adding the reranker to the by finding that corresponding word tokens are sim- Framat model without contextual features, which ilarly distributed across the test document as other corresponds to a +2.8% increase in precision but time expressions are in the training data. In the a 0.8% reduction in recall. − second and third examples, correct assignments re- General vs. Document-specific Vectors We also quire some form of world knowledge which is not assessed the impact of adapting vectors to docu- expressed within the respective sentences but might ments (see Table 6). Specifically, we compared be approximated based on context. For example, a version of the Framat+context model without any knowing that aunt, uncle and grandmother are role vectors against a model using the adaptation tech- fillers of a KINSHIP frame means that they are of nique presented in Section 5.1 and a simpler alterna- the semantic type human and thus only compatible tive which obtains GloVe representations trained on with the frame element RECIPIENT, not with GOAL. the Wikipedia corpus and FrameNet texts. The lat- Similarly, correctly classifying the relation between ter model does not explicitly take document infor- Clinton and stooge in the last example is only possi- mation into account, but it should be able to yield ble if the model has access to some information that vectors representative of the FrameNet domains, makes Clinton a likely filler of the SUPERIOR role. merely by being trained on them. As shown in Ta- We conjecture that document-specific word vector ble 6, our adaptation technique is superior to learn- representations provide such information given that ing word representations based on Wikipedia and Clinton co-occurs in the document with words such all FrameNet texts at once. Using the components as president, chief, and claim. of document-specific vectors as features improves Overall, we find that the features introduced in precision and recall by +0.7 percentage points over Section 5 model a fair amount of contextual in- Framat+context without vectors. Word representations formation which can help a semantic role labeling trained on Wikipedia and FrameNet improve preci- model to perform better decisions. sion by +0.2 percentage points and recall by +0.6. 7 Discussion Qualitative Improvements In addition to quanti- tative gains, we also observe qualitative improve- In this section, we discuss the extent to which our ments when considering contextual features. A set model leverages the full potential of contextual fea- of example predictions by different models are listed tures for semantic role labeling. We manually ex- in Table 7. The annotations show that Framat and amine role assignments to frame elements which SEMAFOR mislabel several cases that are correctly seem particularly sensitive to context. We analyze classified by Framat+context. such frame elements based on differences in label In the first example, only Framat+context is able assignment between Framat and Framat+context that to predict that on Dec. 1 fills the frame element can be traced back to factors such as agency in dis-

456

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/tacl_a_00150 by guest on 23 September 2021 SEMAFOR *Can [he]THEME goMOTION [to Paris]GOAL on Dec. 1 ?

Framat *Can [he]THEME goMOTION [to Paris on Dec. 1]GOAL ? +context Framat Can [he]THEME goMOTION [to Paris]GOAL [on Dec. 1]TIME ?

SEMAFOR *SendSENDING [my regards]THEME to my aunt , uncle and grandmother . Framat *SendSENDING [my regards]THEME [to my aunt , uncle and grandmother]GOAL . +context Framat SendSENDING [my regards]THEME [to my aunt , uncle and grandmother]RECIPIENT .

SEMAFOR *Stephanopoulos does n’t want to seem a Clinton stoogeSUBORDINATES AND SUPERIORS

Framat *Stephanopoulos doesn’t want to seem a [Clinton]DESCRIPTOR stoogeSUBORDINATES AND SUPERIORS +context Framat Stephanopoulos does n’t want to seem a [Clinton]SUPERIOR stoogeSUBORDINATES AND SUPERIORS

Table 7: Examples of frame structures that are labeled incorrectly (marked by asterisks) without contextual features.

course and in context. We investigate coreference resolution system (“The IAEA assisted

whether our model captures these factors success- Syria (...) This study was part of an IAEAAGENT .. fully and showcase examples while reporting abso- programPROJECT). lute changes in precision and recall. The SPEAKER of a STATEMENT is defined as “the sentient entity that produces [a] MESSAGE”. 7.1 Agency and Discourse Instances of the STATEMENT frame are frequently Many frame elements in FrameNet indicate agency, evoked by verbs such as say, mention, and claim. a property that we expect to highly correlate with The SPEAKER role can be hard to identify in sub- contextual features on semantic types of assigned ject position as an unknown entity could also fill the roles (see Section 5.2) and discourse salience (see MEDIUM role. For example, “a report claims that Section 5.3). Analysis of system output revealed ...” should be analyzed differently from “a person that such features indeed affect and generally im- claims”. Our contextual features improve role label- prove role labeling. Considering all AGENT ele- ing in cases where the subject can be classified based ments across frames, we observe absolute improve- on previous role assignments. On the negative side, ments of 4% in precision and 3% in recall. In the fol- we found our model to be too conservative in some lowing, we provide a more detailed analysis of two cases where a subject is discourse new. Additional specific frame elements: the low frequent AGENT gains would be possible with improved coreference element of the PROJECT frame and the highly fre- chains that include pronouns such as some and I. quent SPEAKER element in the STATEMENT frame. Such chains could be established through a better The AGENT of a PROJECT is defined as the preprocessing pipeline or by utilizing additional lin- “individual or organization that carries out the guistic resources. PROJECT”. The main difficulty in identifying in- stances of this frame element is that the frame- 7.2 Word Meaning and Context evoking target word is typically a noun such as As discussed earlier, we expect that the meaning of project, plan, or program and hence syntactic fea- a word in context provides valuable cues regarding tures on word-word dependencies do not provide potential frame elements. Two types of words are sufficient cues. We found several cases where con- of particular interest here: ambiguous words, for text provided missing cues, leading to an increase which different senses might apply depending on in recall from 56% to 78%. In cases where addi- context, and out-of-vocabulary words, for which no tional features did not help, we identified two types clear sense could be established during training. In of errors: firstly, the filler was too far from the tar- the following, we take a closer look at differences in get word and therefore could not be identified as role assignment between Framat and Framat+context

a filler at all (“[North Korea]AGENT is developing for such fillers. ... programPROJECT”), and secondly, earlier men- Ambiguous words that occur as fillers of differ- tions indicating agency were not detected by the ent frame elements in the test set include party,

457

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/tacl_a_00150 by guest on 23 September 2021 power, program, and view. We find occurrences 8 Conclusions of these words in two broad types of contexts: po- litical and non-political. Within political contexts, In this paper, we enriched a traditional semantic role party and power fill frame elements such as POS- labeling model with additional information from SESSION and LEADER. Outwith political contexts, context. The corresponding features we defined can we find frame elements such as ELECTRICITY and be grouped into three categories: (1) discourse-level SOCIAL EVENT to be far more likely. The Framat features that directly utilize discourse knowledge in model exhibits a general bias towards the political the form of coreference chains (newness, prior role domain, often missing instances of frame elements assignments), (2) sentence-level features that model that are more common in non-political contexts properties of a frame structure as a whole, and (3)

(e.g., “the six-[party]INTERLOCUTORS talksDISCUSSION”). lexical features that can be computed using methods Framat+context, in contrast, shows less of a bias and from distributional semantics and an adaptation to provides better classification based on context fea- model document-specific word meaning. tures for all frame elements. Overall, precision for To implement our discourse-level enhancements, the four ambiguous words is improved from 86% to we modified a semantic role labeling system de- 93%, with a few errors remaining due to rare depen- veloped for PropBank/NomBank which we found NMOD SBAR dency paths (e.g., [program]ACT which to achieve competitive performance on FrameNet- PRD ←−−− ←−− is violation ) and differences between based annotations. Our main contribution lies in ←−− COMPLIANCE frame elements that depend on factors such as num- extending this system to the discourse level. Our ber (COGNIZER vs. COGNIZER 1). experiments revealed that discourse aware features A frequently observed error by the baseline can significantly improve semantic role labeling per- model is to assign peripheral frame elements such formance, leading to gains of over +2.0 percent- as TIME to role fillers that actually are not time age points in precision and state-of-the-art results expressions. This happens because words which in terms of F1. Analysis of system output revealed have not been seen frequently during training but two reasons for improvement. Firstly, contextual appear in adverbial positions are generally likely to features provide necessary additional information to fill the frame element TIME. We find that the use understand and assign roles on the sentence level, of document-specific word vector representations and secondly, some of our discourse-level features drastically reduces the number of such errors (e.g., generalize better than traditional lexical and syntac-

“to giveGIVING [generously]MANNER vs. *TIME”), with tic features. We further found that additional gains absolute gains in precision and recall of 14% and can be achieved using improved preprocessing tools 9%, respectively, presumably because non-time and a more sophisticated model for feature inter- expressions are often distributed differently across actions. In the future, we are planning to assess a document than time expressions. Document- whether discourse-level features generalize cross- specific word vector representations also improve linguistically. We would also like to investigate recall for out-of-vocabulary words, as seen with the whether semantic role labeling can benefit from rec- example of Dec discussed in Section 6. However, ognizing and high-level discourse such representations by themselves might be insuf- relations. Our code is publicly available under ficient to determine which aspects of a word sense http://github.com/microth/mateplus. are applicable across a document as occurrences in specific contexts may also be misleading (e.g., Acknowledgements “. . . changes [throughout the community]” vs. “...

[throughout the ages]TIME”). Some of these cases We are grateful to Diana McCarthy and three anony- could be resolved using higher level features that mous referees whose feedback helped to substan- explicitly model interactions between (predicted) tially improve the present paper. The research pre- word meaning in context and other factors, however sented in this paper was funded by a DFG Research we leave this to future work. Fellowship (RO 4848/1-1).

458

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/tacl_a_00150 by guest on 23 September 2021 References the 2010 Conference on Empirical Methods in Natural Language Processing, pages 1162–1172, Cambridge, Apoorv Agarwal, Sriramkumar Balasubramanian, Anup Massachusetts, 9–11 October 2010. Kotalwar, Jiehan Zheng, and Owen Rambow. 2014. Charles J. Fillmore. 1976. Frame semantics and the na- Frame semantic tree kernels for social network extrac- ture of language. In Annals of the New York Academy tion from text. In Proceedings of the 14th Confer- of Sciences: Conference on the Origin and Develop- ence of the European Chapter of the Association for ment of Language and Speech, volume 280, pages 20– Computational Linguistics, pages 211–219, Gothen- 32. burg, Sweden, 26–30 April 2014. William Foland and James Martin. 2015. Dependency- Anders Bjorkelund,¨ Bernd Bohnet, Love Hafdell, and based semantic role labeling using convolutional neu- Pierre Nugues. 2010. A high-performance syntac- ral networks. In Proceedings of the Fourth Joint tic and semantic dependency parser. In Coling 2010: Conference on Lexical and , Demonstration Volume, pages 33–36, Beijing, China. pages 279–288, Denver, Colorado. Wanxiang Che, Ting Liu, and Yongqiang Li. 2010. Im- Matthew Gerber and Joyce Chai. 2012. Semantic Role proving semantic role labeling with word sense. In Labeling of Implicit Arguments for Nominal Predi- Human Language Technologies: The 2010 Annual cates. Computational Linguistics, 38(4):755–798. Conference of the North American Chapter of the As- Daniel Gildea and Daniel Jurafsky. 2002. Automatic la- sociation for Computational Linguistics, pages 246– beling of semantic roles. Computational Linguistics, 249, Los Angeles, California, 1–6 June 2010. 28(3):245–288. Bob Coyne, Alex Klapheke, Masoud Rouhizadeh, Philip Gorinski, Josef Ruppenhofer, and Caroline Richard Sproat, and Daniel Bauer. 2012. Annotation Sporleder. 2013. Towards weakly supervised resolu- tools and knowledge representation for a text-to-scene tion of null instantiations. In Proceedings of the 10th system. In Proceedings of 24th International Con- International Conference on Computational Semantics ference on Computational Linguistics, pages 679–694, (IWCS 2013) – Long Papers, pages 119–130, Potsdam, Mumbai, India, 8–15 December 2012. Germany, 19–22 March 2013. Danilo Croce, Cristina Giannone, Paolo Annesi, and Karl Moritz Hermann, Dipanjan Das, Jason Weston, and Roberto Basili. 2010. Towards open-domain semantic Kuzman Ganchev. 2014. Semantic frame identifica- role labeling. In Proceedings of the 48th Annual Meet- tion with distributed word representations. In Pro- ing of the Association for Computational Linguistics, ceedings of the 52nd Annual Meeting of the Associa- pages 237–246, Uppsala, Sweden, 11–16 July 2010. tion for Computational Linguistics, pages 1448–1458, Danilo Croce, Alessandro Moschitti, and Roberto Basili. Baltimore, Maryland, 23–25 June 2014. 2011. Structured lexical similarity via convolution Fei Huang and Alexander Yates. 2010. Open-domain kernels on dependency trees. In Proceedings of the semantic role labeling by modeling word spans. In 2011 Conference on Empirical Methods in Natural Proceedings of the 48th Annual Meeting of the Associ- Language Processing, pages 1034–1046, Edinburgh, ation for Computational Linguistics, pages 968–978, United Kingdom. Uppsala, Sweden, 11–16 July 2010. Dipanjan Das and Noah A. Smith. 2011. Semi- Richard Johansson and Pierre Nugues. 2008. The ef- supervised frame- for unknown pred- fect of syntactic representation on semantic role label- icates. In Proceedings of the 49th Annual Meeting ing. In Proceedings of the 22nd International Con- of the Association for Computational Linguistics: Hu- ference on Computational Linguistics, pages 393–400, man Language Technologies, Portland, Oregon, 19–24 Manchester, United Kingdom, 18–22 August 2008. June 2011. Egoitz Laparra and German Rigau. 2013. Sources of ev- Dipanjan Das, Desai Chen, Andre´ F. T. Martins, idence for implicit argument resolution. In Proceed- Nathan Schneider, and Noah A. Smith. 2014. ings of the 10th International Conference on Compu- Frame-Semantic Parsing. Computational Linguistics, tational Semantics (IWCS 2013) – Long Papers, pages 40(1):9–56. 155–166, Potsdam, Germany, 19–22 March 2013. Koen Deschacht and Marie-Francine Moens. 2009. Heeyoung Lee, Angel Chang, Yves Peirsman, Nathanael Semi-supervised semantic role labeling using the La- Chambers, Mihai Surdeanu, and . 2013. tent Words Language Model. In Proceedings of the Deterministic coreference resolution based on entity- 2009 Conference on Empirical Methods in Natural centric, precision-ranked rules. Computational Lin- Language Processing, pages 21–29, Singapore, 2–7 guistics, 39(4):885–916. August 2009. Tao Lei, Yuan Zhang, Llu´ıs Marquez,` Alessandro Mos- Georgiana Dinu and Mirella Lapata. 2010. Measuring chitti, and Regina Barzilay. 2015. High-order low- distributional similarity in context. In Proceedings of rank tensors for semantic role labeling. In Proceedings

459

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/tacl_a_00150 by guest on 23 September 2021 of the 2015 Conference of the North American Chapter Magnus Sahlgren. 2008. The distributional hypothesis. of the Association for Computational Linguistics: Hu- Italian Journal of Linguistics, 20(1):33–54. man Language Technologies, pages 1150–1160, Den- Dan Shen and Mirella Lapata. 2007. Using semantic ver, Colorado. roles to improve question answering. In Proceedings Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. of the 2007 Joint Conference on Empirical Methods 2013. Linguistic regularities in continuous space word in Natural Language Processing and Computational representations. In Proceedings of the 2013 Confer- Natural Language Learning (EMNLP-CoNLL), pages ence of the North American Chapter of the Associa- 12–21, Prague, Czech Republic. tion for Computational Linguistics: Human Language Carina Silberer and Anette Frank. 2012. Casting implicit Technologies, pages 746–751, Atlanta, Georgia, 9–15 role linking as an anaphora resolution task. In Pro- June 2013. ceedings of the First Joint Conference on Lexical and Alessandro Moschitti. 2004. A study on convolution Computational Semantics (*SEM 2012), pages 1–10, kernels for shallow statistic parsing. In Proceedings Montreal,´ Canada, 7-8 June. of the 42nd Meeting of the Association for Computa- Oscar Tackstr¨ om,¨ Kuzman Ganchev, and Dipanjan Das. tional Linguistics (ACL’04), Main Volume, pages 335– 2015. Efficient inference and structured learning for 342, Barcelona, Spain. semantic role labeling. Transactions of the Associa- Sebastian Pado,´ Marco Pennacchiotti, and Caroline tion for Computational Linguistics, 3:29–41. Sporleder. 2008. Semantic role assignment for event Stefan Thater, Hagen Furstenau,¨ and Manfred Pinkal. nominalisations by leveraging verbal data. In Pro- 2010. Contextualizing semantic representations us- ceedings of the 22nd International Conference on ing syntactically enriched vector models. In Proceed- Computational Linguistics (Coling 2008), pages 665– ings of the 48th Annual Meeting of the Association for 672, Manchester, United Kingdom. Computational Linguistics, pages 948–957, Uppsala, Marco Pennacchiotti, Diego De Cao, Roberto Basili, Sweden, 11–16 July 2010. Danilo Croce, and Michael Roth. 2008. Automatic Kristina Toutanova, Aria Haghighi, and Christopher induction of FrameNet lexical units. In Proceedings Manning. 2005. Joint learning improves semantic of the 2008 Conference on Empirical Methods in Nat- role labeling. In Proceedings of the 43rd Annual Meet- ural Language Processing, pages 457–465, Honolulu, ing of the Association for Computational Linguistics, Hawaii, USA, 25–27 October 2008. pages 589–596, Ann Arbor, Michigan, 29–30 June Jeffrey Pennington, Richard Socher, and Christopher 2005. Manning. 2014. Glove: Global vectors for word rep- Boyi Xie, Rebecca J. Passonneau, Leon Wu, and resentation. In Proceedings of the 2014 Conference on German´ G. Creamer. 2013. Semantic frames to pre- Empirical Methods in Natural Language Processing, dict stock price movement. In Proceedings of the 51st pages 1532–1543, Doha, Qatar, 25–29 October 2014. Annual Meeting of the Association for Computational Linguistics, pages 873–883, Sofia, Bulgaria, 4–9 Au- Ralph L Rose. 2011. Joint information value of syntactic gust 2013. and semantic prominence for subsequent pronominal Nianwen Xue and Martha Palmer. 2004. Calibrating reference. Salience: Multidisciplinary Perspectives on features for semantic role labeling. In Proceedings of Its Function in Discourse, 227:81–103. the 2004 Conference on Empirical Methods in Natural Michael Roth and Kristian Woodsend. 2014. Compo- Language Processing, pages 88–94, Barcelona, Spain, sition of word representations improves semantic role July. labelling. In Proceedings of the 2014 Conference on Alexander Yeh. 2000. More accurate tests for the sta- Empirical Methods in Natural Language Processing, tistical significance of result differences. In Proceed- pages 407–413, Doha, Qatar, 25–29 October 2014. ings of the 18th International Conference on Computa- Josef Ruppenhofer, Michael Ellsworth, Miriam R. L. tional Linguistics, pages 947–953, Saarbrucken,¨ Ger- Petruck, Christopher R. Johnson, and Jan Scheffczyk. many. 2010. FrameNet II: Extended Theory and Practice. Benat˜ Zapirain, Eneko Agirre, Llu´ıs Marquez,` and Mi- Technical report, International Computer Science In- hai Surdeanu. 2013. Selectional preferences for se- stitute, 14 September 2010. mantic role classification. Computational Linguistics, Josef Ruppenhofer, Philip Gorinski, and Caroline 39(3):631–663. Sporleder. 2011. In search of missing arguments: A linguistic approach. In Proceedings of the Inter- national Conference Recent Advances in Natural Lan- guage Processing 2011, pages 331–338, Hissar, Bul- garia, 12–14 September 2011.

460

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/tacl_a_00150 by guest on 23 September 2021