Evaluating FrameNet-style semantic : the role of coverage gaps in FrameNet

Alexis Palmer and Caroline Sporleder Computational Linguistics Saarland University apalmer, csporled @coli.uni-saarland.de { } Abstract limited coverage of the available hand-coded data against which to evaluate. More realistic evalua- Supervised (SRL) tions test systems on full text, but these same cov- systems are generally claimed to have ac- erage limitations mean that the assumptions made curacies in the range of 80% and higher in more restricted evaluations do not necessarily (Erk and Pado,´ 2006). These numbers, hold for full text. This paper provides an analysis though, are the result of highly-restricted of the extent and nature of the coverage gaps in evaluations, i.e., typically evaluating on FrameNet. A more precise understanding of the hand-picked lemmas for which training limitations of existing resources with respect to data is available. In this paper we con- robust semantic analysis of texts is an important sider performance of such systems when foundational component both for improving ex- we evaluate at the document level rather isting systems and for developing future systems, than on the lemma level. While it is well- and it is in this spirit that we make our analysis. known that coverage gaps exist in the re- sources available for training supervised Full-text semantic analysis SRL systems, what we have been lacking Automated frame-semantic analysis aims to ex- until now is an understanding of the pre- tract from text the key event-denoting predicates cise nature of this coverage problem and and the semantic argument structure for those its impact on the performance of SRL sys- predicates. The semantic argument structure of tems. We present a typology of five differ- a predicate describing an event encodes relation- ent types of coverage gaps in FrameNet. ships between the participants involved in the We then analyze the impact of the cov- event, e.g. who did what to whom. Knowledge of erage gaps on performance of a super- semantic argument structure is essential for lan- vised semantic role labeling system on full guage understanding and thus important for ap- texts, showing an average oracle upper plications such as (Mos- bound of 46.8%. chitti et al., 2003; Surdeanu et al., 2003), ques- tion answering (Shen and Lapata, 2007), or recog- 1 Introduction nizing (Burchardt et al., 2009). A lot of progress has been made in semantic Evaluating an existing system for its ability to aid role labeling over the past years, but the per- such tasks is unrealistic if the evaluation is lemma- formance of state-of-the-art systems is still rel- based rather than text-based. Consequently, there atively low, especially for deep, FrameNet-style continues to be significant interest in developing semantic parsing. Furthermore, many of the re- semantic role labeling (SRL) systems able to au- ported performance figures are somewhat unre- tomatically compute the semantic argument struc- alistic because system performance is evaluated tures in an input text. on hand-selected lemmas, usually under the im- Performance on the full text task, though, is plicit assumptions that (i) all relevant senses typically much lower than for the more restricted (frames) of each lemma are known, and (ii) there evaluations. The SemEval 2007 Task on “Frame is a suitable amount of training data for each Semantic Structure Extraction,” for example, re- sense. This approach to evaluation arises from the quired systems to identify key predicates in texts,

928 Coling 2010: Poster Volume, pages 928–936, Beijing, August 2010 assign a semantic frame to the relevant predi- it might require obtaining an analysis of the word cates, identify the semantic arguments for the tried. predicates, and finally label those arguments with (1) [Captain Thomas Preston] their semantic roles. The systems participating Defendanti was for in this task only obtained F-Scores between 55% triedTry defendanti [murder] . and 78% for frame assignment, despite the fact Chargesi,j that the task organizers adopted a lenient evalu- In the end [he] was ation scheme which gave partial credit for near- Defendantj misses (Baker et al., 2007). For the combined task [Ø] . acquittedVerdictj Chargesj of frame assigment and role labeling the perfor- mance was even lower, ranging from 35% to 54% Performance levels obtained for full text are F-Score. usually not sufficient for this kind of real-world Note that this distinction between evaluation task. FrameNet-style semantic role labeling has schemes for SRL systems corresponds to the dis- been shown to, in principle, be beneficial for ap- tinction between “lexical sample” and “all ” plications that need to generalise over individual evaluations in word sense disambiguation, where lemmas, such as recognizing textual entailment or results for the latter scheme are also typically . However, studies also found lower (McCarthy, 2009). that state-of-the-art FrameNet-style SRL systems The low performances are at least partly due perform too poorly to provide any substantial ben- to coverage problems. For example, Baker et efit to real applications (Burchardt et al., 2009; al. (2007) annotated three new texts for their Shen and Lapata, 2007). SemEval 2007 task. Although these new texts Extending the value of automated semantic overlap in domain with existing FrameNet data, parsing for a variety of applications requires im- the task organizers had to create 40 new frames proving the ability of systems to process unre- in order to complete annotation. The new frames stricted text. Several methods have been pro- were for word senses found in the test set but posed to address different aspects of the cover- missing from FrameNet. The test set contained age problem, ranging from automatic data expan- only 272 frames (types), meaning that nearly 15% sion and semi-supervised semantic role labelling of the frames therein were not yet defined in (Furstenau¨ and Lapata, 2009b; Furstenau¨ and La- FrameNet. Obviously, coverage issues of this de- pata, 2009a; Deschacht and Moens, 2009; Gordon gree make full SRL a difficult task, but this is a and Swanson, 2007; Pado´ et al., 2008) to systems realistic scenario that will be encountered in real which can infer missing word senses (Pennac- applications as well. chiotti et al., 2008b; Pennacchiotti et al., 2008a; As mentioned above, for many tasks it is neces- Cao et al., 2008; Burchardt et al., 2005). How- sary to compute the semantic argument structures ever, so far there has not been a detailed analysis for the whole text, or at least for multi-sentence of the problem. In this paper we provide that de- passages. Due to non-local relations between ar- tailed analysis, by defining different types of cov- gument structures this is also true for tasks like erage problems and performing analysis of both question answering, where it might be possible coverage and performance of an automated SRL to automatically determine a subset of lemmas system on three different data sets. which are relevant for the task. For example, in (1) Section 2 of the paper provides an introduction it might be possible to determine that the second to FrameNet and introduces the basic terminol- sentence contains the answer to the question “Was ogy. Section 4 describes our approach to coverage Thomas Preston acquitted of theft?” However, evaluation, Section 3 discusses the texts analyzed, to correctly answer this question, it is necessary and the analysis itself appears in Section 5. Sec- to resolve the null instantiation of the CHARGES tion 6 then looks at one possibility for addressing role of the VERDICT frame. This null instantiation the coverage problem. The final section presents links back to the previous sentence, and resolving some discussion and conclusions.

929 (a) (b)

Figure 1: Terminology: (a) Frame with core frame elements (FEs) and frame-evoking elements (FEEs) (b) Target with possible frame assignments and resultant lexical units (LUs)

2 FrameNet semantic argument annotations for nouns, adjec- tives, adverbs, prepositions and even multi-word Manual annotation of corpora with semantic ar- expressions. For example, the sentence in (2) and gument structure information has enabled the de- the NP in (3) have identical argument structures velopment of statistical and supervised machine because the verb speak and the noun comment learning techniques for semantic role labeling evoke the same frame STATEMENT. (Toutanova et al., 2008; Moschitti et al., 2008; Gildea and Jurafsky, 2002). (2) [The politician]Speaker spokeStatement The two main resources are PropBank (Palmer [about recent developments on the labour et al., 2005) and FrameNet (Ruppenhofer et al., market]T opic. 2006). PropBank aims to provide a semantic role (3) [The politician’s] com- annotation for every verb in the Penn Speaker mentsStatement [on recent developments (Marcus et al., 1994) and assigns roles on a verb- on the labour market] by-verb basis, without making higher-level gener- T opic alizations. Whether two distinct usages of a given Since FrameNet annotations are semanti- verb are viewed as different senses or not is thus cally driven they are considerably more time- driven by both syntax (namely, differences in syn- consuming to create than PropBank annotations. tactic argument structure) and (via ba- However, FrameNet also provides ‘deeper’ and sic, easily-discernable differences in meaning). more informative annotations than PropBank FrameNet1 is a lexicographic project whose analyses (Ellsworth et al., 2004). For instance, aim it is to create a documenting the fact that (2) and (3) refer to the same state- valence structures for different word senses and of-affairs is not captured by PropBank sense dis- their possible mappings to underlying semantic tinctions. argument structure (Ruppenhofer et al., 2006). In contrast to PropBank, FrameNet is primarily se- FrameNet Terminology mantically driven; word senses (frames)2 are de- The English FrameNet data consist of an inven- fined mainly based on sometimes-subtle meaning tory of frames (i.e. word senses), a set of lexi- differences and can thus generalise across individ- cal entries, and a set of annotated examples ex- ual lemmas, and often also across different parts- emplifying different syntactic realizations for se- of-speech. Because FrameNet focusses on seman- lected frames (known as the lexicographic anno- tics it is not restricted to verbs but also provides tations). Frames are conceptual structures that describe types of situations or events together 1 http://framenet.icsi.berkeley.edu/ with their participants. Frame-evoking elements 2We follow Erk (2005) in treating frame assignment as a word sense disambiguation task. Thus in this paper we use (FEEs) are predicate usages which evoke a par- the terms frame and sense interchangeably. ticular frame. A given lemma can evoke different

930 frames in different contexts; each instance of the FNL is the corpus commonly used as the basis for lemma is a separate target for semantic analysis. training supervised FrameNet-based SRL systems For example, (4) and (5) illustrate two different (Erk and Pado,´ 2006). frames of the lemma speak. In our analysis, we look at three data sets: the lexicographic annotations from FN1.3, the full (4) [The politician]Speaker spokeStatement text annotations from FN1.3, and a new data set [about recent developments on the labour of running text that was annotated for the SemEval market]T opic. 2010 Task-10 (see Table 1 for details). (5) [She] doesn’t speak Interlocutor1 Chatting FrameNet Lexicographic (FNL) FrameNet to [anyone] . Interlocutor2 started as a lexicographic project, aiming to draw In this paper we follow standard use of up an inventory of frames and lexical units, sup- FrameNet terminology, with the possible excep- ported by corpus evidence, to document the range tion of the term lexical unit. Figure 1 illus- of syntactic and semantic usages of each lexical trates our use of FrameNet-related terminology, unit. The annotated example sentences in this part focussing on (a) the CAUSE TO MAKE NOISE of FN1.3 are taken from the British National Cor- frame and (b) the target verb lemma ring. pus (BNC). BNC is a balanced corpus, hence FNL The definition of a frame determines the avail- covers, in principle, a variety of domains. able roles (frame elements or FEs) of the se- For each LU, a subset of the sentences in which mantic argument structure for the particular use it occurs was selected for annotation, and in each of the predicate, as well as the status—core or extracted sentence, only the target LU was anno- peripheral—of those roles. For example, the FE tated. The sentences were not chosen randomly TOPIC is a core role under the STATEMENT frame, but with a set of lexicographic constraints in mind. but a peripheral role under the CHATTING frame. In particular the sentences should exemplify dif- The lexical entry of a lemma in FrameNet spec- ferent usage. Thus ideally selected sentences ifies a list of frames which the lemma can evoke, would be easy to understand and not too long or and the pairing of a word with a particular frame is complex. As a consequence of this linguistically- called a lexical unit (LU). Ideally there should be driven selection procedure, the annotated sen- annotated examples for each lexical unit, exem- tences are not statistically representative in any plifying different syntactic constructions which way. FNL provides annotations for just under can realize this LU. However, as we will see 140,000 FEEs (tokens). On average, around 20 later (Section 5) annotated examples can be miss- sentences are annotated for each LU. FrameNet’s 3 ing. Also, because FrameNet is a lexicographic frame inventory contains 722 frames. project, the examples were extracted to illustrate FrameNet Full Texts (FNF) Starting with re- particular usages, i.e., they are not meant to be sta- lease 1.3, FrameNet also provides annotations of tistically representative. running texts. In this annotation mode, all LUs 3 Data in a sentence and all sentences in a text are an- notated. FN1.3 contains two subsets of full text Having introduced the basic FrameNet terminol- annotations. The first of these (PB) contains five ogy, we now describe in more detail the data texts which were also annotated by the PropBank sets used in the analysis. FrameNet Release 1.3 project. While all texts come from the Wall Street (FN1.3), the latest release from the Berkeley Journal, they are not prototypical examples of the FrameNet project, includes both a corpus of lex- financial domain, rather they are longer essays icographic annotations (FNL), which we referred covering a wide variety of general interest topics to in Section 2, and a corpus of texts fully- 3 annotated with frames and semantic role labels Only lexical frames are included in this number. In addi- tion to those, FrameNet 1.3 defines another 74 frames which (FNF). Annotations in the two corpora of course cannot be lexicalised but are included because they provide cover different sets of predicates and frames, and useful generalisations in the frame hierarchy.

931 FEEs Frames 4 Types of Coverage Gaps Data Genre / Domain Tokens Types Types FNL mixed 139,439 8370 722 PB essays, general interest 1580 680 319 Semantic role labelling systems have to perform NTI reports, foreign affairs 8271 1305 434 two sub-tasks: (i) identifying the correct frame SE fiction, crime 1530 680 320 for a given lemma and context, and (ii) identifying and labeling the frame elements. The most severe Table 1: Statistics for the three data sets coverage problems typically arise with the first subtask. Furthermore, coverage problems related (ranging from ‘Bell Ringing’ to ‘Earthquakes’). to frame identification have a knock-on effect on The second subset (NTI) contains 12 texts from role identification and labeling because the choice the Nuclear Threat Initiative website.4 These texts of the correct frame determines which roles are are intelligence reports which summarize and dis- available. Therefore, we focus on the frame iden- cuss the status of various countries with regard to tification task in this paper. the development of weapons and missile systems. Attempts to do automated frame assignment Statistics for both data sets are given in Table 1. on unrestricted text invariably encounter prob- lems associated with limited coverage of frame- SemEval 2010 Task-10 Full Texts (SE) While evoking elements in FrameNet. However, not ev- the FrameNet full texts allow us to estimate cover- ery coverage gap is the same, and the precise na- age gaps that arise from limited training data, they ture of a coverage gap influences potential strate- do not allow us to gauge coverage problems aris- gies for addressing it. In this section we describe ing from missing frames in the FN1.3 inventory. the different types of coverage gaps. We pro- The reason for this is that the frame inventory re- ceed from less problematic coverage gaps to more flects the annotations of both the lexicographic problematic ones, in the sense that the former can and the full text part of FN1.3, i.e., every frame be addressed more straighforwardly by automated annotated in one of these subsets will also be part systems than can the latter. of the inventory. To estimate the frame coverage problem on completely new texts, we therefore in- 4.1 NOTR gaps cluded a third (full text) data set that was anno- tated for the SemEval 2010 Task 10 on “Linking Some coverage gaps occur when lexical units Events and Their Participants in Discourse” (Rup- (LUs) defined in FrameNet lack corresponding penhofer et al., 2009).5 The text is taken from annotated examples; these gaps are the result of Arthur Conan Doyle’s ”The Adventure of Wiste- lacking training data, hence we call them NOTR ria Lodge”. It thus comes from the fiction domain. gaps. To give a sense of the abundance of such gaps, of the 10,191 LUs defined in FN1.3, anno- The text was manually annotated with frame- tated examples are available for only 6727. semantic argument structure by two experienced annotators. Similar to the FNF texts, the annota- NOTR-LU: lexical unit with no training data. tors aimed to annotate all LUs in the text. To do In many cases, an LU — a specific pairing of a so, some new frames had to be created for pre- target lemma with one frame — may be defined viously un-encountered LUs. These new frames in FrameNet, thus potentially accessible to an are not part of FN1.3 and we can thus use them to automated system, but lacking labeled training estimate coverage problems arising from missing material. For example, FrameNet defines two frames. Details for the data set can be found in LUs for the noun ringer: with the frames CAUSE Table 1. This data set is very similar to the PB set TOMAKENOISE and SIMILARITY. It is clear in terms of size, FEE type-token ratio and number that the occurrence of ringer in (6) belongs to of frames (types). the former LU, even given a very limited context. The lexicographic annotations, though, provide 4http://www.nti.org 5The data set is available from http://semeval2. training material only for the SIMILARITY frame. fbk.eu/semeval2.php?location=data.

932 (6) Then, at a signal, the ringers begin vary- and INSTALLING. The sense of an art installation, ing the order in which the bells sound which is an instance of the frame PHYSICAL ART- without altering the steady rhythm of the WORKS, is missing. striking. UNDEF-TGT: target not addressed. In the NOTR-LU gaps pose particular problems to a worst case, all LUs for a target lemma might be fully-supervised SRL system, because such a sys- missing, i.e., the lemma does not occur in the tem cannot learn anything about the context in FrameNet lexicon at all. The noun fabric is an ex- which the CAUSETOMAKENOISE frame is more ample. Though it has at least two distinct senses— appropriate. A NOTR-LU gap is identified for that of cloth or material and that of a framework an LU even if training data is available for other (e.g. the fabric of society)—FrameNet provides senses (i.e. other LUs) of the target lemma. no help for determining appropriate frames for in- stances of this lemma. NOTR-TGT: target with no training data. In other cases, a target lemma may be defined as par- UNDEF-FR: frame not defined. Finally, it ticipating in one or more LUs, but with no training may be not only that the LU is missing, but that data available for any of them. In other words, a there is no definition in FrameNet for the cor- supervised automated system trained only on the rect frame given the context. For example, in available annotated examples will fail to learn any the sports domain the lemma ringer can have the potential frame assignments for the target lemma. sense of (a horseshoe thrown so that it encircles Such is the case for art, which in FrameNet is the peg); to our knowledge, this sense is not avail- assigned the single frame CRAFT, but for which able in FrameNet. FNL contains no training data. 5 Coverage gaps and automated (7) The art of change-ringing is peculiar to processing the English, and, like most English pe- culiarities, unintelligible to the rest of the With the exception of work on extending cov- world. erage, most FrameNet-style semantic role label- ing studies draw both training and evaluation data Whereas a NOTR-LU gap obscures a particular from FNL. This is an unrealistic evaluation sce- frame assignment for a target lemma, a NOTR- nario for full-text semantic analysis, as such eval- TGT gap indicates a complete absence in the lexi- uation limits the domain for which prediction can cographic corpus of annotated data for the lemma. occur to those lexical entries treated in FNL. For 4.2 UNDEF gaps systems which do not attempt any generalization beyond those lexical entries with training data, The previous coverage problems arise from a lack this limits the system to 5864 lemmas for which it of annotated data, an issue which conceivably can make predictions regarding frame assignment could be addressed through further annotation. and role labeling. More serious problems arise when a text contains Disregarding whether annotations have yet word senses, words, or frames not contained in been provided for the lexical units in FNL still FrameNet. We call such elements ‘undefined’; limits us to 8370 frame-evoking elements (tar- specifically, they receive no treatment in FN1.3. gets). To better understand the potential of cur- UNDEF-LU: lexical unit not defined. Cover- rent frame-semantic resources for semantic anal- age gaps of this sort occur when the frame inven- ysis of unrestricted text, we evaluate coverage of tory for a given lemma is not complete. In other the FNL annotations against the texts in FNF, as words, at least one LU for the lemma exists in well as against the SemEval text. We then analyze FrameNet, but one or more other LUs are miss- the performance of an off-the-shelf, supervised ing. For example, the noun installation occurs SRL system, Shalmaneser (Erk and Pado,´ 2006), in FrameNet with the frames LOCALEBYUSE on the same texts, with a focus on the types of

933 Dataset TR-LU NOTR-LU NOTR-TGT UNDEF-LU UNDEF-FR PB 42.66 9.56 47.78 – – NTI 46.77 7.77 45.46 – – SE 51.64 6.86 26.01 3.40 12.09

Table 2: FrameNet coverage for analyzed texts errors made and the upper bound on performance Dataset Correct Type(i) Type(ii) Type(iii) for this system. PB 36.71 5.95 9.56 47.78 NTI 41.22 5.55 7.77 45.46 SE 46.67 4.97 6.86 41.50 5.1 FrameNet coverage As described in Section 4, in many cases a lex- Table 3: Shalmaneser performance on texts ical unit, a frame-evoking element, or a frame may simply not be represented in FrameNet. In the number of targets for which Shalmaneser has other cases, the entity may be in FN1.3 but lack- seen training data, namely the sum of TR-LU and ing training data. Of the 722 frames defined in NOTR-LU in Table 2.6 FN1.3, for example, annotations exist for 502. We consider three categories of errors: (i) nor- For the three data sets analyzed, Table 2 shows mal or true errors are misclassifications when the the degree of coverage provided by FNL for the correct label has been seen in the training data. In gold-standard frame annotations. First, the TR- this category we also count errors resulting from LU column shows the non-problematic cases, for incorrect lemmatization. (ii) label-not-seen errors which the correct frame annotation is available are misclassifications when the correct label does in FrameNet, with training data. The next two not appear in the training data and thus is unavail- columns represent training gaps related to lack able to the classifier. Finally, (iii) no-chance er- of training data: NOTR-LU are cases for which rors occur when the system has no information training data exists for the target, but not for the for either a given target or a given frame. Ta- correct sense of the target, and NOTR-TGT in- ble 3 shows the prevalence of each error type for stances are those for which no training data at all each data set, given as the percentage of all frame- exists for the target. assignment targets. Because all targets annotated in the FNF texts (i.e. PB and NTI above) are incorporated in It can be seen that the frame assignment accu- FN1.3, gaps due to missing LUs, targets, or racy is relatively low for all three texts (between frames do not exist for those texts. The same 37% and 47%). However, only a relatively small does not hold for the SemEval (SE) text. For proportion of the misclassifications are due to true 3.4% of the annotated SemEval targets, an LU is errors made by the system. Furthermore, a large entirely missing from the lemma’s frame inven- amount of errors (41% to 48%, with an average tory in FrameNet, and in just over 12% of cases of 46.8%) is due to cases where important infor- both the lemma and the frame are missing. In to- mation is missing from FrameNet (Type (iii) er- tal, more than 15% of LUs appearing in the gold- rors). Consequently, improving the semantic role standard SemEval annotations are not defined at labeller by optimising the feature space or the ma- all within FrameNet. This figure accords with that chine learning framework is going to have very found by Baker et al. (2007). little effect. A much more promising path would be to investigate methods which might enable the 5.2 Error analysis of full-text frame SRL system to deal gracefully with unseen data. assignment One possible strategy is discussed in the next sec- Here we examine the errors made by Shalmaneser tion. for frame assignment on the three data sets. The 6By ‘apparent performance’ we mean the system’s own upper bound on apparent performance is fixed by evaluation of its accuracy on frame assignment.

934 6 Frame and lemma overlap In future work we will examine the effective- ness of binary frame-based classifiers, abstract- One potential strategy for improving full-text se- ing away from individual predicates to predict mantic analysis without performing additional an- whether a given lemma belongs to the frame in notation is to take advantage of semantic overlap question (for a related study see Johansson and as it is represented in FrameNet. We can look Nugues (2007)). A potential drawback to this ap- at two different types of overlap in FrameNet: proach is the loss of predicate-specific informa- lemma overlap and frame overlap. tion. We know, for example, about verbs that they tend to have typical argument structures and typi- 6.1 Lemma overlap cal syntactic realizations of those argument struc- The approach of treating frame assignment as a tures. word sense disambiguation task (as, e.g., by Shal- In addition to this frame-overlap approach, we maneser) relies on the overlap of LUs with the will consider the impact on coverage of using same lemma and trains lemma-based classifiers coarser-grained versions of FrameNet in which on all training instances for all LUs involving that frames have been merged according to frame rela- lemma. One way to consider using labeled mate- tions defined over the FrameNet hierarchy, using rial in FrameNet to improve performance on tar- the FrameNet Transformer tool described in (Rup- gets for which we have no labeled material is to penhofer et al., 2010). generalize over lemmas associated with the same frame. The idea is to use training instances from 7 Conclusions related lemmas to build a larger training set for Although it is clear that the capability to do shal- lemmas with little or no annotated data. low semantic analysis on unrestricted text, and on Of the 8370 lemmas in FN, 8358 share a single complete documents or text passages, would help frame with at least one other lemma. 890 overlap performance on a number of key tasks, currently- on two frames with at least one other lemma, and available resources seriously limit our potential 111 have 3-frame overlap with at least one other for achieving this with supervised systems. The lemma. Only 16 lemmas show an overlap of four analysis in this paper aims for a better understand- or more frames. These groupings are: ing of the precise nature of these limitations in 1. clang.v, clatter.v, click.v, thump.v order to address them more deliberately and with 2. hit.v, smack.v, swing.v, turn.v 3. drop.v, rise.v a principled understanding of the coverage prob- 4. remember.v, forget.v lems faced by current systems. 5. examine.v, examination.n To this end, we outline a typology of coverage 6. withdraw.v, withdrawal.n gaps and analyze both coverage of FrameNet and The first two groupings are sets of words that performance of a supervised semantic role label- are closely semantically related, the second two ing system on three different full-text data sets, to- are opposite pairs, and the third two are verb- taling over 150,000 frame-assignment targets. We nominalization pairs. find that, on average, 46.8% of targets are not cov- The lemma overlap groups differ with respect ered under straight supervised-classification ap- to how much training data they make accessible. proaches to frame assignment.

6.2 Frame overlap Acknowledgments Another possibility to be considered is general- This research has been funded by the German Re- ization over all instances of a given frame. For search Foundation DFG under the MMCI Cluster the 502 frames with annotated examples, the num- of Excellence. Thanks to the anonymous review- ber of annotated instances ranges from one (SAFE ers, Josef Ruppenhofer, Ines Rehbein, and Hagen SITUATION,BOARDVEHICLE, and ACTIVITY Furstenau¨ for interesting and helpful comments START to 6233 (SELFMOTION), with an average and discussions, and to Collin Baker for assistance of 278 training instances per frame. with data.

935 References A. Moschitti, P. Morarescu, S. Harabagiu. 2003. Open-domain information extraction via automatic C. Baker, M. Ellsworth, K. Erk. 2007. Semeval-2007 semantic labeling. In Proceedings of FLAIRS. task 19: Frame semantic structure extraction. In Proceedings of SemEval-2007. A. Moschitti, D. Pighin, R. Basili. 2008. Tree Kernels for Semantic Role Labeling. Computational Lin- A. Burchardt, K. Erk, A. Frank. 2005. A WordNet guistics, 34(2). Detour to FrameNet. In Proceedings of the GLDV- 05 Workshop GermaNet II. S. Pado,´ M. Pennacchiotti, C. Sporleder. 2008. Se- mantic role assignment for event nominalisations by A. Burchardt, M. Pennacchiotti, S. Thater, M. Pinkal. leveraging verbal data. In Proceedings of Coling 2009. Assessing the impact of frame semantics on 2008. textual entailment. Journal of Natural Language Engineering, Special Issue on Textual Entailment, M. Palmer, D. Gildea, P. Kingsbury. 2005. The Propo- 15(4):527–550. sition Bank: An Annotated Corpus of Semantic Roles. Computational Linguistics, 31(1):71–105. D. D. Cao, D. Croce, M. Pennacchiotti, R. Basili. 2008. Combining word sense and usage for mod- M. Pennacchiotti, D. D. Cao, R. Basili, D. Croce, eling frame semantics. In Proceedings of STEP-08. M. Roth. 2008a. Automatic induction of FrameNet lexical units. In Proceedings of EMNLP-08. K. Deschacht, M.-F. Moens. 2009. Semi-supervised Semantic Role Labeling Using the Latent Words M. Pennacchiotti, D. D. Cao, P. Marocco, R. Basili. Language Model. In Proceedings of EMNLP-09. 2008b. Towards a Vector Space Model for FrameNet-like Resources. In Proceedings of LREC- M. Ellsworth, K. Erk, P. Kingsbury, S. Pado.´ 2004. 08. PropBank, SALSA, and FrameNet: How Design Determines Product. In Proceedings LREC 2004 J. Ruppenhofer, M. Ellsworth, M. R. L. Petruck, C. R. Workshop on Building Lexical Resources from Se- Johnson, J. Scheffczyk. 2006. FrameNet II: Ex- mantically Annotated Corpora. tended Theory and Practice. K. Erk, S. Pado.´ 2006. Shalmaneser – a toolchain for J. Ruppenhofer, C. Sporleder, R. Morante, C. Baker, shallow semantic parsing. In Proceedings of LREC- M. Palmer. 2009. SemEval-2010 Task 10: Link- 06. ing Events and Their Participants in Discourse. In Proceedings of SEW-2009. K. Erk. 2005. Frame assignment as word sense disam- biguation. In Proceedings of IWCS 6. J. Ruppenhofer, M. Pinkal, J. Sunde. 2010. Generat- ing FrameNets of various granularities. In Proceed- H. Furstenau,¨ M. Lapata. 2009a. Graph alignment for ings of LREC 2010. semi-supervised semantic role labeling. In Proceed- ings of EMNLP 2009. D. Shen, M. Lapata. 2007. Using semantic roles to improve question answering. In Proceedings of H. Furstenau,¨ M. Lapata. 2009b. Semi-supervised se- EMNLP-2007. mantic role labeling. In Proceedings of EACL 2009. M. Surdeanu, S. Harabagiu, J. Williams, P. Aarseth. D. Gildea, D. Jurafsky. 2002. Automatic label- 2003. Using predicate-argument structures for in- ing of semantic roles. Computational Linguistics, formation extraction. In Proceedings of ACL 2003. 28(3):245–288. K. Toutanova, A. Haghighi, C. D. Manning. 2008. A. Gordon, R. Swanson. 2007. Generalizing semantic A Global Joint Model for Semantic Role Labeling. role annotations across syntactically similar verbs. Computational Linguistics, 34(2). In Proceedings of ACL 2007. R. Johansson, P. Nugues. 2007. Using WordNet to extend FrameNet coverage. In Proceedings of the Workshop on Building Frame-semantic Resources for Scandinavian and Baltic Languages, NODAL- IDA. M. Marcus, G. Kim, M. A. Marcinkiewicz, R. MacIn- tyre, A. Bies, M. Ferguson, K. Katz, B. Schasberger. 1994. The Penn Treebank: Annotating predicate ar- gument structure. In ARPA Human Language Tech- nology Workshop. D. McCarthy. 2009. Word Sense Disambiguation: An Overview. Language and Linguistics Compass, 3(2):537–558.

936