Out-Of-Domain Framenet Semantic Role Labeling
Total Page:16
File Type:pdf, Size:1020Kb
Out-of-domain FrameNet Semantic Role Labeling Silvana Hartmann§†, Ilia Kuznetsov†, Teresa Martin§†, Iryna Gurevych§† Research Training Group AIPHES Ubiquitous§ Knowledge Processing (UKP) Lab Department† of Computer Science, Technische Universitat¨ Darmstadt http://www.ukp.tu-darmstadt.de Abstract on a range of benchmark datasets. This is crucial as the demand for semantic textual analysis of large- Domain dependence of NLP systems is one scale web data keeps growing. of the major obstacles to their application in large-scale text analysis, also restrict- Based on FrameNet (Fillmore et al., 2003), ing the applicability of FrameNet semantic FrameNet SRL extracts frame-semantic structures role labeling (SRL) systems. Yet, current on the sentence level that describe a specific FrameNet SRL systems are still only eval- situation centered around a semantic predicate, uated on a single in-domain test set. For often a verb, and its participants, typically the first time, we study the domain depen- syntactic arguments or adjuncts of the predicate. frame dence of FrameNet SRL on a wide range of The predicate is assigned a label, essentially benchmark sets. We create a novel test set a word sense label, that defines the situation and semantic roles for FrameNet SRL based on user-generated determines the of the participants. web text and find that the major bottleneck The following sentence from FrameNet provides Grinding for out-of-domain FrameNet SRL is the an example of the frame and its roles: frame identification step. To address this [The mill]Grinding cause grindsGrinding [the problem, we develop a simple, yet efficient malt]P atient [to grist]Result. system based on distributed word repre- sentations. Our system closely approaches FrameNet SRL consists of two steps, frame iden- the state-of-the-art in-domain while outper- tification (frameId), assigning a frame to the current forming the best available frame identifica- predicate, and role labeling (roleId), identifying the tion system out-of-domain. We publish our participants and assigning them role labels licensed system and test data for research purposes.1 by the frame. The frameId step reduces the hun- dreds of role labels in FrameNet to a manageable 1 Introduction set of up to 30 roles. Thus, FrameNet SRL dif- Domain dependence is a major problem for super- fers from PropBank SRL (Carreras and Marquez,` vised NLP tasks such as FrameNet semantic role 2005), that only uses a small set of 26 syntactically labeling (SRL): systems generally exhibit a strong motivated role labels and puts less weight on the performance drop when applied to test data from predicate sense. The advantage of FrameNet SRL a different distribution than the training data. This is that it results in a more fine-grained and rich prohibits their large-scale use in language technol- interpretation of the input sentences which is cru- ogy applications. cial for many applications, e.g. reasoning in online The same problems are expected for FrameNet debates (Berant et al., 2014). SRL, but due to a lack of datasets, state-of-the- Domain dependence is a well-studied topic for art FrameNet SRL is only evaluated on a single PropBank SRL. However, to the best of our knowl- in-domain test set, see e.g. Das et al. (2014) and edge, there exists no analysis of the performance FitzGerald et al. (2015). of modern FrameNet SRL systems when applied In this work, we present the first comprehensive to data from new domains. study of the domain dependence of FrameNet SRL In this work, we address this problem as fol- 1www.ukp.tu-darmstadt.de/ood-fn-srl lows: we introduce a new benchmark dataset YAGS 471 Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pages 471–482, Valencia, Spain, April 3-7, 2017. c 2017 Association for Computational Linguistics (Yahoo! Answers Gold Standard), which is based the corresponding CoNLL shared tasks promote on user-generated questions and answers and exem- out-of-domain evaluation (Surdeanu et al., 2008; plifies an out-of-domain application use case. We Hajicˇ et al., 2009). In the shared tasks, in-domain use YAGS, along with other out-of-domain test sets, newspaper text from the WSJ Corpus is contrasted to perform a detailed analysis of the domain depen- to out-of-domain data from fiction texts in the dence of FrameNet SRL using Semafor (Das et Brown Corpus. Most of the participants in the al., 2014; Kshirsagar et al., 2015) to identify which shared tasks do not consider domain adaptation of the stages of FrameNet SRL, frameId or roleId, and report systematically lower scores for the is particularly sensitive to domain shifts. Our re- out-of-domain data (Hajicˇ et al., 2009). sults confirm that the major bottleneck in FrameNet Representation learning has been successfully SRL is the frame identification step. Motivated used to improve on the CoNLL shared task re- by that, we develop a simple, yet efficient frame sults (Huang and Yates, 2010; FitzGerald et al., identification method based on distributed word 2015; Yang et al., 2015). Yang et al. (2015) re- representations that promise better domain gener- port the smallest performance difference (5.5 points alization. Our system’s performance matches the in F1) between in-domain and out-of-domain test state-of-the-art in-domain (Hermann et al., 2014), data, leading to the best results to date on the despite using a simpler model, and improves on the CoNLL 2009 out-of-domain test. Their system out-of-domain performance of Semafor. learns common representations for in-domain and The contributions of the present work are two- out-of-domain data based on deep belief networks. fold: 1) we perform the first comprehensive study of the domain generalization capabilities of open- Domain dependence of FrameNet SRL The source FrameNet SRL, and 2) we propose a new FrameNet 1.5 fulltext corpus, used as a standard frame identification method based on distributed dataset for training and evaluating FrameNet SRL word representations that enhances out-of-domain systems, contains texts from several domains (Rup- performance of frame identification. To enable our penhofer et al., 2010). However, the standard data study, we created YAGS, a new, substantially-sized split used to evaluate modern systems (Das and benchmark dataset for the out-of-domain testing of Smith, 2011) ensures the presence of all domains FrameNet SRL; we publish the annotations for the in the training as well as test data and cannot be YAGS benchmark set and our frame identification used to assess the systems’ ability to generalize. system for research purposes. Moreover, all the texts in the FrameNet fulltext corpus, based on newspaper and literary texts, are 2 Related work post-edited and linguistically well-formed. The FrameNet test setup thus cannot provide informa- The domain dependence of FrameNet SRL sys- tion on SRL performance on less edited out-of- tems has been only studied sparsely, however, there domain data, e.g. user-generated web data. exists a large body of work on out-of-domain Prop- There are few studies related to the out-of- Bank SRL, as well as on general domain adaptation domain generalization of FrameNet SRL. Johans- methods for NLP. This section briefly introduces son and Nugues (2008) evaluate the impact of dif- some of the relevant approaches in these areas, and ferent parsers on FrameNet SRL using the Nuclear then summarizes the state-of-the-art in FrameNet Threats Initiative (NTI) data as an out-of-domain frame identification. test set. They observe low domain generalization Domain adaptation in NLP Low out-of- abilities of their supervised system, but find that domain performance is a problem common to using dependency parsers instead of constituency many supervised machine learning tasks. The parsers is beneficial in the out-of-domain scenario. goal of domain adaptation is to improve model Croce et al. (2010) use a similar in-domain/out-of- performance on the test data originating from domain split to evaluate their approach to open- a different distribution than the training data domain FrameNet SRL. They integrate a distri- (Søgaard, 2013). For NLP, domain adaptation has butional model into their SRL system to general- been studied for various tasks such as POS-tagging ize lexicalized features to previously unseen argu- and syntactic parsing (Daume´ III, 2007; Blitzer ments and thus create an SRL system with a smaller et al., 2006). For the complex task of SRL, it performance gap between in-domain and out-of- is strongly associated with PropBank, because domain test data (only 4.5 percentage points F1). 472 Note that they only evaluate the role labeling step. The frame identification system of Semafor It is not transparent how their results would transfer relies on an elaborate feature set based on syntac- to the current state-of-the-art SRL systems that al- tic and lexical features, using the WordNet hierar- ready integrate methods to improve generalization, chy as a source of lexical information, and a label for instance using distributed representations. propagation-based approach to take unknown pred- Palmer and Sporleder (2010) analyze the icates into account. Semafor is not specifically FrameNet 1.3 training data coverage and the per- designed for out-of-domain use: the WordNet cov- formance of the Shalmaneser SRL system (Erk erage is limited, and the quality of syntactic parsing and Pado,´ 2006) for frame identification on sev- might drop when the system is applied to out-of- eral test sets across domains, i.e. the PropBank domain data, especially in case of non-standard and NTI parts of the FrameNet fulltext corpus and user-generated texts.