Robust Semantic Analysis for Unseen Data in Framenet
Total Page:16
File Type:pdf, Size:1020Kb
Robust Semantic Analysis for Unseen Data in FrameNet Alexis Palmer, Afra Alishahi, Caroline Sporleder Computational Linguistics and Phonetics Saarland University, Germany {apalmer,afra,csporled}@coli.uni-saarland.de Abstract was annotated on a frame-by-frame basis, select- ing individual example sentences for each lexical We present a novel method for FrameNet- unit (LU), or pairing of lemma and frame. This based semantic role labeling (SRL), fo- means that many common lemmas are missing cusing on limitations posed by the limited from FrameNet, and for those that are included coverage of available annotated data. Our the number of example sentences is often rela- SRL model is based on Bayesian cluster- tively small and not in accordance with distribu- ing and has the advantage of being very ro- tions found in naturally-occurring texts. bust in the face of unseen and incomplete FrameNet’s well-known coverage gaps translate data. Frame labeling and role labeling are directly to drops in labeling performance, motivat- modeled in like fashions, allowing cascad- ing the development of systems which are more ing classification scenarios. The model is robust in the face of sparse data. For example, shown to perform especially well on un- the supervised SRL system Shalmaneser (Erk and seen data. In addition, we show that for Pado,´ 2006) obtains a frame labeling accuracy of seen data, predicting semantic types for 93% on FrameNet 1.2 (with a 90-10 training-test roles improves role labeling performance. split), but the same system’s performance drops to 1 Introduction 47% accuracy when trained on FrameNet 1.3 and tested on texts with full frame-semantic annota- The majority of recent work in semantic role la- tions (Palmer and Sporleder, 2010). Similarly, Das beling (SRL) has been carried out on PropBank- et al. (2010) report a 60% frame labeling F-Score style semantic argument annotations (Palmer et on SemEval-07 data, but of 210 unseen lemmas, al., 2005), rather than on FrameNet-style annota- their system predicts just four frames correctly.1 tions (Ruppenhofer et al., 2006). FrameNet dif- fers from PropBank in that FrameNet annotations In general the term unseen could refer to un- are more strongly semantically driven. FrameNet seen frames, unseen lemmas, or unseen LUs. As generalizes over different parts of speech and can further discussed in Section 4, we are interested assign the same sense (frame) to a noun and a verb in unseen LUs: cases in which the system has as in (1), where both competition and play are as- not been exposed to a particular pairing of lemma and frame. We propose a novel method for SRL signed the COMPETITION frame. Also, FrameNet assigns semantic roles not only to syntactic argu- based on Bayesian clustering. The model is well ments of the target but also to constituents which suited to deal with incomplete data, both in terms are not directly syntactically dependent on the tar- of missing feature values and in terms of feature- get but can be semantically understood as filling a label combinations not seen in the training data. role, e.g., Wivenhoe Town in (1a). 2 Related Work (1) a. [Wivenhoe Town]Participant1 have never won the competitionCompetition. While early FrameNet-style SRL systems (Gildea b. [Olympiakos]Participant1 playsCompetition [against Aris Salonica]Participant1 [in and Jurafsky, 2002; Erk and Pado,´ 2006, among Piraeus]Place. others) are unable to make predictions for LUs not seen in the training data, several more recent stud- A major challenge for FrameNet-style SRL is posed by the limited coverage of available anno- 1Under the SemEval-07 partial matching scheme, a ma- tated data. The FrameNet lexicographic corpus jority of the other frame predictions receive partial credit. 628 Proceedings of Recent Advances in Natural Language Processing, pages 628–633, Hissar, Bulgaria, 12-14 September 2011. ies have addressed the coverage issue. For exam- are predicted based on the acquired constructions ple, Das et al. (2010) introduce a latent variable (or clusters), and the extracted features from the ranging over seen targets, allowing them to infer corpus. This strategy provides a number of advan- likely frames for unseen words, and the SRL sys- tages. First, the model can easily deal with incom- tem of Johansson and Nugues (2007) uses Word- plete data; that is, input instances for which any Net to generalise to unseen lemmas. In a simi- number of features are missing can be seamlessly lar vein, Burchardt et al. (2005) propose a system clustered or considered for prediction, based on that generalizes over WordNet synsets to guess the similarity of their features with those in the ex- frames for unknown words. Pennacchiotti et al. isting clusters. Moreover, a single core prediction (2008) compare WordNet-based and distributional mechanism is used for a variety of tasks (e.g. pre- approaches to inferring frames and conclude that dicting a missing frame label, role, or role type), a combination of the two leads to the best results, which can lead to cascading prediction. For exam- while (Cao et al., 2008) discuss how different dis- ple, for a partial (i.e. unannotated) frame instance, tributional models can be utilised. Several ap- the best role type for each argument can be pre- proaches have also addressed other coverage prob- dicted based on the available features, and then ar- lems, e.g., how to automatically expand the num- gument roles can be predicted based on those fea- ber of example sentences for a given lexical unit tures and the predicted role types. (Pado´ et al., 2008; Furstenau¨ and Lapata, 2009). An important characteristic of this model is its Another related approach is that of generalizing generalizability. It uses a full Bayesian prediction over semantic roles. Baldewein et al. (2004) use model, which takes into account the contribution the FrameNet hierarchy to model the similarity of of every cluster to predicting the best value for roles, boosting seldom-seen instances by reusing a missing feature. This way, there is no built-in training data for similar roles, though without sig- difference between predicting a frame label or se- nificant gains in performance. The most exten- mantic role for seen versus unseen instances. Nat- sive study on role generalization to date (Matsub- urally, the outcome of prediction will be more ac- ayashi et al., 2009) compares different ways of curate if the model has seen several instances sim- grouping roles—exploiting hierarchical relations ilar to a test instance (i.e., from the same lexical in FrameNet, generalizing via role names, util- unit or lemma). But even for unseen instances, the ising role types, and using thematic roles from model is still capable of generalizing the proper- VerbNet—with the best results from using all ties of the training instances given that there are groups together. similarities between their available features, such as the syntactic pattern and the semantic properties 3 Model of the predicate and the arguments. We formalize frame and role assignment using 3.1 Clustering Frame Instances an extended version of the construction learning From the FrameNet corpus, we extract for each model of Alishahi and Stevenson (2010). The instance the nine features shown in Table 1. Dif- model uses Bayesian clustering for learning argu- ferent subsets of these features are used for the ex- ment structure constructions: each construction is periments reported in Section 5. a grouping of individual predicate usages which An incremental Bayesian clustering process probabilistically share form-meaning associations. groups each extracted frame instance with the These groupings typically correspond to general most similar existing cluster of instances. If no ex- constructions in the language such as intransitive, isting cluster has sufficiently high probability for transitive, and ditransitive. By detecting similar the new frame instance, a new cluster is created. usages and clustering them into constructions, the Adding a frame instance X to a cluster c is for- model forms probabilistic associations between mulated as finding the c with the maximum proba- syntactic positions of arguments with respect to bility given X, where c ranges over the indices of the predicate, and the lexical semantic properties all clusters, with index 0 representing recognition of the predicate and the arguments. of a new cluster. Using Bayes rule, and dropping We model frame and role assignment in this P (X) which is constant for all c: fashion, where the most probable values for a P (c)P (X|c) P (c|X) = ∼ P (c)P (X|c) (2) missing frame or the semantic roles of arguments P (X) 629 The conditional probabilities P (X|c) and The prior probability P (c) is given by the rela- P (X |c) are determined as in the learning mod- tive frequency of the frame instances it contains, i ule. Ranging over the possible values X of over all observed instances. The posterior prob- i feature i, the value of an unobserved feature can ability of an instance X is expressed in terms of be predicted by maximizing P (X |X): the individual probabilities of its features, which i we assume are independent, thus yielding a sim- ple product of feature probabilities: BestValue(X, i) = argmax P (Xi|X) (7) Xi Y P (X|c) = P (Xi|c) (3) i∈Features(X) Identifying a frame can be simulated This probability is estimated using smoothed max- as finding the frame label Xframe with imum likelihood: the highest P (Xframe |X), or estimating BestValue(X, frame). Similarly, assigning P 0 X0∈c match(Xi,Xi) + λ roles or role types to the arguments of an instance P (Xi|c) = (4) nc + αiλ X is modeled as estimating BestValue(X, role) or BestValue(X, role type), respectively.