Manifold Learning for the Semi-Supervised Induction of Framenet Predicates: an Empirical Investigation

Manifold Learning for the Semi-Supervised Induction of FrameNet Predicates: An Empirical Investigation Danilo Croce and Daniele Previtali {croce,previtali}@info.uniroma2.it Department of Computer Science, Systems and Production University of Roma, Tor Vergata Abstract annotated resources have been used by Seman- tic Role Labeling methods: they are commonly This work focuses on the empirical inves- developed using a supervised learning paradigm tigation of distributional models for the where a classifier learns to predict role labels automatic acquisition of frame inspired based on features extracted from annotated train- predicate words. While several seman- ing data. One prominent resource has been de- tic spaces, both word-based and syntax- veloped under the Berkeley FrameNet project as based, are employed, the impact of ge- a semantic lexicon for the core vocabulary of En- ometric representation based on dimen- glish, according to the so-called frame seman- sionality reduction techniques is inves- tic model (Fillmore, 1985). Here, a frame is a tigated. Data statistics are accordingly conceptual structure modeling a prototypical sit- studied along two orthogonal perspectives: uation, evoked in texts through the occurrence of Latent Semantic Analysis exploits global its lexical units (LU) that linguistically expresses properties while Locality Preserving Pro- the situation of the frame. Lexical units of the jection emphasizes the role of local reg- same frame share semantic arguments. For ex- ularities. This latter is employed by em- ample, the frame KILLING has lexical units such bedding prior FrameNet-derived knowl- as assassin, assassinate, blood-bath, fatal, mur- edge in the corresponding non-euclidean derer, kill or suicide that share semantic arguments transformation. The empirical investiga- such as KILLER,INSTRUMENT,CAUSE,VICTIM. tion here reported sheds some light on the The current FrameNet release contains about 700 role played by these spaces as complex frames and 10,000 LUs. A corpus of 150,000 an- kernels for supervised (i.e. Support Vector notated examples sentences, from the British Na- Machine) algorithms: their use configures, tional Corpus (BNC), is also part of FrameNet. as a novel way to semi-supervised lexical Despite the size of this resource, it is un- learning, a highly appealing research di- der development and hence incomplete: several rection for knowledge rich scenarios like frames are not represented by evoking words and FrameNet-based semantic parsing. the number of annotated sentences is unbalanced 1 Introduction across frames. It is one of the main reason for the performance drop of supervised SRL systems in Automatic Semantic Role Labeling (SRL) is a out-of-domain scenarios (Baker et al., 2007) (Jo- natural language processing (NLP) technique that hansson and Nugues, 2008). The limited cover- maps sentences to semantic representations and age of FrameNet corpus is even more noticeable identifies the semantic roles conveyed by senten- for the LUs dictionary: it only contains 10,000 tial constituents (Gildea and Jurafsky, 2002). Sev- lexical units, far less than the 210,000 entries in eral NLP applications have exploited this kind of WordNet 3.0. For example, the lexical unit crown, semantic representation ranging from Information according to the annotations, evokes the ACCOU- Extraction (Surdeanu et al., 2003; Moschitti et al., TREMENT frame. It refers to a particular sense: 2003)) to Question Answering (Shen and Lapata, according to WordNet, it is “an ornamental jew- 2007), Paraphrase Identification (Pado and Erk, eled headdress signifying sovereignty”. Accord- 2005), and the modeling of Textual Entailment re- ing to the same lexical resource, this LU has 12 lations (Tatu and Moldovan, 2005). Large scale lexical senses and the first one (i.e. “The Crown 7 Proceedings of the 2010 Workshop on GEometrical Models of Natural Language Semantics, ACL 2010, pages 7–16, Uppsala, Sweden, 16 July 2010. c 2010 Association for Computational Linguistics (or the reigning monarch) as the symbol of the are reported. Finally, in Section 5 we draw final power and authority of a monarchy”) could evoke conclusions and outline future work. other frames, like LEADERSHIP. In (Pennacchiotti et al., 2008) and (De Cao et al., 2008), the prob- 2 Related Work lem of LU automatic induction has been treated in a semi-supervised fashion. First, LUs are mod- As defined in (Pennacchiotti et al., 2008), LU in- eled by exploiting the distributional analysis of an duction is the task of assigning a generic lexical unannotated corpus and the lexical information of unit not yet present in the FrameNet database (the WordNet. These representations were used in or- so-called unknown LU) to the correct frame(s). der to find out frames potentially evoked by novel The number of possible classes (i.e. frames) and words in order to extend the FrameNet dictionary the multiple assignment problem make it a chal- limiting the effort of manual annotations. lenging task. LU induction has been integrated at SemEval-2007 as part of the Frame Seman- In this work the distributional model of LUs tic Structure Extraction shared task (Baker et al., is further developed. As in (Pennacchiotti et al., 2007), where systems are requested to assign the 2008), several word spaces (Pado and Lapata, correct frame to a given LU, even when the LU is 2007) are investigated in order to find the most not yet present in FrameNet. Several approaches suitable representation of the properties which show low coverage (Johansson and Nugues, 2007) characterize a frame. Two dimensionality reduc- or low accuracy, like (Burchardt et al., 2005). This tion techniques are applied here in this context. task is presented in (Pennacchiotti et al., 2008) and Latent Semantic Analysis (Landauer and Dumais, (De Cao et al., 2008), where two different mod- 1997) uses the Singular Value Decomposition to els which combine distributional and paradigmatic find the best subspace approximation of the orig- (i.e. lexical) information have been discussed. The inal word space, in the sense of minimizing the distributional model is used to select a list of frame global reconstruction error projecting data along suggested by the corpus’ evidences and then the the directions of maximal variance. Locality Pre- plausible lexical senses of the unknown LU are serving Projection (He and Niyogi, 2003) is a used to re-rank proposed frames. linear approximation of the nonlinear Laplacian In order to exploit prior information provided Eigenmap algorithm: its locality preserving prop- by the frame theory, the idea underlying is that se- erties allows to add a set of constraints forcing mantic knowledge can be embedded from exter- LUs that belong to the same frame to be near in nal sources (i.e the FrameNet database) into the the resulting space after the transformation. LSA distributional model of unannotated corpora. In performs a global analysis of a corpus capturing (Basu et al., 2006) a limited prior knowledge is ex- relations between LUs and removing the noise in- ploited in several clustering tasks, in term of pair- troduced by spurious directions. However it risks wise constraints (i.e., pairs of instances labeled to ignore lexical senses poorly represented into the as belonging to same or different clusters). Sev- corpus. In (De Cao et al., 2008) external knowl- eral existing algorithms enhance clustering qual- edge about LUs is provided by their lexical senses ity by applying supervision in the form of con- from a lexical resource (e.g WordNet). In this straints. These algorithms typically utilize the work, prior knowledge about the target problem is pairwise constraints to either modify the clustering directly embedded into the space through the LPP objective function or to learn the clustering distor- transformation, by exploiting locality constraints. tion measure. The approach discussed in (Basu et Then a Support Vector Machine is employed to al., 2006) employs Hidden Markov Random Fields provide a robust acquisition of lexical units com- (HMRFs) as a probabilistic generative model for bining global information provided by LSA and semi-supervised clustering, providing a principled the local information provided by LPP into a com- framework for incorporating constraint-based su- plex kernel function. pervision into prototype-based clustering. In Section 2 related work is presented. In Sec- Another possible approach is to directly embed tions 3 the investigated distributional model of the prior-knowledge into data representations. The LUs is presented as well as the dimensionality re- main idea is to employ effective and efficient algo- duction techniques. Then, in Section 4 the exper- rithms for constructing nonlinear low-dimensional imental investigation and comparative evaluations manifolds from sample data points embedded 8 in high-dimensional spaces. Several algorithms tors are semantically related. Contexts are words are defined, including Isometric feature mapping appearing together with a LU: such a space mod- (ISOMAP) (Tenenbaum et al., 2000), Locally Lin- els a generic notion of semantic relatedness, i.e. ear Embedding (LLE) (Roweis and Saul, 2000), two LUs spatially close in the space are likely to Local Tangent Space alignment (LTSA) (Zhang be either in paradigmatic or syntagmatic relation and Zha, 2004) and Locality Preserving Projec- as in (Sahlgren, 2006). Here, LUs delimit sub- tion (LPP) (He and Niyogi, 2003) and they have spaces modeling the prototypical semantic of the been successfully

Load more