Verb Classification Using Distributional Similarity in Syntactic
Total Page:16
File Type:pdf, Size:1020Kb
Verb Classification using Distributional Similarity in Syntactic and Semantic Structures Danilo Croce Alessandro Moschitti University of Tor Vergata University of Trento 00133 Roma, Italy 38123 Povo (TN), Italy [email protected] [email protected] Roberto Basili Martha Palmer University of Tor Vergata University of Colorado at Boulder 00133 Roma, Italy Boulder, CO 80302, USA [email protected] [email protected] Abstract still far for being accomplished. In particular, the ex- In this paper, we propose innovative repre- haustive design and experimentation of lexical and sentations for automatic classification of verbs syntactic features for learning verb classification ap- according to mainstream linguistic theories, pears to be computationally problematic. For exam- namely VerbNet and FrameNet. First, syntac- ple, the verb order can belongs to the two VerbNet tic and semantic structures capturing essential lexical and syntactic properties of verbs are classes: defined. Then, we design advanced similarity – The class 60.1, i.e., order someone to do some- functions between such structures, i.e., seman- thing as shown in: The Illinois Supreme Court or- tic tree kernel functions, for exploiting distri- dered the commission to audit Commonwealth Edi- butional and grammatical information in Sup- son ’s construction expenses and refund any unrea- port Vector Machines. The extensive empir- sonable expenses . ical analysis on VerbNet class and frame de- tection shows that our models capture mean- – The class 13.5.1: order or request something like ingful syntactic/semantic structures, which al- in: ... Michelle blabs about it to a sandwich man lows for improving the state-of-the-art. while ordering lunch over the phone . Clearly, the syntactic realization can be used to dis- 1 Introduction cern the cases above but it would not be enough to Verb classification is a fundamental topic of com- correctly classify the following verb occurrence: .. putational linguistics research given its importance ordered the lunch to be delivered .. in Verb class for understanding the role of verbs in conveying se- 13.5.1. For such a case, selectional restrictions are mantics of natural language (NL). Additionally, gen- needed. These have also been shown to be use- eralization based on verb classification is central to ful for semantic role classification (Zapirain et al., many NL applications, ranging from shallow seman- 2010). Note that their coding in learning algorithms tic parsing to semantic search or information extrac- is rather complex: we need to take into account syn- tion. Currently, a lot of interest has been paid to tactic structures, which may require an exponential two verb categorization schemes: VerbNet (Schuler, number of syntactic features (i.e., all their possible 2005) and FrameNet (Baker et al., 1998), which substructures). Moreover, these have to be enriched has also fostered production of many automatic ap- with lexical information to trig lexical preference. proaches to predicate argument extraction. Such work has shown that syntax is necessary In this paper, we tackle the problem above for helping to predict the roles of verb arguments by studying innovative representations for auto- and consequently their verb sense (Gildea and Juras- matic verb classification according to VerbNet and fky, 2002; Pradhan et al., 2005; Gildea and Palmer, FrameNet. We define syntactic and semantic struc- 2002). However, the definition of models for opti- tures capturing essential lexical and syntactic prop- mally combining lexical and syntactic constraints is erties of verbs. Then, we apply similarity between 263 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pages 263–272, Jeju, Republic of Korea, 8-14 July 2012. c 2012 Association for Computational Linguistics such structures, i.e., kernel functions, which can also commission in the role PATIENT of the predicate. It exploit distributional lexical semantics, to train au- clearly satisfies the +ANIMATE/+ORGANIZATION tomatic classifiers. The basic idea of such functions restriction on the PATIENT role. This is not true is to compute the similarity between two verbs in for the direct object dependency of the alternative terms of all the possible substructures of their syn- sense 13.5.1, which usually expresses the THEME tactic frames. We define and automatically extract role, with unrestricted type selection. When prop- a lexicalized approximation of the latter. Then, we erly generalized, the direct object information has apply kernel functions that jointly model structural thus been shown highly predictive about verb sense and lexical similarity so that syntactic properties are distinctions. combined with generalized lexemes. The nice prop- In (Brown et al., 2011), the so called dynamic erty of kernel functions is that they can be used dependency neighborhoods (DDN), i.e., the set of in place of the scalar product of feature vectors to verbs that are typically collocated with a direct ob- train algorithms such as Support Vector Machines ject, are shown to be more helpful than lexical in- (SVMs). This way SVMs can learn the association formation (e.g., WordNet). The set of typical verbs between syntactic (sub-) structures whose lexical ar- taking a noun n as a direct object is in fact a strong guments are generalized and target verb classes, i.e., characterization for semantic similarity, as all the they can also learn selectional restrictions. nouns m similar to n tend to collocate with the same We carried out extensive experiments on verb verbs. This is true also for other syntactic depen- class and frame detection which showed that our dencies, among which the direct object dependency models greatly improve on the state-of-the-art (up is possibly the strongest cue (as shown for example to about 13% of relative error reduction). Such re- in (Dligach and Palmer, 2008)). sults are nicely assessed by manually inspecting the In order to generalize the above DDN feature, dis- most important substructures used by the classifiers tributional models are ideal, as they are designed as they largely correlate with syntactic frames de- to model all the collocations of a given noun, ac- fined in VerbNet. cording to large scale corpus analysis. Their abil- In the rest of the paper, Sec. 2 reports on related ity to capture lexical similarity is well established in work, Sec. 3 and Sec. 4 describe previous and our WSD tasks (e.g. (Schutze, 1998)), thesauri harvest- models for syntactic and semantic similarity, respec- ing (Lin, 1998), semantic role labeling (Croce et al., tively, Sec. 5 illustrates our experiments, Sec. 6 dis- 2010)) as well as information retrieval (e.g. (Furnas cusses the output of the models in terms of error et al., 1988)). analysis and important structures and finally Sec. 7 derives the conclusions. Distributional Models (DMs). These models fol- low the distributional hypothesis (Firth, 1957) and 2 Related work characterize lexical meanings in terms of context of Our target task is verb classification but at the same use, (Wittgenstein, 1953). By inducing geometrical time our models exploit distributional models as notions of vectors and norms through corpus analy- well as structural kernels. The next three subsec- sis, they provide a topological definition of seman- tions report related work in such areas. tic similarity, i.e., distance in a space. DMs can Verb Classification. The introductory verb classi- capture the similarity between words such as dele- fication example has intuitively shown the complex- gation, deputation or company and commission. In ity of defining a comprehensive feature representa- case of sense 60.1 of the verb order, DMs can be tion. Hereafter, we report on analysis carried out in used to suggest that the role PATIENT can be inher- previous work. ited by all these words, as suitable Organisations. It has been often observed that verb senses tend In supervised language learning, when few exam- to show different selectional constraints in a specific ples are available, DMs support cost-effective lexi- argument position and the above verb order is a clear cal generalizations, often outperforming knowledge example. In the direct object position of the example based resources (such as WordNet, as in (Pantel et sentence for the first sense 60.1 of order, we found al., 2007)). Obviously, the choice of the context 264 type determines the type of targeted semantic prop- are just related (so they can be different). The con- erties. Wider contexts (e.g., entire documents) are tribution of (ii) is proportional to the lexical similar- shown to suggest topical relations. Smaller con- ity of the tree lexical nodes, where the latter can be texts tend to capture more specific semantic as- evaluated according to distributional models or also pects, e.g. the syntactic behavior, and better capture lexical resources, e.g., WordNet. paradigmatic relations, such as synonymy. In partic- In the following, we define our models based on ular, word space models, as described in (Sahlgren, previous work on LSA and SPTKs. 2006), define contexts as the words appearing in a 3.1 LSA as lexical similarity model n-sized window, centered around a target word. Co- Robust representations can be obtained through occurrence counts are thus collected in a words-by- intelligent dimensionality reduction methods. In words matrix, where each element records the num- LSA the original word-by-context matrix M is de- ber of times two words co-occur within a single win- composed through Singular Value Decomposition dow of word tokens. Moreover, robust weighting (SVD) (Landauer and Dumais, 1997; Golub and Ka- schemas are used to smooth counts against too fre- han, 1965) into the product of three new matrices: quent co-occurrence pairs: Pointwise Mutual Infor- U, S, and V so that S is diagonal and M = USV T . mation (PMI) scores (Turney and Pantel, 2010) are M is then approximated by M = U S V T , where commonly adopted. k k k k only the first k columns of U and V are used, Structural Kernels.