Inducing German Semantic Verb Classes from Purely Syntactic Subcategorisation Information
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, July 2002, pp. 223-230. Inducing German Semantic Verb Classes from Purely Syntactic Subcategorisation Information Sabine Schulte im Walde Chris Brew Institut für Maschinelle Sprachverarbeitung Department of Linguistics Universität Stuttgart The Ohio State University Azenbergstraße 12, 70174 Stuttgart, Germany Columbus, USA, OH 43210-1298 [email protected] [email protected] Abstract since it provides a principled basis for filling gaps in available lexical knowledge. For example, the En- The paper describes the application of k- glish verb classification has been used for applica- Means, a standard clustering technique, tions such as machine translation (Dorr, 1997), word to the task of inducing semantic classes sense disambiguation (Dorr and Jones, 1996), and for German verbs. Using probability document classification (Klavans and Kan, 1998). distributions over verb subcategorisation Various attempts have been made to infer conve- frames, we obtained an intuitively plausi- niently observable morpho-syntactic and semantic ble clustering of 57 verbs into 14 classes. properties for English verb classes (Dorr and Jones, The automatic clustering was evaluated 1996; Lapata, 1999; Stevenson and Merlo, 1999; against independently motivated, hand- Schulte im Walde, 2000; McCarthy, 2001). constructed semantic verb classes. A series of post-hoc cluster analyses ex- To our knowledge this is the first work to ob- plored the influence of specific frames and tain German verb classes automatically. We used frame groups on the coherence of the verb a robust statistical parser (Schmid, 2000) to ac- classes, and supported the tight connec- quire purely syntactic subcategorisation information tion between the syntactic behaviour of for verbs. The information was provided in form the verbs and their lexical meaning com- of probability distributions over verb frames for ponents. each verb. There were two conditions: the first with relatively coarse syntactic verb subcategorisa- tion frames, the second a more delicate classifica- 1 Introduction tion subdividing the verb frames of the first con- A long-standing linguistic hypothesis asserts a tight dition using prepositional phrase information (case connection between the meaning components of a plus preposition). In both conditions verbs were verb and its syntactic behaviour: To a certain ex- clustered using k-Means, an iterative, unsupervised, tent, the lexical meaning of a verb determines its be- hard clustering method with well-known properties, haviour, particularly with respect to the choice of its cf. (Kaufman and Rousseeuw, 1990). The goal of a arguments. The theoretical foundation has been es- series of cluster analyses was (i) to find good values tablished in extensive work on semantic verb classes for the parameters of the clustering process, and (ii) such as (Levin, 1993) for English and (Vázquez to explore the role of the syntactic frame descrip- et al., 2000) for Spanish: each verb class contains tions in verb classification, to demonstrate the im- verbs which are similar in their meaning and in their plicit induction of lexical meaning components from syntactic properties. syntactic properties, and to suggest ways in which From a practical point of view, a verb classifi- the syntactic information might further be refined. cation supports Natural Language Processing tasks, Our long term goal is to support the development of high-quality and large-scale lexical resources. sitional phrases, according to their frequencies in the corpus. Prepositional phrases are referred to by 2 Syntactic Descriptors for Verb Frames case and preposition, such as ‘Dat.mit’, ‘Akk.für’. The resulting lexical subcategorisation for reden and The syntactic subcategorisation frames for German the frame type np whose total joint probability is verbs were obtained by unsupervised learning in a 0.35820, is displayed in Table 2 (for probability val- statistical grammar framework (Schulte im Walde et ues 1%). al., 2001): a German context-free grammar contain- ing frame-predicting grammar rules and information Refined Frame Prob about lexical heads was trained on 25 million words np:Akk.über acc / ‘about’ 0.11981 np:Dat.von dat / ‘about’ 0.11568 of a large German newspaper corpus. The lexi- np:Dat.mit dat / ‘with’ 0.06983 calised version of the probabilistic grammar served np:Dat.in dat / ‘in’ 0.02031 as source for syntactic descriptors for verb frames Table 2: Refined np distribution for reden (Schulte im Walde, 2002b). The verb frame types contain at most three The subcategorisation frame descriptions were for- arguments. Possible arguments in the frames mally evaluated by comparing the automatically are nominative (n), dative (d) and accusative (a) generated verb frames against manual definitions in noun phrases, reflexive pronouns (r), prepositional the German dictionary Duden – Das Stilwörterbuch phrases (p), expletive es (x), non-finite clauses (i), (Dudenredaktion, 2001). The F-score was 65.30% finite clauses (s-2 for verb second clauses, s-dass for with and 72.05% without prepositional phrase in- dass-clauses, s-ob for ob-clauses, s-w for indirect formation: the automatically generated data is both wh-questions), and copula constructions (k). For easy to produce in large quantities and reliable example, subcategorising a direct (accusative case) enough to serve as proxy for human judgement object and a non-finite clause would be represented (Schulte im Walde, 2002a). by nai. We defined a total of 38 subcategorisation frame types, according to the verb subcategorisa- 3 German Semantic Verb Classes tion potential in the German grammar (Helbig and Semantic verb classes have been defined for sev- Buscha, 1998), with few further restrictions on ar- eral languages, with dominant examples concern- gument combination. ing English (Levin, 1993) and Spanish (Vázquez et We extracted verb-frame distributions from the al., 2000). The basic linguistic hypothesis underly- trained lexicalised grammar. Table 1 shows an ing the construction of the semantic classes is that example distribution for the verb glauben ‘to verbs in the same class share both meaning compo- think/believe’ (for probability values 1%). nents and syntactic behaviour, since the meaning of Frame Prob a verb is supposed to influence its behaviour in the ns-dass 0.27945 sentence, especially with regard to the choice of its ns-2 0.27358 np 0.09951 arguments. n 0.08811 We hand-constructed a concise classification with na 0.08046 14 semantic verb classes for 57 German verbs before ni 0.05015 nd 0.03392 we initiated any clustering experiments. We have on nad 0.02325 hand a larger set of verbs and a more elaborate clas- nds-2 0.01011 sification, but choose to work on the smaller set for Table 1: Probability distribution for glauben the moment, since an important component of our research program is an informative post-hoc analysis We also created a more delicate version of subcate- which becomes infeasible with larger datasets. The gorisation frames that discriminates between differ- semantic aspects and majority of verbs are closely ent kinds of pp-arguments. This was done by dis- related to Levin’s English classes. They are consis- tributing the frequency mass of prepositional phrase tent with the German verb classification in (Schu- frame types (np, nap, ndp, npr, xp) over the prepo- macher, 1986) as far as the relevant verbs appear in his less extensive semantic ‘fields’. 4 Clustering Methodology 1. Aspect: anfangen, aufhören, beenden, begin- Clustering is a standard procedure in multivariate nen, enden data analysis. It is designed to uncover an inher- 2. Propositional Attitude: ahnen, denken, ent natural structure of the data objects, and the glauben, vermuten, wissen equivalence classes induced by the clusters provide 3. Transfer of Possession (Obtaining): bekom- a means for generalising over these objects. In our men, erhalten, erlangen, kriegen 4. Transfer of Possession (Supply): bringen, case, clustering is realised on verbs: the data objects liefern, schicken, vermitteln, zustellen are represented by verbs, and the data features for 5. Manner of Motion: fahren, fliegen, rudern, describing the objects are realised by a probability segeln distribution over syntactic verb frame descriptions. 6. Emotion: ärgern, freuen Clustering is applicable to a variety of areas in 7. Announcement: ankündigen, bekanntgeben, Natural Language Processing, e.g. by utilising eröffnen, verkünden class type descriptions such as in machine transla- 8. Description: beschreiben, charakterisieren, tion (Dorr, 1997), word sense disambiguation (Dorr darstellen, interpretieren and Jones, 1996), and document classification (Kla- 9. Insistence: beharren, bestehen, insistieren, vans and Kan, 1998), or by applying clusters for pochen smoothing such as in machine translation (Prescher 10. Position: liegen, sitzen, stehen et al., 2000), or probabilistic grammars (Riezler et 11. Support: dienen, folgen, helfen, unterstützen al., 2000). 12. Opening: öffnen, schließen We performed clustering by the k-Means algo- 13. Consumption: essen, konsumieren, lesen, rithm as proposed by (Forgy, 1965), which is an un- saufen, trinken supervised hard clustering method assigning data 14. Weather: blitzen, donnern, dämmern, nieseln, objects to exactly ¡ clusters. Initial verb clusters are regnen, schneien iteratively re-organised by assigning each verb to its The class size is between 2 and 6, no verb