Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, July 2002, pp. 223-230.

Inducing German Semantic Classes from Purely Syntactic Subcategorisation Information

Sabine Schulte im Walde Chris Brew Institut für Maschinelle Sprachverarbeitung Department of Linguistics Universität Stuttgart The Ohio State University Azenbergstraße 12, 70174 Stuttgart, Germany Columbus, USA, OH 43210-1298 [email protected] [email protected]

Abstract since it provides a principled basis for filling gaps in available lexical knowledge. For example, the En- The paper describes the application of k- glish verb classification has been used for applica- Means, a standard clustering technique, tions such as machine translation (Dorr, 1997), word to the task of inducing semantic classes sense disambiguation (Dorr and Jones, 1996), and for German . Using probability document classification (Klavans and Kan, 1998). distributions over verb subcategorisation Various attempts have been made to infer conve- frames, we obtained an intuitively plausi- niently observable morpho-syntactic and semantic ble clustering of 57 verbs into 14 classes. properties for English verb classes (Dorr and Jones, The automatic clustering was evaluated 1996; Lapata, 1999; Stevenson and Merlo, 1999; against independently motivated, hand- Schulte im Walde, 2000; McCarthy, 2001). constructed semantic verb classes. A series of post-hoc cluster analyses ex- To our knowledge this is the first work to ob- plored the influence of specific frames and tain German verb classes automatically. We used frame groups on the coherence of the verb a robust statistical parser (Schmid, 2000) to ac- classes, and supported the tight connec- quire purely syntactic subcategorisation information tion between the syntactic behaviour of for verbs. The information was provided in form the verbs and their lexical meaning com- of probability distributions over verb frames for ponents. each verb. There were two conditions: the first with relatively coarse syntactic verb subcategorisa- tion frames, the second a more delicate classifica- 1 Introduction tion subdividing the verb frames of the first con- A long-standing linguistic hypothesis asserts a tight dition using prepositional phrase information (case connection between the meaning components of a plus preposition). In both conditions verbs were verb and its syntactic behaviour: To a certain ex- clustered using k-Means, an iterative, unsupervised, tent, the lexical meaning of a verb determines its be- hard clustering method with well-known properties, haviour, particularly with respect to the choice of its cf. (Kaufman and Rousseeuw, 1990). The goal of a arguments. The theoretical foundation has been es- series of cluster analyses was (i) to find good values tablished in extensive work on semantic verb classes for the parameters of the clustering process, and (ii) such as (Levin, 1993) for English and (Vázquez to explore the role of the syntactic frame descrip- et al., 2000) for Spanish: each verb class contains tions in verb classification, to demonstrate the im- verbs which are similar in their meaning and in their plicit induction of lexical meaning components from syntactic properties. syntactic properties, and to suggest ways in which From a practical point of view, a verb classifi- the syntactic information might further be refined. cation supports Natural Language Processing tasks, Our long term goal is to support the development of high-quality and large-scale lexical resources. sitional phrases, according to their frequencies in the corpus. Prepositional phrases are referred to by 2 Syntactic Descriptors for Verb Frames case and preposition, such as ‘Dat.mit’, ‘Akk.für’. The resulting lexical subcategorisation for reden and The syntactic subcategorisation frames for German the frame type np whose total joint probability is verbs were obtained by unsupervised learning in a 0.35820, is displayed in Table 2 (for probability val-

statistical grammar framework (Schulte im Walde et ues 1%). al., 2001): a German context-free grammar contain- ing frame-predicting grammar rules and information Refined Frame Prob about lexical heads was trained on 25 million words np:Akk.über acc / ‘about’ 0.11981 np:Dat.von dat / ‘about’ 0.11568 of a large German newspaper corpus. The lexi- np:Dat.mit dat / ‘with’ 0.06983 calised version of the probabilistic grammar served np:Dat.in dat / ‘in’ 0.02031 as source for syntactic descriptors for verb frames Table 2: Refined np distribution for reden (Schulte im Walde, 2002b). The verb frame types contain at most three The subcategorisation frame descriptions were for- arguments. Possible arguments in the frames mally evaluated by comparing the automatically are nominative (n), dative (d) and accusative (a) generated verb frames against manual definitions in noun phrases, reflexive pronouns (r), prepositional the German dictionary Duden – Das Stilwörterbuch phrases (p), expletive es (x), non-finite clauses (i), (Dudenredaktion, 2001). The F-score was 65.30% finite clauses (s-2 for verb second clauses, s-dass for with and 72.05% without prepositional phrase in- dass-clauses, s-ob for ob-clauses, s-w for indirect formation: the automatically generated data is both wh-questions), and copula constructions (k). For easy to produce in large quantities and reliable example, subcategorising a direct () enough to serve as proxy for human judgement and a non-finite clause would be represented (Schulte im Walde, 2002a). by nai. We defined a total of 38 subcategorisation frame types, according to the verb subcategorisa- 3 German Semantic Verb Classes tion potential in the (Helbig and Semantic verb classes have been defined for sev- Buscha, 1998), with few further restrictions on ar- eral languages, with dominant examples concern- gument combination. ing English (Levin, 1993) and Spanish (Vázquez et We extracted verb-frame distributions from the al., 2000). The basic linguistic hypothesis underly- trained lexicalised grammar. Table 1 shows an ing the construction of the semantic classes is that example distribution for the verb glauben ‘to

verbs in the same class share both meaning compo- think/believe’ (for probability values 1%). nents and syntactic behaviour, since the meaning of Frame Prob a verb is supposed to influence its behaviour in the ns-dass 0.27945 sentence, especially with regard to the choice of its ns-2 0.27358 np 0.09951 arguments. n 0.08811 We hand-constructed a concise classification with na 0.08046 14 semantic verb classes for 57 German verbs before ni 0.05015 nd 0.03392 we initiated any clustering experiments. We have on nad 0.02325 hand a larger set of verbs and a more elaborate clas- nds-2 0.01011 sification, but choose to work on the smaller set for Table 1: Probability distribution for glauben the moment, since an important component of our research program is an informative post-hoc analysis We also created a more delicate version of subcate- which becomes infeasible with larger datasets. The gorisation frames that discriminates between differ- semantic aspects and majority of verbs are closely ent kinds of pp-arguments. This was done by dis- related to Levin’s English classes. They are consis- tributing the frequency mass of prepositional phrase tent with the German verb classification in (Schu- frame types (np, nap, ndp, npr, xp) over the prepo- macher, 1986) as far as the relevant verbs appear in his less extensive semantic ‘fields’. 4 Clustering Methodology 1. Aspect: anfangen, aufhören, beenden, begin- Clustering is a standard procedure in multivariate nen, enden data analysis. It is designed to uncover an inher- 2. Propositional Attitude: ahnen, denken, ent natural structure of the data objects, and the glauben, vermuten, wissen equivalence classes induced by the clusters provide 3. Transfer of Possession (Obtaining): bekom- a means for generalising over these objects. In our men, erhalten, erlangen, kriegen 4. Transfer of Possession (Supply): bringen, case, clustering is realised on verbs: the data objects liefern, schicken, vermitteln, zustellen are represented by verbs, and the data features for 5. Manner of Motion: fahren, fliegen, rudern, describing the objects are realised by a probability segeln distribution over syntactic verb frame descriptions. 6. Emotion: ärgern, freuen Clustering is applicable to a variety of areas in 7. Announcement: ankündigen, bekanntgeben, Natural Language Processing, e.g. by utilising eröffnen, verkünden class type descriptions such as in machine transla- 8. Description: beschreiben, charakterisieren, tion (Dorr, 1997), word sense disambiguation (Dorr darstellen, interpretieren and Jones, 1996), and document classification (Kla- 9. Insistence: beharren, bestehen, insistieren, vans and Kan, 1998), or by applying clusters for pochen smoothing such as in machine translation (Prescher 10. Position: liegen, sitzen, stehen et al., 2000), or probabilistic grammars (Riezler et 11. Support: dienen, folgen, helfen, unterstützen al., 2000). 12. Opening: öffnen, schließen We performed clustering by the k-Means algo- 13. Consumption: essen, konsumieren, lesen, rithm as proposed by (Forgy, 1965), which is an un-

saufen, trinken supervised hard clustering method assigning data 14. Weather: blitzen, donnern, dämmern, nieseln, objects to exactly ¡ clusters. Initial verb clusters are regnen, schneien iteratively re-organised by assigning each verb to its The class size is between 2 and 6, no verb ap- closest cluster (centroid) and re-calculating cluster pears in more than one class. For some verbs this is centroids until no further changes take place. something of an oversimplification; for example, the One parameter of the clustering process is the verb bestehen is assigned to verbs of insistence, but distance measure used. Standard choices include it also has a salient sense more related to existence. the cosine, Euclidean distance, Manhattan metric, Similarly, schließen is recorded under open/close, in and variants of the Kullback-Leibler (KL) diver- spite of the fact it also has a meaning related to infer- gence. We concentrated on two variants of KL in ence and the formation of conclusions. The classes Equation (1): information radius, cf. Equation (2), include both high and low frequency verbs, because and skew divergence, recently shown as an effective we wanted to make sure that our clustering technol- measure for distributional similarity (Lee, 2001), cf. Equation (3). 

ogy was exercised in both data-rich and data-poor ! #" $

  situations. The corpus frequencies range from 8 to ¢¤£¦¥¨§ © ¥ £  

 (1) 31,710.

Our target classification is based on semantic in-

)(* )(*

§

¢%£¦¥ © ¥ &'£' 

tuitions, not on our knowledge of the syntactic be- ,(*'£#- + + (2) haviour. As an extreme example, the semantic class Support contains the verb unterstützen, which syn- ¢%£¦¥¨§.© ¥ '£'0/2134(5£7698/:;1<! (3) tactically requires a direct object, together with the three verbs dienen, folgen, helfen which dominantly Measures (2) and (3) can tolerate zero values in the subcategorise an indirect object. In what follows we probability distribution, because they work with a will show that the semantic classification is largely weighted average of the two distributions compared. recoverable from the patterns of verb-frame occur- For the skew-divergence, we set the weight / to 0.9, rence. as was done by Lee.

Furthermore, because the k-Means algorithm is We call the cluster result  and the desired gold- 

sensitive to its starting clusters, we explored the op- standard  . For measuring the quality of an indi- 

tion of initialising the cluster centres based on other vidual cluster, the cluster purity of each cluster 

 

clustering algorithms. We performed agglomerative is defined by its largest  , the number of mem-

  hierarchical clustering on the verbs which first as- bers  that are projected into the same class . signs each verb to its own cluster and then iteratively The measure is biased towards small clusters, with determines the two closest clusters and merges them, the extreme case of singleton clusters, which is an until the specified number of clusters is left. We undesired property for our (linguistic) needs. tried several amalgamation methods: single-linkage, To capture the quality of a whole clustering,

complete-linkage, average verb distance, distance Strehl et al. combine the mutual information be-

 

between cluster centroids, and Ward’s method. tween  and (based on the shared verb member-

The clustering was performed as follows: the 57 ship  ) with a scaling factor corresponding to

 

verbs were associated with probability distributions the numbers of verbs in the respective clusters,   1  over frame types (in condition 1 there were 38 and  .

frame types, while in the more delicate condition 2

#" $

 £  "! #%$

there were 171, with a concomitant increase in data ¨



& &

6

 ©! #  "! #

¦" $

! #



£ ©    

sparseness), and assigned to starting clusters (ran- 

£     

  

 ¤ 

domly or by hierarchical clustering). The k-Means algorithm was then allowed to run for as many itera- (4) tions as it takes to reach a fixed point, and the result- This manipulation is designed to remove the bias ing clusters were interpreted and evaluated against towards small clusters:2 using the 57 verbs from the manual classes. our study we generated 50 random clusters for each Related work on English verb classification or cluster size between 1 and 57, and evaluated the re- clustering utilised supervised learning by decision sults against the gold standard, returning the best re- trees (Stevenson and Merlo, 1999), or a method re- sult for each replication. We found that even using lated to hierarchical clustering (Schulte im Walde, the scaling factor the measure favours smaller clus- 2000). ters. But this bias is strongest at the extremes of the 5 Clustering Evaluation range, and does not appear to impact too heavily on our results. The task of evaluating the result of a cluster analysis Unfortunately none of Strehl et al’s measures have against the known gold standard of hand-constructed all the properties which we intuitively require from verb classes requires us to assess the similarity be- a measure of linguistic cluster quality. For example, tween two sets of equivalence relations. As noted by if we restrict attention to the case in which all verbs (Strehl et al., 2000), it is useful to have an evaluation in an inferred cluster are drawn from the same actual measure that does not depend on the choice of sim- class, we would like it to be the case that the evalua- ilarity measure or on the original dimensionality of tion measure is a monotonically increasing function the input data, since that allows meaningful compar- of the size of the inferred cluster. We therefore intro- ison of results for which these parameters vary. This duced an additional, more suitable measure for the is similar to the perspective of (Vilain et al., 1995), evaluation of individual clusters, based on the rep- who present, in the context of the MUC co-reference resentation of equivalence classes as sets of pairs. evaluation scheme, a model-theoretic measure of the It turns out that pairwise precision and recall have

similarity between equivalence classes. some of the counter-intuitive properties that we ob- 

Strehl et al. consider a clustering that partitions

¡ jected to in Strehl et al’s measures, so we adjust pair-



¡£¢¥¤¦¤¦¤§¡©¨

objects ( ) into clusters; the clusters

 

 wise precision with a scaling factor based on the size

¡  ¡  of are the sets for which .

1We also tried various transformations and variations of the 2In the absence of the penalty, mutual information would

probabilities, such as frequencies and binarisation, but none attain its maximum (which is the entropy of ' ) not only when

proved as effective as the probabilities. A is correct but also when ( contains only singleton clusters.

of the hypothesised cluster.  structure on the clustering (such as single-linkage: 

 the output clusterings contain few very large and

£ &

number of correct pairs in 

¡ ¢  (56  (5) many singleton clusters) are hardly improved by k-

number of verbs in 

Means. Their evaluation results are noticeably be- £ ¤ We call this measure  , for adjusted pairwise low those for random clusters. But initialisation us- precision. As with any other measure of individual ing Ward’s method, which produces tighter clusters

cluster quality we can associate a quality value with and a narrower range of cluster sizes does outper- ¡©

a clustering which assigns each of the items form random cluster initialisation. Presumably the

¡  to a cluster  by taking a weighted average over issue is that the other hierarchical clustering meth-

the qualities of the individual clusters. ods place k-Means in a local minimum from which

£ 

6 it cannot escape, and that uniformly shaped cluster

£ & 

¡ ¢

   

¡ ¢

 (6) initialisation gives k-Means a better chance of avoid-

% ing local minima, even with a high degree of pertur- Figures 1 and 2 summarise the two evaluation bation. measures for overall cluster quality, showing the variation with the KL-based distance measures and 6 Linguistic Investigation with different strategies for seeding the initial cluster The clustering setup, proceeding and results provide centres in the k-Means algorithm. Figure 1 displays a basis for a linguistic investigation concerning the quality scores referring to the coarse condition 1 German verbs, their syntactic properties and seman- subcategorisation frame types, Figure 2 refers to tic classification. the clustering results obtained by verb descriptions The following clustering result is an intuitively based on the more delicate condition 2 subcategori- plausible semantic verb classification, accompanied

sation frame types including PP information. Base- £ ¢ by the cluster quality scores  , and class labels line values are 0.017 (APP) and 0.229 (MI), calcu- illustrating the majority vote of the verbs in the clus- lated as average on the evaluation of 10 random clus- ter.3 The cluster analysis was obtained by running k- ters. Optimum values, as calculated on the manual Means on a random cluster initialisation, with infor- classification, are 0.291 (APP) and 0.493 (MI). The mation radius as distance measure; the verb descrip- evaluation function is extremely non-linear, which tion contained condition 2 subcategorisation frame leads to a severe loss of quality with the first few types with PP information. clustering mistakes, but does not penalise later mis- a) ahnen, vermuten, wissen (0.75) Propositional takes to the same extent. Attitude From the methodological point of view, the clus- b) denken, glauben (0.33) Propositional Attitude tering evaluation gave interesting insights into k- c) anfangen, aufhören, beginnen, beharren, en- Means’ behaviour on the syntactic frame data. The den, insistieren, rudern (0.88) Aspect more delicate verb-frame classification, i.e. the re- d) liegen, sitzen, stehen (0.75) Position finement of the syntactic verb frame descriptions e) dienen, folgen, helfen (0.75) Support by prepositional phrase specification, improved the f) nieseln, regnen, schneien (0.75) Weather clustering results. This does not go without saying: g) dämmern (0.00) Weather there was potential for a sparse data problem, since h) blitzen, donnern, segeln (0.25) Weather i) bestehen, fahren, fliegen, pochen (0.4) Insisting even frequent verbs can only be expected to inhabit or Manner of Motion a few frames. For example, the verb anfangen with j) freuen, ärgern (0.33) Emotion a corpus frequency of 2,554 has zero counts for 138 k) essen, konsumieren, saufen, trinken, verkün- of the 171 frames. Whether the improvement really den (1.00) Consumption matters in an application task is left to further re- l) bringen, eröffnen, lesen, liefern, schicken, search. schließen, vermitteln, öffnen (0.78) Supply We found that randomised starting clusters usu- 3Verbs that are part of the majority are shown in bold face, ally give better results than initialisation from a hi- others in plain text. Where there is no clear majority, both class erarchical clustering. Hierarchies imposing a strong labels are given. k-Means cluster centre initialisation distance evaluation random hierarchical single complete average centroid ward irad APP 0.125 0.043 0.087 0.079 0.073 0.101 MI 0.328 0.226 0.277 0.262 0.250 0.304 skew APP 0.111 0.043 0.091 0.067 0.062 0.102 MI 0.315 0.226 0.281 0.256 0.252 0.349

Figure 1: Cluster quality variation based on condition 1 verb descriptions

k-Means cluster centre initialisation distance evaluation random hierarchical single complete average centroid ward irad APP 0.144 0.107 0.123 0.118 0.081 0.151 MI 0.357 0.229 0.319 0.298 0.265 0.332 skew APP 0.114 0.104 0.126 0.118 0.081 0.159 MI 0.320 0.289 0.330 0.298 0.265 0.372

Figure 2: Cluster quality variation based on condition 2 verb descriptions m) ankündigen, beenden, bekanntgeben, bekom- preference for an intransitive usage, and a mi- men, beschreiben, charakterisieren, nor 20% preference for the subcategorisation darstellen, erhalten, erlangen, interpretieren, of non-finite clauses. By mistake, the infre- kriegen, unterstützen (1.00) Description and quent verb rudern (corpus frequency 49) shows Obtaining a similar preference for ni in its frame distri- n) zustellen (0.00) Supply bution and therefore appears within the same cluster as the Aspect verbs. The frame confu- We compared the clustering to the gold standard sion has been caused by parsing mistakes for and examined the underlying verb frame distribu- the infrequent verb; ni is not among the frames tions. We undertook a series of post-hoc cluster possibly subcategorised by rudern. analyses to explore the influence of specific frames Even though the verbs beharren and insistieren and frame groups on the formation of verb classes, have characteristic frames np:Dat.auf and such as: what is the difference in the clustering re- ns-2, they share an affinity for n with the as- sult (on the same starting clusters) if we deleted all pect verbs. When eliminating n from the fea- frame types containing an expletive es (frame types ture description of the verbs, the cluster is re- including x)? Space limitations allow us only a few duced to those verbs using ni. insights.

Cluster (d) is correct: Position. The syn-

Clusters (a) and (b) are pure sub-classes of tactic usage of the three verbs is rather the semantic verb class Propositional Attitude. individual with strong probabilities for n, The verbs agree in their syntactic subcategori- np:Dat.auf and np:Dat.in. Even the na sation of a direct object ( ) and finite clauses elimination of any of the three frame features ns-2, ns-dass ( ); denken and glauben are does not cause a separation of the verbs in the assigned to a different cluster, because they clustering. also appear as intransitives, subcategorise the prepositional phrase Akk.an, and show espe- Cluster (j) represents the semantic class Emo- cially strong probabilities for ns-2. Deleting tion which, in German, has a highly charac- na or frames containing s from the verb de- teristic signature in its strong association with scription destroys the coherent clusters. reflexive frames; the cluster evaporates if we

remove the distinctions made in the r feature Cluster (c) contains two sub-classes from As- group. pect and Insistence, polluted by the verb rud- ern ‘to row’. All Aspect verbs show a 50% zustellen in cluster (n) represents a singleton because of its extraordinarily strong preference clear which of and how the frames are influential in ( 50%) for the ditransitive usage. Eliminat- the creation of which clusters. We showed that we ing the frame from the verb description assigns acquired implicit components of meaning through a zustellen to the same cluster as the other verbs syntactic extraction from a corpus, since the seman- of Transfer of Possession (Supply). tic verb classes are strongly related to the patterns in the syntactic descriptors. Everything in this study Recall that we used two different sets of syntac- suggests that the move to larger datasets is an appro- tic frames, the second of which makes more delicate priate next move. distinctions in the area of prepositional phrases. As pointed out in Section 5, refining the syntactic verb 7 Conclusion information by PPs was helpful for the semantic The paper presented the application of k-Means to clustering. But, contrary to our original intuitions, the task of inducing semantic classes for German the detailed prepositional phrase information is less verbs. Based on purely syntactic probability distri- useful in the clustering of verbs with obligatory PP butions over verb subcategorisation frames, we ob- arguments than in the clustering of verbs where the tained an intuitively plausible clustering of 57 verbs PPs are optional; we performed a first test on the into 14 classes. The automatic clustering was evalu- role of PP information: eliminating all PP informa- ated against hand-constructed semantic verb classes. tion from the verb descriptions (not only the delicate A series of post-hoc cluster analyses explored the PP information in condition 2, but also PP argument influence of specific frames and frame groups on information in the coarse condition 1 frames) pro- the coherence of the verb classes, and supported the duced obvious deficiencies in most of the semantic tight connection between the syntactic behaviour of classes, among them Weather and Support, whose the verbs and their meaning components. verbs do not require PPs as arguments. A second test Future work will concern the extension of the confirmed the finding: we augmented our coarse- clustering experiments to a larger number of verbs, grained verb frame repertoire with a much reduced both for the scientific purpose of refining our un- set of PPs, those commonly assumed as argument derstanding of the semantic and syntactic status of PPs. This provides some but not all of the PP in- verb classes and for the more applied goal of creat- formation in condition 2. The clustering result is ing a large, reliable and high quality lexical resource deficient mainly in its classification of the verbs of for German. For this task, we will need to further Propositional Attitude, Support, Opening, and few refine our verb classes, further develop the reper- of these subcategorise for PPs. toire of syntactic frames which we use, perhaps im- Clusters such as (k) to (l) suggest directions in prove the statistical grammar from which the frames which it might be desirable to subdivide the verb were extracted and find techniques which allow us frames, for example by adding a limited amount to selectively include such information about selec- of information about selectional preferences. Pre- tional preferences as is warranted by the availabil- vious work has shown that sparse data issues pre- ity of training data and the capabilities of clustering clude across the board incorporation of selectional technology. information (Schulte im Walde, 2000), but a rough distinction such as physical object vs. abstraction on the direct object slot could, for example, help to split References verkünden from the other verbs in cluster (k). Bonnie J. Dorr and Doug Jones. 1996. Role of Word The linguistic investigation gives some insight Sense Disambiguation in Lexical Acquisition: Predict- into the reasons for the success of our (rather sim- ing Semantics from Syntactic Cues. In Proceedings of the 16th International Conference on Computational ple) clustering technique. We successfully exploited Linguistics, Copenhagen, Denmark. the connection between the syntactic behaviour of Bonnie Dorr. 1997. Large-Scale Dictionary Con- a verb and its meaning components. The cluster- struction for Foreign Language Tutoring and Inter- ing result shows a good match to the manually de- lingual Machine Translation. Machine Translation, fined semantic verb classes, and in many cases it is 12(4):271–322. Dudenredaktion, editor. 2001. DUDEN – Das Stil- Sabine Schulte im Walde, Helmut Schmid, Mats Rooth, wörterbuch. Number 2 in ‘Duden in zwölf Bänden’. Stefan Riezler, and Detlef Prescher. 2001. Statistical Dudenverlag, Mannheim, 8th edition. Grammar Models and Lexicon Acquisition. In Chris- tian Rohrer, Antje Rossdeutscher, and Hans Kamp, ed- E.W. Forgy. 1965. Cluster Analysis of Multivariate Data: itors, Linguistic Form and its Computation. CSLI Pub- Efficiency vs. Interpretability of Classifications. Bio- lications, Stanford, CA. metrics, 21:768–780. Sabine Schulte im Walde. 2000. Clustering Verbs Se- Gerhard Helbig and Joachim Buscha. 1998. Deutsche mantically According to their Alternation Behaviour. Grammatik. Langenscheidt – Verlag Enzyklopädie, In Proceedings of the 18th International Conference 18th edition. on Computational Linguistics, pages 747–753, Saar- brücken, Germany. Leonard Kaufman and Peter J. Rousseeuw. 1990. Find- ing Groups in Data – An Introduction to Cluster Analy- Sabine Schulte im Walde. 2002a. Evaluating Verb Sub- sis. Probability and Mathematical Statistics. John Wi- categorisation Frames learned by a German Statisti- ley and Sons, Inc. cal Grammar against Manual Definitions in the Duden Dictionary. In Proceedings of the 10th EURALEX In- Judith L. Klavans and Min-Yen Kan. 1998. The Role ternational Congress, Copenhagen, Denmark. To ap- of Verbs in Document Analysis. In Proceedings of pear. the 17th International Conference on Computational Sabine Schulte im Walde. 2002b. A Subcategorisation Linguistics, Montreal, Canada. Lexicon for German Verbs induced from a Lexicalised PCFG. In Proceedings of the 3rd Conference on Lan- Maria Lapata. 1999. Acquiring Lexical Generalizations guage Resources and Evaluation, Las Palmas de Gran from Corpora: A Case Study for Diathesis Alterna- Canaria, Spain. To appear. tions. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pages Helmut Schumacher. 1986. Verben in Feldern. de 397–404. Gruyter, Berlin. Lillian Lee. 2001. On the Effectiveness of the Skew Di- Suzanne Stevenson and Paola Merlo. 1999. Automatic vergence for Statistical Language Analysis. Artificial Verb Classification Using Distributions of Grammati- Intelligence and Statistics, pages 65–72. cal Features. In Proceedings of the 9th Conference of the European Chapter of the Association for Compu- Beth Levin. 1993. English Verb Classes and Alterna- tational Linguistics, pages 45–52. tions. The University of Chicago Press, Chicago, 1st edition. Alexander Strehl, Joydeep Ghosh, and Raymond Mooney. 2000. Impact of Similarity Measures on Diana McCarthy. 2001. Lexical Acquisition at the Web-page Clustering. In Proceedings of the 17th Syntax-Semantics Interface: Diathesis Alternations, National Conference on Artificial Intelligence (AAAI Subcategorization Frames and Selectional Prefer- 2000): Workshop of Artificial Intelligence for Web ences. Ph.D. thesis, University of Sussex. Search, Austin, Texas.

Detlef Prescher, Stefan Riezler, and Mats Rooth. 2000. Marc Vilain, John Burger, John Aberdeen, Dennis Con- Using a Probabilistic Class-Based Lexicon for Lexical nolly, and Lynette Hirschman. 1995. A Model- Ambiguity Resolution. In Proceedings of the 18th In- Theoretic Coreference Scoring Scheme. In Proceed- ternational Conference on Computational Linguistics, ings of the 6th Message Understanding Conference, Saarbrücken. pages 45–52, San Francisco. Gloria Vázquez, Ana Fernández, Irene Castellón, and Stefan Riezler, Detlef Prescher, Jonas Kuhn, and Mark M. Antonia Martí. 2000. Clasificación Verbal: Al- Johnson. 2000. Lexicalized Stochastic Modeling of ternancias de Diátesis. Number 3 in Quaderns de Sin- Constraint-Based Grammars using Log-Linear Mea- tagma. Universitat de Lleida. sures and EM Training. In Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong.

Helmut Schmid. 2000. Lopar: Design and Implemen- tation. Arbeitspapiere des Sonderforschungsbereichs 340 Linguistic Theory and the Foundations of Com- putational Linguistics 149, Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart.