Proceedings of the Society for Computation in Linguistics

Volume 2 Article 16

2019 Identifying Participation of Individual Verbs or VerbNet Classes in the Causative Alternation Esther Seyffarth Heinrich Heine University Düsseldorf, [email protected]

Follow this and additional works at: https://scholarworks.umass.edu/scil Part of the Computational Linguistics Commons

Recommended Citation Seyffarth, Esther (2019) "Identifying Participation of Individual Verbs or VerbNet Classes in the Causative Alternation," Proceedings of the Society for Computation in Linguistics: Vol. 2 , Article 16. DOI: https://doi.org/10.7275/efvz-jy59 Available at: https://scholarworks.umass.edu/scil/vol2/iss1/16

This Paper is brought to you for free and open access by ScholarWorks@UMass Amherst. It has been accepted for inclusion in Proceedings of the Society for Computation in Linguistics by an authorized editor of ScholarWorks@UMass Amherst. For more information, please contact [email protected]. Identifying Participation of Individual Verbs or VerbNet Classes in the Causative Alternation

Esther Seyffarth Heinrich Heine University Düsseldorf Düsseldorf, Germany [email protected]

Abstract meaning. The choice of a transitive or intransitive syntactic frame has an impact on the semantic roles Verbs that participate in diathesis alternations that are part of the meaning of the verb. Consider have different semantics in their different syn- sentences (1) and (2). tactic environments, which need to be distin- guished in order to process these verbs and (1) The number of students has decreased. their contexts correctly. We design and imple- ment 8 approaches to the automatic identifica- (2) We have decreased the number of students. tion of the causative alternation in English (3 based on VerbNet classes, 5 based on individ- Both sentences make a statement about the number ual verbs). For verbs in this alternation, the se- of students having changed. In (1), the only se- mantic roles that contribute to the meaning of mantic role that is explicitly encoded is the THEME the verb can be associated with different syn- (the thing that has decreased). In addition to this tactic slots. Our most successful approaches role, sentence (2) also specifies an AGENT that has use distributional vectors and achieve an F1 score of up to 79% on a balanced test set. We causative control over the event. also apply our approaches to the distinction be- The automatic identification of verbs that par- tween the causative alternation and the unex- ticipate in the causative alternation is not trivial, pressed object alternation. Our best system for because the semantic roles involved in the events this is based on syntactic information, with an described by the different uses of these verbs are F1 score of 75% on a balanced test set. encoded in the syntactic frames in different ways. In the sentences above, the THEME is located in the 1 Introduction syntactic subject position in (1), but in the syntactic English verbs impose syntactic and semantic re- direct object position in (2). strictions on their arguments, but some verbs are It is insufficient to use the presence or absence more flexible than others. A number of verbs in of syntactic arguments, such as the direct object, English have different syntactic frames (subcate- as indicators for or against a verb’s participation gorization frames, SCFs) that are associated with in the alternation, since verbs can occur with dif- different semantics. This behavior of a subset of ferent sets of arguments for other reasons. Many the verbs in a language is known as diathesis alter- verbs in English have optional direct objects; in the nations, or verb alternations (Levin, 1993). terminology of Levin (1993), they participate in Verbs that participate in one or more alternations the unexpressed object alternation. Distinguishing are potentially problematic in the context of natural between different alternations is essential for tasks language processing tasks. In order to be able to that rely on correct semantic analyses of a verb and process an instance of an alternating verb in a text, its arguments. Examples for applications where it is necessary to distinguish between the different this is important are , informa- possible uses, so that the correct meaning can be tion extraction, or summarization. assigned to the given instance. This paper describes, compares and evaluates a The causative alternation is one of the regular total of 8 approaches to the automatic identification verb alternations in English. Verbs in this alter- of verbs in the causative alternation. Of particular nation can be used intransitively, with an inchoat- interest to us is the comparison of different evalua- ive meaning, or transitively, leading to a causative tion conditions and different test sets, which give

146 Proceedings of the Society for Computation in Linguistics (SCiL) 2019, pages 146-155. New York City, New York, January 3-6, 2019 a wide range of accuracy scores depending on the decide whether each verb participates in the alterna- setup. tion. The author describes several different setups; Our results show that some of our setups are the highest accuracy on her test set for the causative more robust than others against different evaluation alternation is 73%. conditions. The different evaluation conditions are Merlo and Stevenson (2001) distinguish three useful because they can expose problems of indi- types of optionally intransitive verbs using a num- vidual setups, such as a tendency to assign false ber of features. Their distinction is between unerga- positive labels. tive, unaccusative, and object-drop verbs. Their We also evaluate the performance of our systems features include semantic features like animacy or on the distinction between verbs in the causative causativity, as well as syntactic features like pas- alternation and verbs in the unexpressed-object al- sive voice or presence of a VBN tag. The com- ternation. Although these alternations resemble bination of all features leads to a classification each other in the SCFs they allow, our results show accuracy of 69.8% on a test set of 20 unergative that some systems perform well on both classifica- verbs, 19 unaccusative verbs, and 20 object-drop tion tasks. verbs. From their experiments with human anno- While English alternating verbs can be identi- tators, they derive an “expert-based upper bound” fied by looking them up in one of the resources accuracy around 86.5% for the task. that exist for such purposes, we find it desirable Sun et al. (2013) present an unsupervised ap- to create dynamic systems for the identification of proach to the semantic classification of verbs that alternating verbs, for two main reasons. First, the uses approximations of diathesis alternations as phenomenon may be productive to a certain degree, features. While they evaluate their system on verb which means that resources can become outdated; class induction tasks, their method for approximat- and second, there are other languages with similar ing diathesis alternations is also of interest in isola- phenomena for which no (large) resources like this tion from the clustering results. The approximation exist. A system that does well on English alterna- takes into account the subcategorization frames that tion identification can be helpful in building this are observed for each verb in a corpus and the like- type of resource for other languages. lihood of individual verbs and individual SCFs. As each diathesis alternation gives rise to several SCFs, 2 Related Work the joint probability can be used to predict whether While diathesis alternations have been a topic of the SCFs of a verb are observed by chance or due linguistic discussion for some time, the idea of to the verb’s participation in an alternation. identifying these alternations automatically has An area of research that is related to, but distinct mostly been discussed after the publication of from the task we discuss here is the clustering of Levin (1993). Her lists of verb alternations and verbs into Levin classes, VerbNet classes, or other of verb classes made it possible to design systems semantic classes. Examples for this are presented in that cluster verbs automatically and to evaluate the Lapata and Brew (1999); Schulte Im Walde (2000); outputs of those systems against Levin’s data. Joanis (2002); Joanis and Stevenson (2003); Steven- The creation of resources like WordNet (Miller, son and Joanis (2003); Schulte Im Walde (2006); 1998) and VerbNet (Kipper et al., 2000) made this Joanis et al. (2008); Sun et al. (2008); Korhonen even easier. For instance, approaches like the one (2009). While verb classes are associated with by McCarthy (2000, 2001) use the WordNet hi- diathesis alternations, there is no one-to-one rela- erarchy to predict whether a verb participates in tion between a verb class and an alternation. Thus, the causative or conative alternation. Using a sub- strategies that are useful for verb class prediction categorization lexicon derived from the BNC, she cannot always be used in the same way for the calculates the similarity of the role fillers for each prediction of verb alternations. In this paper, we position in the verb’s syntactic frame to identify develop systems for the identification of diathesis cases where different slots have a systematic over- alternations because we are specifically interested lap in their semantic preferences, which is seen as in properties of verbs at the syntax-semantics inter- an indicator for the alternation. McCarthy (2000) face. hand-picks a set of 46 positive and 53 negative One of the contributions of our work that has verbs that are being classified. Human annotators not been discussed in depth in the literature is the

147 comparison of the performance of different classifi- Our strategy for collecting verbs outside the cation methods on the lists by Levin (1993) and on causative alternation creates a negative set that is VerbNet classes, and on different subsets of verbs not too large and contains only verbs that exhibit from those lists. We show that the scores of each behavior similar to alternating verbs. Effectively, method sometimes vary by a large margin, which we select verbs that participate in the unexpressed- makes it difficult to compare the performance of object alternation (see also Section 6). previous approaches when the evaluation technique One difficulty in evaluating our classification is not described in detail. systems is that verbs are not distinguished by verb sense. For instance, Levin lists advance as an alter- 3 Methods nating verb in its (CAUSED-)CHANGE-OF-STATE sense, but also as a non-alternating verb in its We develop three class-based systems and five verb- FUTURE-HAVING sense. We plan to examine these based systems for the classification of English verbs issues and their implications for the task more regarding their participation in the causative alterna- closely in future work. For now, in order to achieve tion. Class-based systems classify verbs that belong a clear separation of alternating and non-alternating to the same VerbNet class together. Verb-based sys- verbs, we decide to drop verbs with multiple class tems classify each verb lemma individually. membership from the test sets for our experiments, leading to a set of 358 positive and 236 negative 3.1 Data examples3. Syntactic patterns and frequency counts are derived While the positive examples from both sources from a dependency-parsed version of the BNC cor- share a specific syntactic and semantic behavior, the 1 pus (Burnard, 2007) . The approaches that classify negative examples are less homogeneous, and it is individual verbs refer to the dependency annota- not necessarily useful to treat the negative examples tions in the corpus and in some cases to distribu- as a closed set. Theoretically, all verbs that are tional vectors, but do not make use of any not listed as positive examples should be usable external, manually-created resources. as negative examples, provided the list of positive For the class-based approaches, we collect in- examples is complete and covers all verbs in the formation about individual verbs from verb class language that participate in the alternation. definitions in VerbNet 3.3 (Kipper et al., 2000). However, using all unlisted verbs as negative ex- The information we gather from the resource is amples would make the classification much more limited to determining which verbs form a class difficult to evaluate, due to the overwhelming ma- together; nothing else is being read from VerbNet. jority of negative items. Instead, we use the sets We evaluate our systems on two different test described above and propose three test conditions 2 sets , each including examples of alternating verbs to base our evaluations on. The test conditions are as well as examples of non-alternating verbs. One described in Section 4. of the test sets is the list given by Levin (1993, pp. 27–32), where Levin lists 365 examples of verbs 3.2 Class-Based Methods that participate in the alternation and 238 examples of verbs that do not participate. The other test set is We define three class-based approaches to the task. extracted from VerbNet; the set of alternating verbs The information these approaches rely on is ex- consists of all members of all VerbNet classes that tracted from VerbNet 3.3. The measures used to are marked as having causative and inchoative uses. calculate the likelihood of each verb to alternate are As negative examples for this test set, we use verbs inspired by Bonial et al. (2011). There, the authors that are marked in VerbNet as having transitive define scores that are used as indicators for how and intransitive uses (i.e., they are syntactically likely the members of each VerbNet class are to flexible), but not as having causative and inchoative occur in the CAUSED-MOTION construction. In a uses. This leads to 469 positive examples and 454 similar way, we define scores that indicate the like- negative examples. lihood of participation in the causative alternation for the members of each VerbNet class. 1112,298,424 tokens in 6,026,307 sentences. Parsed with the Stanford CoreNLP Library (Manning et al., 2014). 3Levin also lists some verbs more than once, such as bleed. 2The full test sets are available from the author upon re- We group with the same surface forms together without quest. performing word sense disambiguation.

148 The motivation for the approaches in this section 3.2.3 VNToken is that the causative alternation is closely related to Our third approach, VNTOKEN, is a token-based the question whether a verb can be used transitively variation of VNRANK, assigning a score following and intransitively. Although transitive and intransi- Equation 3. tive uses can also be an indicator of optional com- plements or of the unexpressed object alternation, min(c(vtrans0 ), c(vintrans0 )) v C we implement three setups based on the transitivity a (v C)= 02 (3) 3 2 P behavior of verbs to determine how indicative the c(v0) v C use of transitive and intransitive SCFs is for the 02 P causative alternation. a3 assigns each verb v an alternatability score that is based on the frequency of transitive or in- 3.2.1 VNType transitive uses (whichever is lower) of verbs in the Our first classifier, VNTYPE, assigns the alternat- same VerbNet class C. ing label to all members of the VerbNet class if all Since we use the lower of the two numbers as members occur in both SCFs at least once in the indicator for how strong the verb’s ability to al- corpus, following the rule in Equation 1. ternate is, the highest possible score that verbs in a VerbNet class C can achieve according to a3 is

a1(v C)=1 iff v0 C : c(vtrans0 ) > 0 0.5, meaning that the overall number of transitive 2 8 2 (1) occurrences of members of C in the corpus was c(vintrans0 ) > 0 ^ exactly as high as the overall number of intransitive C 0 The alternatability score 1 (verb participates in occurrences of verbs in . The lowest score is , C the alternation) is given to the verb v iff all verbs which is the case when all members of occur in the same VerbNet class C occur at least once in exclusively transitively or exclusively intransitively a transitive SCF and at least once in an intransitive in the corpus. SCF. Verbs that do not occur in the corpus do not im- pact the score calculated by a , since they do not In all other cases, including the case that one or 3 add anything to either the numerator or the denom- more verbs in the class are not found in the cor- inator in the equation. pus at all, the classifier assigns the non-alternating label to all members of the VerbNet class. This 3.3 Verb-Based Methods approach is type-based: Frequencies of transitive The following methods classify or rank verbs indi- and intransitive uses are not taken into account. vidually, without taking their VerbNet classes into account. 3.2.2 VNRank Our second approach, VNRANK, derives the alter- 3.3.1 SCFFlag natability score from the percentage of verb types Our first verb-based approach, SCFFLAG, is a vari- in a VerbNet class that were observed at least once ation of VNTYPE that classifies verbs individually, transitively and at least once intransitively, follow- not based on their VerbNet classes. Equation 4 ing Equation 2. shows how the score is calculated.

a (v)=1 iff c(v ) > 0 c(v ) > 0 (4) 4 trans ^ intrans v0 C : c(vtrans0 )>0 c(vintrans0 )>0 a2(v C)= |{ 2 ^ }| (2) 2 C The alternatability score 1 (verb participates in | | the alternation) is given to a verb v iff v is observed a2 assigns each verb v an alternatability score at least once with a transitive SCF and at least once between 0 and 1 that reflects the percentage of with an intransitive SCF. verbs in the same VerbNet class C that occur at Since this approach is purely syntactic, it is prone least once in a transitive SCF and at least once in to misclassifying verbs that have optional direct an intransitive SCF. objects as verbs that participate in the causative If a class C contains one or more verbs v0 that are alternation. We include it here in order to compare not observed in the corpus, the likelihood for that other systems against it. Unattested verbs are la- class and its member v to be labeled as alternating beled as non-alternating, as they do not fulfill the becomes lower. criteria defined here for alternating verbs.

149 3.3.2 SCFRatio 3.3.3 CentroidDistance Our second verb-based approach, SCFRATIO, is Our third verb-based approach, CENTROIDDIS- a variation on SCFFLAG that takes the frequency TANCE, makes use of distributional information of each syntactic frame into account, as shown in about the arguments observed for each verb in the Equation 5. BNC corpus. Our source for distributional informa- tion is a set of pre-trained word vectors derived

c(vtrans) from a 100-billion word portion of the Google a5(v)= (5) News dataset4. The vector set covers 3 million c(vintrans) words and phrases and represents each item with a 300-dimensional vector created with The result of a5 is a continuous value that re- flects the relation between the number of transitive (Mikolov et al., 2013). The formula that calculates and intransitive occurrences of v. In the special the alternatability score is given in Equation 6. case that the denominator is zero, a value of 0 is a6(v)=cos(objects ! , intr - subjects ! ) (6) assigned. While SCFFLAG only checks whether both tran- a6 derives a verb’s alternatability score from the sitive and intransitive frames are possible, this ap- difference between the centroid of the verb’s direct proach looks at their relative frequency, as observed objects in the corpus, and the centroid of its intran- in the corpus. If a verb is mostly used transitively sitive subjects. The difference is represented by and is only intransitive in one instance, it does not the cosine similarity between the centroids. The necessarily participate in the causative alternation; method is an unsupervised approximation of the the one intransitive instance might be due to an WordNet-based approach by McCarthy (2001). error or a coerced usage of the verb. If it occurs Verbs that are unattested in the corpus or occur transitively and intransitively with similar frequen- with only one SCF are treated as unlikely to alter- cies, it is more likely for that verb to participate in nate. Some verb argument slot fillers, particularly the causative alternation. proper names, are not covered by the Google News Verbs that are not found in the corpus are treated vectors. These arguments are disregarded. like syntactically inflexible verbs, that is, they are 3.3.4 CentroidSubjVsObj unlikely to alternate. Like SCFFLAG, we expect Our fourth verb-based approach, CENTROIDSUBJ- this setup to be prone to misclassifying verbs with VSOBJ, extends the CENTROIDDISTANCE ap- optional objects as participating in the causative proach by another criterion. Verbs can impose alternation. similar selectional preferences on different argu- The method is similar to that of Sun et al. (2013), ment slots, e.g. when they require animate subjects but unlike that system, it does not take the over- as well as animate direct objects. CENTROIDDIS- all frequency of each SCF over all verbs in the TANCE may not be successful in classifying such corpus into account. In SCFFLAG and SCF- verbs. This approach takes that fact into account RATIO, the SCFs are simplified and distinguished by comparing the centroid distance calculated in by (in)transitivity; the presence or absence of other a6 above to the distance between the centroids for dependents and the order of the items in the SCF transitive and intransitive subjects of the given verb, are disregarded. The main reason for this simpli- as shown in Equation 7. fication is that it minimizes sparsity problems by generalizing over different SCFs that share the prop- a (v)=a (v) cos(tr- subjects ! , intr - subjects ! ) 7 6 erty of being (in)transitive. (7) The following approaches explore the idea that The assumption is that verbs whose subjects tend verbs in the alternation will impose the same selec- to be more similar to each other than their intransi- tional preferences on their direct objects as on their tive subjects are to their objects are likely to partici- intransitive subjects, as illustrated by examples (1) pate in the object-drop alternation, not the causative and (2). Therefore, the set of possible direct ob- alternation. Unattested verbs and arguments that jects should be more similar to the set of possible cannot be associated with a vector are handled as intransitive subjects if a verb alternates than it is in the CENTROIDDISTANCE approach. for verbs that do not alternate. We implement two 4Available from https://code.google.com/archive/p/ vector-based approaches to test this. word2vec/.

150 3.3.5 RNN-LM The first test condition, ALL, evaluates the accu- Our fifth verb-based approach, RNN-LM, deter- racy of each setup on the full verb sets. It uses all mines the alternatability score of a verb based on positive and negative examples from Levin, and all automatically-generated acceptability scores for positive examples and syntactically-flexible nega- the verb’s uses in transitive and intransitive envi- tive examples from VerbNet. ronments. We extract ordered, lemma-based argu- The second test condition, FREQ, evaluates the ment sequences from the corpus and use a script accuracy of each setup on the 300 most frequent to turn transitive sentences into intransitive sen- verbs from the full Levin and VerbNet sets. The tences by deleting the direct object, and to turn selection is based on the number of occurrences intransitive sentences into transitive sentences by of each verb in the BNC corpus. For the Levin adding dummy arguments in the form of personal set, the frequency cut-off leads to a slight majority pronouns. Thus, the sentence “Pat invites Kim” be- of alternating verbs in the test set (177 alternating, comes the (transitive) argument sequence “invite- 123 non-alternating). For the VerbNet set, it leads Pat-Kim” and is turned into the (intransitive) se- to a slight majority of non-alternating verbs (138 quence “invite-Pat”. The sentence “Sandy slept” alternating, 162 non-alternating). becomes “sleep-Sandy”, and is turned into the se- The third test condition, BALANCED, evaluates quence “sleep-Sandy-her”. the accuracy of each setup on the 150 most frequent Having extended the input data with these al- verbs from each class for each source, so that we ternated sentences, the system now compares the force a balance of 150 positive and 150 negative average acceptability of all transitive and intran- examples per source, where each verb occurs with sitive uses of each verb, as shown in Equation 8. high frequency in the corpus. Since the ALL condition has the largest majority 1 of alternating verbs, the setups that perform better a (v)= 8 avg_acc(v ) avg_acc(v ) in this condition are the ones that are more likely | trans intrans | (8) to assign the alternating label. We find that the BALANCED condition gives the best indicator of Acceptability is approximated by a modified ver- the accuracy of a setup. sion of the Recurrent Neural Network from Lau Table 1 shows the F1 scores of our setups. For et al. (2015). That work is concerned with the comparison, we also include the results of a random prediction of acceptability judgments that are nor- baseline. Bold font indicates the highest scores malized for factors like sentence length or word achieved in each condition. frequency. This makes the measure more useful For the systems that employ a ranking to classify for our task than the mere probability of an input the verbs, we report the scores achieved by split- sequence (because changing the transitivity in a ting the ranked lists at a pre-defined index. The sentence always changes the length of the sentence, results reported here correspond to the split index and thus, its probability). For details on the al- that leads to the same number of alternating and gorithm, see the original paper (Lau et al., 2015). non-alternating verbs as in the test sets. By using argument sequences instead of sentences, we reduce noise. Unattested verbs are treated as Initially, we had split the ranked lists at the unlikely to alternate. index leading to the optimal score. This intro- We expect that the artificially-alternated sen- duced a strong bias in favor of the ranking approa- tences will receive lower acceptability scores for ches. In another setup, we used the mean ranking non-alternating verbs, while verbs that participate score to separate the classes, which sometimes also in the alternation are more acceptable in their gen- achieved better results than the fixed-index split. erated alternate uses. Thus, the difference in ac- We decided to report the score as described above ceptability scores should be lower for alternating because we find that it best reflects the ability of verbs. each approach to predict the data in the test sets.

4 Results 5 Discussion We report the scores of our systems evaluated in For those of our systems that use corpus data for three different conditions, to show the impact of the classification, the frequency of individual verbs the size and contents of the test set on the score. has an impact on the likelihood of the system to

151 Levin VerbNet ALL FREQ BALANCED ALL FREQ BALANCED RANDOM BASELINE 0.51 0.54 0.52 0.53 0.47 0.56 VNTYPE 0.20 0.31 0.32 0.10 0.18 0.17 VNRANK 0.67 0.63 0.52 0.60 0.42 0.52 VNTOKEN 0.61 0.59 0.50 0.83 0.68 0.71 SCFFLAG 0.71 0.74 0.67 0.59 0.63 0.67 SCFRATIO 0.71 0.72 0.65 0.68 0.57 0.60 CENTROIDDISTANCE 0.62 0.60 0.62 0.64 0.78 0.79 CENTROIDSUBJVSOBJ 0.63 0.63 0.57 0.64 0.79 0.79 RNN-LM 0.66 0.69 0.59 0.66 0.78 0.79

Table 1: F1 scores of all setups assign the correct label. Verbs that are listed in infrequent in the BNC. the Levin or VerbNet sets as participating in the The approaches SCFFLAG and SCFRATIO per- alternation cannot be classified correctly with these form surprisingly well on the task, particularly approaches if they are unattested in the corpus. on the Levin test sets. These approaches derive All setups except VNTYPE outperform the ran- a verb’s alternatability score from a comparison dom baseline. While this approach had a precision of the numbers of transitive and intransitive oc- of roughly 50% in each of the setups, the recall currences of the verb in the corpus. The fact that was extremely low, leading to the low F1 scores SCFFLAG is one of the best-scoring approaches shown in the table. VNTYPE is generally prone on the Levin test sets, while its accuracy is less to mislabeling alternating verbs as non-alternating. impressive for the VerbNet test sets, indicates one The reason for the prevalence of such errors is the of the main differences in the two test sets. The ap- lack of tolerance in assigning the alternating label. proach simply checks whether the verb was used at Recall that a verb is only labeled as alternating by least once in each SCF type. That it performs well this setup if it is a member of a VerbNet class in on the Levin set mainly tells us that the syntactic which all members are observed in transitive and flexibility of the negative examples used in that test intransitive SCFs in the corpus. Data sparsity may set is lower than that of the positive examples. lead to individual VerbNet class members not be- SCFFLAG and SCFRATIO are also discussed ing attested at all, or being attested only in one below, in Section 6. syntactic configuration, even when the other one is On the VerbNet test set, the systems that rely possible. Thus, the criterion that all members of on word vectors achieved good scores, together the class need to be observed in both types of SCFs with the RNN-LM setup. In some other evaluation is apparently too strict for successful classification. setups, the RNN method slightly outperformed the Relaxing this condition leads to the approach vector-based ones. VNRANK, where a high percentage of syntacti- The RNN-LM approach generally preferred cally flexible members in a VerbNet class leads to transitive verb uses over intransitive ones. This all members of that class being classified as alter- was even the case when lexically-intransitive verbs nating verbs, but it is not necessary for all verbs in were involved. For instance, “John-sleep” received the class to be observed with both types of SCFs. a lower acceptability score than “John-sleep-it”, Table 1 shows that relaxing the conditions for syn- even though the verb to sleep rarely takes a di- tactic flexibility leads to high performance gains. rect object in English. This shows that the way the Compared to VNTYPE, this approach assigns a RNN-LM scores were calculated is not necessarily lower number of false negatives. a good approximation of acceptability. VNTOKEN performs slightly worse than VN- Thus, even though the RNN setup was among the RANK on the Levin test sets, but is among the best best-scoring ones on the balanced VerbNet test set, setups on the VerbNet test sets. A closer look at we find the vector-based approaches preferable. For its false positives and false negatives shows that the Levin test sets, the SCFFLAG setup achieves they consist mainly of verbs that are unattested or the highest scores throughout the test conditions.

152 A common source of errors for many of our ap- ideal, since the ability to distinguish between the proaches were mistakes in the parse trees in the causative alternation and the unexpressed-object corpus. They influenced our results in two ways. alternation is one of the motivations for the alter- First, some tokens were erroneously annotated as nation identification task. If the identification of arguments of a verb. These cases added noise to one of these alternations relies on features that both the modeling of the verb’s behavior and argument classes of verbs share, that distinction may not be preferences. They may also be responsible for the feasible. preference of the RNN-LM approach for transi- The following section shows how our approa- tive sentences; possibly, the preference is simply ches perform when they do not assign alternat- a result of the dependency parser having the same ing and non-alternating labels, but instead separate preference. Second, some verbs in our test sets verbs in the causative alternation from verbs in the were systematically not annotated as being verbs, unexpressed-object alternation. including verbs that have noun homographs, such as circle or drip, and verbs that have adjective ho- 6 Distinguishing Causative and mographs, such as yellow or awake. Since our Unexpressed-Object Alternation approaches rely on the dependency trees observed in the corpus in order to model the behavior of each The way we collected the VerbNet test set already verb, the absence of correctly-parsed sentences for filters the verbs in such a way that optionally- these verbs had a negative impact on our overall per- transitive verbs that do not participate in the formance. These issues may be resolved by adding causative alternation make up the negative set. This a word-sense disambiguation step to the process. criterion has the result that we effectively sepa- We will explore this in future work. rate causatively-alternating verbs from verbs in the unexpressed-object alternation. Our vector-based systems perform much better Table 2 shows the performance of our systems on VerbNet than on the Levin test set. This is prob- on the task of distinguishing the causative alter- ably because the VerbNet classes are organized se- nation from the unexpressed-object alternation on mantically, which means that even when one verb is verbs from Levin. The verbs that participate in not attested frequently enough to draw conclusions the unexpressed-object alternation were taken from about it based on its arguments, the availability of Levin (1993, pp. 33–40). The full set contained vectors for other verbs in the same VerbNet class 343 verbs, which we reduced for the FREQ and improves the score by a large margin. BALANCED conditions as in the previous task. We expected our class-based approaches that rely The performance of our systems on this task dif- on VerbNet class groupings (VNTYPE, VNRANK, fers only slightly from the scores reported above. VNTOKEN) to generally perform better when eval- Against our expectations, the vector-based approa- uated against VerbNet than when evaluated against ches CENTROIDDISTANCE and CENTROIDSUBJ- the Levin test set. This is the case for VNTOKEN, VSOBJ perform slightly worse when applied to the but the other approaches in the set achieve better distinction between the two alternations. These scores on the Levin set. Particularly surprising approaches are designed to take each verb’s seman- is the better performance on the Levin set of the tic preferences for its argument slots into account. VNTYPE method. However, since this approach Since CENTROIDSUBJVSOBJ performs better than achieved scores well below the random baseline, CENTROIDDISTANCE, the poor performance of the we do not explore it further. latter is probably due to overlapping semantic pref- Our strategy for assembling the negative test set erences for the different slots. for the VerbNet setup by collecting verbs from To assess the merit of our approaches, we com- classes that are annotated as having both transi- pare them to the work of Merlo and Stevenson tive and intransitive uses, but not as having both (2001). They achieve an accuracy of 69% on the causative and inchoative uses, has an influence on three-way distinction of unaccusative, unergative, the results reported in Table 1. In particular, the ap- and unexpressed-object verbs (our distinction com- proaches that use the presence or absence of transi- bines the first two of these). Our verb-based ap- tive and intransitive SCFs (SCFFLAG, SCFRATIO, proach SCFRATIO outperforms that of Merlo and VNTYPE, VNRANK, VNTOKEN) have a high Stevenson by 6%. Our setup is simpler than that of risk of false positives on this test set. This is not Merlo and Stevenson, requiring only a dependency-

153 all freq balanced RANDOM BASELINE 0.48 0.50 0.49 VNTYPE 0.19 0.30 0.30 VNRANK 0.79 0.67 0.68 VNTOKEN 0.61 0.51 0.51 SCFFLAG 0.64 0.66 0.67 SCFRATIO 0.68 0.73 0.75 CENTROIDDISTANCE 0.53 0.55 0.55 CENTROIDSUBJVSOBJ 0.59 0.61 0.61 RNN-LM 0.58 0.63 0.63

Table 2: F1 scores of all setups when applied to the causative-versus-unexpressed-object task, on verbs from Levin (1993) parsed corpus from which to derive SCF statistics, on English verbs because gold data for the task whereas their system makes use of features like was readily available for the two alternations in animacy or causativity, which pose problems when English. However, since many of our approaches a verb is attested infrequently. do not rely on manually-compiled resources, they Our best-performing setup SCFRATIO assigned can be applied to other languages with little effort, the causative-alternation label for verbs that occur as long as dependency parsers and corpora to use in transitive or intransitive SCFs with very dissimi- as a source for distributional information are avail- lar frequencies (that is, where one of the SCFs was able. However, the quality of available dependency dominant in the corpus), while verbs whose fre- parsers for the language will always influence the quencies with transitive and intransitive SCFs were performance of our methods. closer to each other were more likely to participate The methods presented here may be applied to in the unexpressed-object alternation. any role-switching alternation (McCarthy, 2001) in any language for which the necessary preprocess- 7 Conclusion ing tools are available. Comparing the performance of our approaches to the performance they achieve Our results are difficult to compare to those re- on other languages will be informative from a ty- ported in previous work. Early work, such as pological perspective. We plan to conduct similar McCarthy (2000); Schulte Im Walde (2000); Mc- experiments on at least one language from another Carthy (2001); Merlo and Stevenson (2001), tends language family in the future. to evaluate on small test sets. Some of our approa- Due to the interest in transferring our findings ches exceed the performance of the systems for the to similar phenomena in other languages, we find identification of the causative alternation presented that verb-based methods are preferable for this task. in these publications. Note that our test conditions As Table 1 shows, they perform reasonably well achieve varying scores even with minimal changes in comparison with our other systems, and the in the test sets; previous work in this area often did cases where class-based methods perform better not specify the exact conditions of the evaluation. (VNTOKEN on the VerbNet test sets) are likely the More recent work in this area has focused less result of overfitting on the structure of VerbNet. on the identification of diathesis alternations, and more on the induction of Levin-like verb classes, Acknowledgments with a stronger focus on semantics. In contrast to this, our work focuses on the classification of verbs The work presented in this paper was financed based on properties at the syntax-semantics inter- by the Deutsche Forschungsgemeinschaft (DFG) face. The good results of the SCFRATIO approach within the CRC 991 “The Structure of Representa- on the distinction between verbs in the causative tions in Language, Cognition, and Science”. The alternation and verbs in the unexpressed-object al- author wishes to thank Laura Kallmeyer, Kilian ternation shows that this system learns something Evang, Jakub Waszczuk, and three anonymous re- about the verbs that are being classified. viewers for their valuable feedback and helpful The experiments in this paper were performed comments.

154 References Diana McCarthy. 2000. Using Semantic Preferences to Identify Verbal Participation in Role Switching Al- Claire Bonial, Susan Windisch Brown, Jena D Hwang, ternations. In 1st Meeting of the North American Christopher Parisien, Martha Palmer, and Suzanne Chapter of the Association for Computational Lin- Stevenson. 2011. Incorporating Coercive Construc- guistics, pages 256–263. Association for Computa- tions into a Verb Lexicon. In Proceedings of the tional Linguistics. ACL 2011 Workshop on Relational Models of Seman- tics, pages 72–80. Association for Computational Diana McCarthy. 2001. Lexical Acquisition at the Linguistics. Syntax-Semantics Interface: Diathesis Alternations, Subcategorization Frames and Selectional Prefer- Lou Burnard. 2007. Reference Guide for the British ences. Ph.D. thesis, University of Sussex. National Corpus (XML Edition) . Paola Merlo and Suzanne Stevenson. 2001. Auto- Eric Joanis. 2002. Automatic Verb Classification matic Verb Classification Based on Statistical Distri- Using a General Feature Space. Master’s the- butions of Argument Structure. Computational Lin- sis, Department of Computer Science, University of guistics, 27(3):373–408. Toronto. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor- Eric Joanis and Suzanne Stevenson. 2003. A General rado, and Jeff Dean. 2013. Distributed Representa- Feature Space for Automatic Verb Classification. In tions of Words and Phrases and their Compositional- 10th Conference of the European Chapter of the As- ity. In Advances in Neural Information Processing sociation for Computational Linguistics. Systems, pages 3111–3119. Curran Associates, Inc.

Eric Joanis, Suzanne Stevenson, and David James. George Miller. 1998. WordNet: An electronic lexical 2008. A general feature space for automatic database. MIT press. verb classification. Natural Language Engineering, 14(3):337–367. Sabine Schulte Im Walde. 2000. Clustering Verbs Semantically According to their Alternation Be- Karin Kipper, Hoa Trang Dang, and Martha Palmer. haviour. In COLING 2000 Volume 2: The 18th Inter- 2000. Class-Based Construction of a Verb Lexicon. national Conference on Computational Linguistics, In Proceedings of the Seventeenth National Confer- pages 747–753. Association for Computational Lin- ence on Artificial Intelligence, pages 691–696. guistics.

Anna Korhonen. 2009. Automatic Lexical Classifica- Sabine Schulte Im Walde. 2006. Experiments on tion - Balancing between Machine Learning and Lin- the Automatic Induction of German Semantic Verb guistics. In Proceedings of the 23rd Pacific Asia Classes. Computational Linguistics, 32(2):159– Conference on Language, Information and Compu- 194. tation, Volume 1. Suzanne Stevenson and Eric Joanis. 2003. Semi- Maria Lapata and Chris Brew. 1999. Using Subcatego- supervised Verb Class Discovery Using Noisy Fea- rization to Resolve Verb Class Ambiguity. In 1999 tures. In Proceedings of the Seventh Conference on Joint SIGDAT Conference on Empirical Methods in Natural Language Learning at HLT-NAACL 2003, Natural Language Processing and Very Large Cor- pages 71–78. Association for Computational Lin- pora, pages 266–274. guistics. Lin Sun, Anna Korhonen, and Yuval Krymolowski. Jey Han Lau, Alexander Clark, and Shalom Lappin. 2008. Verb Class Discovery from Rich Syntac- 2015. Unsupervised Prediction of Acceptability tic Data. In International Conference on Intelli- Judgements. In Proceedings of the 53rd Annual gent and Computational Linguistics, Meeting of the Association for Computational Lin- pages 16–27. Springer. guistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Lin Sun, Diana McCarthy, and Anna Korhonen. 2013. Papers), pages 1618–1628. Association for Compu- Diathesis alternation approximation for verb cluster- tational Linguistics. ing. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Vol- Beth Levin. 1993. English Verb Classes and Alter- ume 2: Short Papers), pages 736–741. Association nations: A Preliminary Investigation. University of for Computational Linguistics. Chicago press.

Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Lin- guistics: System Demonstrations, pages 55–60. As- sociation for Computational Linguistics.

155