Modeling the Dative Alternation with Automatically Extracted Features∗

Modeling the dative alternation with automatically extracted features∗ Huayan Zhong and Amanda Stent Mary Swift Computer Science Department Computer Science Department Stony Brook University University of Rochester Stony Brook, NY 11794-4400 Rochester, NY 14627 huayan, [email protected] [email protected] Abstract tures. We compare the effect of different types of feature on this classification task, and show that the most predic- We show that generation of contextually appropriate syntactic tive features for text differ from those for dialog. Finally, variation can be improved using a model based on automatically extracted features. We adapt a model for predicting da- we show that modeling this type of syntactic variation im- tive alternation from (Bresnan et al. 2005); this model incor- proves the performance of statistical surface realization for porates lexical, syntactic, semantic and pragmatic features. both text and dialog. We evaluate the effect of using different types of feature on The rest of this paper is structured as follows: first, we this classification task and show that the most predictive fea- discuss previous approaches to predicting dative alternation, tures for text differ from those for dialog. Finally, we show and give a brief description of research on surface realiza- that modeling this type of syntactic variation improves the tion. Second, we describe our data. Third, we describe our performance of surface realization for both text and dialog. model for predicting dative alternation. Fourth, we describe our surface realization experiments. Finally, we summarize Introduction our results and describe future work. The ability to produce contextually appropriate syntactic variation contributes to the coherence and naturalness of nat- Related Work ural language generation (NLG). Factors that contribute to Predicting Dative Alternation such variation can be wide-ranging and complex and require models beyond corpus frequencies (e.g. (Creswell & Kaiser One analysis of the dative alternation assumes that the two 2004)). In this paper we examine modeling one type of vari- structures have different semantic representations (e.g, (Pe- ation that affects the placement of arguments within the VP, setsky 1995; Harley 2000)). A contrasting approach (e.g. the dative alternation. Dative alternation involves the varia- (Baker 1997)) argues that the two structures have the same tion between two different syntactic representations for the semantic representation, and surface variation is motivated arguments of a ditransitive verb, i) the dative NP: by the discourse theoretic status of the theme and recipi- Agent Verb Recipient Theme ent arguments. Previous research has explored the influence I gave the dog a bone of factors such as animacy, definiteness and length of the and ii) the dative PP: arguments (syntactic weight) on the surface realization of Agent Verb Theme [to Recipient] the dative construction (e.g. (Collins 1995; Halliday 1970; I gave a bone to the dog Thompson 1990)). (Bresnan et al. 2005) use three differ- In recent work, (Bresnan et al. 2005) describe a combina- ent models predicting from multiple variables to explore the tion of lexical, syntactic, semantic and pragmatic features problem of what really drives the dative alternation. Model that model the dative alternation with high accuracy. For A is a logistic regression model using 14 features (features their experiments, they use hand-annotated features. In this are described in Section ) and shows that certain proper- paper, we show that it is possible to achieve comparable ac- ties of recipient and theme (discourse accessibility, animacy, curacy using automatically extracted training data and fea- definiteness, pronominality and syntactic weight) are mutu- ∗ ally irreducible. Model B is a multilevel logistic regression We would like to thank the anonymous reviewers for their variant of Model A which shows that the features remain sig- helpful comments. This work was supported by the Defense Ad- nificant when conditioned on different verb senses. Model C vanced Research Projects Agency (DARPA), through the Depart- is a variant of Model B adapted for available features in the ment of the Interior, NBC, Acquisition Services Division, un- der Contract No. NBCHD030010, via subcontract to SRI #03- data and shows that the features are predictive when tested 000223). on different corpora (Switchboard and Wall Street Journal). Copyright c 2006, American Association for Artificial Intelli- Bresnan et al. report the performance of the models: Model gence (www.aaai.org). All rights reserved. A, 92%; Model B, 94% and Model C, 93%. In this paper we explore the utility of such models for statistical NLG. We Genre Corpus Dative examples adapt Model A using automatically extracted data and fea- V-NP-NP V-NP-PP tures, evaluate its performance on data from both text and Dialog ROCH 99 274 dialog corpora, and examine which features are most predic- SWBD 401 147 tive. We then evaluate the impact of our model on surface Text PT 818 408 realization of the dative alternation. GW 4067 4117 Surface Realization Table 1: Data used in our experiments Surface realizers generate text from an input representation that may be semantic or syntactic in nature. Tra- ditionally, surface realizers were built using hand-written of corpora of spoken dialogs collected at the University of grammars or templates (e.g. (Elhadad & Robin 1997; Rochester; and the treebanked portion of the Switchboard Baptist & Seneff 2000; Channarukul 1999)). However, most (SWBD) corpus of spoken dialogs (Marcus, Santorini, & recent work on surface realization has used a two-stage ap- Marcinkiewicz 1995). The Text data set comprises the fol- proach (e.g. (Langkilde 2002; Rambow & Bangalore 2000; lowing two corpora: the APW-Dec-99 portion of the English Ratnaparkhi 2000)). The first stage uses a simple hand- Gigaword (GW) corpus of raw text (Graff et al. 2005); and written grammar that overgenerates (i.e. produces syntac- the Penn Treebank (PT) corpus of parsed text, including the tically correct and incorrect outputs). The second stage uses Brown and Wall Street Journal subsets (Marcus, Santorini, a statistical language model to rank sentences output by the & Marcinkiewicz 1995). first stage. These surface realizers run quickly and pro- We used automatic methods to prepare each corpus for duce high-quality output. However, the interaction between our experiments as follows. First, we split each tree- grammar and language model, and the lack of modeling of banked sentence into independent clauses. We parsed non- semantics and context in the language model, can make it treebanked sentences into independent clauses using the hard for these surface realizers to produce valid syntactic Collins parser (Collins 1999). Then we automatically an- variants in a principled way. In previous work, we built a notated each clause with semantic information: statistical surface realizer in which word order and word fre- • Verb frames from VerbNet (Kipper, Dang, & Palmer quency probabilities are folded into the grammar rather than 2000). into a second-stage language model (Zhong & Stent 2005). • With this surface realizer, we can more effectively model Noun/verb hypernyms from WordNet (Fellbaum 1998). syntactically valid variation. As a backup, we used Comlex (Girshman, Macleod, & (Creswell & Kaiser 2004) argue that statistical NLG sys- Meyers 1994). tems based only on probabilities cannot capture fine distinc- We extracted from the treebanked corpora verbs that tions in meaning; discourse context and meaning must be might be examples of the dative: verbs appearing in taken into account when selecting a construction for NLG sentences of the form V-NP-PP:to with the prepositional purposes. They demonstrate their point by outlining an al- phrase’s role labeled as -DTV. We augmented this list with gorithm to determine surface form of dative constructions the verbs that are in classes with dative frames in VerbNet. based on manually annotated discourse status (hearer-old Then, from all four of our corpora we automatically ex- and hearer-new) and heaviness of arguments, which would tracted clauses containing any of these verbs in either form reduce the error rate by 25% compared to surface form as of the dative alternation. determined by corpus frequency. They argue that labeling Table 1 shows the number of clauses in each of our data data with pragmatic information such as discourse status is sets having either the V-NP-NP or the V-NP-PP construction crucial for statistical NLG. Bresnan et al. (Bresnan et al. and the number of examples of each form of the dative alter- 2005) provide evidence for this by showing that some prag- nation we were able to extract automatically. We included matic features are useful for predicting the dative. We adapt all examples of either form of the dative, even if in our data Bresnan’s model, but use only automatically extracted fea- a particular verb appeared with only one form. tures. Unlike (Creswell & Kaiser 2004), we incorporate this Because these data are automatically extracted, they in- model into a statistical NLG system and achieve improved clude sentences that are not examples of the dative. In par- performance on dative constructions. ticular, in our dialog corpora there are verbs, like ’take’, that (Stede 1998) proposed a general approach to handling take the dative alternation and also have a transitive form verb alternation in NLG for English and German. This ap- with a destination; some of these sentences show up in our proach was focused on representing possible alternations data if the semantic type of the destination is mislabeled or rather than on having a surface realizer determine which one ambiguous. Also, some incomplete sentences are included to use in a given set of circumstances. in our dialog data set because of sentence splitting errors. Data Modeling The Dative Alternation We used two data sets in these experiments; each data set (Bresnan et al. 2005) demonstrated that pragmatic features has two parts, one treebanked and one not.

Load more