Comparing Computational Cognitive Models of Generalization in a Language Acquisition Task Libby Barak, Adele E. Goldberg, Suzanne Stevenson Psychology Department Department of Computer Science Princeton University University of Toronto Princeton, NJ, USA Toronto, Canada lbarak,adele @princeton.edu [email protected] { } Abstract DO construction to verbs such as explain that resist its use (“?explain me the details”), even though they Natural language acquisition relies on appro- priate generalization: the ability to produce occur with analogous arguments in the PD alterna- novel sentences, while learning to restrict pro- tive (“explain the details to me”). Psycholinguis- ductions to acceptable forms in the language. tic studies have focused on the possible properties Psycholinguists have proposed various prop- of natural language that enable such generalization erties that might play a role in guiding appro- while constraining it to acceptable forms. priate generalizations, looking at learning of verb alternations as a testbed. Several com- Initially, children are linguistically conservative: putational cognitive models have explored as- they generally use verbs in constructions that are pects of this phenomenon, but their results are very close to exemplars in the input (Lieven et al., hard to compare given the high variability in 1997; Akhtar, 1999; Tomasello, 2003; Boyd and the linguistic properties represented in their Goldberg, 2009). Children reach adult-like com- input. In this paper, we directly compare two petence by gradually forming more general as- recent approaches, a Bayesian model and a sociations of constructions to meaning that allow connectionist model, in their ability to repli- cate human judgments of appropriate gener- them to extend verb usages to unwitnessed forms. alizations. We find that the Bayesian model Much work has emphasized the role of verb classes more accurately mimics the judgments due to that capture the regularities across semantically- its richer learning mechanism that can exploit similar verbs, enabling appropriate generalization distributional properties of the input in a man- (e.g., Pinker, 1989; Fisher, 1999; Levin, 1993; Am- ner consistent with human behaviour. bridge et al., 2008). Usage-based approaches have argued that such class-based behaviour can arise in 1 Introduction learning through the clustering of observed usages Native speakers of a language are mostly able to that share semantic and syntactic properties (e.g., generalize appropriately beyond the observed data Bybee, 2010; Tomasello, 2003; Goldberg, 2006). while avoiding overgeneralizations. A testbed area A number of studies also reveal that the statistical for studying generalization behavior in language ac- properties of the language play a central role in lim- quisition is verb alternations – i.e., learning the pat- iting generalization (e.g., Bresnan and Ford, 2010; terns of acceptability of alternative constructions for Ambridge et al., 2012, 2014). Individual verbs of- expressing similar meanings. For example, English ten show statistical biases that favor their appear- speakers readily use a new verb like text in both the ance in one construction over another (Ford et al., double-object (DO) construction (“text me the de- 1982; MacDonald et al., 1994; Garnsey et al., 1997; tails”) and the prepositional-dative (PD) (“text the Trueswell et al., 1993; Losiewicz, 1992; Gahl and details to me”) – an instance of the dative alterna- Garnsey, 2004). For example, while both give and tion. However, speakers avoid overgeneralizing the push can occur in either DO or PD constructions, 96 Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 96–106, Austin, Texas, November 1-5, 2016. c 2016 Association for Computational Linguistics give strongly favors the DO construction (“give me tions in the face of complex, interacting factors. the box”), while push strongly favors the PD (“push As noted by Ambridge et al. (2014), such models the box to me”) (Wasow, 2002). Generally, the should capture influences of the verb such as its se- more frequent a verb is overall, the less likely speak- mantic properties, its overall frequency, and its fre- ers are to extend it to an unobserved construction quency in various constructions. (Braine and Brooks, 1995). In addition, when a verb A focus of computational models has been to repeatedly occurs in one construction when an al- show under what conditions a learner generalizes ternative construction could have been appropriate, to the DO construction having observed a verb in speakers appear to learn that the verb is inappropri- the PD, and vice versa. For example, the hierar- ate in the alternative, regardless of its overall fre- chical Bayesian models of Perfors et al. (2010) and quency (Goldberg, 2011). Parisien and Stevenson (2010) show the ability to Given these observations, it has been argued that generalize from one construction to the other. How- both the semantic and statistical properties of a ever, both models are limited in their semantic repre- verb underlie its degree of acceptability in alternat- sentations. Perfors et al. (2010) use semantic prop- ing constructions (e.g., Braine and Brooks, 1995; erties that directly (albeit noisily) encode the knowl- Theakston, 2004; Ambridge et al., 2014). Recently, edge of the alternating and non-alternating (DO- Ambridge and Blything (2015) propose a computa- only or PD-only) classes. The model of Parisien and tional model designed to study the role of verb se- Stevenson (2010) addresses this limitation by learn- mantics and frequency in the acquisition of the da- ing alternation classes from the data (including the tive alternation. However, they only evaluate their dative), but it uses only syntactic slot features that model preferences for one of the two constructions, can be gleaned automatically from a corpus. In ad- which does not provide a full picture of the alterna- dition, both models use batch processing, failing to tion behaviour; moreover, they incorporate certain address how learning to generalize across an alter- assumptions about the input that may not match the nation might be achieved incrementally. properties of naturalistic data. Alishahi and Stevenson (2008) presents an in- In this paper, we compare the model of Ambridge cremental Bayesian model shown to capture vari- and Blything (2015) to the Bayesian model of Barak ous aspects of verb argument structure acquisition et al. (2014) that offers a general framework of verb (Alishahi and Pyykkonen,¨ 2011; Barak et al., 2012, construction learning. We replicate the approach 2013b; Matusevych et al., 2016), but the model taken in Ambridge and Blything (2015) in order to is unable to mimic alternation learning behaviour. provide appropriate comparisons, but we also extend Barak et al. (2014) extends this construction- the experimental settings and analysis to enable a learning model to incrementally learn both construc- more fulsome evaluation, on data with more natu- tions and classes of alternating verbs, and show the ralistic statistical properties. Our results show that role of the classes in learning the dative. However, the Bayesian model provides a better fit to the psy- like Parisien and Stevenson (2010), the input to the cholinguistic data, which we suggest is due to its model in this study is limited to syntactic properties, richer learning mechanism: its two-level clustering not allowing for a full analysis of the relevant factors approach can exploit distributional properties of the that influence acquisition of alternations. input in a manner consistent with human generaliza- Ambridge and Blything (2015) propose the first tion behaviour. computational model of this phenomenon to include a rich representation of the verb/construction seman- 2 Related Work tics, drawn from human judgments. In evaluation, however, they only report the ability of the model Acquisition of the dative alternation – use of the DO to predict the DO usage (i.e., only one pair of the and PD constructions with analogous semantic argu- alternation), which does not give the full picture of ments – has been studied in several computational the alternation behaviour. Moreover, their assump- cognitive models because it illustrates how people tions about the nature of the input – including the learn to appropriately generalize linguistic construc- use of raw vs. log frequencies and the treatment of 97 a 1-hot pattern across output nodes, each of which represents the use of the verb in the associated con- struction. The possible constructions are DO, PD, or other, representing all other constructions the verb appears in. Training presents the slate of input fea- tures with the appropriate output node activated rep- resenting the construction the verb appears in. In a full sweep of training, the model observes all verbs in proportion to their frequency in the input; for each Figure 1: A visual representation of the feed-forward verb, the proportion of training trials with 1 in each network used by the AB model. (The figure is adapted of the output nodes corresponds to the frequency of from output of the OXlearn package of Ruh and West- the verb in each of those constructions. During test- ermann (2009).) The input nodes correspond to the se- ing, only the input nodes are activated (correspond- mantic properties of the verbs, the verb lexemes, and a ing to a verb and its semantics), and the activation “transfer” node (explained in the text). The output
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages11 Page
-
File Size-