Poverty of the Stimulus? a Rational Approach
Total Page:16
File Type:pdf, Size:1020Kb
Poverty of the Stimulus? A Rational Approach Amy Perfors1 ([email protected]), Joshua B. Tenenbaum1 ([email protected]), and Terry Regier2 ([email protected]) 1Department of Brain and Cognitive Sciences, MIT; 2Department of Psychology, University of Chicago Abstract complex since it operates over a sentence's phrasal struc- ture and not just its sequence of elements. The Poverty of the Stimulus (PoS) argument holds that The \poverty" part of this form of the PoS argument children do not receive enough evidence to infer the exis- tence of core aspects of language, such as the dependence claims that children do not see the data they would need of linguistic rules on hierarchical phrase structure. We to in order to rule out the structure-independent (linear) reevaluate one version of this argument with a Bayesian hypothesis. An example of such data would be an in- model of grammar induction, and show that a rational terrogative sentence such as \Is the man who is hungry learner without any initial language-speci¯c biases could ordering dinner?". In this sentence, the main clause aux- learn this dependency given typical child-directed input. This choice enables the learner to master aspects of syn- iliary is fronted in spite of the existence of another aux- tax, such as the auxiliary fronting rule in interrogative iliary that would come ¯rst in the corresponding declar- formation, even without having heard directly relevant ative sentence. Chomsky argued that this type of data data (e.g., interrogatives containing an auxiliary in a is not accessible in child speech, maintaining that \it is relative clause in the subject NP). quite possible for a person to go through life without having heard any of the relevant examples that would Introduction choose between the two principles" (Chomsky, 1971). Modern linguistics was strongly influenced by Chomsky's It is mostly accepted that children do not appear to observation that language learners make grammatical go through a period where they consider the linear hy- generalizations that do not appear justi¯ed by the ev- pothesis (Crain and Nakayama, 1987). However, two idence in the input (Chomsky, 1965, 1980). The no- other aspects of the PoS argument are the topic of much tion that these generalizations can best be explained debate. The ¯rst considers what evidence there is in by innate knowledge, known as the argument from the the input and what constitutes \enough" (Pullum and Poverty of the Stimulus (henceforth PoS), has led to an Scholz, 2002; Legate and Yang, 2002). Unfortunately, enduring debate that is central to many of the key issues this approach is inconclusive: while there is some agree- in cognitive science and linguistics. ment that the critical forms are rare in child-directed The original formulation of the Poverty of Stimulus ar- speech, they do occur (Legate and Yang, 2002; Pullum gument rests critically on assumptions about simplicity, and Scholz, 2002). Lacking a clear speci¯cation of how the nature of the input children are exposed to, and how a child's language learning mechanism might work, it is much evidence is su±cient to support the generaliza- di±cult to determine whether that input is su±cient. tions that children make. The phenomenon of auxiliary The second issue concerns the nature of the stimulus, fronting in interrogative sentences is one example of the suggesting that regardless of whether there is enough PoS argument; here, the argument states that children direct syntactic evidence available, there may be suf- must be innately biased to favor structure-dependent ¯cient distributional and statistical regularities in lan- rules that operate using grammatical constructs like guage to explain children's behavior (Redington et al., phrases and clauses over structure-independent rules 1998; Lewis and Elman, 2001; Reali and Christiansen, that operate only on the sequence of words. 2004). Most of the work focusing speci¯cally on aux- English interrogatives are formed from declaratives by iliary fronting uses connectionist simulations or n-gram fronting the main clause auxiliary. Given a declarative models to argue that child-directed language contains sentence like \The dog in the corner is hungry", the in- enough information to predict the grammatical status of terrogative is formed by moving the is to make the sen- aux-fronted interrogatives (Reali and Christiansen, 2004; tence \Is the dog in the corner hungry?" Chomsky con- Lewis and Elman, 2001). sidered two types of operation that can explain auxiliary While both of these approaches are useful and the re- fronting (Chomsky, 1965, 1971). The simplest (linear) search on statistical learning in particular is promising, rule is independent of the hierarchical phrase structure there are still notable shortcomings. First of all, the sta- of the sentence: take the leftmost (¯rst) occurrence of the tistical models do not engage with the primary intuition auxiliary in the sentence and move it to the beginning. and issue raised by the PoS argument. The intuition The structure-dependent (hierarchical) rule { move the is that language has a hierarchical structure { it uses auxiliary from the main clause of the sentence { is more symbolic notions like syntactic categories and phrases that are hierarchically organized within sentences, which Because this analysis takes place within an ideal learn- are recursively generated by a grammar. The issue is ing framework, we assume that the learner is able to ef- whether knowledge about this structure is learned or in- fectively search over the joint space of G and T for gram- nate. An approach that lacks an explicit representa- mars that maximize the Bayesian scoring criterion. We tion of structure has two problems addressing this issue. do not focus on the question of whether the learner can First of all, many linguists and cognitive scientists tend successfully search the space, instead presuming that an to discount these results because they ignore a principal ideal learner can learn a given G; T pair if it has a higher feature of linguistic knowledge, namely that it is based score than the alternatives. Because we only compare on structured symbolic representations. Secondly, con- grammars that can parse our corpus, we ¯rst consider nectionist networks and n-gram models tend to be di±- the corpus before explaining the grammars. cult to understand analytically. For instance, the mod- els used by Reali and Christiansen (2004) and Lewis and The corpus Elman (2001) measure success by whether they predict the next word in a sequence, rather than based on ex- The corpus consists of the sentences spoken by adults in amination of an explicit grammar. Though the models the Adam corpus (Brown, 1973) in the CHILDES data- perform above chance, it is di±cult to tell why and what base (MacWhinney, 2000). In order to focus on gram- precisely they have learned. mar learning rather than lexical acquisition, each word In this work we present a Bayesian account of lin- is replaced by its syntactic category.1 Ungrammatical guistic structure learning in order to engage with the sentences and the most grammatically complex sentence PoS argument on its own terms { taking the existence types are removed.2 The ¯nal corpus contains 21792 in- of structure seriously and asking whether and to what dividual sentence tokens corresponding to 2338 unique extent knowledge of that structure can be inferred by a sentence types out of 25876 tokens in the original cor- rational statistical learner. This is an ideal learnability pus.3 Removing the complicated sentence types, done analysis: our question is not whether a learner without to improve the tractability of the analysis, is if anything innate language-speci¯c biases must be able infer that a conservative move since the hierarchical grammar is linguistic structure is hierarchical, but rather whether it more preferred as the input grows more complicated. is possible to make that inference. It thus addresses the In order to explore how the preference for a grammar exact challenge posed by the PoS argument, which holds is dependent on the level of evidence in the input, we that such an inference is not possible. create six smaller corpora as subsets of the main corpus. The Bayesian approach provides the capability of com- Under the reasoning that the most frequent sentences bining structured representation with statistical infer- are most available as evidence,4 di®erent corpus Levels ence, which enables us to achieve a number of important contain only those sentence forms that occur with a cer- goals. (1) We demonstrate that a learner equipped with tain frequency in the full corpus. The levels are: Level the capacity to explicitly represent both hierarchical and 1 (contains all forms occurring 500 or more times, cor- linear grammars { but without any initial biases { could responding to 8 unique types); Level 2 (300 times, 13 infer that the hierarchical grammar is a better ¯t to typ- types); Level 3 (100 times, 37 types); Level 4 (50 times, ical child-directed input. (2) We show that inferring this 67 types); Level 5 (10 times, 268 types); and the com- hierarchical grammar results in the mastery of aspects of plete corpus, Level 6, with 2338 unique types, includ- auxiliary fronting, even if no direct evidence is available. ing interrogatives, wh-questions, relative clauses, prepo- (3) Our approach provides a clear and objectively sensi- sitional and adverbial phrases, command forms, and aux- ble metric of simplicity, as well as a way to explore what iliary as well as non-auxiliary verbs. sort of data and how much is required to make these hierarchical generalizations. And (4) our results suggest that PoS arguments are sensible only when phenomena 1Parts of speech used included determiners (det), nouns are considered as part of a linguistic system, rather than (n), adjectives (adj), comments like \mmhm" (c, sentence fragments only), prepositions (prep), pronouns (pro), proper taken in isolation.