A Selectionist Theory of Language Acquisition

A Selectionist Theory of Language Acquisition Charles D. Yang* Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge, MA 02139 charles@ai, mit. edu Abstract It is worth noting that the developmental compati- This paper argues that developmental patterns in bility condition has been largely ignored in the for- child language be taken seriously in computational mal studies of language acquisition. In the rest of this section, I show that if this condition is taken se- models of language acquisition, and proposes a for- riously, previous models of language acquisition have mal theory that meets this criterion. We first present difficulties explaining certain developmental facts in developmental facts that are problematic for sta- child language. tistical learning approaches which assume no prior knowledge of grammar, and for traditional learnabil- 1.1 Against Statistical Learning ity models which assume the learner moves from one An empiricist approach to language acquisition has UG-defined grammar to another. In contrast, we (re)gained popularity in computational linguistics view language acquisition as a population of gram- and cognitive science; see Stolcke (1994), Charniak mars associated with "weights", that compete in a (1995), Klavans and Resnik (1996), de Marcken Darwinian selectionist process. Selection is made (1996), Bates and Elman (1996), Seidenberg (1997), possible by the variational properties of individual among numerous others. The child is viewed as an grammars; specifically, their differential compatibil- inductive and "generalized" data processor such as ity with the primary linguistic data in the environ- a neural network, designed to derive structural reg- ment. In addition to a convergence proof, we present ularities from the statistical distribution of patterns empirical evidence in child language development, in the input data without prior (innate) specific that a learner is best modeled as multiple grammars knowledge of natural language. Most concrete pro- in co-existence and competition. posals of statistical learning employ expensive and specific computational procedures such as compres- 1 Learnability and Development sion, Bayesian inferences, propagation of learning A central issue in linguistics and cognitive science errors, and usually require a large corpus of (some- is the problem of language acquisition: How does times pre-processed) data. These properties imme- a human child come to acquire her language with diately challenge the psychological plausibility of the such ease, yet without high computational power or statistical learning approach. In the present discus- favorable learning conditions? It is evident that any sion, however, we are not concerned with this but adequate model of language acquisition must meet simply grant that someday, someone might devise the following empirical conditions: a statistical learning scheme that is psychologically plausible and also succeeds in converging to the tar- • Learnability: such a model must converge to the get language. We show that even if such a scheme target grammar used in the learner's environ- were possible, it would still face serious challenges ment, under plausible assumptions about the from the important but often ignored requirement learner's computational machinery, the nature of developmental compatibility. of the input data, sample size, and so on. One of the most significant findings in child language research of the past decade is that different • Developmental compatibility: the learner mod- aspects of syntactic knowledge are learned at differ- eled in such a theory must exhibit behaviors ent rates. For example, consider the placement of that are analogous to the actual course of lan- finite verb in French, where inflected verbs precede guage development (Pinker, 1979). negation and adverbs: * I would like to thank Julie Legate, Sam Gutmann, Bob Jean voit souvent/pas Marie. Berwick, Noam Chomsky, John Frampton, and John Gold- Jean sees often/not Marie. smith for comments and discussion. This work is supported by an NSF graduate fellowship. This property of French is mastered as early as 429 the 20th month, as evidenced by the extreme rarity where "is" has been fronted from the position t, the of incorrect verb placement in child speech (Pierce, position it assumes in a declarative sentence. A pos- 1992). In contrast, some aspects of language are ac- sible inductive rule to describe the above sentence is quired relatively late. For example, the requirement this: front the first auxiliary verb in the sentence. of using a sentential subject is not mastered by En- This rule, though logically possible and computa- glish children until as late as the 36th month (Valian, tionally simple, is never attested in child language 1991), when English children stop producing a sig- (Chomsky, 1975; Crain and Nakayama, 1987; Crain, nificant number of subjectless sentences. 1991): that is, children are never seen to produce When we examine the adult speech to children sentences like: (transcribed in the CHILDES corpus; MacWhinney and Snow, 1985), we find that more than 90% of , Is the cat that the dog t chasing is scared? English input sentences contain an overt subject, whereas only 7-8% of all French input sentences con- where the first auxiliary is fronted (the first "is"), tain an inflected verb followed by negation/adverb. instead of the auxiliary following the subject of the A statistical learner, one which builds knowledge sentence (here, the second "is" in the sentence). purely on the basis of the distribution of the input Acquisition findings like these lead linguists to data, predicts that English obligatory subject use postulate that the human language capacity is con- should be learned (much) earlier than French verb strained in a finite prior space, the Universal Gram- placement - exactly the opposite of the actual find- mar (UG). Previous models of language acquisi- ings in child language. tion in the UG framework (Wexter and Culicover, Further evidence against statistical learning comes 1980; Berwick, 1985; Gibson and Wexler, 1994) are from the Root Infinitive (RI) stage (Wexler, 1994; transformational, borrowing a term from evolution inter alia) in children acquiring certain languages. (Lewontin, 1983), in the sense that the learner moves Children in the RI stage produce a large number of from one hypothesis/grammar to another as input sentences where matrix verbs are not finite - un- sentences are processed. 1 Learnability results can grammatical in adult language and thus appearing be obtained for some psychologically plausible algo- infrequently in the primary linguistic data if at all. rithms (Niyogi and Berwick, 1996). However, the It is not clear how a statistical learner will induce developmental compatibility condition still poses se- non-existent patterns from the training corpus. In rious problems. addition, in the acquisition of verb-second (V2) in Since at any time the state of the learner is identi- Germanic grammars, it is known (e.g. Haegeman, fied with a particular grammar defined by UG, it is 1994) that at an early stage, children use a large hard to explain (a) the inconsistent patterns in child proportion (50%) of verb-initial (V1) sentences, a language, which cannot be described by ally single marked pattern that appears only sparsely in adult adult grammar (e.g. Brown, 1973); and (b) the speech. Again, an inductive learner purely driven by smoothness of language development (e.g. Pinker, corpus data has no explanation for these disparities 1984; Valiant, 1991; inter alia), whereby the child between child and adult languages. gradually converges to the target grammar, rather Empirical evidence as such poses a serious prob- than the abrupt jumps that would be expected from lem for the statistical learning approach. It seems binary changes in hypotheses/grammars. a mistake to view language acquisition as an induc- Having noted the inadequacies of the previous tive procedure that constructs linguistic knowledge, approaches to language acquisition, we will pro- directly and exclusively, from the distributions of in- pose a theory that aims to meet language learn- put data. ability and language development conditions simul- taneously. Our theory draws inspirations from Dar- 1.2 The Transformational Approach winian evolutionary biology. Another leading approach to language acquisition, largely in the tradition of generative linguistics, is 2 A Selectionist Model of Language motivated by the fact that although child language is Acquisition different from adult language, it is different in highly 2.1 The Dynamics of Darwinian Evolution restrictive ways. Given the input to the child, there Essential to Darwinian evolution is the concept of are logically possible and computationally simple in- variational thinking (Lewontin, 1983). First, differ- ductive rules to describe the data that are never attested in child language. Consider the following 1 Note that the transformational approach is not restricted well-known example. Forming a question in English to UG-based models; for example, Brill's influential work involves inversion of the auxiliary verb and the sub- (1993) is a corpus-based model which successively revises a set of syntactic_rules upon presentation of partially bracketed ject: sentences. Note that however, the state of the learning sys- tem at any time is still a single set of rules, that is, a single Is the man t tall? "grammar". 430 ences among individuals are viewed

A Selectionist Theory of Language Acquisition

Computer Vision Stochastic Grammars for Scene Parsing

UNIVERSITY of CALIFORNIA Los Angeles Human Activity

Using an Annotated Corpus As a Stochastic Grammar

GRAMMAR IS GRAMMAR and USAGE IS USAGE Frederick J

Application of Stochastic Grammars to Understanding Action

W. G. M., a Stochastic Model of Language Change Through Social

The Plasticity of Grammar

Calibrating Generative Models: the Probabilistic Chomsky-Schutzenberger¨ Hierarchy∗

Stochastic Definite Clause Grammars

Stochastic Attribute-Value Grammars

Unsupervised Language Acquisition: Theory and Practice

A Stochastic Grammar of Images