A Selectionist Acquisition

Charles D. Yang* Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge, MA 02139 charles@ai, mit. edu

Abstract It is worth noting that the developmental compati- This paper argues that developmental patterns in bility condition has been largely ignored in the for- child language be taken seriously in computational mal studies of . In the rest of this section, I show that if this condition is taken se- models of language acquisition, and proposes a for- riously, previous models of language acquisition have mal theory that meets this criterion. We first present difficulties explaining certain developmental facts in developmental facts that are problematic for sta- child language. tistical learning approaches which assume no prior knowledge of , and for traditional learnabil- 1.1 Against Statistical Learning ity models which assume the learner moves from one An empiricist approach to language acquisition has UG-defined grammar to another. In contrast, we (re)gained popularity in computational view language acquisition as a population of gram- and cognitive science; see Stolcke (1994), Charniak mars associated with "weights", that compete in a (1995), Klavans and Resnik (1996), de Marcken Darwinian selectionist process. Selection is made (1996), Bates and Elman (1996), Seidenberg (1997), possible by the variational properties of individual among numerous others. The child is viewed as an ; specifically, their differential compatibil- inductive and "generalized" data processor such as ity with the primary linguistic data in the environ- a neural network, designed to derive structural reg- ment. In addition to a convergence proof, we present ularities from the statistical distribution of patterns empirical evidence in child language development, in the input data without prior (innate) specific that a learner is best modeled as multiple grammars knowledge of natural language. Most concrete pro- in co-existence and competition. posals of statistical learning employ expensive and specific computational procedures such as compres- 1 Learnability and Development sion, Bayesian inferences, propagation of learning A central issue in linguistics and cognitive science errors, and usually require a large corpus of (some- is the problem of language acquisition: How does times pre-processed) data. These properties imme- a human child come to acquire her language with diately challenge the psychological plausibility of the such ease, yet without high computational power or statistical learning approach. In the present discus- favorable learning conditions? It is evident that any sion, however, we are not concerned with this but adequate model of language acquisition must meet simply grant that someday, someone might devise the following empirical conditions: a statistical learning scheme that is psychologically plausible and also succeeds in converging to the tar- • Learnability: such a model must converge to the get language. We show that even if such a scheme target grammar used in the learner's environ- were possible, it would still face serious challenges ment, under plausible assumptions about the from the important but often ignored requirement learner's computational machinery, the nature of developmental compatibility. of the input data, sample size, and so on. One of the most significant findings in child lan- guage research of the past decade is that different • Developmental compatibility: the learner mod- aspects of syntactic knowledge are learned at differ- eled in such a theory must exhibit behaviors ent rates. For example, consider the placement of that are analogous to the actual course of lan- finite verb in French, where inflected verbs precede guage development (Pinker, 1979). negation and adverbs:

* I would like to thank Julie Legate, Sam Gutmann, Bob Jean voit souvent/pas Marie. Berwick, Noam Chomsky, John Frampton, and John Gold- Jean sees often/not Marie. smith for comments and discussion. This work is supported by an NSF graduate fellowship. This property of French is mastered as early as

429 the 20th month, as evidenced by the extreme rarity where "is" has been fronted from the position t, the of incorrect verb placement in child speech (Pierce, position it assumes in a declarative sentence. A pos- 1992). In contrast, some aspects of language are ac- sible inductive rule to describe the above sentence is quired relatively late. For example, the requirement this: front the first auxiliary verb in the sentence. of using a sentential subject is not mastered by En- This rule, though logically possible and computa- glish children until as late as the 36th month (Valian, tionally simple, is never attested in child language 1991), when English children stop producing a sig- (Chomsky, 1975; Crain and Nakayama, 1987; Crain, nificant number of subjectless sentences. 1991): that is, children are never seen to produce When we examine the adult speech to children sentences like: (transcribed in the CHILDES corpus; MacWhinney and Snow, 1985), we find that more than 90% of , Is the cat that the dog t chasing is scared? English input sentences contain an overt subject, whereas only 7-8% of all French input sentences con- where the first auxiliary is fronted (the first "is"), tain an inflected verb followed by negation/adverb. instead of the auxiliary following the subject of the A statistical learner, one which builds knowledge sentence (here, the second "is" in the sentence). purely on the basis of the distribution of the input Acquisition findings like these lead linguists to data, predicts that English obligatory subject use postulate that the human language capacity is con- should be learned (much) earlier than French verb strained in a finite prior space, the Universal Gram- placement - exactly the opposite of the actual find- mar (UG). Previous models of language acquisi- ings in child language. tion in the UG framework (Wexter and Culicover, Further evidence against statistical learning comes 1980; Berwick, 1985; Gibson and Wexler, 1994) are from the Root Infinitive (RI) stage (Wexler, 1994; transformational, borrowing a term from evolution inter alia) in children acquiring certain languages. (Lewontin, 1983), in the sense that the learner moves Children in the RI stage produce a large number of from one hypothesis/grammar to another as input sentences where matrix verbs are not finite - un- sentences are processed. 1 Learnability results can grammatical in adult language and thus appearing be obtained for some psychologically plausible algo- infrequently in the primary linguistic data if at all. rithms (Niyogi and Berwick, 1996). However, the It is not clear how a statistical learner will induce developmental compatibility condition still poses se- non-existent patterns from the training corpus. In rious problems. addition, in the acquisition of verb-second (V2) in Since at any time the state of the learner is identi- Germanic grammars, it is known (e.g. Haegeman, fied with a particular grammar defined by UG, it is 1994) that at an early stage, children use a large hard to explain (a) the inconsistent patterns in child proportion (50%) of verb-initial (V1) sentences, a language, which cannot be described by ally single marked pattern that appears only sparsely in adult adult grammar (e.g. Brown, 1973); and (b) the speech. Again, an inductive learner purely driven by smoothness of language development (e.g. Pinker, corpus data has no explanation for these disparities 1984; Valiant, 1991; inter alia), whereby the child between child and adult languages. gradually converges to the target grammar, rather Empirical evidence as such poses a serious prob- than the abrupt jumps that would be expected from lem for the statistical learning approach. It seems binary changes in hypotheses/grammars. a mistake to view language acquisition as an induc- Having noted the inadequacies of the previous tive procedure that constructs linguistic knowledge, approaches to language acquisition, we will pro- directly and exclusively, from the distributions of in- pose a theory that aims to meet language learn- put data. ability and language development conditions simul- taneously. Our theory draws inspirations from Dar- 1.2 The Transformational Approach winian evolutionary biology. Another leading approach to language acquisition, largely in the tradition of generative linguistics, is 2 A Selectionist Model of Language motivated by the fact that although child language is Acquisition different from adult language, it is different in highly 2.1 The Dynamics of Darwinian Evolution restrictive ways. Given the input to the child, there Essential to Darwinian evolution is the concept of are logically possible and computationally simple in- variational thinking (Lewontin, 1983). First, differ- ductive rules to describe the data that are never attested in child language. Consider the following 1 Note that the transformational approach is not restricted well-known example. Forming a question in English to UG-based models; for example, Brill's influential work involves inversion of the auxiliary verb and the sub- (1993) is a corpus-based model which successively revises a set of syntactic_rules upon presentation of partially bracketed ject: sentences. Note that however, the state of the learning sys- tem at any time is still a single set of rules, that is, a single Is the man t tall? "grammar".

430 ences among individuals are viewed as "real", as op- Comment: The algorithm is the Linear reward- posed to deviant from some idealized archetypes, as penalty (LR-p) scheme (Bush and Mostellar, 1958), in pre-Darwinian thinking. Second, such differences one of the earliest and most extensively studied result in variance in operative functions among indi- stochastic algorithms in the psychology of learning. viduals in a population, thus allowing forces of evo- It is real-time and on-line, and thus reflects the lution such as natural selection to operate. Evolu- rather limited computational capacity of the child tionary changes are therefore changes in the distri- language learner, by avoiding sophisticated data pro- bution of variant individuals in the population. This cessing and the need for a large memory to store contrasts with Lamarckian transformational think- previously seen examples. Many variants and gener- ing, in which individuals themselves undergo direct alizations of this scheme are studied in Atkinson et changes (transformations) (Lewontin, 1983). al. (1965), and their thorough mathematical treat- ments can be found in Narendra and Thathac!lar 2.2 A population of grammars (1989). Learning, including language acquisition, can be The algorithm operates in a selectionist man- characterized as a sequence of states in which the ner: grammars that succeed in analyzing input sen- learner moves from one state to another. Transfor- tences are rewarded, and those that fail are pun- mational models of language acquisition identify the ished. In addition to the psychological evidence for state of the learner as a single grammar/hypothesis. such a scheme in animal and human learning, there As noted in section 1, this makes difficult to explain is neurological evidence (Hubel and Wiesel, 1962; the inconsistency in child language and the smooth- Changeux, 1983; Edelman, 1987; inter alia) that the ness of language development. development of neural substrate is guided by the ex- We propose that the learner be modeled as a pop- posure to specific stimulus in the environment in a ulation of "grammars", the set of all principled lan- Darwinian selectionist fashion. guage variations made available by the biological en- 2.4 A Convergence Proof dowment of the human language faculty. Each gram- For simplicity but without loss of generality, assume mar Gi is associated with a weight Pi, 0 <_ Pi <_ 1, that there are two grammars (N -- 2), the target and ~pi -~ 1. In a linguistic environment E, the grammar T1 and a The results pre- weight pi(E, t) is a function of E and the time vari- pretender T2. able t, the time since the onset of language acquisi- sented here generalize to the N-grammar case; see Narendra and Thathachar (1989). tion. We say that Definition: The penalty of grammar Ti Definition: Learning converges if in a linguistic environment E is

Ve,0 < e < 1,VGi, [ pi(E,t+ 1) -pi(E,t) [< e ca = Pr(s ¢ T~ I E -~ s) That is, learning converges when the composition In other words, ca represents the probability that and distribution of the grammar population are sta- the grammar T~ fails to analyze an incoming sen- bilized. Particularly, in a monolingual environment tence s and gets punished as a result. Notice that ET in which a target grammar T is used, we say that the penalty probability, essentially a fitness measure learning converges to T if limt-.cv pT(ET, t) : 1. of individual grammars, is an intrinsic property of a UG-defined grammar relative to a particular linguis- 2.3 A Learning Algorithm tic environment E, determined by the distributional Write E -~ s to indicate that a sentence s is an ut- patterns of linguistic expressions in E. It is not ex- terance in the linguistic environment E. Write s E G plicitly computed, as in (Clark, 1992) which uses the if a grammar G can analyze s, which, in a narrow Genetic Algorithm (GA). 2 sense, is parsability (Wexler and Culicover, 1980; The main result is as follows: Berwick, 1985). Suppose that there are altogether Theorem: N grammars in the population. For simplicity, write e2 if I 1-V(cl+c2) l< 1 (1) Pi for pi(E, t) at time t, and p~ for pi(E, t+ 1) at time t_~ooPl_tlim() - C1 "[- C2 t + 1. Learning takes place as follows: The Algorithm: Proof sketch: Computing E[pl(t + 1) [ pl(t)] as Given an input sentence s, the child a function of Pl (t) and taking expectations on both with the probability Pi, selects a grammar Gi 2Claxk's model and the present one share an important feature: the outcome of acquisition is determined by the dif- • ifsEGi {,P}=Pi+V(1-Pi) ferential compatibilities of individual grammars. The choice pj (1 - V)Pj if j ~ i of the GA introduces various psychological and linguistic as- sumptions that can not be justified; see Dresher (1999) and p; = (1 - V)pi Yang (1999). Furthermore, no formal proof of convergence is • ifsf[G~ p,j N--~_l+(1--V)pj if j~i given.

431 sides give . Consider the acquisition of the target, a German V2 grammar, in a population of grammars E[pl(t + 1) = [1 - ~'(el -I- c2)]E~Ol(t)] + 3'c2 (2) below:

Solving [2] yields [11. 1. German: SVO, OVS, XVSO Comment 1: It is easy to see that Pl ~ 1 (and 2. English: SVO, XSVO p2 ~ 0) when cl = 0 and c2 > 0; that is, the learner converges to the target grammar T1, which has a 3. Irish: VSO, XVSO penalty probability of 0, by definition, in a mono- 4. Hixkaryana: OVS, XOVS lingual environment. Learning is robust. Suppose that there is a small amount of noise in the input, We have used X to denote non-argument categories i.e. sentences such as speaker errors which are not such as adverbs, adjuncts, etc., which can quite compatible with the target grammar. Then cl > 0. freely appear in sentence-initial positions. Note that If el << c2, convergence to T1 is still ensured by [1]. none of the patterns in (1) could conclusively distin- Consider a non-uniform linguistic environment in guish German from the other three grammars. Thus, which the linguistic evidence does not unambigu- no unambiguous evidence appears to exist. How- ously identify any single grammar; an example of ever, if SVO, OVS, and XVSO patterns appear in this is a population in contact with two languages the input data at positive frequencies, the German (grammars), say, T1 and T2. Since Cl > 0 and c2 > 0, grammar has a higher overall "fitness value" than [1] entails that pl and P2 reach a stable equilibrium other grammars by the virtue of being compatible at the end of language acquisition; that is, language with all input sentences. As a result, German will learners are essentially bi-lingual speakers as a result eventually eliminate competing grammars. of language contact. Kroch (1989) and his colleagues 2.6 Learning in a Parametric Space have argued convincingly that this is what happened in many cases of diachronic change. In Yang (1999), Suppose that natural language grammars vary in we have been able to extend the acquisition model a parametric space, as cross-linguistic studies sug- to a population of learners, and formalize Kroch's gest. 3 We can then study the dynamical behaviors idea of grammar competition over time. of grammar classes that are defined in these para- Comment 2: In the present model, one can di- metric dimensions. Following (Clark, 1992), we say rectly measure the rate of change in the weight of the that a sentence s expresses a parameter c~ if a gram- target grammar, and compare with developmental mar must have set c~ to some definite value in order findings. Suppose T1 is the target grammar, hence to assign a well-formed representation to s. Con- cl = 0. The expected increase of Pl, APl is com- vergence to the target value of c~ can be ensured by puted as follows: the existence of evidence (s) defined in the sense of parameter expression. The convergence to a single E[Apl] = c2PlP2 (3) grammar can then be viewed as the intersection of parametric grammar classes, converging in parallel Since P2 = 1 - pl, APl [3] is obviously a quadratic to the target values of their respective parameters. function of pl(t). Hence, the growth of Pl will pro- duce the familiar S-shape curve familiar in the psy- 3 Some Developmental Predictions chology of learning. There is evidence for an S-shape The present model makes two predictions that can- pattern in child language development (Clahsen, not be made in the standard transformational theo- 1986; Wijnen, 1999; inter alia), which, if true, sug- ries of acquisition: gests that a selectionist learning algorithm adopted here might indeed be what the child learner employs. 1. As the target gradually rises to dominance, the child entertains a number of co-existing gram- 2.5 Unambiguous Evidence is Unnecessary mars. This will be reflected in distributional One way to ensure convergence is to assume the ex- patterns of child language, under the null hy- istence of unambiguous evidence (cf. Fodor, 1998): pothesis that the grammatical knowledge (in sentences that are only compatible with the target our model, the population of grammars and grammar but not with any other grammar. Unam- their respective weights) used in production is biguous evidence is, however, not necessary for the that used in analyzing linguistic evidence. For proposed model to converge. It follows from the the- grammatical phenomena that are acquired rela- orem [1] that even if no evidence can unambiguously tively late, child language consists of the output identify the target grammar from its competitors, it of more than one grammar. is still possible to ensure convergence as long as all competing grammars fail on some proportion of in- 3Although different theories of grammar, e.g. GB, HPSG, put sentences; i.e. they all have positive penalty LFG, TAG, have different ways of instantiating this idea.

432 2. Other things being equal, the rate of develop- from optional subject languages. 5 However, there ment is determined by the penalty probabili- exists a certain type of English sentence that is in- ties of competing grammars relative to the in- dicative (Hyams, 1986): put data in the linguistic environment [3]. There is a man in the room. In this paper, we present longitudinal evidence Are there toys on the floor? concerning the prediction in (2). 4 To evaluate de- velopmental predictions, we must estimate the the The subject of these sentences is "there", a non- penalty probabilities of the competing grammars in referential lexical item that is present for purely a particular linguistic environment. Here we exam- structural reasons - to satisfy the requirement in ine the developmental rate of French verb placement, English that the pre-verbal subject position must an early acquisition (Pierce, 1992), that of English be filled. Optional subject languages do not have subject use, a late acquisition (Valian, 1991), that of this requirement, and do not have expletive-subject Dutch V2 parameter, also a late acquisition (Haege- sentences. Expletive sentences therefore express the man, 1994). [+] value of the subject parameter. Based on the Using the idea of parameter expression (section CHILDES corpus, we estimate that expletive sen- 2.6), we estimate the frequency of sentences that tences constitute 1% of all English adult utterances unambiguously identify the target value of a pa- to children. rameter. For example, sentences that contain finite Note that before the learner eliminates optional verbs preceding adverb or negation ("Jean voit sou- subject grammars on the cumulative basis of exple- vent/pas Marie" ) are unambiguous indication for the tive sentences, she has probabilistic access to multi- [+] value of the verb raising parameter. A grammar ple grammars. This is fundamentally different from with the [-] value for this parameter is incompatible stochastic grammar models, in which the learner has with such sentences and if probabilistically selected probabilistic access to generative ~ules. A stochastic for the learner for grammatical analysis, will be pun- grammar is not a developmentally adequate model ished as a result. Based on the CHILDES corpus, of language acquisition. As discussed in section 1.1, we estimate that such sentences constitute 8% of all more than 90% of English sentences contain a sub- French adult utterances to children. This suggests ject: a stochastic grammar model will overwhehn- that unambiguous evidence as 8% of all input data ingly bias toward the rule that generates a subject. is sufficient for a very early acquisition: in this case, English children, however, go through long period the target value of the verb-raising parameter is cor- of subject drop. In the present model, child sub- rectly set. We therefore have a direct explanation ject drop is interpreted as the presence of the true of Brown's (1973) observation that in the acquisi- optional subject grammar, in co-existence with the tion of fixed word order languages such as English, obligatory subject grammar. word order errors are "trifingly few". For example, Lastly, we consider the setting of the Dutch V2 English children are never to seen to produce word parameter. As noted in section 2.5, there appears to order variations other than SVO, the target gram- no unambiguous evidence for the [+] value of the V2 mar, nor do they fail to front Wh-words in question parameter: SVO, VSO, and OVS grammars, mem- formation. Virtually all English sentences display bers of the [-V2] class, are each compatible with cer- rigid word order, e.g. verb almost always (immedi- tain proportions of expressions produced.by the tar- ately) precedes object, which give a very high (per- get V2 grammar. However, observe that despite of haps close to 100%, far greater than 8%, which is its compatibility with with some input patterns, an sufficient for a very early acquisition as in the case of OVS grammar can not survive long in the population French verb raising) rate of unambiguous evidence, of competing grammars. This is because an OVS sufficient to drive out other word order grammars grammar has an extremely high penalty probability. very early on. Examination of CHILDES shows that OVS patterns Consider then the acquisition of the subject pa- consist of only 1.3% of all input sentences to chil- rameter in English, which requires a sentential sub- dren, whereas SVO patterns constitute about 65% ject. Languages like Italian, Spanish, and Chinese, of all utterances, and XVSO, about 34%. There- on the other hand, have the option of dropping the fore, only SVO and VSO grammar, members of the subject. Therefore, sentences with an overt subject [-V2] class, are "contenders" alongside the (target) are not necessarily useful in distinguishing English V2 grammar, by the virtue of being compatible with significant portions of input data. But notice that 4In Yang (1999), we show that a child learner, en route to OVS patterns do penalize both SVO and VSO gram- her target grammar, entertains multiple grammars. For ex- mars, and are only compatible with the [+V2] gram- ample, a significant portion of English child language shows characteristics of a topic-drop optional subject grammar like 5Notice that this presupposes the child's prior knowledge Chinese, before they learn that subject use in English is oblig- of and access to both obligatory and optional subject gram- atory at around the 3rd birthday. mars.

433 mars. Therefore, OVS patterns are effectively un- References ambiguous evidence (among the contenders) for the Atkinson, R., G. Bower, and E. Crothers. (1965). V2 parameter, which eventually drive SVO and VSO An Introduction to Mathematical Learning Theory. grammars out of the population. New York: Wiley. In the selectioni-st model, the rarity of OVS sen- Bates, E. and J. Elman. (1996). Learning rediscov- tences predicts that the acquisition of the V2 pa- ered: A perspective on Saffran, Aslin, and Newport. rameter in Dutch is a relatively late phenomenon. Science 274: 5294. Furthermore, because the frequency (1.3%) of Dutch Berwick, R. (1985). The acquisition of syntactic OVS sentences is comparable to the frequency (1%) knowledge. Cambridge, MA: MIT Press. of English expletive sentences, we expect that Dutch Brill, E. (1993). Automatic grammar induction and V2 grammar is successfully acquired roughly at the parsing free text: a transformation-based approach. same time when English children have adult-level ACL Annual Meeting. subject use (around age 3; Valian, 1991). Although Brown, R. (1973). A first language. Cambridge, I am not aware of any report on the timing of the MA: Harvard University Press. correct setting of the Dutch V2 parameter, there is Bush, R. and F. Mostellar. Stochastic models ]'or evidence in the acquisition of German, a similar lan- learning. New York: Wiley. guage, that children are considered to have success- Charniak, E. (1995). Statistical language learning. fully acquired V2 by the 36-39th month (Clahsen, Cambridge, MA: MIT Press. 1986). Under the model developed here, this is not Chomsky, N. (1975). Reflections on language. New an coincidence. York: Pantheon. Changeux, J.-P. (1983). L'Homme Neuronal. Paris: 4 Conclusion Fayard. Clahsen, H. (1986). Verbal inflections in German To capitulate, this paper first argues that consider- child language: Acquisition of agreement markings ations of language development must be taken seri- and the functions they encode. Linguistics 24: 79- ously to evaluate computational models of language 121. acquisition. Once we do so, both statistical learn- Clark, R. (1992). The selection of syntactic knowl- ing approaches and traditional UG-based learnabil- edge. Language Acquisition 2: 83-149. ity studies are empirically inadequate. We proposed Crain, S. and M. Nakayama (1987). Structure de- an alternative model which views language acqui- pendency in grammar formation. Language 63: 522- sition as a selectionist process in which grammars 543. form a population and compete to match linguis- Dresher, E. (1999). Charting the learning path: cues tic* expressions present in the environment. The to parameter setting. Linguistic Inquiry 30: 27-67. course and outcome of acquisition are determined by Edelman, G. (1987). Neural Darwinism.: The the- the relative compatibilities of the grammars with in- ory of neuronal group selection. New York: Basic put data; such compatibilities, expressed in penalty Books. probabilities and unambiguous evidence, are quan- Fodor, J. D. (1998). Unambiguous triggers. Lin- tifiable and empirically testable, allowing us to make guistic Inquiry 29: 1-36. direct predictions about language development. Gibson, E. and K. Wexler (1994). Triggers. Linguis- The biologically endowed linguistic knowledge en- tic Inquiry 25: 355-407. ables the learner to go beyond unanalyzed distribu- Haegeman, L. (1994). Root infinitives, clitics, and tional properties of the input data. We argued in truncated structures. Language Acquisition. section 1.1 that it is a mistake to model language Hubel, D. and T. Wiesel (1962). Receptive fields, acquisition as directly learning the probabilistic dis- binocular interaction and functional architecture in tribution of the linguistic data. Rather, language ac- the cat's visual cortex. Journal of Physiology 160: quisition is guided by particular input evidence that 106-54. serves to disambiguate the target grammar from the Hyams, N. (1986) Language acquisition and the the- competing grammars. The ability to use such evi- ory of parameters. Reidel: Dordrecht. dence for grammar selection is based on the learner's Klavins, J. and P. Resnik (eds.) (1996). The balanc- linguistic knowledge. Once such knowledge is as- ing act. Cambridge, MA: MIT Press. sumed, the actual process of language acquisition is Kroch, A. (1989). Reflexes of grammar in patterns no more remarkable than generic psychological mod- of language change. Language variation and change els of learning. The selectionist theory, if correct, 1: 199-244. show an example of the interaction between domain- Lewontin, R. (1983). The organism as the subject specific knowledge and domain-neutral mechanisms, and object of evolution. Scientia 118: 65-82. which combine to explain properties of language and de Marcken, C. (1996). Unsupervised language ac- cognition. quisition. Ph.D. dissertation, MIT.

434 MacWhinney, B. and C. Snow (1985). The Child Language Date Exchange System. Journal of Child Language 12, 271-296. Narendra, K. and M. Thathachar (1989). Learning automata. Englewood Cliffs, N J: Prentice Hall. Niyogi, P. and R. Berwick (1996). A language learn- ing model for finite parameter space. Cognition 61: 162-193. Pierce, A. (1992). Language acquisition and and syntactic theory: a comparative analysis of French and English child grammar. Boston: Kluwer. Pinker, S. (1979). Formal models of language learn- ing. Cognition 7: 217-283. Pinker, S. (1984). Language learnability and lan- guage development. Cambridge, MA: Harvard Uni- versity Press. Seidenberg, M. (1997). Language acquisition and use: Learning and applying probabilistic con- straints. Science 275: 1599-1604. Stolcke, A. (1994) Bayesian Learning of Probabilis- tic Language Models. Ph.D. thesis, University of California at Berkeley, Berkeley, CA. Valian, V. (1991). Syntactic subjects in the early speech of American and Italian children. Cognition 40: 21-82. Wexler, K. (1994). Optional infinitives, head move- ment, and the economy of derivation in child lan- guage. In Lightfoot, D. and N. Hornstein (eds.) Verb movement. Cambridge: Cambridge University Press. Wexler, K. and P. Culicover (1980). Formal princi- ples of language acquisition. Cambridge, MA: MIT Press. Wijnen, F. (1999). Verb placement in Dutch child language: A longitudinal analysis. Ms. University of Utrecht. Yang, C. (1999). The variational dynamics of natu- ral language: Acquisition and use. Technical report, MIT AI Lab.

435