Explicit Causal Connections between the Acquisition of Linguistic Tiers: Evidence from Dynamical Systems Modeling

Daniel Spokoyny and Jeremy Irvin Fermín Moscoso del Prado Martín College of Creative Studies Department of Linguistics University of California, Santa Barbara University of California, Santa Barbara Santa Barbara, CA 93106, USA Santa Barbara, CA 93106, USA [email protected] [email protected] [email protected]

Abstract Tomasello, 1992; Tomasello, 2005) propose that there is a gradual increase in the generality of the In this study, we model the causal links be- structures learned by the child, which are slowly tween the of different macro- acquired through distributional learning. Such a scopic aspects of child language. We con- picture is strongly supported by the remarkably lit- sider pairs of sequences of measurements tle creativity exhibited by children, most of whose of the quantity and diversity of the lexi- utterances are often literal repetitions of those that cal and grammatical properties. Each pair they have previously heard (Lieven et al., 1997; of sequences is taken as the trajectory of a Pine and Lieven, 1993), with little or no general- high-dimensional , some ization in the early stages. It appears as though of whose dimensions are unknown. We children progressively and conservatively increase use Multispatial Convergent Cross Map- the level at which they generalize linguistic con- ping to ascertain the directions of causal- structions, building from the words upwards, in ity between the pairs of sequences. Our what some have termed ‘lexically-based positional results provide support for the hypothesis analysis’ (Lieven et al., 1997). that children learn grammar through dis- The Theory of Dynamical Systems offers pow- tributional learning, where the generaliza- erful tools for modeling human development (e.g, tion of smaller structures enables general- Smith and Thelen, 2003; van Geert, 1991). It pro- izations at higher levels, consistent with vides a mathematical framework for implementing the proposals of construction-based ap- the principle that development involves the mutual proaches to language. and continuous interaction of multiple levels of the developing system, which simultaneously un- 1 Introduction fold over many time-scales. Typically, a dynam- A crucial question in language acquisition con- ical system is described by a system of coupled cerns how (or, according to some, whether) chil- differential equations governing the temporal evo- dren learn the grammars of their native languages. lution of multiple parts of the system and their in- Some researchers, mainly coming from the gener- terrelations. One difficulty that arises when trying ative tradition, argue that, although the grammati- to model a dynamical system as complex as the cal rules are possibly ‘innate’ (e.g., Pinker, 1994), development of language is that many factors that children still need to learn how to map the dif- are important for the evolution of the system might ferent semantic/grammatical roles onto the differ- not be available or might not be easily measurable ent options offered by Universal Grammar (e.g., or –even worse– there are additional variables rel- ‘parameter-setting’). The evidence, however, does evant for the system of which the modeler is not not seem to support this hypothesis. For instance, even aware. In this respect, a crucial development Bowerman (1990) notes that the type of seman- was the discovery that, in a deterministic coupled tic aspects learned by the child do not match well dynamical system –even in the presence of noise– into the prototypical roles that would be required the dynamics of the whole system can be satisfac- to map into hard linguistic rules (e.g., learning an torily recovered using measurements of a single of AGENT category to map onto the SUBJECT syntac- the system’s variables (Takens’ Embedding Theo- tic role). Other researchers (e.g., Goldberg, 2003; rem; Takens, 1981).

73 Proceedings of the 7th Workshop on Cognitive Aspects of Computational Language Learning, pages 73–81, Berlin, Germany, August 11, 2016. c 2016 Association for Computational Linguistics

The finding above opens an interesting avenue that is valid for non-separable systems, is capa- for understanding the processes involved in lan- ble of identifying weakly coupled variables even guage acquisition. In the same way that systems in the presence of noise, and –crucially– can dis- of differential equations can be used to model the tinguish direct causal relations between variables evolution of ecosystems (e.g., predator-prey sys- from effects of shared driving variables (i.e., in tems), one could take measurements of the de- possibility (d) above, CCM would not find causal- tailed properties of child language, and build a ity). detailed system of equations capturing the macro- scopic dynamics of the process. However, in order to achieve this, it is necessary to ascertain the ways in which different measured variables in the sys- tem affect each other. This problem goes beyond estimating correlations (as could be obtained, for instance, using regression models), as one needs to detect asymmetrical causal relations between the variables of interest, so that these causal influences can be incorporated into the models. In this study, we investigate the causal relations different macroscopic-level measures characteriz- ing the level of development of different tiers child Figure 1: Reconstructed manifold for Lorenz’s language (i.e., number of words produced, lexical system (M; top), as well as the shadow manifolds diversity, inflectional diversity and mean length of reconstructed considering only X (MX ; bottom- utterances), using the longitudinal data provided left) and Y (MY ; bottom-right) (reprinted with in the Manchester Corpus (Theakston et al., 2001). permission from Sugihara et al., 2012). In order to detect causal relations between the dif- For instance, consider E. Lorenz’s often stud- ferent measures, we make use of state space re- ied dynamical system, which includes three cou- construction relying on Takens (1981)’s Embed- pled variables X(t), Y (t), and Z(t) whose co- ding Theorem, and recently developed techniques evolution is described by the system of differential for assessing the strength of causal relations in dy- equations namical systems (Multispatial Convergent Cross Mapping; Clark et al., 2015). Here, we provide dX = σ(Y X) a detailed picture of the causal connections be- dt  − tween the development of different aspects of a dY  = X(ρ Z) Y . (1) child’s language (while acquiring English). Our   dt − − result provide support for theories that advocate dZ distributional learning of linguistic constructions = XY βZ  dt − by gradual generalizations from the level of words   to larger scale constructions. The first equation in this system indicates that there is a relation by which Y causes X, as the 2 Causality Detection in Dynamical change in X (i.e., its future value) depends on Systems the value of Y (i.e., the future of X depends on the past of Y even after the past of X itself has Whenever two variables are correlated, there must been considered), a causal relation whose strength exist some causal link between them. Namely, if is indexed by parameter σ. The manifold defined variables A and B are found to be correlated, then by these three variables (Lorenz’s famous strange one of four possibilities must be true: (a) A causes ), which we can denote by M, is plot- B, (b) B causes A, (c) A and B form a feedback ted in the top of Fig. 1. In many circumstances, loop, each causing the other, or (d) there is a third however, not all variables of the system are avail- variable C causing both A and B. For studying the able (some might be difficult to measure, or we interactions of species within ecosystems, Sugi- might not even be aware of their relevance). It hara et al. (2012) introduced Convergent Cross is at this point that Takens (1981)’s Embedding Mapping (CCM), a causality-detection technique Theorem comes into play. Informally speaking,

74 the theorem states that the properties of a coupled vergence then becomes the necessary condition dynamical system’s attractor can be recovered us- for inferring causation. Using both artificial sys- ing only measurements from a single one of its tems and ecological time-series with known dy- variables. This is achieved by considering mul- namics, Sugihara and his colleagues demonstrated tiple versions of the same variable lagged in time, that this technique successfully recovers true di- that is, instead of plotting (X[t],Y [t],Z[t]), when rectional causal relations when these are present, only measurements of X are available, we can plot and –crucially– is able to discard spurious causa- (X[t],X[t + τ],...,X[t + (E 1)τ]). These re- tion in the case when both variables are causally − constructed manifolds are termed “shadow” man- driven by a third, unknown, variable, but there is ifolds. MX denotes the shadow manifold of M no true direct causation between them. reconstructed on the basis of X alone. There are An inconvenience of CCM, and in general of well-studied techniques for finding the appropri- techniques that rely on manifold reconstruction, ate values for the parameters for the lag τ and is that they generally require that relatively long the number of dimensions E (c.f., Abarbanel et time-series of the behavior of the system are avail- al., 1993) so that the properties of the original able. Such long series are, however, very diffi- manifold M are recovered by the shadow mani- cult, if not impossible, to obtain in many fields, fold MX . Fig. 1 illustrates this point by plotting including of course language acquisition. One can the shadow manifolds MX (bottom-left) and MY however obtain multiple short from dif- (bottom-right) for the . Notice how ferent instances of a similar dynamical system. both shadow manifolds recover much of the origi- In ecology, for instance, one can obtain short se- nal’s structure, using only knowledge of one of its quences of measurements of the population den- three variables. sities of a group of species measured at differ- ent places and times. In language acquisition, we Each point in the original manifold M maps might have multiple, relatively short longitudinal onto points in its shadow manifolds, as is illus- sequences of measurements from different chil- trated by the points labelled m(t), x(t), and y(t) dren. With this in mind Clark et al. (2015) devel- in Fig. 1. The preservation of the topological oped Multispatial CCM (mCCM), an extension of properties of the original manifold in its shadow CCM able to infer causal relations from multiple manifolds entails that points that are close-by in short time-series measured at different sites, mak- the original manifold will also be close-by in its ing use of dewdrop regression (Hsieh et al., 2008) shadow versions. This implies that, for causally to take the additional heterogeneity into account. linked variables within the same dynamical sys- tem, the state of one variable can identify the states 3 Materials and Methods of the others. Sugihara et al. (2012) noticed that, when one variable X stochastically drives another 3.1 Materials variable Y , information about the states of X can We obtained from the CHILDES database be recovered from Y , but not vice-versa. This is (MacWhinney, 2000) the transcriptions contained the basic insight of the CCM method. To test for in the Manchester Corpus (Theakston et al., 2001). causality from X to Y , CCM looks for the signa- This corpus contains annotated transcripts of au- ture of X in Y ’s time series by seeing whether the dio recordings from a longitudinal study of 12 time indices of nearby points on MY can be used British English-speaking children (6 girls and 6 to identify nearby points on MX . Crucially, in or- boys) between the ages of approximately two and der to distinguish causation from mere correlation, three years. The children were recorded at their CCM requires convergence, that is, that cross- homes for an hour while they engaged in normal mapped estimates improve in estimation accuracy play activities with their mothers. Each child was with the sample size (i.e., “library size”) used for recorded on two separate occasions in every three- reconstructing the manifolds. As the library size week period for one year. Each recording ses- increases, the trajectories defining the manifolds sion is divided into two half-hour periods. The fill in, resulting in closer nearest neighbors and annotations include the lemmatized form of the declining estimation error, which is reflected in a words produced by the children (incomplete words higher correlation coefficient between the points in and small word-internal errors were manually cor- the neighborhoods of the shadow manifolds. Con- rected in the lemmatization).

75 In order to increase the sample size in each pe- Prado Martín et al., 2004) as well as in child lan- riod, we followed a sliding window technique of guage acquisition (Stoll et al., 2012). The inflec- (Irvin et al., in press): We computed measures for tional entropy of a lemma ` (H[W `]) is the in- | the samples contained in overlapping windows of formation entropy of the inflected variants of that three consecutive corpus files. In this way, at each lemma. Our inflectional diversity is just the av- point we obtained samples originating from two erage value of inflectional entropy across all lem- files from the same recording session, and a file mas, from either the previous or the next recording ses- sion. H[W L] = H[W, L] H[L], (3) | − 3.2 Measures of Linguistic Development where H[L] is the lexical diversity measure de- As in previous studies (Irvin et al., in press; scribed above, and H[W, L] is the joint entropy Moscoso del Prado Martín, in press), in order to between the inflected word forms and their corre- measure the overall amount of speech produced by sponding lemmas, each child, we counted the total number of word 1 tokens produced by each child in each temporal H[W, L] = p(w, `) log , (4) · p(w, `) window. We refer to this measure as the child’s ` L w W X∈ X∈ loquacity. In order to measure the diversity of the words where L denotes the set of all distinct lemmas en- used by the children, we use the lexical diversity countered in the sample, W is the set of all distinct measure (Irvin et al., in press; Moscoso del Prado inflected word forms encountered, and p(w, `) is Martín, in press). This is just the information en- the joint probability with which lemma ` occurs tropy (Shannon, 1948) of the probability distribu- as the specific inflected form w. Inflectional di- tion of word lemmas found in the sample, versity takes non-negative values, measuring how large are the average inflectional paradigms used 1 in the language sample. Estimating H[W, L] us- H[L] = p(`) log , (2) · p(`) ing Eq. 4 is subject to the same estimation biases ` L X∈ that were described for lexical diversity. There- where L refers to the set of word lemmas found fore, we also follow Moscoso del Prado Martín (in in a sample, and p(`) is the probability with which press) in using the bias-adjusted estimate (Chao et the particular lemma ` is found in that sample. En- al., 2013, see Appendix A) for this magnitude, and tropy estimates obtained using Eq. 2 using maxi- then combining it with the lexical diversity using mum likelihood estimates of the probabilities are Eq. 3 to obtain our inflectional diversity estimates. known to be strongly biased (Miller, 1955), with Finally, in order to measure the degree of syn- the bias magnitude correlating with the size of tactic development of the children we used their the sample used. Importantly, the sample size is mean length of utterances (MLU). Instead of mea- nothing else than the loquacity measure described suring MLU in morphemes (Brown, 1973), we above. Therefore, using this plain maximum like- used the simpler, but equally accurate measure in lihood method would result in spurious correla- number of words (c.f., Parker and Bronson, 2005). tions. For this reason, Moscoso del Prado Martín In these ages, MLUs are well known to provide (in press) recommends using the bias-adjusted en- an accurate measure of the syntactic richness of tropy estimator (Chao et al., 2013, see Appendix the utterances produced (Brown, 1973), and in A) instead of Eq. 2. fact correlate almost perfectly with explicit mea- In order to measure the acquisition of inflec- surements of grammatical diversity (Moscoso del tional morphological paradigms, we make use Prado Martín, in press). of the inflectional diversity measure (Moscoso del Prado Martín, in press). This is a 3.3 Reconstruction of Shadow Manifolds macroscopic generalization of inflectional entropy Using the windowing technique, for each child we (Moscoso del Prado Martín et al., 2004), a mea- obtained four time series, one corresponding to sure that is known to index morphological in- each of the four measures described above: lo- fluences on adult lexical processing (Baayen and quacity, lexical diversity, inflectional diversity, and Moscoso del Prado Martín, 2005; Moscoso del MLU. These time series are plotted in Fig. 2

76 (a) (b)

(c) (d)

Figure 2: Evolution of the measures studied as a function of the children’s ages (in days). (a) Evolution of the loquacity (measured in number of word tokens produced) for each of the twelve children. (b) Evo- lution of the lexical diversity (measured in nats per word) for each of the twelve children. (c) Evolution of the inflectional diversity (measured in nats per word) for each of the twelve children. (d) Evolution of the MLU (in number of words per utterance) for each of the twelve children.

Lexical Inflectional Parameter Loquacity MLU the shadow from each of these collec- Diversity Diversity τ 3 2 3 3 tions of time series. The optimal time-lags (τ) E 3 3 2 4 for constructing the shadow manifolds were es- timated as the first local minimum of the lagged Table 1: Parameter values used in the reconstruc- self-information in each of the time series (c.f., tion of the shadow attractors based on each of the Abarbanel et al., 1993). The optimal embedding four measures. dimensionalities (E) were estimated by optimiz- ing next-step prediction accuracy. The estimates In order to ensure that applying the non-linear were not found to differ significantly across chil- dynamics techniques on these time series was dren, and therefore for each measure, we used a sensible, the series were checked to ensure that single estimate of (τ, E) for all children. The es- they contained non-linear signal not dominated by timated optimal parameter values used for the re- noise. This was achieved using a prediction test construction of each shadow attractor are given in (Clark et al., 2015): We ensured that, for all four Table 1. variables, the ability to predict future values sig- nificantly decreased as one increases the distance in the future at which the predictions are being 3.4 Detection of Causal Relationships made. This increasing unpredictability is the hall- mark of non-linear dynamical systems. Therefore, The presence of directional causal relations was we could safely proceed to reconstruct the shadow tested for each of the six possible pairs of vari- attractors. ables using mCCM. We performed 1,000 boot- Following Clark et al. (2015), we reconstructed strapping iterations for assessing the p-values of

77 the relations.1 Finally, to account for our lack the FDR procedure). of a priori predictions on the causal directions to Using the p-values in Fig. 3 enables the recon- be tested, the p-values were adjusted for multiple struction of the network of causal relations de- comparisons using the false discovery rate pro- picted in Fig. 4. In this graph, the causal relation cedure for correlated data (FDR; Benjamini and between the loquacity and the lexical diversity is Yekutieli, 2001). considered weaker than the rest. The reason for this is that the comparisons reported here are in 4 Results fact part of a larger study considering many more comparisons (including many factors of the moth- As plotted in Fig. 2, the four groups of time-series ers as well), on which we did not have any clear considered here exhibit different patterns of devel- a priori predictions on the relations that would be opment. On the one hand, the loquacity, inflec- found. When applying the FDR method on the tional diversity, and MLU series show evidence of whole set of 56 comparisons that we actually con- a more or less linear increase along the child’s de- sidered, the relation plotted by the dashed arrow velopment, with their values towards the end of the is in fact not significant. In short, one should not studied interval being close to what was found for trust the reliability of that particular relation. their mothers in those same conversations. On the other hand, the lexical diversity measure exhibits Considering only the fully reliable relations, quite constant patterns across all children, with one finds that, as was suspected from the curves in their values being pretty much indistinguishable Fig. 2, there is an explicit causal relation between from those observed for their mothers. This lat- the development of vocabulary richness (i.e., lex- ter pattern is slighly different in two chilren (Ruth ical diversity) and the acquisition of inflectional and Nick), who seem to be experiencing their ‘vo- paradigms (i.e., inflectional diversity). The in- cabulary burst’ later than the rest of the children crease in lexical diversity indeed causes the de- did. In fact, if one examines panel (c) in detail, one velopment of inflectional paradigms. In turn, that sees that the inflectional diversity curves for these the inflectional paradigms begin to be in place en- two children only begin their linear increases af- ables the child to begin generalizing more syntac- ter the children have experienced their vocabulary tic relationships (as is reflected by the feedback bursts. A similar pattern can be seen in the MLU loop found between the inflectional diversity and curves (panel (d)) for these two particular chil- MLU manifolds). Importantly, that the inflectional dren, with syntactic development apparently be- paradigms are developed is also strongly coupled ing delayed by their late vocabulary bursts. These (i.e., forms a feedback loop) with the increase the two patterns suggest that the development of both children’s overall loquacity; once children begin to grammatical components of their language (inflec- get a hold of grammar (inflection and syntax) they tional morphology and syntax) depends on hav- are enabled to speak more, which in turn furthers ing attained a certain degree of vocabulary rich- their ability to generalize morphological relations ness. However, just examining these curves does and –by association– syntactic relations. not provide explicit evidence on whether these hy- pothesized causal relationships are actually reli- 5 Discussion able ones or they are just statistical mirages. The In this study, we have –for the first time– docu- mCCM method addresses such question directly. mented the explicit causal relations between dif- Fig. 3 plots the results of mCCM for each pair of ferent tiers of children’s linguistic development. reconstructed shadow manifolds. The curves plot At a macroscopic level, we find strictly causal re- how the correlations between nearest neighbors lations between the acquisition of vocabulary, in- across shadow attractors evolve as one considers flectional paradigms, and syntactic relationships. increasingly larger library sizes. The p-values re- As schematized in Fig. 4, the development of a port whether these correlation values are signifi- sufficiently large vocabulary is a crucial trigger cantly increasing (the p-values are obtained by a for the successful acquisition of the grammatical Monte Carlo method with 1,000 resamplings, and aspects of language, which are in turn necessary further corrected for the twelve comparisons using for children to be able to speak more. These re- 1All computations were done using R package sults are consistent with theories advocating the multispatialCCM (Clark et al., 2015). importance distributional learning for the acqui-

78 Figure 3: Results of mCCM between each pair of variables. The p-values are FDR-corrected considering the twelve comparisons reported here. sition of constructions (Lieven et al., 1997; Pine This study also stresses the importance of and Lieven, 1993; Tomasello, 1992; Tomasello, macroscopic level linguistic analyses. Whereas 2005). It shows how the level of generality of much research in language acquisition has focused the constructions is progressively increased (Gold- on the acquisition of specific individual construc- berg, 2003) by the use of ‘lexically-based posi- tions (microscopic level) or groups thereof (meso- tional analysis’ (Lieven et al., 1997) to achieve scopic level), the investigation of the properties of early grammatical generalizations. the whole lexicon, inflectional and syntactic sys- The picture of causal relations observed here tems uncovers relations which are difficult to pin- could be put into an informal narrative as follows: point at the other levels. This fits in well with The acquisition of sufficient lexical forms enables the multiscale investigation of language develop- children to generalize their relations into inflec- ment proposed from the point of view of the The- tional paradigms. When a sufficient command of ory of Dynamical Systems (van Geert, 1991). In- the language’s inflectional morphology has arisen, deed, one can see, at the mesoscopic level, that children are able to begin generalizing syntactic –also consistent with the distributional learning relations. The presence of these early syntactic hypothesis– there is a causal chain by which the developments in turn serves to increase the child’s development of single word utterances triggers awareness of the functional roles served by differ- the development of two-word utterances, which ent paradigm members. From this point, one ob- in turn trigger three-word utterances, and so forth serves the strong bidirectional coupling between (Bassano and van Geert, 2007). The macro- the development of syntax and inflectional mor- scopic analyses provided here complement that phology. An increasing awareness of the func- picture by indicating how that evolution of utter- tional roles of the individual forms within these ance lengths is strongly coupled with the develop- paradigms, and noticing the formal relations be- ment (or ‘growth’ in van Geert’s terms) of gram- tween them, in turn trigger further generalizations matical knowledge. of the paradigms into inflectional classes (Milin An innovative aspect of the methods we have et al., 2009), further increasing the productivity of developed in this paper is that they provide an ex- the inflectional morphology system. plicit procedure for testing whether there are ex-

79 3 Lexical Diversity p=.031

p=.012 p=.012 p=.01

!  z Loquacity Inflectional Diversity Syntactic Diversity (MLU) a :

p<.001 p<.001

Figure 4: Reconstructed network of causal relations between the different measures of the children’s linguistic performances. The p-values indicated on the causal arrows are FDR-corrected. The dashed- line denotes a relationship that does not survive FDR correction considering a larger set of variables. plicit causal relations between the development Yoav Benjamini and Daniel Yekutieli. 2001. The con- of different aspects of language. Here we have trol of the false discovery rate in multiple testing un- used the methods at a macroscopic level, but it der dependency. Annals of Statistics, 29:1165–1188. would be equally possible to apply them to both Melissa Bowerman. 1990. Mapping thematic roles microscopic- or mesoscopic-level time series. Pre- onto syntactic functions: are children helped by in- vious research on dynamical systems on language nate linking rules? Linguistics, 28:1253–1289. acquisition (e.g., Bassano and van Geert, 2007; Roger Brown. 1973. A first language: the early stages. Steenbeek and van Geert, 2007; van Geert, 1991) Harvard University Press, Cambridge, MA. relies on proposing different candidate models in terms of systems of differential equations, each in- Anne Chao, Y. T. Wang, and Lou Jost. 2013. Entropy and the species accumulation curve: a novel entropy cluding different sets of causal relations and cou- estimator via discovery rates of new species. Meth- plings between time series. Our methods, us- ods in Ecology and Evolution, 4:1091–1100. ing techniques for explicitly testing causal rela- tions borrowed from ecology (a field whose study Adam Thomas Clark, Hao Ye, Forest Isbell, Ethan R. Deyle, Jane Cowles, G. David Tilman, and George bears uncanny similarities with the study of hu- Sugihara. 2015. Spatial convergent cross mapping man development), complement the curve-fitting to detect causal relationships from short time series. by explicitly testing which couplings and causali- Ecology, 96:1174–1181. ties should be included in the models, thus signif- Adele E. Goldberg. 2003. Constructions: a new theo- icantly reducing the model space that needs to be retical approach to language. TRENDS in Cognitive explored. Sciences, 7:219–224.

Chih-hao Hsieh, Christian Anderson, and George Sug- References ihara. 2008. Extending nonlinear analysis to short ecological time series. The American Naturalist, Henry D. I. Abarbanel, Reggie Brown, John J. 171:71–80. Sidorowich, and Lev Sh. Tsimring. 1993. The anal- ysis of observed chaotic data in physical systems. Jeremy Irvin, Daniel Spokoyny, and Fermín Moscoso Reviews of Modern Physics, 65:1331–1392. del Prado Martín. in press. Dynamical systems modeling of the child–mother dyad: Causality be- R. Harald Baayen and Fermín Moscoso del tween child-directed language and lan- Prado Martín. 2005. Semantic density and guage development. In Anna Papafragou, John past-tense formation in three Germanic languages. Trueswell, Dan Grodner, and Dan Mirman, editors, Language, 81:666–698. Proceedings of the 38th Annual Conference of the Cognitive Science Society. Cognitive Science Soci- Dominique Bassano and Paul L. C. van Geert. 2007. ety, Austin, TX. Modeling continuity and discontinuity in utter- ance length: a quantitative approach to changes, Elena V. M. Lieven, Julian M. Pine, and Gilliam Bald- transitions and intraindividual variability in early win. 1997. Lexically-based learning and early grammatical development. Developmental Science, grammatical development. Journal of Child Lan- 10:588–512. guage, 24:187–219.

80 Brian MacWhinney. 2000. The CHILDES Project: Floris Takens. 1981. Detecting strange attractors in Tools for analyzing talk, volume 2: The database. turbulence. In D. A. Rand and L.-S. Young, editors, Lawrence Erlbaum Associates, Mahwah, NJ, 3rd Dynamical Systems and Turbulence, pages 366–381. edition. Springer Verlag, Berlin, Germany.

Petar Milin, Dušica Filipovic´ Ðurdevi¯ c,´ and Fermín Anna L. Theakston, Elena V. M. Lieven, Julian M. Moscoso del Prado Martín. 2009. The simultane- Pine, and Caroline F. Rowland. 2001. The role of ous effects of inflectional paradigms and classes on performance limitations in the acquisition of verb- lexical recognition: Evidence from Serbian. Journal argument structure: an alternative account. Journal of Memory and Language, 60:50–64. of Child Language, 28:127–152. Michael Tomasello. 1992. First Verbs: a Case Study of George Miller. 1955. Note on the bias of informa- Early Grammatical Development. Cambridge Uni- tion estimates. In Henry Quastler, editor, Informa- versity Press, Cambridge, England. tion Theory in Psychology: Problems and Methods, pages 95–100. Free Press, Glencoe, IL. Michael Tomasello. 2005. Beyond formalities: The case of language acquisition. The Linguistic Review, Fermín Moscoso del Prado Martín, Aleksandar Kostic,´ 22:183–197. and R. Harald Baayen. 2004. Putting the bits to- gether: an information theoretical perspective on Paul L. C. van Geert. 1991. A dynamic systems model morphological processing. Cognition, 94:1–18. of cognitive and language growth. Psychological Review, 98:3–53. Fermín Moscoso del Prado Martín. in press. Vocabu- lary, grammar, sex, and aging. Cognitive Science. A Bias-Adjusted Entropy Estimator Matthew D. Parker and Kent Brorson. 2005. A com- The bias-adjusted entropy estimator (Chao et al., parative study between mean length of utterance in 2013) relies on properties of the accumulation morphemes (MLUm) and mean length of utterance curve of the number of distinct words observed in words (MLUw). First Language, 25:365–376. (i.e., the species accumulation curve in the biolog- Julian M. Pine and Elena V. M. Lieven. 1993. Re- ical terms of the original paper). The estimator is analysing rote-learned phrases: individual differ- given by ences in the transition to multiword speech. Journal of Child Language, 20:43–60. n 1 F − 1 Hˆ = i n k Steven Pinker. 1994. The language instinct: How the 1 F n 1 " k=F !# ≤ Xi≤ − Xi mind creates language. Harper-Collins, New York, n 1 NY. f1 1 n − 1 r (1 A) − log(A) + (1 A) , − n − r − " r=1 # Claude E. Shannon. 1948. A mathematical theory of X communication. The Bell System Technical Journal, where F are the word frequencies observed in the 27:379–423, 623–656. i sample, n is the number of tokens in the corpus, Linda Smith and Esther Thelen. 2003. Development as and a dynamic system. TRENDS in Cognitive Sciences, 2f 7:343–348. 2 if f > 0, (n 1)f + 2f 2  − 1 2 Hederien W. Steenbeek and Paul L. C. van Geert. 2007. A =  2 . A theory and dynamic model of dyadic interaction:  if f2 = 0, f1 > 0, (n 1)(f1 1) + 2 Concerns, appraisals, and contagiousness in a devel-  − − opmental context. Developmental Review, 27:1–40. 1 if f1 = f2 = 0,   Sabine Stoll, Bathasar Bickel, Elena Lieven, Netra P. with f1 and f2 being the number of word types that Paudyal, Goma Banjade, Toya N. Bhatta, Mar- were encountered exactly once or twice respec- tin Gaenszle, Judith Pettigrew, Ichchha Purna Rai, Manoj Rai, and Novel Kishore Rai. 2012. Nouns tively (i.e., the numbers of hapax legomena and and verbs in Chintang: Children’s usage and sur- dis legomena). This estimator is demonstrated to rounding adult speech. Journal of Child Language, be accurate and unbiased for word frequency dis- 39:284–321. tributions (Moscoso del Prado Martín, in press). George Sugihara, Robert May, Hao Ye, Chih-hao Hsieh, Ethan R. Deyle, Michael Fogarty, and Stephan Munch. 2012. Detecting causality in com- plex ecosystems. Science, 338:496–500.

81