Evidence from Dynamical Systems Modeling

Explicit Causal Connections between the Acquisition of Linguistic Tiers: Evidence from Dynamical Systems Modeling Daniel Spokoyny and Jeremy Irvin Fermín Moscoso del Prado Martín College of Creative Studies Department of Linguistics University of California, Santa Barbara University of California, Santa Barbara Santa Barbara, CA 93106, USA Santa Barbara, CA 93106, USA [email protected] [email protected] [email protected] Abstract Tomasello, 1992; Tomasello, 2005) propose that there is a gradual increase in the generality of the In this study, we model the causal links be- structures learned by the child, which are slowly tween the complexities of different macro- acquired through distributional learning. Such a scopic aspects of child language. We con- picture is strongly supported by the remarkably lit- sider pairs of sequences of measurements tle creativity exhibited by children, most of whose of the quantity and diversity of the lexi- utterances are often literal repetitions of those that cal and grammatical properties. Each pair they have previously heard (Lieven et al., 1997; of sequences is taken as the trajectory of a Pine and Lieven, 1993), with little or no general- high-dimensional dynamical system, some ization in the early stages. It appears as though of whose dimensions are unknown. We children progressively and conservatively increase use Multispatial Convergent Cross Map- the level at which they generalize linguistic con- ping to ascertain the directions of causal- structions, building from the words upwards, in ity between the pairs of sequences. Our what some have termed ‘lexically-based positional results provide support for the hypothesis analysis’ (Lieven et al., 1997). that children learn grammar through dis- The Theory of Dynamical Systems offers pow- tributional learning, where the generaliza- erful tools for modeling human development (e.g, tion of smaller structures enables general- Smith and Thelen, 2003; van Geert, 1991). It pro- izations at higher levels, consistent with vides a mathematical framework for implementing the proposals of construction-based ap- the principle that development involves the mutual proaches to language. and continuous interaction of multiple levels of the developing system, which simultaneously un- 1 Introduction fold over many time-scales. Typically, a dynam- A crucial question in language acquisition con- ical system is described by a system of coupled cerns how (or, according to some, whether) chil- differential equations governing the temporal evo- dren learn the grammars of their native languages. lution of multiple parts of the system and their in- Some researchers, mainly coming from the gener- terrelations. One difficulty that arises when trying ative tradition, argue that, although the grammati- to model a dynamical system as complex as the cal rules are possibly ‘innate’ (e.g., Pinker, 1994), development of language is that many factors that children still need to learn how to map the dif- are important for the evolution of the system might ferent semantic/grammatical roles onto the differ- not be available or might not be easily measurable ent options offered by Universal Grammar (e.g., or –even worse– there are additional variables rel- ‘parameter-setting’). The evidence, however, does evant for the system of which the modeler is not not seem to support this hypothesis. For instance, even aware. In this respect, a crucial development Bowerman (1990) notes that the type of seman- was the discovery that, in a deterministic coupled tic aspects learned by the child do not match well dynamical system –even in the presence of noise– into the prototypical roles that would be required the dynamics of the whole system can be satisfac- to map into hard linguistic rules (e.g., learning an torily recovered using measurements of a single of AGENT category to map onto the SUBJECT syntac- the system’s variables (Takens’ Embedding Theo- tic role). Other researchers (e.g., Goldberg, 2003; rem; Takens, 1981). 73 Proceedings of the 7th Workshop on Cognitive Aspects of Computational Language Learning, pages 73–81, Berlin, Germany, August 11, 2016. c 2016 Association for Computational Linguistics The finding above opens an interesting avenue that is valid for non-separable systems, is capa- for understanding the processes involved in lan- ble of identifying weakly coupled variables even guage acquisition. In the same way that systems in the presence of noise, and –crucially– can dis- of differential equations can be used to model the tinguish direct causal relations between variables evolution of ecosystems (e.g., predator-prey sys- from effects of shared driving variables (i.e., in tems), one could take measurements of the de- possibility (d) above, CCM would not find causal- tailed properties of child language, and build a ity). detailed system of equations capturing the macroscopic dynamics of the process. However, in order to achieve this, it is necessary to ascertain the ways in which different measured variables in the system affect each other. This problem goes beyond estimating correlations (as could be obtained, for instance, using regression models), as one needs to detect asymmetrical causal relations between the variables of interest, so that these causal influences can be incorporated into the models. In this study, we investigate the causal relations different macroscopic-level measures characteriz- ing the level of development of different tiers child Figure 1: Reconstructed manifold for Lorenz’s language (i.e., number of words produced, lexical system (M; top), as well as the shadow manifolds diversity, inflectional diversity and mean length of reconstructed considering only X (MX ; bottom- utterances), using the longitudinal data provided left) and Y (MY ; bottom-right) (reprinted with in the Manchester Corpus (Theakston et al., 2001). permission from Sugihara et al., 2012). In order to detect causal relations between the dif- For instance, consider E. Lorenz’s often stud- ferent measures, we make use of state space re- ied dynamical system, which includes three cou- construction relying on Takens (1981)’s Embed- pled variables X(t), Y (t), and Z(t) whose co- ding Theorem, and recently developed techniques evolution is described by the system of differential for assessing the strength of causal relations in dy- equations namical systems (Multispatial Convergent Cross Mapping; Clark et al., 2015). Here, we provide dX = σ(Y X) a detailed picture of the causal connections be- dt − tween the development of different aspects of a dY = X(ρ Z) Y . (1) child’s language (while acquiring English). Our dt − − result provide support for theories that advocate dZ distributional learning of linguistic constructions = XY βZ dt − by gradual generalizations from the level of words to larger scale constructions. The first equation in this system indicates that there is a relation by which Y causes X, as the 2 Causality Detection in Dynamical change in X (i.e., its future value) depends on Systems the value of Y (i.e., the future of X depends on the past of Y even after the past of X itself has Whenever two variables are correlated, there must been considered), a causal relation whose strength exist some causal link between them. Namely, if is indexed by parameter σ. The manifold defined variables A and B are found to be correlated, then by these three variables (Lorenz’s famous strange one of four possibilities must be true: (a) A causes attractor), which we can denote by M, is plot- B, (b) B causes A, (c) A and B form a feedback ted in the top of Fig. 1. In many circumstances, loop, each causing the other, or (d) there is a third however, not all variables of the system are avail- variable C causing both A and B. For studying the able (some might be difficult to measure, or we interactions of species within ecosystems, Sugi- might not even be aware of their relevance). It hara et al. (2012) introduced Convergent Cross is at this point that Takens (1981)’s Embedding Mapping (CCM), a causality-detection technique Theorem comes into play. Informally speaking, 74 the theorem states that the properties of a coupled vergence then becomes the necessary condition dynamical system’s attractor can be recovered us- for inferring causation. Using both artificial sys- ing only measurements from a single one of its tems and ecological time-series with known dy- variables. This is achieved by considering mul- namics, Sugihara and his colleagues demonstrated tiple versions of the same variable lagged in time, that this technique successfully recovers true di- that is, instead of plotting (X[t],Y [t],Z[t]), when rectional causal relations when these are present, only measurements of X are available, we can plot and –crucially– is able to discard spurious causa- (X[t],X[t + τ],...,X[t + (E 1)τ]). These re- tion in the case when both variables are causally − constructed manifolds are termed “shadow” man- driven by a third, unknown, variable, but there is ifolds. MX denotes the shadow manifold of M no true direct causation between them. reconstructed on the basis of X alone. There are An inconvenience of CCM, and in general of well-studied techniques for finding the appropri- techniques that rely on manifold reconstruction, ate values for the parameters for the lag τ and is that they generally require that relatively long the number of dimensions E (c.f., Abarbanel et time-series of the behavior of the system are avail- al., 1993) so that the properties of the original able. Such long series are, however, very diffi- manifold M are recovered by the shadow mani- cult, if not impossible, to obtain in many fields, fold MX . Fig. 1 illustrates this point by plotting including of course language acquisition. One can the shadow manifolds MX (bottom-left) and MY however obtain multiple short time series from dif- (bottom-right) for the Lorenz system.

Evidence from Dynamical Systems Modeling

Robust Learning of Chaotic Attractors

Writing the History of Dynamical Systems and Chaos

Causal Inference for Process Understanding in Earth Sciences

Fundamental Theorems in Mathematics

Arxiv:1812.05143V1 [Math.AT] 28 Nov 2018 While Studying a Simpliﬁed Model (1) for Weather Forecasting [18]

A Probabilistic Takens Theorem

Turbulence DAVID RUELLE and FLORIS TAKENS* I.H.E.S., Bures-Sur-Yvette, France

From Chaos to Early Warnings NAW 5/2 Nr

Finite Approximation of Sinai-Bowen-Ruelle Measures For

On Floris Takens and Our Joint Mathematical Work

The Development of Nonlinear Science£

A Tribute to Dr. Edward Norton Lorenz