1 Toward a New Grand Theory of Development? Connectionism
Total Page:16
File Type:pdf, Size:1020Kb
Toward a New Grand Theory of Development? Connectionism and Dynamic Systems Theory Re-Considered J.P. Spencer, M. S. C. Thomas, & J. McClelland (Editors). To be published by Oxford University Press. Contributors and Book Structure Part I: Overview Introduction: Spencer, Thomas & McClelland Chapter 1: Jay McClelland—overview of connectionist approaches [Note: Rough draft only] Chapter 2: Gregor Schöner—overview of dynamic systems approaches Part II: Case Studies Chapter 3: Karl Newell, Liu & Mayer-Kress Chapter 4: Whit Tabor Chapter 5: John Spencer, Jeff Johnson & John Lipinski [Note: Outline only] Chapter 6: Matthew Schlesinger Chapter 7: Yuko Munakata & Bruce Morton Chapter 8: Linda Smith [Note: Abstract only] Chapter 9: Daniela Corbetta Chapter 10: Han van der Maas & Maartje Raijmakers Chapter 11: Denis Mareschal, Robert Leech & Rick Cooper Chapter 12: Bob McMurray, Jessica Horst & Larissa Samuelson Part III: Reactions from the “outside” Chapter 13: Guy Van Orden & Heidi Kloos Chapter 14: Lisa Oakes, Nora Newcombe & Jodie Plumert Chapter 15: Tim Johnston & Bob Lickliter Part IV: Reactions from the “inside” Chapter 16: Jay McClelland Chapter 17: Gregor Schöner Chapter 18: Kim Plunkett Chapter 19: Michael Thomas & Fiona Richardson Chapter 20: Paul van Geert & Kurt Fischer Conclusion: Thomas, Spencer & McClelland 1 McClelland and Vallabha 7/23/2006 Page 1 Connectionist Models of Development: Mechanistic Dynamical Models with Emergent Dynamical Properties James L. McClelland and Gautam Vallabha The symbolic paradigm of cognitive modeling, championed by Minsky and Papert, Newell and Simon, and other pioneers of the 1950’s and 1960’s, remains very much alive and well today. Yet an alternative paradigm, first championed in the 1950’s by Rosenblatt, in which cognitive processes are viewed as emergent functions of a complex stochastic dynamical system, has continued to have adherents. A cooling of interest in such approaches in the 1960’s did not deter Grossberg (1973, 1976) or James Anderson (1973), and the approach emerged again full force in the 1980’s. By the end of that decade, many others had begun to explore the implications of this paradigm for development (McClelland, 1989; Plunkett and Marchman, 19??). A parallel and closely related movement, emerging from the physical sciences, also began to gain adherents during the same decade (Schoner & Kelso, 1988) and began to attract the interest of developmental psychologists in the early 1990s (Thelen and Smith, 19??). The present chapter represents our attempt to underscore the common ground between connectionist and dynamical systems approaches. Central to both is the emergent nature of system-level behavior and changes to such behavior through development. Essential to this exploration of emergentism is a framework that allows consideration of the dynamics of state variables over time. Mechanistic vs Emergent dynamics In what follows, we will distinguish between mechanistic and emergent dynamics. This distinction parallels a distinction offered in their seminal article on dynamical systems by Schoner and Kelso (1988; Schoner, this volume). Importantly, for both connectionist models and the models of the type investigated by Schoner and Kelso (1988), the dynamical systems involved can be described at two different levels: A macroscopic or emergent level, and a microscopic of mechanistic level. This differentiates both McClelland and Vallabha 7/23/2006 Page 2 approaches from some others in which the focus has been restricted to a descriptive characterization on the macroscopic dynamics. For connectionist models, the state evolutions at the emergent, macroscopic, or systems level are determined by several things, among them the mechanistic microdynamics specified by the modeler and used in simulations. Other crucial factors are the network architecture and the ensemble of experiences on which the network is trained. These state evolutions exhibit emergent properties congruent with those postulated by Dynamical Systems Theory (DST), as articulated by Schoner and Kelso (1988, Schoner, this volume). To give a few examples: The cascade model (McClelland, 1979) describes the time-evolution of activation across the units in a linear, feed-forward network in terms of a simple differential equation: Δai/dt = λ(ai – neti) In words, the change in activation of a unit is driven by the difference between its current activation and the net input it receives from other units. The purpose of the model was to explore the consequences for overall system behavior of changes in parameters of these equations at the mechanistic or microstructural level. The analysis reveal, for example, that changes in the rate constants at two different layers in a cascaded, feedforward network had additive effects on reaction time at the systems level. More complex equations are often used in other models (e.g., Grossberg, 1976) Δai/dt = αE(M-ai) + γI(ai-m) + γ(ai-r) Grossberg’s formulation separates the excitatory input E and the inhibitory input I, and lets each drive the activation up or down by an amount proportional to the distance from the current activation to the maximum M or minimum m; also operating is a restoring force driving the activation of units back toward rest r; each force has its own strength parameter (α, β, and γ). A similar function was used in the interactive activation model (McClelland and Rumelhart, 1981). In his seminal work in the 1970’s Grossberg examined in detail the McClelland and Vallabha 7/23/2006 Page 3 conditions under which these microdynamic equations gave rise to stable emergent behaviors such as attractors and various other important emergent structures (Grossberg, 1978). But these equations simple express the dynamics of a connectionist network as it responds to an input in real time. Typically, such equations are used to model activation processes that are assumed to be complete in under a second, such as, for example, the process involved in settling on a perceptual interpretation of a visual input, such as a line drawing or a word. But hidden in the equations, as presented thus far, are other dynamical variables that really form the heart of these models: their connections. For example, the net input in our first equation above represents the sum, over all connections projecting to a given receiving unit r, of the activation of the unit sending the projection, s, times the weight to r from s: netr = Σs wrs as And similarly the E and I in Grossberg’s equation are essentially the sum of the excitatory inputs (mediated by excitatory connections) and inhibitory inputs (mediated my inhibitory connections). While some units receive direct stimulation from the outside world, most units receive their input from other units, and thus, what determines their activation is the the values of their incoming connection weights. According to the connectionist approach to cognitive development, it is largely the time evolution of the strengths of the connection weights that governs the process of change over developmental time. Connectionist ‘learning rules’ are essentially differential equations that describe the changes of these connections. For example Linsker suggested that the connection weight to a receiving neuron r from a sending neuron s might obey a Hebbian-type learning rule: dwrs/dt = ε(ar – Θr) (as – Θs) – βwrs Here ar and as are the activations of the receiving and sending units, β is a constant regulating a tendency for weights to decay toward 0, ε is the learning rate, and Θr and Θs McClelland and Vallabha 7/23/2006 Page 4 are critical values that may themselves be adaptive (another word for ‘governed by mechanistic differential equations indicating how the variables change as a function of their own or other variables’). As with activation functions, many variations have been proposed for connectionist learning rules. What is crucial for the moment is that the time- evolution of the connections in the system --- the parameters which essentially determine both the outcome and the time-course of processing --- are themselves defined in terms of differential equations. The emergent results of these connection dynamics are emergent cognitive structures such as feature detectors, phonetic and other types of categories, and behavioral competencies, as well as emergent dynamical trajectories of change in such properties. Within PDP approaches to development, the mechanistic and emergent dynamical properties of connectionist models are relevant at two time scales. On a short time scale, connection weights are usually treated as fixed, contributing along with the architecture of the system and the specified mechanistic dynamics of the activation process to the emergent time-course and outcome of a single act of information processing and/or behavior, such as reaching to one of two locations in which an object may have been hidden by an experimenter. On a longer time scale, connection dynamics within the same system are relevant to the gradual change in the way the system responds to inputs over developmental time, usually measured in months or years. Many connectionist models simplify the activation dynamics in ways that may be misleading about the underlying theory. For example, in most connectionist models of past-tense verb inflection or single word reading, an input pattern is propagated forward through one or more layers of connection weights (and 0 or more hidden layers of units) to an output layer. This output represents the asymptotic state that would be achieved in an identical cascaded feedforward network or even a recurrent network (see Plaut et al, 1996, for an explicit comparison of one-pass feedforward and recurrent versions of the a network for single- word reading). Likewise, while intrinsic variability is assumed always to be at work in processing (McClelland, 1991, 1993), it is often left out of specific models for simplicity. A similar approach is often taken in simple recurrent networks (Elman, 1990), which can be viewed as chunking time into relatively large-scale, discrete units, and setting the state for the next discrete unit based on the state of the one before.