Cognitive Modeling Paradigms

P1: JZP CUFX212-02 CUFX212-Sun 978 0 521 85741 3 November 15, 2007 13:46 Part II COGNITIVE MODELING PARADIGMS The chapters in Part II introduce the reader to broadly influential and foundational ap- proaches to computational cognitive modeling. Each of these chapters describes in detail one particular approach and provides examples of its use in computational cognitive modeling. 21 P1: JZP CUFX212-02 CUFX212-Sun 978 0 521 85741 3 November 15, 2007 13:46 22 P1: JZP CUFX212-02 CUFX212-Sun 978 0 521 85741 3 November 15, 2007 13:46 CHAPTER 2 Connectionist Models of Cognition Michael S. C. Thomas and James L. McClelland 1. Introduction cial neural network (ANN) or parallel distributed processing (PDP) models, connec- In this chapter, computer models of cogni- tionism has been applied to a diverse range tion that have focused on the use of neural of cognitive abilities, including models of networks are reviewed. These architectures memory, attention, perception, action, lan- were inspired by research into how com- guage, concept formation, and reasoning putation works in the brain and subsequent (see, e.g., Houghton, 2005). Although many work has produced models of cognition with of these models seek to capture adult func- a distinctive flavor. Processing is character- tion, connectionism places an emphasis on ized by patterns of activation across sim- learning internal representations. This has ple processing units connected together into led to an increasing focus on developmental complex networks. Knowledge is stored in phenomena and the origins of knowledge. the strength of the connections between Although, at its heart, connectionism com- units. It is for this reason that this approach prises a set of computational formalisms, to understanding cognition has gained the it has spurred vigorous theoretical debate name of connectionism. regarding the nature of cognition. Some theorists have reacted by dismissing connectionism as mere implementation of preex- 2. Background isting verbal theories of cognition, whereas others have viewed it as a candidate to re- Over the last twenty years, connection- place the Classical Computational Theory ist modeling has formed an influential ap- of Mind and as carrying profound impli- proach to the computational study of cog- cations for the way human knowledge is nition. It is distinguished by its appeal to acquired and represented; still others have principles of neural computation to inspire viewed connectionism as a subclass of statis- the primitives that are included in its cog- tical models involved in universal function nitive level models. Also known as artifi- approximation and data clustering. 23 P1: JZP CUFX212-02 CUFX212-Sun 978 0 521 85741 3 November 15, 2007 13:46 24 thomas and mcclelland This chapter begins by placing connec- Culloch and Pitts (1943) who characterized tionism in its historical context, leading up the function of simple networks of binary to its formalization in Rumelhart and Mc- threshold neurons in terms of logical op- Clelland’s two-volume Parallel Distributed erations. In his 1949 book The Organiza- Processing (1986), written in combination tion of Behavior, Donald Hebb proposed a with members of the Parallel Distributed cell assembly theory of cognition, including Processing Research Group. Then, three im- the idea that specific synaptic changes might portant early models that illustrate some underlie psychological principles of learn- of the key properties of connectionist sys- ing. A decade later, Rosenblatt (1958, 1962) tems are discussed, as well as how the novel formulated a learning rule for two-layered theoretical contributions of these models neural networks, demonstrating mathemat- arose from their key computational prop- ically that the perceptron convergence rule erties. These three models are the Interac- could adjust the weights connecting an in- tive Activation model of letter recognition put layer and an output layer of simple (McClelland & Rumelhart, 1981; Rumel- neurons to allow the network to associate hart and McClelland, 1982), Rumelhart and arbitrary binary patterns. With this rule, McClelland’s (1986) model of the acquisi- learning converged on the set of connection tion of the English past tense, and Elman’s values necessary to acquire any two-layer- (1991) simple recurrent network for finding computable function relating a set of input- structure in time. Finally, the chapter con- output patterns. Unfortunately, Minsky and siders how twenty-five years of connection- Papert (1969, 1988) demonstrated that the ist modeling has influenced wider theories set of two-layer computable functions was of cognition. somewhat limited – that is, these simple artificial neural networks were not particu- larly powerful devices. While more compu- 2.1. Historical Context tationally powerful networks could be de- Connectionist models draw inspiration from scribed, there was no algorithm to learn the notion that the information-processing the connection weights of these systems. properties of neural systems should influ- Such networks required the postulation of ence our theories of cognition. The possible additional internal, or “hidden,” processing role of neurons in generating the mind units, which could adopt intermediate rep- was first considered not long after the ex- resentational states in the mapping between istence of the nerve cell was accepted input and output patterns. An algorithm in the latter half of the nineteenth cen- (backpropagation) able to learn these states tury (Aizawa, 2004). Early neural net- was discovered independently several times. work theorizing can therefore be found in A key paper by Rumelhart, Hinton, and some of the associationist theories of men- Williams (1986) demonstrated the useful- tal processes prevalent at the time (e.g., ness of networks trained using backpropaga- Freud, 1895; James, 1890; Meynert, 1884; tion for addressing key computational and Spencer, 1872). However, this line of theo- cognitive challenges facing neural networks. rizing was quelled when Lashley presented In the 1970s, serial processing and the data appearing to show that the perfor- Von Neumann computer metaphor domi- mance of the brain degraded gracefully de- nated cognitive psychology. Nevertheless, a pending only on the quantity of damage. number of researchers continued to work This argued against the specific involvement on the computational properties of neural of neurons in particular cognitive processes systems. Some of the key themes identi- (see, e.g., Lashley, 1929). fied by these researchers included the role In the 1930s and 1940s, there was a of competition in processing and learning resurgence of interest in using mathemati- (e.g., Grossberg, 1976; Kohonen, 1984), cal techniques to characterize the behavior the properties of distributed representa- of networks of nerve cells (e.g., Rashevksy, tions (e.g., Anderson, 1977; Hinton & 1935). This culminated in the work of Mc- Anderson, 1981), and the possibility of P1: JZP CUFX212-02 CUFX212-Sun 978 0 521 85741 3 November 15, 2007 13:46 connectionist models of cognition 25 Simple Neural (Theoretical) Networks Logogen/ (logical operations) Pandemonium 2-Layer Feedforward (Hand-wired) Networks (trained with Constraint Satisfaction delta rule) – Perceptron Networks (e.g., IA, IAC – Jets & Sharks, Necker Cube, Stereopsis) Competitive Networks unsupervised learning (e.g., Kohonen, Grossberg) Boltzmann 3-Layer Feedforward Machine Networks (trained with (simulated backpropagation algorithm) Hopfield Nets annealing (binary) metaphor) Cascade correlation Recurrent Networks (Fahlman & Lebiere) Cascade Rule (e.g., Stroop model) Jordan Elman Attractor Networks Networks (SRN) Networks Figure 2.1. A simplified schematic showing the historical evolution of neural network architectures. Simple binary networks (McCulloch & Pitts, 1943) are followed by two-layer feedforward networks (perceptrons; Rosenblatt, 1958). Three subtypes then emerge: three-layer feedforward networks (Rumelhart & McClelland, 1986), competitive or self-organizing networks (e.g., Grossberg, 1976; Kohonen, 1984), and interactive networks (Hopfield, 1982; Hinton & Sejnowksi, 1986). Adaptive interactive networks have precursors in detector theories of perception (Logogen: Morton, 1969; Pandemonium: Selfridge, 1959) and in handwired interactive models (interactive activation: McClelland & Rumelhart, 1981; interactive activation and competition: McClelland, 1981; Stereopsis: Marr & Poggio, 1976; Necker cube: Feldman, 1981, Rumelhart et al., 1986). Feedforward pattern associators have produced multiple subtypes: for capturing temporally extended activation states, cascade networks in which states monotonically asymptote (e.g., Cohen, Dunbar, & McClelland, 1990), and attractor networks in which states cycle into stable configurations (e.g., Plaut & McClelland, 1993); for processing sequential information, recurrent networks (Jordan, 1986; Elman, 1991); and for systems that alter their structure as part of learning, constructivist networks (e.g., cascade correlation: Fahlman & Lebiere, 1990; Shultz, 2003). SRN = simple recurrent network. content addressable memory in networks of connectionism can be found in Rumel- with attractor states, formalized using the hart and McClelland (1986, Chapter 1), mathematics of statistical physics (Hopfield, Bechtel and Abrahamsen (1991), McLeod, 1982). A fuller characterization of the many Plunkett, and Rolls (1998), and O’Reilly historical influences in the development and Munakata (2000). Figure 2.1 depicts

Load more