, Confusion, and

MICHAEL R.W. DAWSON and KEVIN S. SHAMANSKI

Biological Computation Project, University of Alberta

CONTENTS Page Synopsis 216 Introduction 216 PDP Networks and the Tri-Level Hypothesis 218 Computational Descriptions of PDP Networks 221 PDP Networks Are Powerful Systems 221 How Are Claims About PDP Competence Related to Cognitive Science? 225 Computational Connectionism and Cognitive Science 228 Algorithmic Descriptions of PDP Networks 229 PDP Networks Are Themselves Algorithms 229 Problems With PDP Algorithms 230 Algorithmic Connectionism and Cognitive Science 235 Implementational Descriptions of PDP Networks 238 PDP Networks, Biological Plausibility, and Computational Relevance 238 Problems Arise Because of Biologically Undefended Design Decisions 241 Implementational Connectionism and Cognitive Science 249 From Connectionism to Cognitive Science 250 Acknowledgements 252 References 253

215 Volume 4, Nos. 3-4, 1994 Connectionism, Confusion and Cognitive Science

SYNOPSIS

This paper argues that while connectionist technology may be flourishing, connectionist cognitive science is languishing. Parallel distributed processing (PDP) networks can be proven to be computationally powerful, but these proofs offer few useful constraints for developing models in cognitive science. Connectionist algorithms — that is, the PDP networks themselves — can exhibit interesting behaviours, but are difficult to interpret and are based upon an incomplete functional architecture. While PDP networks are often touted as being more "biologically plausible than classical AI models, they do not appear to have been widely endorsed by neurophysiologists because they incorporate many implausible assumptions, and they may not model appropriate physiological processes. In our view, connectionism faces such problems because the design decisions governing current connectionist theory are determined by engineering needs — generating the appropriate output — and not by cognitive or neurophysiological considerations. As a result, the true nature of connectionist theories, and their potential contribution to cognitive science, is unclear. We propose that the current confusion surrounding connectionism's role in cognitive science could be greatly alleviated by adopting a research programme in which connectionists paid much more attention to validating the PDP architecture.

Key Words

Parallel distributed processing, connectionism, cognitive science

INTRODUCTION

Connectionists appear to be making great advances in the technology of knowledge engineering, and now feel poised to answer the difficult questions about machine intelligence that seem to have passed classical by. Parallel distributed processing (PDP) models have been developed for a diverse range of phenomena, as a survey of almost any journal related to cognitive science will show. For example, in recent years

216 M.R. W. Dawson and K.S. Shamanski Journal of Intelligent Systems

Psychological Review has published connectionist models concerned with aspects of reading (Hinton & Shallice, 1991; Seidenberg & McClelland, 1989), classical learning theory (Kehoe, 1988), automatic processing (Cohen, Dunbar, & McClelland, 1991), sentence production (Dell, 1986), apparent Motion (Dawson, 1991), and dreaming (Antrobus, 1991). In addition, many basic connectionist ideas are being directly implemented in hardware (e.g., Jabri & Flower, 1991) under the assumption that increases in computer power and speed require radical new parallel architectures (e.g., Hillis, 1985; Müller & Reinhardt, 1990, p. 17). "The neural network revolution has happened. We are living in the aftermath" (Hanson & Olson, 1991, p. 332).

While connectionist technology may indeed be flourishing, connectionist cognitive science is languishing. PDP networks may generate interesting behaviour, but it is not clear that they do so by emulating the fundamental nature of human cognitive processes. In our view, the current design decisions governing connectionist theory are determined by engineering needs — generating the appropriate output — and not by cognitive or neurophysiological considerations.

The goal of this paper is to illustrate the gap between connectionist technology and connectionist cognitive science. Connectionist networks can be proven to be computationally powerful, but these proofs offer no meaningful constraints for designing cognitive models. Connectionist algorithms — that is, the PDP networks themselves — can exhibit interesting behaviours, but are difficult to interpret and are based upon an insufficient functional architecture. While PDP networks are often touted as being more "biologically plausible" than classical or symbolic artificial intelligence (AI) models, they do not appear to have been widely endorsed by neurophysiologists because they incorporate many implausible and unjustified assumptions, and they may not model appropriate physiological processes.

217 Volume 4, Nos. 3-4, 1994 Connectionism, Confusion and Cognitive Science

PDP NETWORKS AND THE TRI-LEVEL HYPOTHESIS

The explosion of interest in connectionist systems over the past decade has been accompanied by the development of diverse architectures (for overviews, see Cowan & Sharp, 1988; Hecht-Neilson, 1990; Müller & Reinhardt, 1990). These have ranged from simulations designed to mimic (with varying detail) specific neural circuits (e.g., Granger, Ambros- Ingerson & Lynch, 1989; Grossberg, 1991; Grossberg & Rudd, 1989, 1992; Lynch, Granger, Larson & Baudry, 1989) to new computer designs that have little to do with human cognition, but which use parallel processing to solve problems that are ill-posed or that require simultaneous satisfaction of multiple constraints (e.g., Hillis, 1985; Abu-Mostafa & Psaltis, 1987). Given this diversity, it is important at the outset to identify the branch of connectionism with which we are particularly concerned. This paper examines the characteristics of what has been called generic connectionism (e.g., Anderson & Rosenfeld, 1988, p. xv), because it appears to have had the most impact on cognitive science.

A detailed description of the generic connectionist architecture is provided by Rumelhart, Hinton and McClelland (1986). PDP models are defined as networks of simple, interconnected processing units (see Figure la). Each processing unit in such a network operates in parallel, and is characterized by three components: a net input function which defines the total signal to the unit, an activation function which specifies the unit's current "numerical state", and an output function which defines the signal sent by the unit to others. Such signals are sent through connections between processing units, which serve as communication channels that transfer weighted numeric signals from one unit to another. Connection strengths can be modified by applying a , which serves to teach a network how to perform some desired task. For instance, the generalized delta rule (Rumelhart, Hinton & Williams, 1986a, 1986b) computes an error signal using the difference between the observed and desired responses of the network. This error signal is then "propagated backwards" through the network, and used to change connection weights, so that the network's performance will improve.

218 M.R. W. Dawson and K.S. Shamanski Journal of Intelligent Systems

c

Fig. 1: Components of the generic connectionist architecture. (A) A typical multilayer network of processing units. It is trained to generate a particular response to patterns that are presented to the input units. Hidden units can be viewed as feature detectors. (B) The three basic operations of a single processing unit. See the text for details. (C) Processing units communicate through a connection that has the numerical weight w-; learning rules are used to modify weight values.

219 Volume 4, Nos. 3-4, 1994 Connectionism, Confusion and Cognitive Science

Classical artificial intelligence (AI) systems embody the notion that information processing is the serial manipulation of physical symbols that represent semantic content. Defined in this fashion, classical systems need to be described at three different levels of analysis — computational, algorithmic, and implementational — if they are to be completely understood (e.g., Marr, 1982, Chap. 1; Pylyshyn, 1984). Models created from the generic connectionist architecture are usually described (and sometimes criticized) as being radically different from classical models (e.g., Broadbent, 1985; Churchland & Sejnowski, 1989; Clark, 1989; Fodor & Pylyshyn, 1988; Hawthorne, 1989; Hecht-Nielsen, 1990; McClelland, Rumelhart & Hinton, 1986; Rumelhart, Smolensky, McClelland & Hinton, 1986; Schneider, 1987; Smolensky, 1988). One issue that arises from the view that connectionism offers a "paradigm shift" to AI researchers is whether the so-called tri-level hypothesis also applies to PDP models.

For instance, some arguments indicate that PDP systems are implementational models: "In our view, people are smarter than today's computers because the brain employs a basic architecture that is more suited to deal with a central aspect of the natural information processing tasks that people are so good at" (McClelland, Rumelhart & Hinton, 1986, p. 1). However, when connectionism is described (and, in particular, criticized) as being merely implementational (e.g., Broadbent, 1985; Fodor & Pylyshyn, 1988), connectionists beg to differ. "Our primary concern is with the computations themselves, rather than the detailed neural implementation of these computations" (Rumelhart & McClelland, 1986, p. 138). Should connectionism thus be viewed as being primarily concerned with issues of competence? Not necessarily — it has been claimed that PDP models are essentially procedural or algorithmic (e.g. Rumelhart & McClelland, 1985). At first glance, this diversity of positions gives connectionism an uncertain — if not downright mysterious — position in mainstream cognitive science.

However, on closer consideration, this diversity is exactly what one should expect. This is because while connectionist researchers have proposed a type of information processing that is quite different from that

220 M.K. W. Dawson and K.S. Shamanski Journal of Intelligent Systems proposed by classicists, they have not abandoned the general cognitivist notion that intelligence is information processing (see Fodor & Pylyshyn, 1988, pp. 7-11). As a result, connectionist models must also be considered at each of the computational, algorithmic, and implementational levels. The citations in the preceding paragraph merely illustrate this possibility. In the following sections, we elaborate this perspective on connectionism by examining PDP networks from the tri-level perspective in order to ascertain the relationship between connectionist theory and cognitive science. First, we consider the computational power of these networks. Second, we consider the types of algorithms or procedures that these networks define. Third, we consider the relationship between the PDP architecture and neurophysiology. We show, at each level of description, that while current PDP models have some intriguing properties, their potential contribution to cognitive science is uncertain.

COMPUTATIONAL DESCRIPTIONS OF PDP NETWORKS

PDP Networks Are Powerful Information Processing Systems

A computational description of an information processor accounts for the system's competence — it defines the kinds of functions that a system can compute. In cognitive science, descriptions of this sort are generally used to fulfil two different purposes. First, these accounts can be used to rigorously define information processing problems as the first step in a to- down research programme that has as its ultimate goal the creation of a working computer model (e.g., Marr, 1982). Second, computational analyses can be used to assess the potential adequacy of a general class of model, by determing whether the kinds of functions it can compute are sufficiently rich to capture interesting cognitive regularities. Computational analysis of connectionist systems have focused on this second purpose.

Many researchers have argued that, in principle, PDP networks are extremely powerful information processing systems. Below we briefly

221 Volume 4, Nos. 3-4, 1994 Connectionism, Confusion and Cognitive Science review the evidence for three different claims of this sort: that PDP networks are functionally equivalent to Universal Turing machines, that PDP networks are arbitrary pattern classifiers, and that PDP networks are universal function approximators.

Connectionist networks are equivalent to Universal Turing Machines. Even a cursory look at connectionist theory indicates that it is very similar to classical associationism (e.g., Bechtel, 1985; Bechtel & Abrahamsen, 1991, pp. 101-103). However, this resemblance is also disconcerting. It can be strongly argued that associationist models are formally equivalent to finite state automata, and as a result are not powerful enough in principle to instantiate human cognition (e.g., Bever, Fodor & Garrett, 1968). This is why classical AI attempts to design models that are equivalent to Universal Turing Machines (UTMs). If connectionist systems were equivalent to classical associationist models, then their limited computational power would make them extremely unattractive to cognitive science (see also Fodor & Pylyshyn, 1988; Lachter & Bever, 1988).

However, strong arguments have been made that PDP models have the same competence as classical AI systems. In some of the earliest work on neural networks, McCulloch and Pitts (1943/1988) examined finite networks whose components could perform simple logical operations like AND, OR, and NOT. They were able to prove that such systems could compute any function that required a finite number of these operations. From this perspective, the network was only a finite state automaton (see also Hopcrofit & Ullman, 1979, p. 47; Minsky, 1972, Chap. 3). However, McCulloch and Pitts went on to show that a UTM could be constructed from such a network, by providing the network a means to move along, sense, and rewrite an external "tape" or memory. "To , however defined, specification of the net would contribute all that could be achieved in that field" (McCulloch & Pitts, 1943/1988, p. 25).

Connectionist networks are arbitrary pattern classifiers. Connectionist networks are also commonly used to classify patterns (for reviews, see Carpenter, 1989; Lippmann, 1987, 1989). Essentially, the set

222 M.R. W. Dawson and K.S. Shamanski Journal of Intelligent Systems of input activities for a particular stimulus defines the location of a point in a multidimensional pattern space. The network "carves" this pattern space into different decision regions, which potentially can have complex shapes. The network classifies the input pattern by generating the "name" (i.e., a unique pattern of output unit activity) of the decision region in which the stimulus point is located.

When a network is described as a pattern classifier, claims about computational power focus on the decision regions that it can "carve", because this defines the complexity of the classifications that can be performed. For example, a network with a monotonic activation function in its output unit, and no intermediate processors, has only the limited abilit6y to decide whether an input belongs to one of two distinct categories: this network can only "carve" a single hyperplane through the multidimensional pattern space. Stimuli located to one side of the hyperplane are assigned one category label, and stimuli located to the other side are assigned the second category label (see Figure 2). Such systems cannot learn the simple XOR relationship, because it requires a more sophisticated partitioning of the pattern space — specifically, two parallel hyperplanes. This more complicated partitioning can be accomplished either by adding a single layer of hidden processors (e.g., Rumelhart, Hinton & Williams, 1986b, pp. 319-320) or by using a nonmonotonic activation function (see Figure 2c).

How many additional layers of processors are required to partition the pattern space into arbitrary decision regions, and thus make a network capable of any desired classification? Lippmann (1987, p. 16), by considering the shape of decision regions created by each additional layer of monotonic processors, has shown that a network with only two layers of hidden units (i.e., a three-layer ) is capable of "carving" a pattern space into arbitrary decision regions. "No more than three layers are required in perceptron-like feed-forward nets".

223 Volume 4, Nos. 3-4, 1994 Connectionism, Confusion and Cognitive Science

C d

Fig. 2: (Α) Α simple pattern classification network with two input units and one output unit. Its "carving" of a two-dimensional pattern space depends upon the output unit's activation function. With a monotonic "squashing" function like the logistic (B), the network carves the pattern space into two regions with a single plane, as illustrated by the "shadow" cast by the network's receptive field. (C) With the Gaussian function used in Dawson and Schopflocher's (1992b) value units, the pattern space is carved into three regions by two parallel planes. (D) With the Gaussian function used in RBF networks, highly localized regions of the pattern space can be selected.

224 M.R. W. Dawson and K.S. Shamanski Journal of Intelligent Systems

Connectionist networks are universal approximators. Historically, PDP networks have been most frequently described as pattern classifiers. Recently, however, with the advent of so-called radial basis function (RBF) networks (e.g., Girosi & Poggio, 1990; Hartman, Keeler & Kowalski, 1989; Moody & Darken, 1989; Poggio & Girosi, 1989, 1990; Renals, 1989), connectionist systems are now often described as function approximators. Imagine, for example, a mathematical "surface" defined in A'-dimcnsional space. At each location in this space this surface has a definite height. A function approximating network with Ν input units and one output unit would take as input the coordinates of a location in this space, and would output the height of the surface at this location.

How powerful a function approximator can a PDP network be? Rumelhart, Hinton and Williams (1986b, p. 319) have claimed that "if we have the right connections from the input units to a large enough set of hidden units, we can always find a representation that will perform any mapping from input to output". More recently, researchers have attempted to analytically justify this bold claim, and have in fact proven that many different kinds of networks are universal approximators. That is, if there are no restrictions on the number of hidden units or the size of the connection weights in the network, then in principle a network can be created to approximate — over a finite interval — any continuous mathematical function to an arbitrary degree of precision. This is true for networks with a single layer of hidden units whose activation function is a sigmoid-shaped "squashing" function (e.g.. Cotter, 1990; Cybenko. 1989; Funahashi, 1989), for networks with multiple layers of such hidden units (Hornik, Stinchcombe & White. 1989), and for RBF networks (Hartman, Keeler & Kowalski, 1989).

How Are Claims About PDP Competence Related to Cognitive Science?

It is obvious from the discussion above that PDP networks can be described as very powerful computational systems. Nevertheless, there are a number of indications that such competence is not by itself sufficient to

225 Volume 4, Nos. 3-4, 1994 Connectionism, Confusion and Cognitive Science claim that PDP models are appropriate for cognitive science:

Competence is irrelevant in the absence of performance. Proofs about computational ability can be extremely powerful, insofar as they may rule out a proposal for an architecture of (e.g., Bever, Fodor & Garrett, 1968). Nevertheless, the demonstration that a connectionist architecture has the competence of a UTM only suggests its plausibility for psychological modelling; it does not justify adopting such networks as the most plausible candidate. This is because one can have powerful computational competence in machines whose performance characteristics clearly rule them out for modelling purposes. For example, the tape head of a is limited to relative memory access, and thus performs painstakingly slow serial manipulations of input data. As a result, researchers do not seriously propose this particular type of machine for cognitive models, even though it has enormous competence. Instead, they explore architectures like production systems that have the same computational power, but have far more powerful performance characteristics (for an introduction, see Haugeland, 1985, Chap. 4). Thus, while computational claims are important, they do little to constrain or motivate the use of connectionist systems in cognitive science (see also Massaro, 1988).

The structure/process distinction and finite state automata. Dawson and Schopflocher (1992a) have argued that a fundamental difference between classical and connectionist models emerges when one considers how the two approaches distinguish between data structures and the processes that manipulate them. In a classical architecture a set of symbols does not itself comprise an autonomous representational system, because full-fledged computation requires that these symbols be manipulated by additional external processes that are sensitive to symbolic structure. A prototypical example is the Turing machine, whose processes are instantiated in the structure of the tape head, which in turn manipulates tokens written on an external tape. In contrast, PDP networks are designed to "exhibit intelligent behaviour without storing, retrieving, or otherwise operating on structured symbolic expressions" (Fodor & Pylyshyn, 1988,

226 M.R. W. Dawson and K.S. Shamanski Journal of Intelligent Systems p. 5). This rejection of the structure/process distinction is the characteristic that differentiates a PDP network from a classically defined symbol. "Knowledge is not directly accessible to interpretation by some separate processor, but it is built into the processor itself and directly determines the course of processing" (Rumelhart, Hinton & McClelland, 1986, pp. 75-76).

While the absence of a structure/process distinction may define how a PDP network differs from a classical system, it also imposes limitations on the network's computational power. Recall that in order to achieve the computational power of a UTM, an external memory structure had to be provided to the network (McCulloch & Pitts, 1943/1988). This implies that if one does not distinguish between structure and data in a connectionist system — if this external data store is not provided — then one may not be able to produce networks of sufficient computational power to be of interest to cognitive science. Furthermore, this problem is not avoided by proofs that networks are universal approximators: Levelt (1990) has argued from such proofs that PDP networks are merely finite state automata, because they can only approximate functions over a finite interval. This limitation is also at the heart of Fodor and Pylyshyn's (1988) criticism that PDP systems are insensitive to the constituent structure of complex tokens.

Do brains and PDP networks have the same competence? That PDP networks are universal approximators is quite interesting in principle, but less compelling in practice. Proofs of universal approximation place no limits on the number of processing units in the network, or on the strengths of its connections. Thus, to approximate some functions, one might require numbers of processors, or connection weight values, that are impossible in finite biological systems (see also Ballard's [1986] discussion of the packing problem). As well, it is not clear that the brain itself is designed to be a universal approximator, because such devices are not without interesting limitations. For example, after being trained to approximate some function, an RBF network can generalize its performance and respond correctly to new instances. However, this ability to generalize requires that the approximated function be smooth and piecewise continuous (e.g., Poggio & Girosi, 190). This property is not true of

227 Volume 4, Nos. 3-4, 1994 Connectionism, Confusion and Cognitive Science

Boolean functions (i.e., functions of the form {θ,ΐ}Ν ~»{0,l}M) typically used to study pattern classification in neural networks. Indeed, universal approximators like RBF networks have difficulties in learning such functions (e.g.. Moody & Darken, 1989). Neuroscientists often describe the brain in a fashion suggesting that its primary function is pattern categorization (e.g. Kuffler, Nicholls & Martin, 1984). If this is true — if it is not a function approximator — then proofs that PDP networks are universal approximators may not be pertinent to cognitive science.

However, the nature of proofs about the pattern classification abilities of PDP networks also raises serious questions about the relationship between PDP models and brain-mediated psychological processes. Specifically, Lippmann (1987) shows that only two layers of hidden units are required to mediate arbitrary pattern classification. However, it is also clear that the can be described as being composed of a very large number of processing layers (e.g., Kuffler, Nicholls & Martin, 1984, Chap. 2). If adequate computational power can be achieved with a small number of processing layers, then why does the brain not have a simpler structure?

Furthermore, the demonstrated competence of relatively simple PDP architectures has discouraged researchers from considering more complicated many-layered architectures, which could bear a stronger relationship to actual brain function. For example, after noting that multi- layer RBF networks are possible in principle, Poggio and Girosi (1989, p. 58) point out that "there is no reason to believe, however, that such 'multilayer' functions represent a large and interesting class". From the view of computational competence, this claim may indeed be true. However, if one of the intents of connectionism is to provide a bridge between psychology and neuroscience, then this view is disturbing.

Computational Connectionism and Cognitive Science

While neuroscientists have made tremendous strides in describing neural structure, they often have very little to say about why this structure

228 M.K. IV. Dawson and K.S. Shamanski Journal of Intelligent Systems exists, in particular when other structures are logically plausible (e.g., Braitenberg, 1984, pp. 96-99). Unfortunately, most computational analyses of connectionist systems ignore such questions, and are concerned instead with general competence. We feel this is unfortunate because some researchers have shown that PDP networks provide a rich computational medium in which one can generate formal answers to "Why?" questions about brain structure.

For example, why are functions localized in the brain? Ballard (1986) provides a formal argument that this is a natural solution to the so-called packing problem. By localizing functions, a network with a finite number of processors can incorporate a greater diversity of computational abilities than an identically sized network that does not localize functions. Why does the brain have so many processing layers? Servan-Schreiber, Printz and Cohen (1990) have proven that if one manipulates the gain of activation functions in PDP networks (i.e., the "slope" of sigmoidal activation functions), then the ability to identify signals in noisy environments increases — but only if multiple layers of processing units are employed.

In our opinion, if computational analyses of PDP systems are to contribute to cognitive science in general, then these analyses should move away from issues of general competence. Computational researchers should instead focus upon more specific issues concerning why the brain might have particular structural characteristics.

ALGORITHMIC DESCRIPTIONS OF PDP NETWORKS

PDP Networks Are Themselves Algorithms

Informally, an algorithm is a completely mechanical procedure for performing some computation — "an infallible, step-by-step recipe for obtaining a prespecified result" (Haugeland, 1985, p.. 65). In cognitive science, an algorithm is viewed by many researchers not only as a fundamentally important theoretical notion, but also as a practical goal for

229 Volume 4, Nos. 3-4, 1994 Connectionism, Confusion and Cognitive Science models-as-explanations. "If the long promised Newtonian revolution in the study of cognition is to occur, then qualitative explanations will have to be abandoned in place of effective procedures" (Johnson-Laird, 1983, p. 6).

In PDP connectionism, a network can itself be described as an effective procedure for some function, or for categorizing some patterns. Indeed, a tremendous amount of enthusiasm for connectionism has been fueled by specific demonstrations that PDP networks offer practical algorithms for a diverse range of problems. A wide range of connectionist systems have been proposed to model aspects of memory (e.g., Anderson, 1972; Anderson, Silverstein, Ritz & Jones, 1977; Eich, 1982; Grossberg, 1980; Knapp & Anderson, 1984; Murdock, 1982). Connectionist networks have a long history (e.g., Selfridge, 1956), have become benchmarks to which other methods are compared (e.g., Barnard & Casasent, 1989), and can outperform standard methods for such tasks as speech recognition (e.g., Bengio & de Mori, 1989). Connectionists have successfully used networks to solve problems related to locomotion (e.g., Brooks, 1989; Pomerleau, 1991), and have designed systems to mediate behaviours once thought to be exclusive to classical systems, such as performing logical inferences (Bechtel & Abrahamsen, 1991, pp. 163-174) and performing linguistic transformations or sentence parsing (e.g., Jain, 1991; Lucas & Damper, 1990; Rager & Berg, 1990).

Problems With PDP Algorithms

In spite of these successes, the contributions of connectionist algorithms to cognitive science is somewhat suspect. First, researchers are beginning to challenge the ability of such models to capture the right empirical regularities. Second, such models are often exceedingly difficult to interpret, which mitigates their explanatory usefulness as effective procedures. Third, in many cases the functional architecture of these networks is not completely specified. We consider each of these problems below.

230 M.R. W. Dawson and K.S. Shamanski Journal of Intelligent Systems

PDP networks may fail to capture interesting empirical regularities. Pylyshyn (1980, 1984) has argued strongly that a fundamental goal of cognitive science's theories is to capture rich sets of empirical regularities. Indeed, the success of computer simulations of psychological phenomena is often measured in the program's ability not only to make the same correct judgements as humans, but similar mistakes as well. Merely generating "intelligent" behaviour does not guarantee a successful niche in cognitive science for an implemented theory, which is why neither computerized chess boards nor pocket calculators are viewed as proposals for how humans play chess or perform mental arithmetic.

One important characteristic of connectionism has been the claim that PDP networks capture the right kinds of regularities for an empirical cognitive science. For example, one reason for the recognized importance of Rumelhart and McClelland's (1986) network that transforms verbs into the past tense was that during training it produced overgeneralization errors similar to those observed in children. Similarly, much of the interest in distributed connectionist memories is due to the kinds of errors these systems produce (see, for example, Eich, 1982).

The claim that connectionist systems are capable of capturing sufficiently rich empirical regularities has far-reaching consequences, because the behavior of PDP networks is putatively mediated by mechanisms that bear little relationship to those proposed in classical models (for an example of strong claims of this sort, see Seidenberg & McClelland, 1989). Connectionists are now challenging the "realistic" status of classical theories — the view that such accounts reflect actual "theories in the head". PDP researchers are proposing that classical theories are not valid explanations, but are merely instrumentalist descriptions. The proper account of mentality, they argue, is reflected in explanations of the dynamic properties of connectionist models. "Subsymbolic models accurately describe the microstructure of cognition, whereas symbolic models provide an approximate description of the macrostructure" (Smolensky, 1988, p. 12).

231 Volume 4, No.s. 3-4, 1994 Connectionism, Confusion and Cognitive Science

This challenge to classical cognitive science required PDP models to generate the same behaviour as that observed in human iubjects. Recently, however, this ability has been strongly contested. Several prominent connectionist models, which have spearheaded the assault on classical models of data, have been carefully examined, and have been found wanting (e.g. Pinker and Prince's [1988] critique of Rumelhart and McClelland's [1986] verb transformation network; Besner, Twilley, McCann & Seergobin's [1990] examination of the Seidenberg and McClelland [1989] grapheme-to-phoneme network). The general theme of these critiques is that PDP networks capture some, but not all, of the empirical regularities thought to be critical to understanding the psychological phenomena being modeled.

The connectionist response to such criticisms is to moderate their claims about the models. They argue that because of practical limitations, the networks that they create should not be expected to capture all of the relevant results (e.g., Seidenberg & McClelland, 1990). However, because these simple systems can account for some interesting data, it is argued that they warrant serious consideration. The suggestion is that as networks become larger and more sophisticated, they will be able to account for a broader range of empirical phenomena. There is certainly merit in this position, but it should be recognized for what it is: a promissory note. The enthusiastic predictions of connectionists about the future performance of larger networks should be tempered by the knowledge that the properties of small PDP networks often disappear when their size is scaled up (e.g., Minsky & Papert, 1969/1988, pp. 261-266).

PDP algorithms are extremely difficult to interpret. In many cases it is extremely difficult to determine how connectionist networks accomplish the tasks that they have been taught. "One thing that connectionist networks have in common with brains is that if you open them up and peer inside, all you can see is a big pile of goo" (Mozer & Smolensky, 1989, p. 3). There are a number of reasons that PDP networks are difficult to understand as algorithms.

232 M.R. W. Dawson and K.S. Shamanski Journal of Intelligent Systems

First, they are rarely developed a priori — instead, a generic learning rule is used to develop useful (algorithmic) structures in a network that is initially random. Thus, one does not need a theoretical account of a to-be- learned task before network is created to do it. Second, general learning procedures can train networks that are extremely large; their sheer size and complexity makes them difficult to interpret. For example, Seidenberg and McClelland's (1989) network for computing a mapping between graphemic and phonemic word representations uses 400 input units, up to 400 hidden units, and 460 output units. Determining how such a large network maps a particular function is an intimidating task. Third, most interesting PDP networks incorporate nonlinear activation functions. This nonlinearity makes these models more powerful than those that only incorporate linear activation functions (e.g., Jordan, 1986), but it also requires that descriptions of their behaviour be particularly complex. Fourth, connectionist architectures offer too many degrees of freedom for the generation of working systems. One learning rule can create many different networks — for instance, containing different numbers of hidden units — that each compute the same function. Each of these systems can therefore be described as a different algorithm for computing that function. One does not have any a priori knowledge of which of these possible algorithms might be the most plausible as a psychological theory of the phenomenon being studied.

Johnson-Laird (1983, p.4) has noted that "to understand a phenomenon is to have a wo9rking model of it". Interestingly, PDP models appear to prove this statement false, because connectionists can easily replace one unknown (e.g., how the brain mediates some psychological phenomenon) with another — a functioning but unexplained network (see also Lewandowsky, 1993; McCloskey, 1991).

The generic connectionist architecture is incomplete. One of the common arguments for using computer simulation methodology in cognitive science is that such models force researchers to be extraordinarily explicit about their assumptions and their theoretical statements. Vague theories do not result in working computer programs.

233 Volume 4, Nos. 3-4, 1994 Connectionism, Confusion and Cognitive Science

Connectionist proponents imply that a PDP network — a program written in the functional architecture of generic connectionism — defines a particularly explicit theory. The components of generic connectionism are quite simple. As a result, one could imagine using a diagram of a trained network as a circuit diagram; each pictured processor and connection could be emulated by a simple electronic (or biological) component. The result would be a physical device capable of carrying out all of the computations attributed to the original network. Thus, connectionists claim that PDP networks comprise an autonomous representational system — one need not appeal to external rules or processes to explain how these networks function or learn. "Much of the allure of the connectionist approach is that many connectionist networks program themselves, that is, they have autonomous procedures for tuning their weights to eventually perform some specific computation" (Smolensky, 1988, p. 1, his italics).

However, Dawson and Schopflocher (1992a) have shown that in actuality PDP networks cannot be easily implemented in this sense. In short, if one were to build a diagrammed network by replacing its generic connectionist components with functionally equivalent electronic parts, then the electronic network would not be capable of all the behaviours attributed to the network — it would not be an autonomous system. This is because the components of the generic connectionist architecture are not by themselves sufficient for the intended task.

Dawson and Schopflocher (1992a) make their case by analysing in detail an extremely simple associative memory model. Figure 3a illustrates the PDP version of the model; diagrams of this system have a long history in the connectionist literature (e.g., Kohonen, 1977, Fig. 1.9; McClelland & Rumelhart, 1988, Chap. 4 Fig. 3; Rumelhart, McClelland & the PDP Group, 1986, Chap. 1 Fig. 12, Chap. 9 Fig 18, Chap. 12 Fig. 1, Chap. 18 Fig 3; Schneider, 1987 Fig. 1; Steinbuch, 1961 Fig 2; Taylor, 1956 Figs 9 & 10). The purpose of this model is to learn the association between pairs of activity patterns presented simultaneously to the two banks of processing units. Under the restriction that all activity patterns are mutually orthogonal, a model with Ν units in each input bank is capable of storing

234 M.R. W. Dawson and K.S. Shamanski Journal of Intelligent Systems information about Ν different pattern pairs in its connections. Because this network is a distributed memory system, and because its mathematical properties are quite easily described, it is often used to introduce the basic ideas of connectionism (e.g., Jordan, 1986).

Nevertheless, Dawson and Schopflocher (1992a) argue that the network in Figure 3a is not capable of learning associations between activity patterns without the help of a controller that is external to the network. They point out that the Figure 3a network requires an external signal to tell whether it should be learning a new association, or whether it should be recalling an old one. Furthermore, it requires additional processing and controlling abilities to modify connection weights — connections cannot merely be single, numerical values. Thus Dawson and Schopflocher conclude that PDP networks by themselves are not sufficiently powerful to simultaneously represent and manipulate contents. As a result, it would be a mistake to assume that diagrams created from the generic connectionist architecture provide better, more explicit, or more easily realizable programs than would be available from classical AI.

Algorithmic Connectionism and Cognitive Science

We believe that if connectionism is to provide algorithmic contributions to cognitive science, then researchers should attempt to develop networks that are truly autonomous. However, if this is to be achieved, then the kinds of networks typically proposed require substantial elaborations of the PDP architecture. This elaboration must be guided by an explicit statement of a functional architecture capable of solving the various control problems faced by an autonomous system. For example, Dawson and Schopflocher (1992a) propose a slightly elaborated connectionist architecture, and demonstrate how an autonomous pattern associator could be created from it (see Figure 3b). Without such a functional architecture, it is doubtful that connectionism can serve as a viable bridge between computational and physiological descriptions.

235 Volume 4, Nos. 3-4, 1994 Connectionism, Confusion and Cognitive Science

J 7 -T

V7 J 7 7 <3 <3 ca 7 J J J <] CT v τ? y 7 J Δ Δ Δ Δ Bank 1 a

-ΗΒΗ-—ΐΡ

Output Input Bank 1 Bank

M-^S j I φ—HK

H3 Θ1 t

Input Bank 2

Fig. 3: (A) The standard pattern association network proposed by connectionists. Dawson and Schopflocher (1992a) have argued that this network cannot function autonomously. In its place, they propose a network constructed from an elaborated functional architecture. (B) A small version of the Dawson and Schopflocher network. Units marked with "I" are input units, with "O" are output units, and with "M" are memory units. Operators marked with "+" compute the sum of their inputs; operators marked with "X" compute the product of their inputs. Connections in this architecture have fixed weights equal to 1.

236 M.R. W. Dawson and K.S. Shamanski Journal of Intelligent Systems

We also believe that important and lasting contributions of connectionism to cognitive science will require that connectionist algorithms be interpreted. Unfortunately, this is likely to be an extremely difficult task to accomplish: "There is a growing suspicion that discovering [how a network does its job] may require an intellectual revolution in information processing as profound as that in physics brought about by the Copenhagen interpretation of quantum mechanics " (Hecht-Nielsen, 1990, p. 10). Nevertheless, many promising approaches to network interpretation have already been identified.

One strategy is to develop networks that are (hopefully) maximally interpretable by reducing the number of their processing units to a minimum (e.g., Hagiwara, 1990; Mozer & Smolensky, 1989; Sietsma & Dow, 1988). For example, Mozer and Smolensky propose a measure of the relevance of each processor to a network's overall performance. They advocate a research strategy in which one starts by training a large network to accomplish some task. Then, the relevance of each processor is computed. Processors with sufficiently small relevance values are removed from the network. This procedure is repeated until each network processor has a high relevance value. A second strategy is to perform statistical analyses of the connection weights from a trained network. For example, Hanson and Burr (1990) illustrate a number of techniques for probing network structure, including compiling frequency distributions of connection strengths, quantifying global patterns of connectivity with descriptive statistics, illustrating local patterns of connectivity with "star diagrams", and performing cluster analyses of hidden unit activations. A third strategy is to map out the response characteristics of each processor in the network. For instance, Moorhead, Haig and Clement (1989) used the generalized delta rule to train a PDP network to identify the orientation of line segments presented to an array of input units. Their primary research goal was to determine whether the hidden units in this system developed centre-surround receptive fields analogous to those found in the primate lateral geniculate nucleus. They chose to answer this question by

237 Volume 4, Nos. 3-4, 1994 Connectionism, Confusion and Cognitive Science stimulating each input element individually, and plotting the resulting activation in each hidden unit. In related work, Dawson, Kremer and Gannon (1993) have developed a simple rule for identifying the input pattern that produces the maximum activity in each hidden unit of a network. They were able to use this rule to demonstrate that in a network whose output units were trained to behave like complex cells, hidden units developed receptive fields analogous to simple cells.

IMPLEMENTATIONAL DESCRIPTIONS OF PDP NETWORKS

An implementational description of an information processing system attempts to relate its representational and formal properties to the causal laws governing its mechanical structure. Classical cognitive science, because of its functionalist nature, has typically placed little emphasis on this type of description. Connectionism, however, is motivated by quite different considerations.

PDP Networks, Biological Plausibility, and Computational Relevance •

Why has connectionism been so enthusiastically adopted by some cognitive scientists? One reason is that PDP models are claimed to be biologically plausible algorithms. In other words, when examining a diagram of a connectionist system, one could imagine that it illustrates a sufficient neural circuitry for accomplishing some task. This, it is argued, is not true of classical models. "No serious study of mind (including philosophical ones) can, I believe, be conducted in the kind of biological vacuum to which cognitive scientists have become accustomed" (Clark, 1989, p. 61).

In what way are PDP models intended to fill this "biological vacuum"? Generally speaking, these systems are "neuronally inspired" — processing units are roughly equivalent to , and connections between processors are roughly equivalent to synapses (see, for example, the visual analogy rendered in Rumelhart, Hinton & McClelland , 1986, Fig. 1).

238 M.R. IV. Dawson and K.S. Shamanski Journal of Intelligent Systems

Neuronal inspiration also colours general assumptions about processing in PDP networks. As in the brain, all processors are assumed to work in parallel and to send signals that are a nonlinear function of their net input. The "knowledge" of the system is encoded in patterns of connectivity, because synaptic modification appears to be a general description of how the brain remembers information (e.g., Dudai, 1989).

In spite of connectionism's implementational intentions, neuroscientists are quite skeptical about the biological plausibility of the PDP architecture. A number of reasons are often cited for this skepticism. First, one can generate long lists of PDP architectural properties that are clearly not true of the brain (e.g., Crick & Asanuma, 1986; Smolensky, 1988, Table 1). As a result, PDP models are often vilified as oversimplifications by neuroscientists; Douglas and Martin (1991, p. 292) refer to them as "stick and ball models". Second, researchers find it extremely unlikely that rules like error backpropogation could be physiologically instantiated. This is because it is highly unlikely that the environment could specify a "training pattern" as accurately as is required by such rules (e.g., Barto, Sutton & Anderson, 1983), and because there is no evidence at all for neural connections capable of feeding an error signal backwards to modify existing connections (e.g., Bechtel & Abrahamsen, 1991, p. 57; Kruschke, 1990). In short, while biological networks are capable of autonomous learning, artificial networks are not (see also Dawson & Schopflocher, 1992a). Reeke and Edelman (1988, p. 144) offer this blunt assessment of the neurophysiological relevance of PDP connectionism: "These new approaches, the misleading label 'neural network computing' notwithstanding, draw their inspiration from statistical physics and engineering, not from biology".

However, these criticisms miss the mark. PDP networks are designed to be extreme simplifications that ignore many of the complex details true of neural systems (for an example, see Braham & Hamblen, 1990). This is because the PDP architecture is itself functionalist in nature. It attempts to capture just those properties of biological networks that are

239 Volume 4, Nos. 3-4, 1994 Connectionism, Confusion and Cognitive Science computationally relevant. The intent of this enterprise is to describe neural networks in a vocabulary that permits one to make rigorous claims about what they can do, or about why the brain might have the particular structure that it does. For example, claims about the competence of neural networks only arise when one abstracts over neurophysiological details, and describes important aspects of neuronal function either mathematically or logically (e.g., McCulloch & Pitts, 1943/1988).

The functionalist philosophy that guides cognitivism has always argued that cognitive phenomena cannot be reduced to a single level of neurophysiological explanation because such an account cannot capture all of the important empirical generalizations (e.g., Pylyshyn, 1980, 1984). The message to neuroscience has been that in cognitivism, accounts at the three levels of computation, algorithm, and implementation are all equally necessary. Largely in response to theories of the "New Connectionism", the egalitarian emphasis of functionalism is apparently ebbing away. Some critics of connectionism have argued that if connectionism is primarily concerned with implementational issues, then it bears no relationship to cognitive science at all, because the fundamental properties of information processing systems must be captured at more abstract or functional levels of description (e.g., Broadbent, 1985; Fodor & Pylyshyn, 1988, pp. 64-69).

This type of argument is too strong, because it ignores the possibility that PDP research has the potential to build a strong bridge between neuroscience and functionalist theories. Classical theories in cognitive science require this bridge: First, cognitive science's commitment to the hypothesis (e.g., Newell, 1980) necessitates a physical account of information processing as well as a formal account. Second, current views in the philosophy of science (e.g., Cummins, 1983) note that functionalist explanations require the dispositional properties of the functional architecture — the primitive "building blocks" of an information processing system — must be subsumed under natural laws. Thus, if one proffers a functionalist explanation, it is not enough to merely state that a function has been subsumed; one must also provide an account of the mechanisms that instantiate the function.

240 M.R. W. Dawson and K.S. Shamanski Journal of Intelligent Systems

However, there are many factors working against realizing connectionism's potential to subsume cognitive theory. This is because connectionists often make design decisions about their architecture without justifying them as computationally relevant properties of neural circuits. It is perfectly reasonable to propose an architecture that ignores complex properties of neural substrate with the goal of making computationally relevant properties explicit. It is quite another to create an architecture that incorporates properties that make it work, independent of whether these properties bear any relation to neural substrates whatsoever. Below, we review several instances of this latter practice.

Problems Arise Because of Biologically Undefended Design Decisions

Connectionists adopt monotonic activation functions. To a large extent, changes in conceptualizations of activation functions for processing units have been responsible for the evolution from less powerful, single layer networks of the "Old Connectionism" to the more powerful multiple layer networks of the "New Connectionism". For example, in a perceptron (e.g., Rosenblatt, 1962), the activation function for the output unit is a linear threshold function: If the net input to the unit exceeds a threshold, then it assumes an activation of 1, otherwise it assumes an activation of 0. This kind of activation function is roughly analogous to the "all-or-none law" governing the generation of action potentials in s(e.g., Levitan & Kaczmarek, 1991, pp. 37-44). However, linear threshold functions are characterized by mathematical properties that make them difficult to work with, and limit their power.

The learning procedures developed within the New Connectionism were made possible by reconceptualizing the linear threshold function with more tractable mathematical equations. For example, it has been quite common to adopt sigmoid-shaped "squashing" functions, like the logistic depicted in Figure 2b. The mathematical limits of this nonlinear equation are functionally equivalent to the two discrete states of the linear threshold function. However, the function itself is continuous, and therefore has a derivative. Because of this property, one can use calculus to determine rules

241 Volume 4, Nos. 3-4, 1994 Connectionism, Confusion and Cognitive Science that will manipulate weights in such a way to perform a in an error space (e.g., Rumelhart et al., 1986b, pp. 322-327). In short, continuous activation functions permit one to derive powerful learning rules.

However, "squashing" functions have another mathematical property that makes these learning rules practical to apply. Such activation functions are monotonic — they are nondecreasing in relation to increases in net input. The derivation of the generalized delta rule (Rumelhart et al., 1986b, p. 325), and the derivation of learning rules for stochastic autoassociative networks like Boltzman machines (e.g., Müller & Reinhardt, 1990, p. 37) stipulate that activation functions be monotonic. If this condition is violated, then in practice the learning rule will almost always fail to work. For example, Dawson and Schopflocher (1992b) found that if processing units that had a particular nonmonotonic activation function (the Gaussian illustrated in Figure 2c) were inserted into a network trained with the standard version of the generalized delta rule, then quite frequently the network settled into a local minimum in which it did not respond correctly to all inputs.

The assumption that activation functions are monotonic appears to be a practical requirement for learning procedures. However, adopting this assumption for this reason alone is dangerous practice, because monotonicity does not appear to be universally true of neural mechanisms. For instance, Ballard (1986) uses a relatively coarse behavioural criterion (i.e., the response of a cell as a function of a range of net inputs) to distinguish between integration devices — neurons whose behaviour is well described with activation functions like the logistic in Figure 2b — and value units — neuron whose behaviour is well described with activation functions like the Gaussian in Figure 2c.

Nonmonotonicity becomes even more apparent at a more detailed level of analysis. It should be apparent that the activation function for a connectionist processor is strongly related to the mechanisms that produce action potentials in neurons. A major component of these mechanisms are

242 M.R. IV. Dawson and K.S. Shamanski Journal of Intelligent Systems

voltage gated ion channels in nerve membranes (for an introduction, see Levitan & Kaczmarek, 1991, pp. 51-124). These channels allow ionic currents to pass through them, and as a result affect a neuron's resting membrane potential. In turn, changes in membrane voltage affect the likelihood that these channels are open or closed. In some cases, such as the potassium channel considered by Levitan and Kaczmarek (Figure 3-9) the relationship between this likelihood and membrane voltage is monotonic. In other important cases, it is decidedly nonmonotonic. For instance, as membrane voltage becomes positive, voltage gated sodium channels begin to open. As the voltage continues to increase, the channel becomes inactive. The nonmonotonicity of the sodium channel played an important role in Hodgkin and Huxley's (e.g., 1952) quantitative modelling of the .

PDP learning rules are not limited in principle to monotonic activation functions. For instance, Dawson and Schopflocher (1992b) derived a modified version of the generalized delta rule that is capable of training networks of value units (i.e., processors with Gaussian activation functions). RBF networks (e.g., Moody & Darken, 1989) also have units with nonmonotonic activation functions, although they use a net input function that in effect makes them behave monotonically (see Dawson & Schopflocher, 1992b, Figure 2c). Nevertheless, nonmonotonic activation functions appear to be more the exception than the rule in PDP modelling. Unfortunately, this appears to be due to the fact that monotonic activation functions lead more easily to practical training methods, and not due to the fact that such properties are characteristic of neural substrates.

Connectionists assume modifiable biases. A tenet of the PDP approach is that connection "weights are usually regarded as encoding the system's knowledge. In this sense, the connection strengths play the role of the program in a conventional computer" (Smolensky, 1988, p. 1). Thus the basic goal of a connectionist learning rule is to manipulate the pattern of connectivity in a network. This is in accordance with current understanding of actual neural circuits that are capable of learning. For example, experimental studies of the gill withdrawal reflex in Aplysia

243 Volume 4, Nos. 3-4, 1994 Connectionism, Confusion and Cognitive Science

Californica have indicated that learning alters the efficacy of synapses between neurons (for a review, see Dudai, 1989, Chap. 4).

However, when networks are trained by supervised learning procedures like the generalized delta rule (e.g., Rumelhart, Hinton & Williams, 1986a, 1986b), the pattern of connectivity in the network is not all that is changed. It is quite typical to also modify the bias values of the activation function as well. For a sigmoid "squashing" function like the logistic, bias is a parameter that positions the activation function in net input space; changing bias is equivalent to translating the activation function along an axis representing net input. Biases can be modified by construing them as connection weights emanating from a "bias processor" that is always on (e.g., Rumelhart et al., 1986b, footnote 1). A processing unit's bias is manipulated by modifying the strength of the connection between the unit and its respective "bias processor". However, it is important to note that this is merely a description of how biases can be learned. "Bias processors" are not presumed to exist in a network. Instead, modifying the bias of a unit's activation function is analogous to directly modifying a neuron's threshold for generating an action potential.

The assumption that bias can be modified violates the connectionist tenet that all that matters are patterns of connectivity. This in itself is not problematic. The problem arises in justifying this design decision — in defending the existence of modifiable biases in the connectionist architecture. In point of fact, there is little evidence that threshold membrane potentials in real neural networks are modifiable. For example, Kupfermann, Castellucci, Pinsker and Kandel (1970) demonstrated that in the neural circuits that mediate the gill withdrawal reflex in Aplysia, the thresholds of motor neurons to constant external current did not change as a function of learning. It was concluded that learning only modified synaptic properties in this . Similarly, neuroscientists concerned with learning in the mammalian brain have focused on a particular memory mechanism, the long term potentiation of synapses (for reviews, see Cotman, Monaghan & Ganong, 1988; Massicotte & Baudry,

244 M.R. W. Dawson and K.S. Shaman ski Journal of Intelligent Systems

1991). To our knowledge, neuroscientists do not believe that neuron thresholds are themselves plastic.

Why does PDP connectionism use modifiable bias terms, when this manoeuvre does not appear to be supported by extant neurophysiological evidence? The answer appears to be that without modifiable biases, some PDP networks are extremely difficult to train. For example, Dawson, Schopflocher, Kidd and Shamanski (1992) trained standard networks on the encoder problem. In a control condition, typical learning procedures were used, and processor biases were modified. In an experimental condition, the networks were identical with the exception that after being assigned initial random values, all biases were fixed during the training session. While the control networks had little difficulty in learning solutions to the encoder problem, none of the experimental networks did — even under a variety of training conditions (i.e., different learning rates, momentums, starting states), and even with an extremely relaxed definition of learning to criterion.

This is not to say that connectionist research in general is fiindamentally flawed because it requires modifiable biases. Many architectures can use fixed thresholds in their processing units, including Hopfield nets (e.g., Hopfield, 1982), Boltzman machines (e.g., Ackley, Hinton & Sejnowski, 1985), and value unit networks (e.g., Dawson et al., 1992). The critical issue is that modifiable biases are adopted in some architectures without being neurophysiologically justified. If it is indeed the case that plastic neural circuits do not have directly modifiable thresholds, then such justification is important, because without it certain architectures may be deemed uninteresting to cognitive science.

Connectionists adopt massively parallel patterns of connectivity. The history of connectionism can be presented in capsule form as follows: IN the beginning, connectionist networks had no hidden units. Minsky and Papart (1969/1988) then proved that such networks had limited competence, and were thus not worthy of further study. The New

245 Volume 4, Nos. 3-4, 1994 Connectionism, Confusion and Cognitive Science

Connectionism was born when learning rules for networks with hidden units were discovered (e.g., Ackley, Hinton & Sejnowski, 1985; Rumelhart et al., 1986a). These rules provided researchers the ability to teach networks that were powerful enough to overcome the Minsky/Papert limitations. (Detailed versions of this history are provided in Hecht- Nielsen, 1990, pp. 14-19; Papert, 1988).

What is interesting about this history is that it pins the blame for the limitations of networks created by Old Connectionism — — on the number of processing layers. It neglects the fact that Minsky and Papert (1969/1988) were extremely concerned with a different type of limitation, the limited-order constraint. Under this constraint, the neural network is restricted to being local — there is no single processing unit that can directly examine every input unit's activity. For example, Minsky and Papert (pp. 56-59) prove that to compute the parity predicate (i.e., to assert "true" if an odd number of input units has been activated), a network requires at least one processor to be directly connected to every input unit. Such proofs still hold for modern multilayer perceptrons (see Minsky & Papert, 1969/1988, pp. 251-252).

There is no doubt that the New Connectionist models are more powerful than their antecedents. However, this increased power is not only due to the addition of layers of hidden units, but is also due to the violation of the limited-order constraint: in these new models, hidden units typically have direct connections to every input unit. The fact that PDP models permit such massively parallel patterns of connectivity between input units and hidden units is unfortunate. While it is true that this design decision will increase the network's competence, it does this at the expense of both biological and empirical plausibility. With respect to the former issue, there is no evidence to indicate that, in human sensory systems, massively parallel connections exist between receptor cells and the next layer of neurons. Indeed, computational modellers of visual processing attempt to increase the biological plausibility of their models by enforcing spatially local connections among processing units (e.g., Ullman, 1979). With

246 M.R. W. Dawson and K.S. Shamanski Journal of Intelligent Systems respect to the latter issue, humans may indeed be subject to computational limits due to the limited-order constraint. For example, Minsky and Papert (1969/1988, p. 13) have used a small set of very simple figures to prove that a perceptron of limited order cannot determine whether all the parts of any geometric figure are connected to one another. Psychophysical experiments have shown that preattentive visual processes involved in texture perception (e.g., Julesz, 1981) and motion perception (Dawson, 1990) are insensitive to this property as well. It is quite plausible to suppose that this insensitivity is related to the fact that the neural circuitry responsible for registering these figures is not massively parallel.

Connectionists assume homogenous processing units. One of the interesting properties of PDP models is their homogeneity (for an exception, see the hybrid networks described by Dawson & Schopflocher, 1992b). It is typically the case that all of the units in PDP networks are of the same type, and that all of the changes that occur during learning in the network are governed by a single procedure. "The study of Connectionist machines has led to a number of striking and unanticipated findings; it's surprising how much computing can be done with a uniform network of simple interconnected elements" (Fodor & Pylyshyn, 1988, p. 6).

In some sense, the homogenous structure of PDP networks can be construed as "neuronally inspired". At a macroscopic level, neurons themselves appear to be relatively homogenous; Kuffler, Nicholls and Martin (1984, Chap. 1) note that the nervous system uses only two basic types of signals, which are virtually identical in all neurons. Furthermore, these signals appear to be common to an enormous range of animal species; much of our molecular understanding of neuronal mechanisms comes from the study of invertebrate systems. "The brain, then, is an instrument, made of 1010 to 1012 components of rather uniform materials, that uses a few stereotyped signals. What seems so puzzling is how the proper assembly of the parts endows the instrument with the extraordinary properties that reside in the brain" (Kuffler, Nicholls & Martin, p. 7). The answer to this puzzle, according to both neurophysiologists and connectionists, lies in understanding the complex and specific patterns of connectivity between

247 Volume 4, Nos. 3-4, 1994 Connectionism, Confusion and Cognitive Science these homogenous components. Getting (1989, p. 186) has noted that in the neuroscience of the late 1960's "the challenge of uncovering the secrets to brain function lay in the unravelling of neural connectivity".

However, a more microscopic analysis of (apparently) relatively simple plastic neural circuits has revealed that neural networks have properties that are far more diverse and complicated than was anticipated. "No longer can neural networks be viewed as the interconnection of many like elements by simple excitatory or inhibitory synapses" (Getting, 1989, p. 187). For example, Getting notes that there is an enormous variety of properties of neurons, synapses, and patterns of connectivity. These serve as the building blocks of neural circuits, and importantly can change as a function of both intracellular and extracellular contexts. As a result, a detailed mapping of the connectivity pattern in a neural network is not sufficient to understand its function. The functional connectivity in the network — the actual effects of one cell on another — can change as the properties of the network's "building blocks" are modulated, even though the anatomical connections in the network are fixed (See Getting, 1989. Figure 2 for a striking example).

Getting (1989, p. 199) has painted quite a different picture of neural networks than would appear to be reflected in the PDP architecture: "The comparative study of neural networks has led to a picture of neural networks as dynamic entities, constrained by their anatomical connectivity but, within these limits, able to be organized and configured into several operational modes". The dynamic changes in biological networks would appear to be computationally relevant. Thus, of connectionism is to make good its promise to provide a more biologically feasible architecture than is found in classical systems, it would appear that the generic architecture must be elaborated extensively. The homogeneity of processors assumption must be abandoned, and in its place should be processing units and connections that have diverse and dynamic properties.

248 M.R. W. Dawson and K.S. Shamanski Journal of Intelligent Systems

Implementational Connectionism and Cognitive Science

McCloskey (1991) has argued that connectionist networks cannot be construed as either theories of cognitive functions or as simulations of such theories. McCloskey adopts two general strategies to support his view. First, he points out that unlike classical models in cognitive science, one does not require an a priori theory of a phenomenon to model it with a connectionist network. Second, he notes that the general inability to interpret the structure of trained networks prevents them from being useful as theories or explanations. "Attempts to tie theoretical claims to network implementations face a serious obstacle in the form of limited understanding of complex connectionist networks" (p. 389).

Recently, Seidenberg (1993) has responded to McCloskey's (1991) critique. Seidenberg suggests that within cognitive science there is a fundamental disagreement about what cognitive theories should do. He argues that McCloskey's criticisms are true of descriptive connectionism, which would have as its goal the clarification of existing theoretical constructs and empirical observations. Seidenberg claims, however, that connectionism has far more to contribute than this descriptive goal. He champions explanatory connectionism, which begins by appealing "to a small set of concepts that are independently motivated rather than task- or phenomenon-specific" (p. 230). The goal of explanatory connectionism is to demonstrate how a small set of general principles (i.e., the design decisions underlying the generic connectionist architecture) can account for a diverse range of cognitive phenomena. If this range is large enough, and interesting enough, then this indicates that the founding principles are candidates for necessary (and perhaps sufficient) properties of cognition.

In general, we are extremely sympathetic to Seidenberg's (1993) view of connectionist models, because he is proposing that researchers should be more concerned with understanding the basic properties of the model's architecture than with simulating specific behaviours in a piecemeal fashion (see also Dawson, 1991, pp. 581). However, there does appear to be one major flaw in Seidenberg's position — the independent justification of

249 Volume 4, Nos. 3-4, 1994 Connectionism, Confusion and Cognitive Science connectionism's general principles. He points out that "given the present state of our understanding, these principles are largely concerned with properties of artificial neural networks" (p. 230). Unfortunately, this seems to put the cart in front of the horse. One would presume that the properties of artificial neural networks are precisely the general principles that connectionists should begin with, and that these principles are independently motivated in some other way. Furthermore, one would also presume that the independent motivation of these principles rests with their biological relevance or plausibility. Unfortunately, the preceding sections have shown that connectionists have largely ignored computationally relevant properties of neural systems when proposing the generic connectionist architecture. Seidenberg (1993, p. 231) argues that "a much bigger win would result, of course, if the general principles were themselves grounded in facts about neurobiology". In our view, this grounding is a prerequisite for explanatory connectionism, and is very far from being achieved. Without it, explanatory connectionism is not possible, and cognitive science does not win at all.

FROM CONNECTIONISM TO COGNITIVE SCIENCE

In the radical behaviourism proposed by Skinner, psychological theory amounted to accounts of environmental stimuli and of the observable behaviours they produced. Internal processes were not referred to; they could play no role in a psychological science. In many respects, the current approach to neural networks represents the rebirth of this endeavour. In our view, the primary concern of connectionists is generating appropriate stimulus/response relationships in PDP networks. Little attention is paid to the relationship between the structure of these networks and the nature of the processes underlying human cognition. We echo Hillis' (1988, p. 176) concern that connectionist networks allow "for the possibility of constructing intelligence without first understanding it". This is perfectly legitimate if a primary goal is merely to build artifacts that generate useful behaviour. Unfortunately, the theories of cognitive science must meet additional criteria.

250 M.R. W. Dawson and K.S. Shamanski Journal of Intelligent Systems

Consider Pylyshyn's (e.g., 1980, 1984) position on comparing two information processing systems. If the two systems are weakly equivalent, they compute the same mapping from input to output, but do so using quite different procedures. If the two systems are strongly equivalent, not only do they compute the same input/output mapping, but they do so because they use identical procedures — specifically, the same program running on functionally equivalent architectures. Within this framework, cognitive science focuses on internal states and regularities as it strives for strongly equivalent models of psychological processes to account for human mentality. We are concerned that connectionists do not strive in this same direction, to the extent that their computational, algorithmic and implementational descriptions fail to go beyond input/output mappings. Connectionists appear to be more interested in developing systems that are at best weakly equivalent to human processes.

What is required of researchers if connectionist models are to be developed that at least have the potential to be strongly equivalent to human information processing? The same that is required of any modellers in cognitive science: they must provide evidence that the functional architecture for their networks is functionally equivalent to that of the systems they model. The preceding sections of this paper show that current connectionist research does not provide this type of evidence. We envision two directions in which a future research programme could develop to remedy this.

In one direction, connectionists would continue to distance themselves from Classical theorists by focusing on the biological plausibility of their networks. However, as we have shown above, such a focus requires substantial elaboration of proposals for PDP architectures. Many more computationally relevant properties must be considered, including nonmonotonic activation functions, fixed biases, and limited patterns of connectivity. These considerations have motivated our own research on the value unit architecture (Dawson & Schopflocher, 1992b; Dawson, Schopflocher, Kidd & Shamanski, 1992; Dawson, Shamanski & Medler, 1993). Specific proposals are required for incorporating learning rules

251 Volume 4, Nos. 3-4, 1994 Connectionism, Confusion and Cognitive Science directly into the architecture (see also Dawson & Schopflocher, 1992a). Different levels of processing may also need to be explored. For instance, the implicit assumption underlying much of connectionism is that processing units are analogous to neurons, and connection are analogous to synapses. However, the recent development of the silicon neuron resulted by describing processing at the level of ion gates in nerve membranes (Mahowald & Douglas, 1991). We are currently exploring connectionist systems in which processing unit activation indicates whether ion gates are open or closed, and in which connections represent ion currents.

In the other direction, connectionists could move towards an integration with the classical approach, by abandoning the notion that they offer a paradigm shift (c.f. Schneider, 1987), and by treating their networks as active data structures or dynamic symbols capable of being manipulated in serial by rules (c.f., Bechtel, 1988; Hawthorne, 1989). Such an integration requires substantial development of principles governing interactions between networks, and a willingness to reject the uniformity hypothesis that all cognition is explicable in terms of "generic" connectionism's architecture (see Clark, 1989, p. 128). From this perspective, PDP networks would fulfil the same theoretical role in cognitive science as have other proposals for representational primitives, such as schemas and images. There is emerging consensus among vision researchers that hybrid models are required to account for existing data on human perception (e.g., Hurlbert & Poggio, 1985; Pylyshyn, 1989; Treisman, 1986). Perhaps dynamic properties of networks-as-symbols could be used to provide a rigorous framework for such models.

ACKNOWLEDGEMENT

This paper was supported by Natural Sciences and Engineering Research Council of Canada operating grant 2038 and equipment grant 46584, both awarded to the first author. We would like to thank the following members of the Biological Computation Project for their helpful comments: Istvan Berkeley, Matthew Duncan, Tim Gannon, David Hall, James Kidd, and Don Schopflocher. Thanks as well to Nancy Digdon and

252 M.R.W. Dawson and K.S. Shamanski Journal of Intelligent Systems

Dallas Treit. Address reprint requests to Dr. Michael Dawson, Biological Computation Project, Department of Psychology, University of Alberta, Edmonton, AB, CANADA T6G 2E9. Electronic mail: [email protected].

REFERENCES

Abu-Mostafa, Y.S. and Psaltis, D., 1987, Optical neural computers, Scientific American, 256(3), 88-95. Ackley, D.H., Hinton, G.E., and Sejnowski, T.J., 1985, A learning algorithm for Boltzman machines, Cognitive Science, 9, 147-169. Anderson, J.Α., 1972, A simple neural network generating an interactive memory, Mathematical Biosciences, 14, 197-220. Anderson, J.A. and Rosenfeld, Ε., 1988, Neurocomputing: Foundations of research, Cambridge, MA, MIT Press. Anderson, J.A., Silverstein, J.W., Ritz, S.R and Jones, R.S., 1977, Distinctive features, categorical perception, and probability learning: Some applications of a neural model, Psychological Review, 84, 413-451. Antrobus, J., 1991, Dreaming: cognitive processes during cortical activation and high afferent thresholds, Psychological Review, 98, 96-121. Ballard, D.H., 1986, Cortical connections and parallel processing: Structure and function, Behavioural and Brain Sciences, 9, 67-120. Barnard, E, and Casasent, D., 1989, A comparison between criterion functions for linear classifiers, with an application to neural nets, IEEE Transactions on Systems, Man, and , 19, 834-846. Barto, A G., Sutton, RS., and Anderson, C.W., 1983, Neuronlike adaptive elements that can solve difficult learning control problems, IEEE Transactions On Systems, Man, and Cybernetics, 13, 835-846. Bechtel, W., 1985, Contemporary connectionism: Are the new parallel distributed processing models cognitive or associationist? Behaviourism, 13, 53-61. Bechtel, W., 1988, Connectionism and rules and representation systems: are they compatible? Philosophical Psychology, 1, 5-16.

253 Volume 4, Nos. 3-4, 1994 Connectionism, Confusion and Cognitive Science

Bechtcl, W. and Abrahamsen, Α., 1991, Connectionism and the mind., Cambridge, MA, Basil Blackwell. Bengio, Y, and de Mori, R., 1989, Use of multilayer networks for the recognition of phonetic features and phonemes, Computational Intelligence, 5, 134-141. Besner, D., Twilley, L„ McCann, R.S. and Seergobin, K., 1990, On the association between connectionism and data: Are a few words necessaryΊ Psychological Review, 97, 432-446. Bever, T.G., Fodor, J.A. and Garrett, M., 1968, A formal limitation of associationism, in: T.R. Dixon and D.L. Horton, eds., Verbal behaviour and general behaviour theory, Englewood Cliffs, N.J., Prentice-Hall. Braham, R. and Hamblen, J.O., 1990, The design of a neural network with a biologically motivated architecture, IEEE transactions on neural networks, 1, 251-262. Brakenberg, V., 1984, Vehicles, Cambridge, MA, MIT Press. Broadbent, D., 1985, A question of levels: Comment on McClelland and Rumelhart, Journal of Experimental Psychology: General, 114, 189-192. Brooks, R.A., 1989, A robot that walks; emergent behaviours from a carefully evolved network, Neural Computation, 1, 253-262. Carpenter, G.A., 1989, Neural network models for pattern recognition and associative memory, Neural Networks, 2, 243-257. Churchland, P.S. and Sejnowski, T., 1989, Neural representation and neural computation, in: L. Nadel, L.A. Cooper, P. Culicover and R.M. Harnish, eds, Neural connections, mental computation, Cambridge, MA, MIT Press. Clark, Α., 1989, Microcognition, Cambridge, MA, MIT Press. Cohen, J.D., Dunbar, K. and McClelland, J.L., 1991, On the control of automatic processes: A parallel distributed processing account of the Stroop effect, Psychological Review, 97, 332-361. Cotman, C.W., Monaghan, D.T. and Ganong, A.H., 1988, Excitatory amino acid neurotransmission: NMDA receptors and Hebb-type synaptic plasticity, Annual Review ofNeuroscience, 11, 61-80.

254 M.R. W. Dawson and K.S. Shamanski Journal of Intelligent Systems

Cotter, N.E., 1990, The Stone-Weierstrass theorem and its application to neural networks, IEEE Transactions On Neural Networks, 1, 290- 295. Cowan, J.D. and Sharp, D.H., 1988, Neural nets and artificial intelligence, in: S. Graubard, ed, The artificial intelligence debate, Cambridge, MA, MIT Press. Crick, F. and Asanuma, C., 1986, Certain aspects of the anatomy and physiology of the cerebral cortex, in: J. McClelland, D. Rumelhart and the PDP Group, eds, Parallel Distributed Processing, V.2., Cambridge, MA, MIT Press. Cummins, R., 1983, The nature of psychological explanation, Cambridge, MA, MIT Press. Cybenko, G., 1989, Approximation by superpositions of a sigmoidal function, Mathematics of Control, Signals and Systems, 2, 303-314. Dawson, M.R.W., 1990, Apparent motion and element connectedness, Spatial Vision, 4, 241-251. Dawson, M.R.W., 1991, The how and why of what went where in apparent motion: Modelling solutions to the motion correspondence problem, Psychological Review, 98, 569-603. Dawson, M.R.W., Kremer, S. and Gannon, T., 1993, Identifying the trigger features for hidden units in a PDP model of the early visual pathway. Manuscript in preparation. Dawson, M.R.W. and Schopflocher, D.P., 1992a, Autonomous processing in PDP networks, Philosophical Psychology, 5, 199-219. Dawson, M.R.W. and Schopflocher, D.P., 1992b, Modifying the generalized delta rule to train networks of nonmonotonic processors for pattern classification, Connection Science, 4, 19-31. Dawson, M.R.W., Schopflocher, D.P., Kidd, J. and Shamanski, K.S., 1992, Training networks of value units, Proceedings of the Ninth Canadian Artificial Intelligence Conference, 244-250. Dawson, M.R.W., Shamanski, K.S. and Medier, D.A., 1993, From connectionism to cognitive science, Proceedings of the Fifth University of New Brunswick Artificial Intelligence Symposium, in press.

255 Volume 4, Nos. 3-4, 1994 Connectionism, Confusion and Cognitive Science

Dell, G.S., 1986, A spreading-activation theory of retrieval in sentence production, Psychological Review, 93, 283-321. Douglas, R.J. and Martin, K.A.C., 1991, Opening the grey box, Trends in Neuroscience, 14, 286-293. Dudai, Y., 1989, The neurobiology of memory, New York, Oxford University Press. Eich, J.M., 1982, A composite holographic associative recall model, Psychological Review, 89, 627-661. Fodor, J. A. and Pylyshyn, Z.W., 1988, Connectionism and , Cognition, 28, 3-71. Funahashi, K., 1989, On the approximate realization of continuous mappings by neural networks, Neural Networks, 2, 183-192. Getting, P.A., 1989, Emerging principles governing the operation of neural networks, Annual review of neuroscience, 12, 185-204. Girosi, F. and Poggio, T., 1990, Networks and the best approximation property, Biological Cybernetics, 63, 169-176. Granger, R., Ambros-Ingerson, J. and Lynch, G., 1989, Derivation of encoding characteristics of layer II cerebral cortex, Journal of Cognitive Neuroscience, 1, 61-87. Grossberg, S., 1980, How does the brain build a cognitive code? Psychological Review, 87, 1-51. Grossberg, S., 1991, Why do parallel cortical system exist for the perception of static form and moving form? Perception & Psychophysics, 49, 117-141. Grossberg, S. and Rudd, M., 1989, A neural architecture for visual motion perception: Group and element apparent motion, Neural Networks, 2, 421-450. Grossberg, S. and Rudd, M., 1992, Cortical dynamics of visual motion perception: Short-range and long-range apparent motion, Psychological Review, 99, 78-121. Hagiwara, M, 1990, Novel backpropagation algorithm for reduction of hidden units and acceleration of convergence using artificial search, Proceedings of the IEEE Joint Conference On Neural Networks, Vol. I, 625-630.

256 M.R. W. Dawson and K.S. Shamanski Journal of Intelligent Systems

Hanson, S.J. and Burr, D.J., 1990, What connectionist models learn: Learning and representation in connectionist networks, Behavioural and Brain Sciences, 13, 471-518. Hanson, S.J. and Olson, C.R., 1991, Neural networks and natural intelligence: Notes from Mudville, Connection Science, 3, 332-335. Hartman, Ε., Keeler, J.D. and Kowalski, J.M., 1989, Layered neural networks with Gaussian hidden units as universal approximation, Neural Computation, 2, 210-215. Haugeland, J., 1985, Artificial intelligence: The very idea., Cambridge, MA, MIT Press. Hawthorne, J., 1989, On the compatibility of connectionist and classical models, Philosophical Psychology, 2, 5-15. Hecht-Nielsen, R., 1990, Neurocomputing, Reading, MA, Addison-Wesley. Hillis, W.D., 1985, The connection machine, Cambridge, MA, MIT Press. Hillis, W.D., 1988, Intelligence as emergent behavior, or, the songs of Eden, in: S.R. Graubard, ed, The artificial intelligence debate, Cambridge, MA, MIT Press. Hinton, G.E. and Shallice, T., 1991, Lesioning an attractor network: Investigations of acquired dyslexia, Psychological Review, 98, 74- 95. Hodgkin, A.L. and Huxley, A.F., 1952, A quantitative description of membrane current and its application to conduction and excitation in nerve, Journal of Physiology, 117, 500-544. Hopcroft, J.E. and Ullman, J.D., 1979, Introduction to automata theory, languages, and computation, Reading, MA, Addison-Wesley. Hopfield, J.J., 1982, Neural networks and physical systems with emergent collective computational abilities, Proceedings of the National Academy of Sciences, USA, 79, 2554-2558. Hornik, M., Stinchcombe, M. and White, H., 1989, Multilayer networks are universal approximators, Neural Networks, 2, 359-366. Hurlbert, A. and Poggio, T., 1985, Spotlight on attention. Trends In Neurosciences, 8, 309-311. Jabri, M. and Flower, B., 1991, Weight perturbation: An optimal architecture and learning technique for analog VLSI feedforward

257 Volume 4, No.f. 3-4, 1994 Connectionism, Confusion and Cognitive Science

and recurrent multilayer networks, Neural Computation, 3, 546-565. Jain, A.N., 1991, Parsing complex sentences with structured connectionist networks, Neural Computation, 3, 110-120. Johnson-Laird, P.N., 1983, Mental models, Cambridge, MA, Harvard University Press. Jordan, M.I., 1986, An introduction to linear algebra in parallel distributed processing, in: D. Rumelhart, J. McClelland and the PDP Group, eds, Parallel Distributed Processing, V.l., Cambridge, MA, MIT Press. Julesz, B, 1981, Textons, the elements of texture perception, and their interactions, Nature, 290, 91-97. Kehoe, E.J., 1988, A layered network model of associative learning: Learning to learn and configuration, Psychological Review, 95, 411- 433. Knapp, A, and Anderson, J.A., 1984, A signal averaging model for concept formation, Journal of Experimental Psychology: Learning, Memory and Cognition, 10, 616-637. Kohonen, T., 1977, Associative memory: A system-theoretical approach, New York, Springer-Verlag. Kruschke, J.K., 1990, How connectionist models learn: The course of learning in connectionist networks, Behavioural and Brain Sciences, 13, 498-499. Kuffler, S.W., Nicholls, J.G. and Martin, A.R., 1984, From neuron to brain, 2nd edition, Sunderland, MA, Sinauer Associates. Kupfermann, I., Castelluci, U., Pinsker, H. and Kandel, E.R., 1970, Neuronal correlates of habituation and dishabituation of the gill- withdrawal reflex in Aplysia, Science, 167, 1743-1745. Lachter, J. and Bever, T.G., 1988, The relation between linguistic structure and associative theories of language learning — A constructive critique of some connectionist learning models, Cognition, 28, 195- 247. Levelt, W.J.M., 1990, Are multilayer feedforward networks effectively Turing machines? Psychological Research, 52, 153-157. Levitan, I.B. and Kaczmarek, L.K., 1991, The neuron: Cell and molecular biology, New York, Oxford University Press.

258 M.R. W. Dawson and K.S. Shamanski Journal of Intelligent Systems

Lewandowsky, S., 1993, The rewards and hazards of computer simulations, Psychological Science, 4, 236-243. Lippmann, R.P., 1987, An introduction to computing with neural nets, IEEE ASSP Magazine, April, 4-22. Lippmann, R.P., 1989, Pattern classification using neural networks, IEEE Communications Magazine, November, 47-64. Lucas, S.M. and Damper, R.I., 1990, Syntactic neural networks, Connection Science, 2, 195-221. Lynch, G., Granger, R., Larson, J. and Baudry, M., 1989, Cortical encoding of memory: Hypotheses derived from analysis and simulation of physiological learning rules in anatomical structures, in: L. Nadel, L A. Cooper, P. Culicover and R.M. Harnish, eds, Neural connections, mental computation, Cambridge, MA, MIT Press. Mahowald, M. and Douglas, R., 1991, A silicon neuron, Nature, 354, 515- 518. Marr, D., 1982, Vision, San Francisco, W.H. Freeman. Massaro, D.W., 1988, Some criticisms of connectionist models of human performance, Journal of Memory and Language, 27, 213-234. Massicotte, G. and Baudiy, M., 1991, Triggers and substrates of hippocampal synaptic plasticity, Neuroscience and Biobehavioural Reviews, 15, 415-423. McClelland, J.L. and Rumelhart, D.E., 1988, Explorations in parallel distributed processing, Cambridge, MA, MIT Press. McClelland, J.L., Rumelhart, D.E. and Hinton, G.E., 1986, The appeal of parallel distributed processing, in: D. Rumelhart, J. McClelland and the PDP Group, eds, Parallel Distributed Processing, V.l., Cambridge, MA, MIT Press. McCIoskey, M., 1991, Networks and theories: The place of connectionism in cognitive science, Psychological Science, 2, 387-395. McCulloch, W.S. and Pitts, W„ 1988, A logical calculus of the ideas immanent in nervous activity, in: J. Anderson and E. Rosenfeld, eds, Neurocomputing: Foundations of research, Cambridge, MA, MIT Press. (Originally published in 1943.)

259 Volume 4, Nos. 3-4, 1994 Connectionism, Confusion and Cognitive Science

Minsky, M., 1972, Computation: Finite and infinite machines, London, Prentice-Hall. Minsky, M. and Papert, S., 1988, Perceptrons, Cambridge, MA, MIT Press. (Originally published in 1969.) Moody, J. and Darken, C.J., 1989, Fast learning in networks of locally- tuned processing units, Neural Computation, 1, 281-294. Moorhead, I.R., Haig, N.D. and Clement, R.A., 1989, An investigation of trained neural networks from a neurophysiological perspective, Perception, 18, 793-803. Mozer, M.C. and Smolensky, P., 1989, Using relevance to reduce network size automatically, Connection Science, 1, 3-16. Müller, Β. and Reinhardt, J., 1990, Neural networks, Berlin, Springer- Verlag. Murdock, Β.Β., 1982, A theory for the storage and retrieval of item and associative information, Psychological Review, 89, 609-626. Newell, Α., 1980, Physical symbol systems, Cognitive Science, 4, 135-183. Papert, S., 1988, One AI or many? in: S.R. Graubard, ed, The artificial intelligence debate, Cambridge, MA, MIT Press. Pinker, S. and Prince, Α., 1988, On language and connectionism: Analysis of a parallel distributed processing model of , Cognition, 28, 73-193. Poggio, T. and Girosi, F., 1989, A theory of networks for approximation and learning, MIT AI Lab Memo No. 1140. Poggio, T. and Girosi, F., 1990, Regularization algorithms for learning that are equivalent to multilayer networks, Science, 247, 978-982. Pomerleau, D.A., 1991, Efficient training of artificial neural networks for autonomous navigation, Neural Computation, 3, 88-97. Pylyshyn, Z.W., 1980, Computation and cognition: Issues in the foundations of cognitive science, Behavioural and Brain Sciences, 3, 111-169. Pylyshyn, Z.W., 1984, Computation and cognition, Cambridge, MA, MIT Press. Pylyshyn, Z.W., 1989, The role of location indexes in spatial perception: A sketch of the FINST spatial-index model, Cognition, 32, 65-97.

260 M.R. W. Dawson and K.S. Shamanski Journal of Intelligent Systems

Rager, J. and Berg, G., 1990, A connectionist model of motion and government in Chomsky's government-binding theory, Connection Science, 2, 35-52. Reeke, G.N. and Edelman, G.M., 1988, Real brains and artificial intelligence, in: S.R. Graubard, ed, The artificial intelligence debate, Cambridge, MA, MIT Press. Renals, S., 1989, Radial basis function network for speech pattern classification, Electronics Letters, 25, 437-439. Rosenblatt, F., 1962, Principles of neurodynamics, Washington, Spartan Books. Rumelhart, D.E., McClelland, J.L. and the PDP Group, 1986, Parallel Distributed Processing, V.l., Cambridge, MA, MIT Press. Rumelhart, D.E., Hinton, G.E. and McClelland, J.L., 1986, A general framework for parallel distributed processing, in: D. Rumelhart, J. McClelland and the PDP Group, eds, Parallel Distributed Processing, V.l., Cambridge, MA, MIT Press. Rumelhart, D.E., Hinton, G.E. and Williams, R.J., 1986a, Learning representations by back-propagating errors, Nature, 323, 533-536. Rumelhart, D.E., Hinton, G.E. and Williams, R.J., 1986b, Learning internal representations by error backpropagation, in: D. Rumelhart, J. McClelland and the PDP Group, eds, Parallel Distributed Processing, V. 1., Cambridge, MA, MIT Press. Rumelhart, D.E. and McClelland, J.L., 1986, On learning the past tenses of English verbs, in: J. McClelland, D. Rumelhart and the PDP Group, eds, Parallel Distributed Processing, V.2., Cambridge, MA, MIT Press. Rumelhart, D.E. and McClelland, J.L., 1985, Levels indeed! A response to Broadbent, Journal of Experimental Psychology: General, 114, 193-197. Rumelhart, D.E., Smolensky, P., McClelland, J.L. and Hinton, G.E., 1986, Schemata and sequential thought processes in PDP models, in: J. McClelland, D. Rumelhart and the PDP Group, eds, Parallel Distributed Processing, V.2., Cambridge, MA, MIT Press. Schneider, W., 1987, Connectionism: Is it a paradigm shift for psychology?

261 Volume 4, Nos. 3-4, 1994 Connectionism, Confusion and Cognitive Science

Behaviour Research Methods, Instruments and Computers, 19, 73- 83. Seidenberg, M., 1993, Connectionist models and cognitive theory, Psychological Science, 4, 228-235. Seidenberg, M.S. and McClelland, J.L., 1989, A distributed, developmental model of word recognition and naming, Psychological Review, 96, 523-568. Seidenberg, M.S. and McClelland, J.L., 1990, More words but still no lexicon: Reply to Besner et al., 1990, Psychological Review, 97, 447-452. Selfridge, O.G., 1956, Pattern recognition and learning, in: C. Cherry, ed, , London, Butterworths Scientific Publications. Servan-Schreiber, D., Printz, H. and Cohen, J.D., 1990, A network model of catecholamine effects: Gain, signal-to-noise ratio and behavior, Science, 249, 892-895. Sietsma, J. and Dow, R.J.F., 1988, Neural net pruning — why and how, Proceedings of the IEEE Joint International Conference On Neural Networks, Vol. I, 325-333. Smolensky, P., 1988, On the proper treatment of connectionism, Behavioural and Brain Sciences, 11, 1-74. Steinbuch, Κ., 1961, Die lernmatrix, Kybernetik, 1, 36-45. Taylor, W.K., 1956, Electrical simulation of some nervous system functional activities, in: C. Cherry, ed, Information theory, London, Butterworths Scientific Publications. Treisman, Α., 1986, Features and objects in visual processing, Scientific American, 255(5), 114B-125. Ullman, S., 1979, The interpretation of visual motion, Cambridge, MA, MIT Press.

262