<<

Metadata of the chapter that will be visualized online

Chapter Title Connectionism Copyright Year 2011 Copyright Holder Springer Science+Business Media, LLC Corresponding Author Family Name Karaminis Particle Given Name Themis N. Suffix Division Department of Psychological Sciences, Birkbeck College Organization University of London Address Malet Street, London, WC1E 7HX, UK Email [email protected]

Author Family Name Thomas Particle Given Name Michael S.C. Suffix Division Department of Psychological Sciences, Birkbeck College Organization University of London Address Malet Street, London, WC1E 7HX, UK Email [email protected] Comp. by: KArunKumar Stage: Proof Chapter No.: 397 Title Name: ESL Page Number: 0 Date:23/3/11 Time:17:22:10

1 C

Theoretical Background 38 2 Connectionism Artificial Neural Networks are abstract models of 39 biological neural systems. They consist of a set of identical 40 3 THEMIS N. KARAMINIS,MICHAEL S.C. THOMAS processing units, which are referred to as artificial 41 4 Department of Psychological Sciences, Birkbeck College, or processing units. Artificial neurons are interconnected 42 5 University of London, London, UK via weighted connections. 43 A great deal of biological complexity is omitted in 44 artificial neural network models. For example, artificial 45 6 Synonyms neurons perform the simple function of discriminating 46 7 (Artificial) Neural network modeling; Connectionist between different levels of input activation. The detector 47 8 modeling; Neural nets; Parallel Distributed Processing model of the (Fig. 1) is a crude approximation of 48 9 (PDP) the role of dendrites and synaptic channels in biological 49 neurons. According to this model, each neuron receives 50 10 Definition a number of inputs from other neurons. The neuron 51 11 Connectionism is an interdisciplinary approach to the integrates the inputs by a weighted sum of 52 12 study of cognition that integrates elements from the fields sending activation. Based on the value of the total input 53 13 of artificial intelligence, neuroscience, cognitive , activation, an activation function (e.g., a threshold 54 14 and philosophy of . As a theoretical movement in function) determines the level of the output activation of 55 15 , connectionism suggests that cognitive the neuron. The output activation is propagated to 56 16 phenomena can be explained with respect to a set of general succeeding neurons. 57 17 information-processing principles, known as parallel The pattern of connectivity between the processing 58 18 distributed processing (Rumelhart et al. 1986a). From a units defines the architecture of the neural network and 59 19 methodological point of view, connectionism is the input–output functions that can be performed. 60 20 a framework for studying cognitive phenomena using The processing units are usually arranged in layers. It is 61 21 architectures of simple processing units interconnected notable that a layered structure has also been observed 62 22 via weighted connections. in neural tissues. Many different neural network 63 23 These architectures present analogies to biological architectures have been implemented in the connectionist 64 24 neural systems and are referred to as (Artificial) Neural literature. One that has been particularly common 65 25 Networks. Connectionist studies typically propose and is the three-layer feed-forward neural network (Fig. 2). 66 26 implement neural network models to explain various In this network, the units are arranged in three layers: 67 27 aspects of cognition. The term connectionism stems input, hidden, and output. The connectivity is feed-for- 68 28 from the proposal that cognition emerges in neural ward, which means that the connections are unidirec- 69 29 network models as a product of a learning process which tional, and connect the input to the hidden, and the 70 30 shapes the values of the weighted connections. hidden to the output layer. The connectivity is also full: 71 31 Connectionism supports the idea that knowledge is Every neuron of a given layer is connected to every neuron 72 32 represented in the weights of the connections between of the next layer. 73 33 the processing units in a distributed fashion. This means A key property of neural networks is their ability to 74 34 that knowledge is encoded in the structure of the learn. Learning in neural networks is based on altering the 75 35 processing system, in contrast to the symbolic approach extent to which a given neuron’s activity alters the activity 76 36 where knowledge is readily shifted between different of the neurons to which it is connected. Learning is 77 37 memory registers. performed by a learning algorithm which determines 78 appropriate changes in the weight values to perform 79 a set of input–output mappings. For example, the 80

Norbert Seel (ed.), Encyclopedia of the Sciences of Learning, DOI 10.1007/978-1-4419-1428-6, # Springer Science+Business Media, LLC 2011 Comp. by: KArunKumar Stage: Proof Chapter No.: 397 Title Name: ESL Page Number: 0 Date:23/3/11 Time:17:22:10

2 C Connectionism

81 of Error algorithm (Rumelhart et al. One of the most influential connectionist models is 130 82 1986b) can be used to train a feed-forward multilayered that of Rumelhart and McClelland (1986) for the acquisi- 131 83 network (Fig. 2) using supervised learning. For this type of tion of the English past tense (Fig. 3). The domain of the 132 84 learning, the learning algorithm presents the network with English past tense is of theoretical interest to psycholin- 133 85 pairs of input patterns and desired output patterns guists because it presents a predominant regularity, with 134 86 (or targets). The algorithm computes the output error, the great majority of verbs forming their past tenses 135 87 i.e., the difference between the actual output of the through a stem-suffixation rule (e.g., walk/walked). 136 88 network and the targets. Next, the algorithm propagates However, a significant group of verbs form their past 137 89 appropriate error signals back down through each layer of tenses irregularly (e.g., swim/swam, hit/hit, is/was). 138 90 the network. These error signals are used to determine Rumelhart and McClelland trained a two-layered 139 91 weight changes necessary to achieve the minimization feed-forward network (a pattern associator) on mappings 140 92 of the output error. For a more detailed discussion of between phonological representations of the stems and the 141 93 learning in neural networks, see connectionist theories corresponding past tense forms of English verbs. 142 94 of learning. Rumelhart and McClelland showed that both regular and 143 95 Other issues that are considered in neural network irregular inflections could be learned by this network. 144 96 modeling concern the representation of the learning Furthermore, they argued that their model reproduced 145 97 environment. For example, a localist or a distributed a series of well-established phenomena in empirical 146 98 scheme can be used to represent different entities. In the studies of . For example, the past 147 99 former, a single unit is used to encode an entity, while in tense rule was generalized to novel stems, while the 148 100 the latter an entity is encoded by an activation learning of irregular verbs followed a U-shaped pattern 149 101 pattern across multiple units. Furthermore, the different (an initial period of error-free performance succeeded by 150 102 input–output patterns which compose the learning a period of increased occurrence of overgeneralization 151 103 environment can be presented in different ways (e.g., errors, e.g., think/thinked instead of thought). 152 104 sequentially, randomly with replacement, incrementally, The success of this model in simulating the acquisition 153 105 or based on a frequency structure). of the English past tense demonstrated that an explicit 154 representation of rules is not necessary for the acquisition 155 106 Important Scientific Research and Open of morphology. Instead, a rule-like behavior was the 156 107 Questions product of the statistical properties of input–output 157 108 The concept of neural network computation was initially mappings. The Rumelhart and McClelland (1986) model 158 109 proposed in the 1940s. However, the foundations for their posed a serious challenge to existing ‘symbolic’ views, 159 110 systematic application to the exploration of cognition which maintained that the acquisition of morphology 160 111 were laid several decades later by the influential volumes was supported by two separate mechanisms, also referred 161 112 of Rumelhart, McClelland, and colleagues. Following this to as the dual-route model. According to the dual-route 162 113 seminal work, a large number of studies proposed neural model, a rule-based system was involved in the learning of 163 114 network models to address various cognitive phenomena. regular mappings, while a rote-memory was involved in the 164 115 Although connectionist models are inspired by learning of irregular mappings. A vigorous debate, also 165 116 computation in biological neural systems, they present known as the ‘past tense debate,’ ensued in the field of 166 117 a high level of abstraction. Therefore, they could not language acquisition (c.f., Pinker and Prince 1988). By the 167 118 claim biological plausibility. Connectionist models are time this debate resided, connectionist studies had moved 168 119 usually seen as cognitive models, which explain cognition on to addressing many aspects of the acquisition of past 169 120 based on general information-processing principles. One tense and inflectional morphology in greater detail. 170 121 of the main strengths of connectionism is that the neural For example, Thomas and Karmiloff-Smith (2003) 171 122 network models are not verbally specified but incorporated phonological and lexical-semantics infor- 172 123 implemented. In this way, they are able to suggest mation in the input of a three-layered feed-forward 173 124 elaborate mechanistic explanations for the structure of network and studied conditions under which an atypical 174 125 cognition and cognitive development. They also developmental profile could be reproduced, as a way of 175 126 allow the detailed study of developmental disorders by investigating the potential cause of developmental 176 127 considering training under atypical initial computational language impairments. 177 128 constraints, and acquired deficits by introducing ‘damage’ Another important connectionist model is the simple 178 129 to trained models. recurrent network (SRN) proposed by Elman (1990). 179 The significance of this network lies in its ability to 180 Comp. by: KArunKumar Stage: Proof Chapter No.: 397 Title Name: ESL Page Number: 0 Date:23/3/11 Time:17:22:11

Connectionism C 3

181 represent time and address problems, which involve the References 203 182 processing of sequences. As shown in Fig. 4, the SRN uses Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14, 204 183 a three-layered feed-forward architecture in which an 179–211. 205 206 184 additional layer of ‘context units’ is connected to the Pinker, S., & Prince, A. (1988). On language and connectionism: Analysis of a parallel distributed processing model of language acquisition. 207 185 hidden layer with recurrent connections. Time is Cognition, 28, 73–193. 208 186 separated into discrete slices. On each subsequent time Rumelhart, D. E., & McClelland, J. L. (1986). On learning the past tense of 209 187 slice, activation from the hidden layer in the previous English verbs. In J. L. McClelland, D. E. Rumelhart, & The PDP 210 188 time slice is given as input to the network via the context Research Group (Eds.), Parallel distributed processing: Explorations 211 212 189 layer. In this way, SRN is able to process a new input in the in the microstructure of cognition. Volume 2: Psychological and biolog- ical models (pp. 216–271). Cambridge, MA: MIT Press. 213 190 context of the full history of the previous inputs. Rumelhart, D. E., Hinton, G. E., & McClelland, J. L. (1986a). A general 214 191 This allows the network to learn statistical relationships framework for parallel distributed processing. In D. E. Rumelhart, 215 192 across sequences in the input. J. L. McClelland, & The PDP Research Group (Eds.), Parallel distrib- 216 uted processing: Explorations in the microstructure of cognition. Volume 217 218 193 Acknowledgments 1: Foundations (pp. 45–76). Cambridge, MA: MIT Press. Rumelhart,D.E.,Hinton,G.E.,&Williams,R.J.(1986b).Learninginternal 219 194 The studies of the first author are funded by the Greek representations by error propagation. In D. E. Rumelhart, 220 195 State Scholarship Foundation (IKY). The work of the J. L. McClelland, & The PDP Research Group (Eds.), Parallel distributed 221 196 second author is supported by UK MRC Grant G0300188. processing: Explorations in the microstructure of cognition. Volume 1: 222 Foundations (pp. 318–362). Cambridge, MA: MIT Press. 223 224 197 Cross-References Thomas, M. S. C., & Karmiloff-Smith, A. (2003). Modeling language acquisition in atypical phenotypes. Psychological Review, 110(4), 225 198 ▶ Computational Models of Human Learning 647–682. 226 199 ▶ Connectionist Theories of Learning 200 ▶ Developmental Cognitive Neuroscience and Learning 201 ▶ Human Cognition and Learning 202 ▶ Learning in Artificial Neural Networks Comp. by: KArunKumar Stage: Proof Chapter No.: 397 Title Name: ESL Page Number: 0 Date:23/3/11 Time:17:22:11

4 C Connectionism

W X1 1

input Σ XiWi activity from output to W2 other weighted threshold other X2 y neurons sum of input activation neurons activity function W3

X3

Connectionism. Fig. 1 The detector model of the real neuron

input hidden output layer layer layer

Connectionism. Fig. 2 A three-layered feed-forward neural network with three units in the input layer, four units in the hidden layer, and two units in the output layer Comp. by: KArunKumar Stage: Proof Chapter No.: 397 Title Name: ESL Page Number: 0 Date:23/3/11 Time:17:22:11

Connectionism C 5

phonological representation of the past tense

decoding

‘Wickelfeature’ representation of past tense

‘Wickelfeature’ representation of verb root

encoding

phonological representation of verb root

Connectionism. Fig. 3 The Rumelhart and McClelland (1986) model for the learning of the English past tense. The core of the model is a two-layered feed-forward network (pattern associator) which learns mappings between coarse-coded distributed representations (Wickelfeature representations) of verb roots and past tense forms

output layer

hidden layer

copy

context layer input layer

Connectionism. Fig. 4 The Simple Recurrent Network (Elman 1990)