<<

MODELS OF : THE CLASSICAL/CONNECTIONIST DEBATE

Suzanne E. McCalden

Submitted in partial fulfillrnent of the requirernents for the degree of Masters of Arts

Dalhousie University Halifax, Nova Scotia August, 2000

@ Copyright by Suzanne E. McCalden National Library Bibliothèque nationale I*l of Canada du Canada Acquisitions and Acquisitions et Bibliographie Services services bibliographiques

395 Wellington Street 395, nie Wellingtm Ottawa ON K1A ON4 UnawaON K1AW Canada Canada

The author has granted a non- L'auteur a accordé une licence non exclusive licence dowing the exclwive pemettant à la National Library of Canada to Bibliothèque nationale du Canada de reproduce, loan, distribute or seii reproduire, prêter, distribuer ou copies of this thesis in microform, vendre des copies de cette thèse sous paper or electronic fonnats. la forme de microfiche/film, de reproduction sur papier ou sur format électronique.

The author retains ownership of the L'auteur conserve la propriété du copyright in this thesis. Neither the droit d'auteur qui protège cette thèse. thesis nor substantial extracts firom it Ni la thèse ni des extraits substantiels may be printed or otherwise de celle-ci ne doivent êeimprimés reproduced without the author's ou autrement reproduits sans son permission. autorisation. For David, Richard, Gordon, Douglas, and especially Evelyn . TABLE OF CONTENTS

Table of Contents v

Acknowledgements vii

Introduction 1

Chapter One: The Classical Theory of Cognition 4

Chapter Two: Comectionisrn and the Systematicity 34 Challenge

Chapter Three: v. The Language of 69 Thought Hypothesis

Bibliography ABSTRACT

Fodorts the language of thought hypothesis (LOT) and

Smolensky's connectionism are examined. The systematicity debate is also examined. Fodor and Pylyshyn are correct in claiming that Smolensky's connectionisrn does not provide an account of systematicity. However, it is argued that Clark's connectionism does provide an account of systematicity though it is left open as to whether

such an account is merely an implementation of the LOT.

It is then argued that connectionism can trivially handle

other cases concerning human cognition whereas the LOT cannot. Due to this, even if at one level of analysis

connectionism is an irnplementation of the LOT, it offers

a more viable and robust theory of huma cognition. ACKNOWLEDGEMENTS

To begin with, 1 would like to thank my various undergraduate professors who sparked my interest in philosophy. In particular. 1 am indebted to Andrew

Irvine, Alan Richardson, and Kate Talmage.

1 would also like to thank the members of my thesis committee: Mike Hymers. my third reader; and Duncan

MacIntosh, for coming aboard my thesis committee as second reader; and for reading and commenting on earlier drafts. Most of all, 1 would especially like to thank Chris Viger. my thesis supervisor, who, through his lectures in philosophy of and brain, inspired me to write my thesis on this subject. In addition. 1 would like to thank Chris for reading and comenting on al1 drafts; and for his enormous help in shaping the ideas presented in my thesis. INTRODUCTION:

In 1975, Jerry Fodor published the book entitled The

Language of Thought Hypothesis, which offered a theory of human cognition and which subsequently became the cornerstone theory in philosophy of mind and . However, in the 1980fs, a new alternative theory of human cognition emerged -- connectionism. As with many competing theories, a debate ensued concerning what is referred to as 'systematicity': our ability to entertain certain thoughts is intrinsically connected to our ability to entertain certain others. The nature of the debate is this. Those in the LOT camp, such as Fodor and Zenon Pylyshyn, daim that the LOT provides an account of systematicity but that connectionism does not. Those in the connectionism camp, such as Paul

Smolensky and Andy Clark, daim that connectionism does provide an account of systematicity. The challenge that

Fodor and Pylyshyn presented to the connectionists is this : ( 1) if mental representations exhibit constituent structure in connectionist models, then connectionism does not provide a novel account of systematicity.

Simply, ccnnectionism is merely an implementation of the 2 LOT; and (2) if mental representations do not exhibit constituent structure in connectionist models, then systematicity is simply a mystery.

In chapter one I provide an account of the Classical theory of human cognition, which includes the

Computational Theory of Mind which claims that mental processes are simply computational processes, and the

LOT. In chapter two 1 provide an account of connectionism. 1 then present the systematicity challenge and examine the central elements of the debate as presented by Fodor and Pylyshyn in the LOT camp and by

Smolensky in the comectionist camp. The upshot of the debate is that Smolenskyfs argument in favour of connectionism does not provide an adequate account of systematicity. 1 then indicate that, by following Clark's argument, connectionism does offer an account of systematicity. However, it is left open as to whether connectionism is merely an implementation of the LOT. In chapter three I attempt to weaken Fodor's account of systematicity. I argue that there are certain cases concerning our ability to have certain thoughts, which are not entirely systematic. I then indicate that there are certain aspects concerning human cognition which connectionism can handle trivially but which the LOT cannot. Hence the thesis is that connectionism is a better mode1 of human cognition, in al1 its variations, even if the way it explains systematicity is by implementing a language of thought. Chapter One: The Classical Theory of Cognition

1. symbole and the Caqputatioaal meozy of Wnd

'... language is a system of symbols which we know and uset (Stainton 1996, p. 1). But what are symbols? %en we talk of syrnbols, we usually take them to be things with syntax. The syntax of a symbol is its physical property

(i.e.. its shape or form). However, individual syrnbols lack semantics (Le., meaning). It is only when we syntactically combine symbols that we are able to create meaningful words and sentences. For the purpose of this thesis, we are not concerned about the lexical nature of individual words (Le., how individual words get meaning)

- that is for a theory of content to adüress. Rather. we are concerned with the syntactic properties of a sentence, and hence with the syntactic properties of the symbols whi-ch comprise it and certain of its semantic properties. We are also concerned with the fact that the syntactic properties of a sentence and certain of its semantic properties are related to human cognition (i.e., thinking). In other words, we are concerned with thinking

(Le., mental processes) and syntax. And since thinking involves mental processes, thinking resides in the mind, 5 which in turn involves the manipulation of the syntax of symbols. This is the view that the mind is a syntactically driven machine. This view is held by, e.g.,

Daniel Dennett and Jewry Fodor:

'... the brain ... is just a syntactic enginet (Dennett, 1987, p. 61 emphasis in original). 'There must be mental symbols because, in a nutshell, only symbols have syntax, and our best available theory of mental processes - indeed, the only available theory of mental processes that isntt known to be false - needs the picture of the mind as syntax- driven machine (Fodor, 1990, p. 23 emphasis in original).

But how do we go about showing that the mind is a syntactically driven machine? One method is via a theoretical device known as a Turing Machine, which was developed by the mathematician/logician, Alan Turing. Turing developed such a machine to formalize the notion of computation. The formalization defined a class of physical mechanisms in terms of their formal properties of symbol manipulation and showed how the physical mechanisms could solve problems that nomally require human . Humans are also physical mechanisms capable of symbol manipulation. Thus, Turing's formalization of the notion of computation implied that machines could mimic the human mind (MacDonald 1995, pp.

4-5) . Turing machines work on symbolic computation: they follow a system of rules used to manipulate symbols. One such system of rules a Turing Machine may use is propositional or truth-functional logic. Propositional logic deals with arguments, which comprise a set of sentences with one or more premises and a conclusion. which is supposed to follow from the premises. An argument is said to be valid if and only if it is impossible to have al1 true premises with a false conclusion -- that is, if the truth of the premises guarantees the truth of the conclusion. To put it in other terrns. a valid argument preserves the truth of the premises in the conclusion. Thus, a valid forma1 argument is said to be truth-preserving. Moreover, an argument is said to be valid if and only if it has a valid form. To put it another way, the validity of an argument is dependent upon its sentential form. The form of an argument has nothing to do with its subject matter; thus, its content is irrelevant to its validity. Propositional logic is a formal language, which includes a vocabulary of primitive elements (i.e.. symbols) and a set of formation rules (i.e., grammar) together with a set of axioms and/or rules of . The symbols are taken

8 For example, 1 can make an inference of the form P&Q, therefore P within a propositional or truth-functional logistic system. 1 am able to make such an inference irrespective of what P and Q mean. Such an inference is truth-preserving (Le., the syntactic properties of the symbols carry its semantic properties). And this is because P and Q are conjoined by the operator 'andr.

The computations of a Turing machine take place on a tape, which is marked into squares. The machine is capable of (1) typing a symbol on the tape; (2) removing a syrnbol from the tape; (3) moving one space to the left;

(4) moving one space to the right; and (5) reading a symbol and going into a new state. In addition, the machine is designed in such a way that at each stage of the computation, it is in one of a finite number of states. Initially, the Turing Machine is halted, and, depending on the set of instructions you give it, it will perform one of the aforementioned £ive acts. If a square is not blank, then it has exactly one of a finite number of symbols, S,,..., Sn written on it. When the machine is halted, the machine is in its present state and the scanned square (which may or may not have a symbol on it;, determines what act is to be carried out and what 9 the next state will be. To put it another way, at each stage of the computation, the machine scans one square on the tape and is capable of identifying what symbol is written on the scanned square, if there is one, and then moves on to the next computation, if there is one. In effect, as the machine goes through its set of instructions, we can get the machine to transform one set of symbols (input) into another (output) (Boolos and

Jeffery 1974, pp. 20-21). Let us now demonstrate how a Turing machine would handle a particular example. Suppose that written on the tape are the symbols 'Pt and 'Qt (input), which are written next to one another; and that when the machine begins its operations, it starts on a square with the symbol 'Pt written on it. The set of instructions is the following: (1) if the scanned square has the symbol 'P' written on it, then move one space to right; (2) if the scanned square has the symbol 'Q' written on it, then move one square to the right; and then (3) if the scanned square is blank, then type the symbol 'Pt (output). Once the machine has performed its operations, it has made an inference: whenever we have the form 'P&Qtt we can derive

'Pt. 10 This can also be shown via a machine table such that

{q,, q,, q,) represent the machine states and {P, Q} represent the scanned symbols. By following this method, the instructions for the machine are: (q,) if the scanned square has the symbol 'Pt written on it, then move one square to the right; (q2) if the scamed square has the symbol 'QI written on it. then move one square to the right; (q,) if the scanned square is blank, then type the symbol 'Pt on it. But, you may ask, what does this have

to do with thinking?

A consequence of the developrnent of the Turing

Machine was the advent of the Cornputational Theory of

Mind (CTM). which claims that mental processes just are computational processes and that such processes are

syntactic. That is, thinking is computation. The notion of Turing Machines, in conjunction with the CTM, gave rise to the development of Classical cognitive science

(MacDonald 1995, p. 3). However. computational processes presuppose a medium of computation: mental processes consist of causal sequences of tokenings of symbols or mental representations. So if thinking is accurately modeled by the CTM. it requires symbols. Moreover. the

symbols have cornbinatorial syntax and semantics. That is, structurally complex molecular representations are systematically built up from structurally simple atornic constituents (e.g., we can take the symbol 'Pt and the symbol 'Q', conjoin them with 'andt to yield 'P and Q').

And, the operations on mental representations are causally sensitive to the syntactic (i.e. formal)

structure of representations as defined by combinatorial syntax.

The idea of structure-sensitive operations has two components. The first component is that, in formai

languages, such as propositional or truth-functional logic, the syntactic properties of symbols correspond to certain of their semantic properties.

'Intuitively, the idea is that in such languages the syntax of a formula encodes its meaning; most especially, those aspects of i ts meaning that determine i ts role in inferencet (Fodor and Pylyshyn 1988, p. 112 emphasis added) . For example, consider the relation between the following:

1. Bill went to the store and Mary went to store.

S. ~iiîwent to the store.

Semantically, (1) entails (2) and thus the inference from

constituent of (1). 'These two facts can be brought into 12 phase by exploiting the principle that sentences with the syntactic structure '(SI and S,)' entai1 their sentential constituents' (Fodor and Pylyshyn 1988. p. 112) . One way of realizing the idea of the structure- sensitivity of mental processes is via Turing Machines, which are capable of transforning symbols with operations sensitive to the syntactical structure of the syrnbols.

This is the second component. With this in mind, consider again the aforementioned inference:

3. Bill went to the store and Mary went to the store.

( P&Q 4. Bill went to the store. (P)

Let us now analyze this inference as performed by a Turing machine. Expressions are written on the tape. such as Pt,Q 'P&Qt, ... Let P represent the constituent expression 'Bill went to the store' and let Q represent the constituent expression 'Mary went to the store'. The interpretation of the niring machine is that whenever a token of the form P&Q appears on the tape, it is a token of the expression 'Bill went to the store and Mary went to the store'. The Turing machine then writes a token of the form P on the tape (in a manner described earlier), 13 which is a token of the constituent 'Bill went to the storer. Thus, an inference from P&Q to P matches the tokening of the type 'P&Qt on the tape. thus causing a tokening of the type 'P' (Fodor and Pylyshyn 1988. p.

100). The Turing machine has transformed one set of symbols (input) into another (output). These two components fit together. Syntactic relations can parallel semantic relations if we have a machine

(e.g., a Turing Machine), whose operations on formulas are sensitive tc? their syntax (Fodor and Pylyshyn 1988, p. 113).

According to Fodor and Zenon Pylyshyn, the combinatorial syntax and semantics of mental representations and the structure-sensitivity of mental processes defines Classical models of cognition. They constrain the physical realizations of symbol structure. In particular, such symbol structures are supposed to correspond to real physical structures in the brain; and the combinatorial structure of a representation is supposed to correspond to real structural relations in the brain. For example, consider a simple symbol system that consists of the atomic symbols 'Pr and 'Q'. Suppose we combine these two symbols by placing an 'and' between 14 them. This yields 'P&Q8. So as the symbol 'P8 is part of 'P&QtJthe brain state that means that P could be part of

the brain state that means PQ (Fodor and Pylyshyn 1988, p.294). As Fodor and Pylyshyn purport, '... the symbol structures in a Classical mode1 are assumed to correspond to real physical structures in the brain and the combinatorial structure of a representation is supposed to have a counterpart in structural relations among physical properties of the brainJ (Fodor and Pylyshyn 1988, p. 294 emphasis in original) . On the Classical treatment of mental processes,

Turing machines transform symbols, and the operations are

sensitive to the syntactical structure of the symbols

that the machine operates on. Since, according to the

CMT, mental processes are just computational processes

which are formally syntactic, mental thoughts, which

comprise mental representations, are forxnally structured

in exactly the same way as the public sentences they

represent.1 This is because the operations (e.g.,

'and','ort,'if and only iff,etc.) are defined via the

syntactic properties of the mental representations; and

the causal relations that hold between mental processes

' This is me at the propositionai formai level king discussed. But it das not follow that al1 public sentences are captured in the mind. That is. there are different types of surface structures in public sentences, which are not internally suiictud in exactly the same way.

16 This quotation is compatible with the interpretation that natural languages are the medium of thought. That is, the medium in which we think is a natural language

(e.g., English for English speakers, German for Gennan speakers and so an). 1s such an interpretation plausible?

Well, natural languages are symbol systems and are thus representational. So if natural languages are the medium of thought, then cognitive processes are simply performed over natural language symbols. That is, cognitive processes, such as thinking, presuppose at least one natural language. In other words, we cannot think unless we already use at least one natural language. Fodor argues that this is not plausible. The key to Fodor8s argument is that natural languages are not the medium of

thought because there are infraverbal organisms and preverbal human infants that can think. If this is the case, then at least some cognitive processes are not mediated by natural languages (Fodor, 1975, pp. 56-57).

According to Fodor, there are homogeneities between the mental capacities of infraverbal organisms and humans. Take, for example, disjunctive concepts2; humans tend to have difficulty disjunctive concepts. Colour. for example, is not a disjunctive concept, but 'red or bluet is (viz., it is disjunctively represented in English) . According to Fodor, this applies equally to infraverbal concept learning.3 Infraverbal beings, too, find it difficult to learn disjunctive concepts. Fodor accounts for this by assuming that the representational system that they employ is similar to ours. The point is this. Infraverbal organisms are not equipped with any natural language. Thus, the LOT cannot, on Fodor's account, be a natural language (Fodor

1975, pp. 57-58).

Fodor holds that the LOT cannot be a natural language. And natural languages presuppose the LOT. According to

Fodor, this is correct because linguistic meaning comes

£rom pairing expressions with sornething 'in the mind' that bestows thern with meaning. To put it another way, public words and sentences are meaningful because they are paired with interna1 words and sentences, expressions

' The notion of concepts is conientious matare ihey? Are there any? What function do they play?). But generaily speaking, and for the purpose of this thesis, concepts are defined as the constituents of thought (Le.. what thoughis are composed of), which include. hm alia, mental representations. See: Fodor, J.. Garrett, M. and Brill. S. (1975). Pe. ka, pu: nie Perception of Speech Sound in Pre linguistic Infants. Mm.Qua rrerly Progress Reporr, January . 18 of the LOT, and this implies that natural languages presuppose a mental language: the LOT (Stainton 1996, pp.

112&117).

Now, before delving into the next section, some notations need to be explained. (1) Names of English expressions appear in single quotes; (2) the semantic values of words are expressed as properties in italics;

(3) when individual symbols are combined in the LOT, to form, say, the word 'dog', the notation used to express a string of symbols in the LOT is #dAoAg#;and (4) concepts are written in capitals.

To resume, what is involved in learning a language?

Well, learning a language is thinking and so must occur in the medium of thought. So, how do 1 learn, for example, the word 'dog'? Suppose that 1 perceive a dog. 1 must (1) have a mental representation of a 'dogr and (2) have a string of symbols in my innate language, such as

#dAoAg#.In order to learn 'dog8 and to discriminate between objects in my environment, 1 must have a mental syrnbol for DOG. So, if 1 perceive a golden retriever

(which is in the extension of 'dog8),my mental symbol for dog will be tokened. Due to this, 1 am in the position to Say, 'Look, a dog,' if 1 speak English. 19 But how does the English word 'dog' get matched to the mental word #dAoAg#?To answer this question, let us review Alfred Tarski's semantic theory of truth, which

Fodor endorses. Truth for Tarski may only be defined relative to a language; for a sentence that is true in one language may turn out to be meaningless or false in another. Tarski does not define 'true', but only 'true in a language'. According to Tarski, 'truet may be defined in a rigorous and precise marner only within a specified formal language. The kind of formal language Tarski nad in mind was first order predicate logic. However, the kind of language that Fodor has in mind is a mental language (i.e., the LOT). Tarski's conception of truth is grounded in ~ristotle'sdictum: To Say of what is that it is not, or of what is not that is, is false, while to Say of what is, or of what is not that it is or is not, is true. Tarski's reasoning for choosing Aristotlets dictum is that it adheres to our intuitions concerning truth

(Tarski 1944, pp.342-343 and Haack 1978, p. 114). Tarski, then, set himself the following task: to provide a material adequacy condition (MAC) of the definition of truth:

s is true if and only if p. To explicate Tarskits MAC, consider the sentence

'Snow is whitet. Under what conditions is this sentence

true? If based on Aristotlets dictum, then the sentence will be true if snow is white and false otherwise. If the

definition of truth is to correspond to Aristotle's

dictum, then it must irnply the following: the sentence 'Snow is white' is true if and only if snow is white. The sentence on the left is in quotation marks; and the

sentence on the right is not. The left side mentions the sentence whereas the right side uses the sentence (Tarski

1944, p. 343). Now consider any arbitrary, contingent sentence and

replace it by the letter 'pt. We cm then form the name

of this sentence and replace it by the letter \Sa. One may ask what the logical relation is between the two

sentences, 'S is truet and 'pl. The MAC holds that any acceptable definition of truth should have as consequence

al1 instances of the (T) schema:

(T) S is true if and only if p (Tarski 1944, p. 343) .

Hence. any such equivalences with 'pl replaced by

any sentence of the language to which the predicate

'true' applies, and 'S. replaced by the name of the 21 sentence, is an instance of the (T) schema (Tarski 1944, p. 344 & Haack 1977, p.100).

The most important component of Tarski's forma1 requirements is the language in which truth is to be defined. Two different languages are employed: the object language (OL) and the metalanguage (ML). The OL is the language for which truth is being defined. The ML is the language in which truth-in-the-OL is being defined. The

OL and the ML rnust be formally specifiable; or to put it another way, the OL and the ML muçt have a specified structure (Tarski 1944, p. 350). For Fodor, the OL is any public natural language and the ML is the interna1 mental language (the LOT) .

The ML is largely determined by the MAC, which holds that any acceptable definition of truth must have as its consequence al1 instances of the (T) schema: (T) S is true if and only if p. Thus, the definition and al1 the equivalences that are implied by it are expressed in the

ML. Since 'p' in the (T) schema stands for an arbitrary contingent sentence of the OL, it follows then that evely sentence that occurs in the OL must also occur in the ML

(Tarski 1944, pp. 350-351)- 22 What is of paramount importance for Fodor is that, in the sentence 'S is true if and only if p', S and p are

CO-extensive. Moreover, S is only mentioned whereas p is used. In essence, Fodor extended Tarski's semantic conception of truth to account for al1 natural language terms, not just to formally specified sentences. The point is this. If, for example, a child is in the process of learning a first natural language, she cannot use it or any other natural language. This is why S (a sentence of a natural language) is only mentioned. Thus, 1 can learn a natural language only if 1 already know some other language (Le., the LOT). So we must have an innate language (Le., a language that is hard-wired in our ), and we must already know this language, that is, it is not learned (Fodor 1975, pp. 80-81).

1.3. The Structure of Natural Language

Al1 languages are symbol systems, and al1 languages have syntactic structure. The çyntax of a language is its grammar, or the way its expressions may be combined to produce sentences. Syntax is not concerned with sentence meaning, but rather with the purely formal aspects of word and phrase combinations in a language. So one way to describe a language's syntax is to provide a rule system 23 that, in effect. tells you what the parts are and how to combine them to produce sentences (Stainton 1996, p. 13).

With this in mind, consider the following Over-simplified

Rule System:

1. Noun Phrase + Article + Noun (a noun phrase consists of

an article followed by a noun) 2. Articles + 'thet,a', 'an' (the definite article 'the'

indicates that the following noun is a particular

individual; the indefinite articles 'at and 'an'

indicate that the following noun is a member of a

class)

3. Noun -t 'tree', 'girl' ('treet and 'girl' are nouns)

By using these three rules, you can build noun phrases. For example, combining the article 'the' and the noun 'tree', yields >the tree' by using rule (1). Of course there are numerous other nouns (e.g., 'Jeep,

'Australiat,etc.), as well as other categories such as verb (e.g., 'sing' and 'work'), which you can combine with adverbs ('loudlyt, 'quicklyt); and these can be combined together with other phrases, such as Noun Phrases.

Consider a Less Sirnplified Rule System: 4. NP + Art + N

5. VP + V + Adverb

6. S (Sentence) -+ NP + VP (Verb Phrase) 7. Art + 'thet,'an, 'an' 8. N + 'tree' 'girlt 9. V + 'singt 'work' 10. Adverb + 'loudlyt 'quicklyt

For example, if we take the Noun Phrase 'the girl' and the Verb Phrase 'sings loudlyDtwe can generate the sentence 'The girl sings loudly' by (7)' (8), (4). and

(91, (IO), (5) with (6) (Stainton 1996, pp. 13-14).

Obviously, this does not exhaust the number of rules; many more rules obtain. Thus far, the rule system we have been working with is compositional: the system's "complex expressions are 'composed of' (Le. 'built up from') minimal parts and the nature of a complex linguistic expression is exhaustively determined by what its parts are, and how they are arrangedn (Stainton 1996, p.15). It should be noted that some of the rules governing grammatical combination could be recursively applied to yield more complex expressions. Consider the following Recursive Rule:

11. S + S, + 'and' + S, 25 This rule stipulates that you can conjoin two sentences by placing an 'and' between the two sentences.

For example, consider the sentences 'Bill dancesr and

'Beth sings'. By employing the recursive rule, we arrive at the sentence 'Bill dances and Beth sings'. So 'Bill dances' served as the input to rule (11) since it was combined with the sentence 'Beth sings'. But 'Beth sings' also served as the input rule to (11) since it was combined with 'Bill dances'. The recursive aspect is that we can continue this process by using the output as a new input to yield more complex sentences. For example, we can take the output sentence 'Bill dances and Beth sings' as the input for the recursive rule and combine it with the sentence 'Sally smokes' to yield the sentence 'Bill dances and Beth sings and Sally smokesr. It is important to stress that natural languages are powerful in the sense that very complex sentences can be produced. That is, rule systems that have a finite number of minimal rules can yield an infinite number of outputs. if the rules are compositional and recursive. So, if you understand the finite number of rules, then you are, in principle, able to form an infinite number of sentences

(Stainton 1996, pp. 15-16) . 26 So sentences are built up compositionally; and what they mean depends on what their parts mean and how the parts are arranged. But what is it for a word or a sentence to be rneaningful? From a symbol system perspective, a linguistic expression combines a syntactic structure and semantics (Le.. meaning). But what makes certain syntactic forms, or for that matter, any syntactic forms, meaningful? (Stainton 1996, p. 99)

We already have the notion of the syntactic feature of compositionality, but we now need to incorporate the semantic feature since it determines the interpretation of complex expressions, given the interpretation of their parts. If this were not required, then meaning could come from, Say, a phrase book. But the phrase book conception of meaning is problematic, for if 1 were to follow the phrase book, 1 could learn 'Bill loves the horse', but that by no means implies that 1 can learn 'the horse loves Bill1. If the phrase book mode1 is correct (Le., the means by which we learn a language), then our semantic capacities are punctate (i.e., gappy). However, if I have mastered the English language, then my linguistic capacities, at least for the hglish language. are not punctate, but, rather, systematic. That is, if I have mastered the English language, then my ability to

understand the sentence 'Bill loves the horse' is

intrinsically connected wi th my abili ty to understand the sentence ' the horse loves Bill '. This leads to another problem with the phrase book model: phrase books are

finite. Since the syntax of natural languages is recursive, this simply will not do. In a sense, you would not have a one to one correspondence between syntax and sernantics, but rather you would have an infinite syntax but only a finite semantics.

A more plausible account of meaning is that cornpositional rules supply a meaning for a sentence given

(a) the meaning of its parts (i.e., single words) and (b) the way they are combined. For example, the sentences

'The horse loves Billt and 'Bill loves the horset are not synonymous, though they share the same meaningful parts.

Having now incorporated both syntax and semantics into the notion of compositionality, we can define compositionality as the principle that 'the meaning of an expression is determined by the meaning of its parts. plus the way those parts are ordered (Stainton 1996 pp.

23-32 & 64). This is how language works. There is a rule system in which specific rules (e.g., NP'S) introduce 28 minimal elements; and there are other specific rules, which indicate how to combine the minimal elements. For

Our purposes, we don't care how the minimal elements get their meaning - that's for a theory of content to address. But given that they do have meaning, the meaning of the whole is determined.

1.4. The Structure of the LôT As we saw in the previous section, we have reason to believe that natural languages are compositional. And we also have reason to believe that natural languages are

systematic (i.e., our ability to understand certain

sentences is intrinsically connected to our ability to understand certain others), as seen in our verbal behaviour. Since we have reason to believe that natural

languages are systematic, we thereby have reason to believe that Our linguistic capacities are not punctate.

And it appears that the principle of compositionality

explains the principle of systematicity. If natural

languages are systematic, then they are probably

compositional. given the absence of a rival explanatory

hypothesis. But does the LOT exhibit a structure similar

to that of natural languages? 1s the LOT compositional and hence systematic - that is, are my abilities to 29 entertain a certain thought intrinsically connected to rny ability to entertain certain others? The answer should be obvious: if natural languages presuppose the LOT, and if natural languages are compositional and hence systematic, then so too must be the LOT.

Recall that compositionality governs the semantic relations between words and the expressions of which they are constituents. Compositionality thus implies that at least some expressions have constituent structure (Fodor and Pylyshyn 1988, p. 126) . As Fodor daims, 'Linguistic capacities are systematic, and that's because sentences have constituent structure. But cognitive capacities are systematic too, and that must be because thoughts have constituent structure. But if thoughts have constituent structure, then LOT is true' (Fodor 1987, pp. 150-151).

For example, ' the', 'horse', ' loves', and 'Bill' make the same semantic contribution to 'Bill loves the horse' that they make to 'the horse loves Billt. So, the fact that 1 can have some thoughts entails that I can have certain other semantically related thoughts (Fodor and

Pylyshyn 1988, p. 124) . If the ability to use some sentences is connected to the ability to use other semantically related sentences, 30 then the ability to think some thoughts is connected to

the ability to think other semantically related thoughts.

However, you can only think the thoughts that your mental

representations express. In particular. the ability to be

in some representational states must imply the ability to be in other semantically related representational states

(Fodor and Pylyshyn p.126). This is possible if mental

representations have interna1 structure, as do sentences.

More specifically, the mental representation that corresponds to the thought 'Bill loves the horset

contains, as its parts, the same constituents as the mental representation that corresponds to the thought 'the horse loves Billt.

Fodorfs view is that thought is systematic: the ability to entertain certain thoughts is intrinsically

connected to the ability to entertain certain others.

Fodor concludes that thoughts have structure (Le.,

constituent parts) . So, if you can have a thought, you

must have al1 the constituent parts and the ability to

combine them in your mental language. For example, if you

are capable of thinking that 'Jim is tall', and you are

capable of thinking 'Bill is fatr, then you are capable

of thinking that 'Jim is fat'. Any language with 31 combinatorial semantics that contains a translation of

'Jim is tallt and 'Bill is fat' will also contain the translation of 'Jim is fat' (Stainton 1996, p. 114).

So thoughts are systematically stmctured. But how are thoughts semantically related? Consider the following: being able to understand 'Bill loves the horse' goes dong with being able to understand 'the horse loves

Bill'. Having the thought 'Bill loves the horse' involves

the use of the mental sentence 'Bill loves the horset.

This mental sentence is a cornplex representation, itself containing simpler representations (Le., il,'lovest, and 'horse'). In order for 'Bill loves the horse' to be true, Bill must bear to the horse the same relation

(Le., loves) that the truth of 'the horse loves Billt requires the horse to bear to Bill (Fodor and Pylyshyn

1988, p.124) . For our thoughts to be semantically related requires not only that similar constituent structure be discerned in the related thoughts, but that the syntactic constituents make the same semantic contribution in the

two thoughts . 'In so far as language is systematic, a lexical item must make approximately the same semantic contribution to each expression in which it occurst (Fodor and Pylyshyn 1988, p.124 emphasis added).

Sirnilar constituent structure accounts fox semantic relations if semantic constituents are context- independent. The context of an expression may affect both its meaning and its constituent structure. A context- independent semantic constituent structure is one in which combinatorial semantics applies irrespective of the context. To clarify the notion of context-independence, consider the following inference of the form P&Q therefore P. Irrespective of what P and Q are we can deduce P. Thus, the semantic constituents are context- irdependent. That is, we are not concerned with the semantic context of P&Q; we are only concerned with the expression's syntactic form or shape. Since P is conjoined with Q by the operator 'and', we can deduce P irrespective of what the semantic context of P is (Fodor and Pylyshyn 1988, p. 124).

To summarize, we saw that the Classical view of human cognition incorporates the CTM and the LOT. And we also saw that the LOT is an interna1 language that is compositional, systematic and context-independent. And since the LOT has these features, and so do natural languages; and since we can learn a natural language only if we have an interna1 language, natural languages presuppose a LOT. Chapter Rsot ~onnectionimand the Syatematicity Challenge

1. Connectionism

A biologically motivated mode1 of cognition alternative to the Classical view has been suggested.

~hismodel iç known as connectionism. The central principles of connectionist models are derived from Our understanding of the brain. So the models are said to be neurally inspired; and this is a fundamentally different approach £rom that of Classicism. Within Classical models, ii program can be designed on a Turing Machine, which is governed by a set of rules that manipulate symbols according to specified rules, such as those of propositional logic (Clark 1996, p. 53). Also, Classical models, which implement Turing Machines, describe the performance of the system, not the way the performance is achieved. By contrast, the connectionist approach starts with a model that incorporates brain-like processing and sees whether behaviour emerges that rnimics hurnan cognition (McLeod, Plunkett & Rolls, 1998, pp. 9-10).

Another difference between the two models is that

Classical models work serially Le., by creating and modifying strings of symbols in a step by step mariner), 35 whereas connectionist rnodels work in parallel in that many units can carry out their computations at the same time (MacDonald 1995, p. 9 & Rumelhart 1989, p.

138).

The brain consists of numerous processing units

(neurons), which are linked, either in parallel or serially, by a mass of wiring and connections. In a typical connectionist network, we specify a set of processing units, which are idealized neurons. The set of processing units consists of at least three layers: input, at least one hidden layer. and output units. The input units receive input from external sources, such as sensory inputs, or from another part of the processing system in which the system is based. The hidden units reside in the middle layer and they neither receive input directly nor produce the systemts response; instead they mediate the processing. And the output units send signals out of the system - they specify the system's response to the information, which can be abstractly represented in

the form of a vector (a magnitude with a certain direction), represented by numerical activation values

(Rumelhart 1989. p. 138 & McLeod, Plunkett & Rolls, 1998. p. 14 & Clark 1996, pp. 54 - 55). 36 The activity level of the units is fundamental to what a system can compute. A stimulus is applied to the system. resulting in the activation of some of its input units. This produces a pattern of activity of the units in the next layer until some of the units in the output layer become activated. Different systems are designed using different assumptions about the activation values that a unit is permitted to have. The activation values rnay be either continuous or discrete. If continuous, then the activation value is represented by a real number. If discrete, the activation value is represented by the binary numbers 1 or O, where '1' means that the unit is active or 'oni and '0' means that the unit is inactive or

'off' (McLeod, Plunkett & Rolls 1998, pp. 14-15 & Rumelhart 1989, pp. 138-139).

A unit is activated, depending on the degree of input it receives, and then passes a signal about its level of activation to its neighbouring units. The signal its neighbouring units receive is determined by the level of activation of the sender unit and the connection between them. Each connection is assigned a weight, which may be either positive (excitatory) or negative (inhibitory)

(Clark 1996, p. 55). So given certain input, the units 37 will be activated, and depending on whether the weights, which connect the units, are positive, negative or zero, the weights will either excite or inhibit connected units. Thus, the system's pattern of connectivity is represented by specifying the weights for each of the connections within the system (Rumelhart 1989, pp. 139-

140).

Connectionists typically assume that each unit contributes to the input of the units to which it is connected. The total input to the unit is the weighted sum of the inputs from the individual units. The processing units interact by transmitting signals to neighbouring units and the strengths of their signals are determined by their degree of activation. The inputs impinging on a particular processing unit are combined with one another and with the current state of the unit according to a rule to yield a new state of activation.

If the activation rule acts as a threshold function, then the net input must exceed some value prior to contributing to a new state of activation (Rumelhart

1989, p. 140).

Also associated with each unit is an output function, which maps the current state of activation to an output 38 signal. In some systems, the output level is equal to the activation level of the unit. But in other systems, the output function acts as a threshold function in that a unit does not affect another unit unless its activation exceeds a certain value. In yet other systems, the output function acts as a stochastic function in that the output of a unit depends on probabilities of activation

(Rumelhart 1989, p. 139). It is this output function weighted by the connection with the neighbourts that a unit contributes to that neighbour's next level of activation.

Connectionist systems can be interpreted semantically by specifying units or sets of units to represent concepts. Consider a network, which consists of units and weights. A representation is local in such a network if its content is associated with the activity of a single unit. For example, a single unit is specified to represent dogs. A representation is distributed in such a network if its content is associated with the simultaneous activity of several units. For example, it specifies that several units together represents dogs.

And the units in a distributed representational network represent feature-like entities (referred to as micro- 39 features - e.g. four-legged, horizontal snout, etc.)'that are distributed over many units. In addition,

the weights encode and store information; and the encoded

information in a distributed connectionist network is

stored holistically (Le., is widely distributed)

thrciughout the network (Clark 1993, p. 18 & Rumelhart

1989, p. 138). What a system knows about dogs is not

stored locally in individual units; rather, it is

irnplicit in what the system can do based on a pattern of

connectivity.

2.2. ~onnectionismand Classicism: Levele of Analysis Connectionist models are similar to Classical models in

that both clah to provide an explanation of

combinatorial syntactic and semantic structure and,

hence, an explanation of compositionality and

systernaticity. However, Paul Srnolensky and Andy Clark

argue that connectionist models are not rnerely

implementations of Classical models. This is due to the

fact that the Classical and connectionist approaches to

cognitive modeling operate at different levels of

analysis. The level of analysis on the Classical approach 40 is symbolic in that cognitive descriptions are built up of entities that are symbols, both in the semantic and syntactic sense of being operated on via symbol manipulation. Smolensky refers to such a level of analysis as the symbolic paradigm. The level of analysis on the connectionist approach, however, is lower than that of the models of Classicism since cognitive descriptions are built up from entities that correspond to conçtituents (Le., pre-meaning cornponents such as letters) in the symbolic paradigm. Smolensky refers to the level of analysis on the connectionist approach as the sub-symbolic paradigm in that units are not symbols; rather. they are the activities of individual processing units in connectionist models (Smolensky 1988. p. 34).

According to Smolensky, connectionist models have two levels of description. The type of dual-level system that

Smolensky has in mind is a distributed connectionist mode1 (one in which content is associated with the simultaneous activity of several units). The basic idea of the dual-level systern is this: (1) semantic interpretation can be assigned to large-scale activity

' The use of micro-features, however, is pmblematic; for the micro-feaures themselves are sem

The syntax or processing algorithm resides at the lower level, while the semantics resideç at the higher- level. Since both syntax and semantics are essential to the cognitive architecture, we have an intrinsically dual-level cognitive architecture (i.e., in this architecture, the same entities do not carry both the

for Smolensky's account of weak compositional structure. slyntax and the semantics as they do in the Classical architecture) (Smolensky 1988. p. 169).

Thus, Smolensky claims that connectionism can provide a viable new account of cognitive architecture. one that is not merely an implementation of Classical architecture.

But does connectionism provide an.account of mental representations, which are compositional, and hence systematic, as it must to explain our natural language behaviour?

2.3. The Systematicity Challenge

Fodor and Pylyshyn daim that the LOT explains systematicity but that connectionism, as a genuinely distinct mode1 of human cognition, cannot. Fodor and

Pylyshyn present a challenge, in the form of a dilemrna, to the connectionists, concerning the notion of systematicity: (1) if mental representations have constituent structure in connectionist models, then connectionism does not provide a novel solution to the problem of systematicity. Simply, connectionism would merely be an implementation of the LOT; and (2) if mental representations do not have constituent structure in connectionist models, then systematicity is simply a mystery. Smolensky accepts Fodor and Pylyshyn's challenge, albeit indirectly and in an interesting fashion.

Srnolensky claims that the dichotomy presented by Fodor and Pylyshyn is false. He rejects it and opts for a third alternative: connectionism does indeed provide structured mental representations and structure sensitive processes that are genuinely non-classical (Smolensky 1991, p.

165). The proper understanding of connectionism is of an architecture that provides an explanation of systematicity by appealing to a non-Classical notion of constituent (constituents in connectionist architecture are sub-symbolic). Since the challenge presented by Fodor and Pylyshyn is directed toward Smolensky, 1 will be following Smolensky's line of argument throughout.

Connectionist systems. as explained by Smolens.ky, are not an implementation of the LOT, but a refinement of it.

'These connectioniçt models hypothesize a truly different cognitive architecture to which the classical architecture is a scientifically important approximation' (Smolensky 1991, pp. 166-167).

~t may appear that the difference between implementation and refinement is of no philosophical significance. According to Smolensky, given two accounts of a cornputational system, one higher-level and the other lower-level, '... the lower one is an implementation of the higher one if and only if the higher description is a complete, precise, algorithmic account of the behaviour of that system' (Smolensky 1991, p. 167 emphasis in original).

Smolensky argues that if we follow this definition of implementation, connectionist models are not implementations of Classical models.

On the Classical account, the entities that are semantically evaluable are the same entities that are operated on at the algorithmic level. In order for connectionist models to implement Classical models, both rnust woxk with the same notion about how semantic content is represented and processed. However, they do not. The type of connectionism that Smolensky is advancing lacks the paramount feature of Fodor and Pylyshyn's Classical architecture: in connectionist architecture, mental representations and mental processes are not supported by the same entities -- there are no symbols that do both. This is due to the fact that connectionist architecture is intrinsically dual-level: the lower level formal algorithrns of processing are on units (constituents) that are not sernantically evaluable, whereas, at the higher- 45 level of semantic interpretation, there are constituents, but formal algorithms cannot be defined on them. Thus, connectionist models do not work with

Classical constituents - constituents that are semantically evaluable and play a causal role in the behaviour of the models. Thus, Smolensky claims that connectionist models cannot implement Classical models

(MacDonald 1995, p. 11).

According to Fodor and Pylyçhyn, any adequate explanation of systematicity requires context-independent constituent, for reasons presented in Chapter One. So the question is this. Are the constituents in connectionist models context-independent? Smolensky attempts to answer this question by distinguishing between weak and strong compositional structure.

2.4. Weak Composf tional Structure - Firat Approxfma tioa Smolensky offers an alternative to the Classical account of systernaticity, one that corresponds to ways in which complex mental representations can be distributed.

The first kind of connectionist distributed system yields complex mental representations in a weak compositional structure, 46 Consider a distributed representation of a cup with coffee and subtract £rom it a distributed representation of a cup without coffee:' what remains is the connectionist representation of coffee (Smolensky 1991, p. 172). In order to produce these representations,

Smolensky uses a set of micro-features, such as 'upright container', 'hot liquidt, 'glass contacting woodO, 'burnt odourt, 'brown liquid contacting porcelaintt and 'brown liquid with curved sides and bottomt. When we have a distributed representation of 'cup with coffeet, the active units are the ones which correspond to the micro- features (which are semantically evaluable) and part of the description of a cup with coffee. Given the representation of 'cup with coffeettwe then subtract the representation 'cup without coffeer. Having done this, the active units of the network now correspond to the micro-feature 'upright containert.The representationts

'cup without coffeettusing the same set of micro- features. yields the following: 'upright containert.

Now, if we take the representation 'cup with coffee' and subtract £rom it the repreçentation 'cup without coffee', the connectionist representation ~f coffee is 'hot liquidttburnt odourt, and 'brown liquid with curved sides and bottom' but only in the context provided by cup (Srnolensky 1991, pp. 172-173). When we combine the representations of 'cup without coffee and coffeef, we have, in effect, constructed the representation 'cup with coffee' £rom a representation of 'cupl and a

representation of koffee', but this is, to Say the

least, an odd combination. Smolensky claims that

compositional structure is present, but only approximately (Smolensky 1991, pp. 173-174).

Connectionist networks using distributed representations give mental representations compositional

structure, albeit in a 'weak sense', but they are not an

implementation of the LOT. That is because, inter dia,

the representations in connectionlst networks are

context-dependent. And this indicates that the

connectionist constituency relation is not an

implementation of the constituency relation in the LOT

(Smolensky 1991, pp. 175-176).

Fodor and McLaughlin and Fodor and Pylyshyn have

their suspicions about Srnolensky's coffee story. In

particular, they have their suspicions about whether the

coffee story exhibits real constituency (i.e.,

constituency that would provide an explanation of 48 cornpositionality and hence systematicity). if

Smolensky's account does not, then it can neither provide

an adequate explanation of compositionality, nor of

systematicity.

As Fodor and McLaughlin claim,

"Smolensky only avoids one horn of the dilemma - his architecture is genuinely non-classical since the representations he postulates are not 'distributed over' constituents in the sense that Classical representations are; and we shall see that for that very reason Smolensky's architecture leaves systernaticity unexplained" (Fodor and McLaughlin 1990, p. 200).

Fodor and McLaughlin object to the coffee story; for

they claim that it presuppoçes that 'cup without coffee*

is a constituent of 'cup with coffee'. Smolensky claims

that the pattern or vector representing cup with coffee

is composed of a vector that can be identified as a

distributed representation of cup without coffee together with a vector that can be identified as a particular

distributed representation of coffee. Fodor and

McLaughlin claim that this must be wrong. Combining a

representation with the content 'cup without coffee' with

a representation of the content 'coffee' does not yield a

representation with the content 'cup with coffee'. What

it yields is a representation with the self-contradictory content 'cup without coffee with coffee'. Fodor and McLaughlin further daim that Smolensky's subtraction procedure confuses the representation of 'cup without coffee' with the representation of 'cup' without the representation of 'coffee'. CUP WITHOUT COFFEE expresses the content 'cup without coffee' and CUP to produce

COFFEE, but nothing does both (Fodor and McLaughlin 1990, pp. 204-205). In addition, as Fodor and McLaughlin comment, Smolenskylsmental representations are context- dependent and thus are non-compositional. 'We are given no clue at al1 about what sorts of relations between the semantic properties of complex symbols and the semantic properties of their constituents his theory acknowledgesl

(Fodor and ~cLaughlin1990, p. 206).

Pylyshyn agrees. The micro-features in connectionist models are supposed to show how concepts become encoded.

But what about the commonsense concept CUP? According to

Fodor and Pylyshyn, the micro-feature that corresponds to the concept CUP must be context-dependent. The generalizations that concept theories formulate are only approxirnately true. Fodor and Pylyshyn clah that the notion of commonsense concepts as being represented by

sets of micro-features is confused with the issue about 50 mental representations and combinatorial structure. They

daim that the source of this confusion is that sets of micro-features can overlap. For example, if the micro-

feature that corresponds to 'has-a-handle' is part of the matrix of units to which the concept CUP is distributed,

then we may take this as representing 'has-a-handle' as a

constituent of the concept CUP. If we take this to be the

case, then connectionism has an account of constituency.

However, Fodor and Pylyshyn dismiss this: even if we

accept that concepts are distributed over micro-features,

%as-a handlet is not a constituent of CUP. At least not

in the sense that Fodor and Pylyshyn demand, viz . , not in the sense that 'Bill' is a constituent of 'Bill loves the

horse' (Fodor and Pylyshyn 1988, p. 105) . Smolensky

appears to agree with this in that he thinks this is the

wrong kind of constituency to account for systematicity.

As Smolensky claims, 'a true constituent can move around

and fil1 any of number of different roles in different

structures' (Smolensky 1991, p. 174). But the connection between constituency and systematicity depends on this

relation - a relation that Smolensky, with his account of

weak compositional structure, cannot provide an adequate explanation of because systematicity demands context- independent constituents (Fodor and McLaughlin 1990, p.

207). But then Smolensky's account of weak compositional structure is rather puzzling, to Say the least. For how

is it that mental representations, on Smolensky's account, can have weak compositional structure and are

context-dependent, and yet still provide an explanation

of systematicity? Smolensky does not provi.de with us an

answer to this query.

Fodor and Pylyshyn claim that connectionists,

especially Smolensky, rnisuse the notion of constituency to refer to a semantic relation between predicates. The

idea is this. Predicates, such as CUP, are defined via

sets of micro-features, such as 'has-a-handlet.But it

turns out then that it is a semantic truth that CWP

applies to a subset of what 'has-a-handlef applies to.

However, according to Fodor and Pylyshyn, the predicates

are in a 'part-to-whole' relation (Le., there is no syntactic constituent relation between predicates) . That is, the expression 'has-a-handle' is not part of the

expression CUP. Real constituency, according to Fodor and

Pylyshyn, has to do with the 'part-to-whole' relation.

For example, the symbol 'Bill' is part of the symbol 52 'Bill loves the horse'. And it is because these symbols have real constituency relations that natural languages have atomic and complex symbols. Fodor and Pylyshynls take on Smolensky is that compositionality is not generally a feature of connectionist representations.

~onnectionismcannot provide an account of compositionality because mental representations in connectionist systems lack combinatorial structure (Fodor and Pylyshyn 1988, pp. 105 & 111).

Thus, Smolensky's coffee story fails to provide an adequate account of compositionality, since connectionist constituents are context-dependent. And since it fails to provide an account of compositionality, it thereby fails to provide an adequate account of systematicity.

2. 5. Strong Composi tf onal Structure - Second

~pproxhation

As Fodor and McLaughlin rightly point out, Smolensky, in essence, abandons his notion of weak compositional structure in favour of his notion of strong compositional structure; and his account of systematicity hinges on, is dependent upon, the latter (Fodor and McLaughlin 1990, p.

208). According to Smolensky, although the connectionist- distributed representations of 'cup with coffee' have 53 constituent vectors representing 'cup' and 'coffee', it may be argued that such a representation is too weak for al1 constituent structure pertaining to the representation of 'cup'. It is too weak to rnaintain a formal inference since the vector representing cup cannot fil1 multiple structural roles. It thus appears that

Smolensky concedes that, simply put, weak compositional structure is too weak to provide an adequate explanation of compositionality. Smolensky's explanation of strong compositional structure, then, diverges, in the following respects, from his explanation of weak cornpositional structure. (1) Smolensky does not address the notion of micro-features in his explanation of strong compositional structure; (2) Smolensky introduces anoiher vector operation, that of multiplication (referred to as tensor product) to the two previously mentioned vector operations of addition and subtraction, which were employed in the coffee story; and, most importantly, (3) smolensky's notion of strong compositional structure does not appeal to the notion that mental representations are context-dependent (Fodor and McLaughlin 1990, p. 210).

The second kind of connectionist distributed system that Smolensky articulates yields complex mental 54 representations with strong compositional structure.

Smolensky's explanation of strong compositional structure

incorporates tensor product representations and superposition representations (vector addition).

Consider, then, how a connectionist machine may represent four-letter ~nglishwords. English words can be decomposed into roles (Le., ordinal positions that letters occupy), which is an abstract way of talking about a particular processing unit. And what fills these roles are letters. in addition, the machine may have activity vectors over units that represent the relevant roles (Le., over the role units) and activity vectors over units that represent the fillers e.,that range over the filler units). And the machine rnay have activity vectors over units that represent filled roles (Le., letters in letter positions), and these are the binding units. That is, the particular units that are role units, which are ordinal positions that letters can occupy, are occupied by letters (Le., are bound to the unit). When a particular role unit is occupied by a letter (Le. filled). the role and filler units are bound. The point

then is this. The activity vectors over the binding units may be tensor products of activity vectors over the role 55 of units and the filler units. The representation of a word would then be the mapping of a vector over the binding units, {i.e., a vector that is arrived at via mapping the tensor product vectors) (Fodor and McLaughlin 1990, p. 211) . The two operations used to derive complex vectors from component vectors are tensor product vectors (Le., multiplication vectors) and mapping vectors (Le.,vector addition). And these are iterative operations in that the activity vectors that result £rom the multiplication of role and filler vectorç might themselves represent the fiilers of roles in more complex structures. Hence, a tensor product that represents the word 'Billt as 'B' in the first position, 'i' in the second position, '1' in the third, and '1' in the fourth may itself be bound to the representation of a syntactical function to indicate, for example, that 'Bill' plays the role subject-of in

'Bill loves the horse' (Fodor and McLaughlin 1990, p.

211). Such tensor product representations could themselves be mapped over another group of binding units to yield a mapping vector that represents, Say, the bracketing tree (~ill)(loves) (the horse)) (Fodor and

~cLaughlin1990, p. 211). Smolensky claims that an important conclusion concerning constituency is that the Classical and connectionist approaches differ not in whether they accept that mental representations have constituent structure and that mental processes are sensitive to this structure, but in how they formally instantiate them

(Smolensky 1991, pp. 179-180). '.., distributed representations provide a description of mental states with semantically interpretable constituents, but there is no complete, precise formal account of the construction of composites or of mental processes in general that can be stated solely in terms of context-independent semantically interpretable constituents. On this account, there is a language of thought - but only approximately; the language of thought, by itself, does not provide a basis for an exact formal account of mental structure or processes - it cannot, by itself, support a precise forma1 account of the cognitive architecture' (Smolensky 1991, pp. 184-185). So, connectionism offers an alternative account of systematicity by appealing to the systematic constituent structure of the representation vectors and the connectivity patterns that manipulate them. However, the mechanisms responsible for systematicity (unlike in the

LOT), do not operate via rules formally expressible at the level of the constituents. And this is because mental processes, which are causal, occur at the lower level while mental representations, which are semantically W O U

4O U U rd

Q, U 2 aal rd aE: aQI 4 3 O ka O U

II) i-i rl rd U-l

O O 58 vector that represents the first letter position. This

is also the case with the tensor product that represents

'Bt in the first letter position. So the only pattern of

activity that is actually tokened in the machine is the

superposition vector, which represents the English word

'Bill' (Fodor and McLaughlin 1990, p. 212).

When a tensor product or superposition vector is

tokened, its components (Le., constituents) are not.

This differs £rom the Classical account. For when a

complex Classical symbol is tokened, its constituents are

also tokened. The importance of the difference between

these two accounts is this. Concerning mental processes,

the Classical constituents of a complex symbol contribute

to the causal consequences of its tokenings. Concerning

the mental processes on the connectionist account, the

components (i.e., constituents) of tensor product and

superposition vectors play no role with regard to mental processes (Fodor and McLaughlin 1990, p. 212).

As Fodor and McLaughlin conclude,

"The constituents of complex activity vectors typically arentt 'there'; so if the causal consequences of tokening a complex vector are sensitive to its constituents, that's a miraclen (Fodor and McLaughlin 1990, p. 215). The upshot of Fodor's and McLaughlinls reasoning is that while we can imagine components, the system cannot use them, so they cannot explain its operations. For example, the number '8' can be represented as 5+3, but that does not allow for causality if the representation is not explicitly tokened. Thus, there is no causality between the semantic and syntactic levels in connectionist models. And since connectionist models lack this causality, they are not similar enough tt the Classical notion of systematicity (which, the Classicists daim, is the only notion), to explain systematicity.

That is, representations in highly distributed connectionist models make it difficult to interpret individual inputs- we cannot Say what they encode. And since the formal algorithms reside at the lower-level and semantic interpretation resides at the higher-level, connectionists cannot provide an account of causality.

However, connectionism rnay be lent a helping hand; there rnay be a way in which we can indicate that connectionism explains systematicity in the manner that the Classicists desire. Does this sound too good to be true?

2.6. Higher Level Description8 and Post -hoc Analysis Andy Clark argues that distributed connectionist models are actually more structured than Fodor, Pylyshyn, and McLaughlin think and, hence, can provide an explanation of systematicity - the type of explanation that they desire. It should be noted that Clark's line of argument was a reply to a paper produced by William Ramsey, Stephen Stich, and Joseph Garon entitled

'Connectionism, Eliminativism, and the Future of Folk

Psychology'. 1 daim that Clark's line of argument holds equally well with regard to Fodor, Pylyshyn, and

~cLaughlin.The problem with Fodor, Pylyshyn, and

McLaughlinls criticisms of connectionism is that they focus their attention on the units-and-weights description, and take this as the sole causal description of a network. They take it that since representations in distributed connectionist models are widely distributed throughout, it makes no sense to Say that a particular component, for example 'B' from the expression 'Bill loves the horse', caused the system to yield an output 'Bill loves the horse'; the causal role of any one expression cannot be determined. In essence the units and weights in a distributed connectionist mode1 are nothing more than a chaotically disjunctive set. 61 But, according to Clark, if we focus on a higher- level description of connectionist models, as opposed to a unitç and weights description, then the Classicistst arguments airned at connectionism, in particular, that connectionism fails to provide an account of systernaticity, are undermined. What is of importance is that Clark's higher-level description is a causal higher- level. This was the not case with Smolensky's higher- level description.

Clark, in effect, wants us to shift our attention. We should not focus on the activity of the weights in connectionist models, but, rather, on what they geared to do. They are geared to generate a pattern of hidden unit activity, which then causes5 an output. What we necd to consider is the kind of output we expect a real-encoding system to drive. Such a system must drive a large and subtle set of behaviours (e.g., verbal behaviours) (Clark

1989/90, p. 348).

Clark's method of explaining how it is possible for connectionist models to exhibit systematicity is via a connectionist model, which 1s referred to as NETtalk.

The higher-levels are causal in the same way as the predicates in the special sciences are causal. Such a network was set-up to accomplish the task of pronouncing ~nglishtext by turning input (words) into phonetic speech (output). Unlike Classical computational models (e.g., Turing Machines), NETtalk was not provided with a set of rules for accomplishing the task. The output units were connected to a speech synthesizer so you could hear the system learn how to talk. At the outset, NETtalk understood nothing - it did not know the meaning of words and it could not use a language to achieve any real world goals (Clark 1996, pp. 54-55).

The network consisted of seven groups of input units in which each group consisted of twenty-nine units, and each group of twenty-nine represented one letter, and the input consisted of seven letters - one of which (the fourth), was the target whose phonemic contribution (in the context provided by the other six) was to be determined. The inputs connected to a layer of eighty hidden units, which in turn connected to twenty-six output units, which coded for phonemes. In total, the network consisted of 18,829 weighted connections (Clark

1996, p. 56).

-- In essence. Clark's higher-level descriptions are instances of the way in which Fodor imagined the causal wlritions between the special sciences and other sciences to be. 63 But how does the system learn? One method is via a backpropagation algorithm. It learns by adjusting the between-unit weights according to a systematic procedure. The system is set up with a series of random weights - since the weights are random, the network will not be able to accornplish its task at the outset. The network is then trained. It is given a set of inputs; and for each input it will produce sorne output - almost always incorrect. But, for each input, a supemisory system sees an associated correct output. We can think of the network that is learning as a student and think of the supervisory system as the teacher who knows the answers in advance. The teacher then compares the actual output with the correct output. For example, the network could take as input 'Dogs have furt and be required to output

'Dogs have furf.When the system has a random weight assignment, it will perform poorly - it will yield an incorrect output. However, once the output is specified, the teacher will compare the actual and desired outputs for each output unit and calculate the error on each.

The system then focuses on one weighted connection and asks if a slight increase or decrease in the weights 64 would reduce the error. In effect, the system perfonns a trial and error procedure. This procedure is repeated for each weight until a low error level is obtained. Once this has been completed, the network will perform well

Le., it will be able to correctly pronounce 'Dogs have fur'). The network then has learned, via backpropagation, to accomplish the task (Clark 1996, p. 56).

What is truly interesting about NETtalk is that learning is not a mere parrot-fashion recall of the training data. The network learns about general features of the relation between text and spoken English. After training, the network could successfully deal with words it had never encountered before - words that were not in its initial training set (Clark 1996, p. 57).

According to Clark, since connectionist models rely on training via examples, we often have an up and running system while having no clear idea how the system is doing the job; so connectionists engage in post hoc analysis of their systems. One type of post hoc analysis is cluster analysis. Simply put, we give the network many inputs and record the hidden unit activations and output caused by each. We then gather al1 the inputs, which yielded a given output phoneme, and find an average rnediating hidden unit activation vector. We then repeat this process for each phoneme. Then we paix the most similar hidden unit activation vectors. We then find an average for each pair and repeat the process. The result is a tree structure that displays the way in which the network has learnt to structure the space of the weights so as to accomplish its task of pronouncing English text (Clark

1989/90, p. 346) .

~y performing a cluster analysis on NETtalk we can determine which inputs and which groups of inputs are treated most similarly to other inputs and groups of inputs. What we find is that, at the bottom level of the tree, the network grouped together items such as 'p' and

'br inputs; and at the top of the tree, the network divided into two sectors - one corresponding to vowels and the other to consonants (Clark 1989/90, pp. 346-347).

NOW let us consider various different versions of

NETtalk. Since each network is initially given a random set of weights and then trained up via a backpropagation algorithm, the different networks will yield different descriptions at the units-and-weights level. However, and rnost importantly, performing a post hoc cluster analysis

on the various versions of NETtalk yielded pretty much 66 the same clustering profile. Thus, it was possible to discover a scientifically respectable higher-level description, one that unified what had seemed, at the units-and-weights level, to be a chaotic diçjunction of networks. This is the crucial result. If the 'pretty much' amounts to causality, then this would provide connectionism with an account of systematicity, the kind of systematicity that Fodor, Pylyshyn and McLaughlin require. As Clark claims,

'The moral is that there may be higher-level descriptions which are both scientifically well grounded and which capture cornmonalties between networks which are invisible at the units and weights lével of analysisf (Clark 1989/90, p. 347 - emphasis added) . That is, a psychological natural kind that would be appropriate in the discourse of psychology. For example, higher-level descriptions are semantically interpretable

(i.e.. folk psychology condones generalizations based on the semantic properties of beliefs - e.g., we al1 believe that dogs have fur). According ta Clark, there is a psychological natural kind since cluster analysis shows that it is possible to reveal that a whole set of networks fa11 into an equivalence class (e-g., the various versions of NETtalk had pretty much the same 67 clustering profile). Clark claims that we can assign al1 instances of NETtalk to a psychological kind, even though they look different at the units-and- weights level.

Thus, there is causality, in an appropriate way. between syntactic and mental processes, which would satisfy

Fodor's intuitions concerning the special sciences. To really get a taste of this, consider the following.

Suppose that we have a network, which succwnbs to cluster analysis, whose labels involve sernantic entities. We can then untangle the widely distributed storage of encoded information via a higher-level description. Further, suppose that the network is given certain input, and then goes into a hidden unit activation state, which falls within the cluster we found reason to label 'Bill loves the horse'. According to Clark, we are warranted in asserting that the network gave a certain output because the network yields semantically structured clusters

(i.e., the psychological kinds are causally efficacious

- mental processes are implementations of the higher psychological level). And this would imply that the network has combinatorial syntax and semantics. And if this is the case, then the network can also easily yield the output 'The horse loves Bill'. If the network can 68 yield the output 'The horse loves Billf and 'Bill loves

the horse', then it behaves systematically. so its architecture provides an account of systematicity. Thus,

if we only look at the units and weights level of description, systematicity appears to be a mystery. But at the higher-level of description, systematicity could be explained.

But the question that needs to be raised is this. If we assume causal context-independent symbols, are

connectionist models, such as NETtalk, implementations of

the LOT? As Clark rightly points out,

'... the availability of a higher level of description does not directly imply that a network is a mere implementation of a classical symbolic system. For the fact of distributed, sub-symbolic encoding can still bite in so far as a system retrieves data, generalizes, interpolates and produces some error patterns in ways which are explicable only by appeal to its distinctively connectionist style of encoding and retrieval. In short, a network may have a cluster analysis which merits symbolic labels without itself being a processor of symbols in the Classical sense' (Clark 1989/90, p. 347 - emphasis in original).

It thus appears that the question remains open-ended. Chapter *ers Connectionism v. The Language of Thought Bypothesio

3.1. Problems on Fodor8s LOT? Same Thoughts

To briefly recapitulate, the systematicity dilema for connectionism is that (1) if mental xepresentations have constituent structure in connectionist models, then connectionism does not provide a novel solution to the problem of systematicity. Simply, connectionisrn would merely be an irnplementation of the LOT; and (2) if mental representations do not have constituent structure in connectionist models, then systematicity is simply a mystery. And we saw that Smolensky~saccount of connectionism failed to provide an adequate account of systematicity. However, Clark's account of connectionism could provide an account of systematicity. This was achieved by performing a cluster analysis on higher-level descriptions, within a connectionist system, such as

NETtalk. Thus, Clark may have a non-Classical solution to the systematicity problem, but this was left open-ended,

But even if Clark's solution to the systematicity problem is Classical and hence an implementation of the LOT, 1 claim that it is not merely implementation; for 1 will 70 argue that connectionism can explain other important

features of human cognition, whereas the LOT cannot. 1 will thus argue that connectionism is a better mode1 of

human cognition since it is more robust and more biologically plausible.

~ccordingto Fodor, two ideas that are of importance

to the LOT are (1) structure-sensitive operations and (2)

context-independence. These two ideas fit together.

Recall the first idea of structure sensitivity, which

is a key feature of the Classical view. in formal

languages, such as propositional or truth-functional

logic, the syntactic properties of synbols correspond to certain of their semantic properties. That is, the

operations on mental representations are causally

sensitive to the syntactic i.e , formal) structure of

representations as defined by combinatorial syntax. This is because operations are defined via the syntactic

properties of the mental representations; and the causal

relations that hold between mental processes preserve

semantic relations between mental representations.

Recall also the second idea of context-independence,

which is also a key feature of the Classical view. In

order for two thoughts to be semantically related, 7 1 similar constituent structure must be discerned in the related thoughts and the syntactic constituents must make the sme semantic contribution in the two thoughts. Sirnilar constituent structure accountç for semantic relations if the constituents are context-independent. By incorporating the notion of the structure sensitivity of mental processes, the semantic content of an expression is preserved.

As we have seen, these two ideas are central for the Clasçical/LOT explanation of the principles of compositionality and hence systernaticity. For example, being able to understand 'Bill went to the store and Mary went to store* nomologically necessitates being able to understand 'Mary went to the store', Bill went to the store' and 'Mary went to the store and Bill went to the storet.We can see that this is the case because thoughts are both systematic and compositional. That is, if we understand the parts of a sentence and how they are arranged, then we must understand what the sentence means. 3.2. How Systematfc fs Natural Language? Now consider the following: being able to understand

'~illloves the horse' necessitates being able to understand 'the horse loves Bill'. Having the thought that 'Bill loves the horset involves the use of the mental sentence 'Bill loves the horse', This mental sentence is a complex representation, itself containing sirnpler representations (Le., 'Billt 'loves' and 'the horse'). In order for 'Bill loves the horoet to be true. Bill must bear to the horse the sarne relation (Le., loves) that the truth of 'the horse loves Bill' requires the horse to bear to Bill (Fodor and Pylyshyn 1988, p.

124). To put it another way, it is impossible to have an interpretation of 'loves' for 'Bill loves the horse1 without an interpretation of 'loves' for 'the horse loves

~ill'. If we understand the one sentence, we must be able to understand the other. To put it yet another way. if we understand the parts that comprise the sentence, we have to understand al1 the various arrangements of those parts. That is, if we re-arrange the parts to form another sentence, we have to understand that new sentence. 73 But is this always the case? 1 claim that it is not.

1 argue that if natural languages are not as systematic as we first thought, and if natural languages presuppose the LOT, then the LOT is not as systematic as we first thought. The predicate 'lovest,and others may be more context-dependent, and, hence, mental processes are not just structure sensitive. That is, some thoughts are context sensitive. For example, the relation that Bill bears to the horse (Le., the sanie interpretation6 of

'loves') may not be the same relation that the horse bears to Bill (Le., the same interpretation of 'loves').

It is not necessarily the case that {horse, Bill} is in that ordered pair. Thus' it may be the case that the relations are not the same. That is to Say, it may be the case that the interpretation of 'loves' is not the same with both Bill and the horse as subject.

Moreover, it is natural to suppose that we are able to have a thought about one thing, but that re-arranging the constituents of the thought, even within the same structure, is not clearly coherent. For example, we can have the thought that 'John hates buttert but what does 74 'Butter hates John' mean? To put it another way, çince

Our rninds are systematic, being able to understand the thought 'John hates butter', according to Fodor, necessitates being able to understand the thought 'Butter hates Johnt.But do we? Well, not really - at least it is not entirely obvious what that means. For example. what is it to Say that an inanimate object 'loves1 something? in this case it is obvious that the interpretation of

'hates' is not the same for both John and butter. Thus. the notion of context-independence is not as straightforward as Fodor daims it to be. This is something that the Classical view needs to explain.

So the problem is that there are cases in which certain thoughts are context-dependent. But how can this be if thoughts are systematic? According to the Classical view, it shouldnlt be the case. 1 daim that the appropriate explanation is that. perhaps, systematicity is context-dependent', but that there are also cases in which systernaticity is context-independent. That is, in some cases, the content of symbols does depend on context, but Fodor's account does not allow for this. It

' A fint-order logistic system's interprehtion of predicates is an extensional set of ordered pain. With this in mind. the predicate 'loves', which includes the ordered pair (Bill. hone 1, does not neceswily is not always the case that semantically related thoughts are context-independent and that the structure- sensitivity of mental processes preserves the truth of such thoughts. We do have systematic minds, but our systematic rninds deal with both context-independent and context-dependent thoughts. However, Fodorls LOT can only provide an explanation of context-independent thoughts and the structure sensitivity of mental processes. This idea will become clear from considerations below.

8y contrast, connectionism has access to both context-dependent and context-independent thoughts. That is, connectionist models, at different levels of analysis, can handle both. At the units-and-weights level, connectionist models can handle context-dependent thoughts, as in the case of Smolensky's coffee story.

(Smolensky's account of weak compositional structure might work much better for the 'loves' story). And at a higher level of analysis, comectionist models, taken by performing a post hoc cluster analysis on a network, such as NETtalk, can handle context-independent thoughts and hence display structure sensitive mental processes. Both

- - include the ordered pair { horse, Bill).

77 which the part constitutes the retrieval cue, and the filling in is a kind of -reconstruction process

(Rumelhart 1989, p. 148). And this is because connectionist systems are dispositional in that there is no need for the information to be explicitly tokened, but the systems' functions are still based on content. To put it another way, systems perfom on content, and the systems are disposed to do so.

To illustrate how content addressable memory works, consider the following 'Newfie' joke.

"A man went to visit his friend the Newfie and found him with both ears bandaged. 'What happened?' asked the man, and the Newfie replied, '1 was ironing my shirt when the telephone rang.' - 'That explains one ear, but what about the other?' - 'Well, 1 had to cal1 a doctor! ' " (Dennett 1987, p. 76) .

A connectionist syçtem can mode1 a response to the joke trivially because it is dispositional. Since information is widely distributed in connectionist systems, not every sentence involved in the 'Newfiet joke needs to be explicitly tokened. Simply, in a connectionist system, the relevant nodes get activated by relevant circumstances and this is because connectionist systems are dispositionally and contextually structured, given that content- addressable memory is context- sensitive. Classicists would presumably take this context-sensitivity to b2 a flaw, but it is not - the reason connectionist systems can understand jokes quickly is precisely that the content-addressable memory is context-sensitive. As we will see, the fact that

Classical systems cannot account for this threatens any daim that Classical systems are the least bit biologically plausible.

It is unclear as to how the Classical mode1 would handle the 'Newfie' joke since tokening in the LOT is explicit. How do we understand the 'Newfie' joke in the

LOT? Either we get the joke, or we don't. If we do, we get it almost instantly. But this poses a serious problem for the LOT. For one, we cannot predetermine what the relevant information is. Why is this the case? Well, since the relevant information is open-ended, a vast number of sentences are required to be tokened. That is, we have to first token al1 the sentences involved in the joke and then draw from them. But the initial set of possible inferences we could make leads to potentially more inferences. In essence, there will be a proliferation of inferences. And this would imply that we would have to draw rather complicated inferences. If this 79 is how the mind works. then, according to the LOT, we sirnply dontt get jokes. It simply takes too long in the LOT architecture. The only recourse for those who espouse the LOT is to daim that we do not get jokes or that we do not get them immediately but much later. But that is simply false.

Whatfs worse, the Classical architecture is biologically implausible. To explain, connectionists enjoy pointing out that human neurons are fairly slow.

For example, a typical neuron-firing takes approximately one millisecond. But human brains, even though they consist of neurons, perform tasks quickly. For example, to understand an utterance takes approximately one hundred milliseconds. Thus. there can only be approximately one hundred serial neurons firings involved in understanding an utterance. Now let's suppose that the execution of one step in a requires at least one serial neuron firing and 100 neuron firings to perform al1 the steps. If this is correct, then, at most,

100 program steps can be executed in the allotted time

(one hundred milliseconds) to understand an utterance

(Stainton 1996, p. 121). Given al1 of the above, it is unlikely that a Classical mode1 would be able to understand a single utterance, given the limit constraint. Thus, Classicism is not biologically plausible because we understand utterances and jokes alike. almost instantly. Thus, the fact that we cannot get jokes in the LOT architecture but that we can in the connectionist architecture also indicates that connectionism offers a more viable and biologically plausible theory of human cognition. And this is because connectionist systems have at their disposal content- addressable memory, which is an important feature of human cognition that is explained at the units-and- weights level. Classicism lacks such a feature.

To swn the discussion thus fart even if we do not have a non-Classical account of systematicity, it rnay be the case that the systematicity problem is not what we thought it was, vit., it may be the case that systematicity is both context-dependent and context- independent. And if this is the case, then it simply does not matter if connectionism does not have a non-Classical account of systematicity; for it can explain both context-dependent and context-independent structure, whereas the LOT can only provide an account of the 81 latter. Thus, on this point alone. connectionism offers a more viable theory of hurnan cognition. But even if connectionism is an implementation of the

LOT, it is only an implementation at one level of analysis (i.e., Clark's higher-level cluster analysis) . However, connectionism, at the units and weights level, offers an explanation of content addressable memory, due to the dispositional nature of the systems. It is natural to suppose that different features of human cognition are explained at different levals of analysis. Thus. connectionism really is different from the LOT. And connectionism offers a more viable and biologically plausible theory of human cognition.

3.4. How ~ogicalf s Huama Cognd tdon? Recall that it is unlikely that Classical models can understand jokes almost instantly in the limit constraint because of the vast number of complicated inferences that need to be drawn. Thus, Classical models are not biologically plausible.

Now connectionist systems have some apparent faults.

For example, connectionist systems have difficulty handling sequential problem solving. such as the problems involved in logic and planning. Simply put. connectionist 82 systems are not good at handling logical inferences. By contrast, Classical systems demand it - they are excellent at perfoming highly sequential problem solving tasks. But is this a merit for Claçsicism? Prima facie, we would Say that it is. However, if we are striving to develop an adequate theory of human cognition, we want a theory that is able to illuminate Our strengths and weaknesses. As Clark cornrnents, 'We are generally better at Frisbee than at logict (Clark 1996, p. 60 ernphasis in original). Clark is on the right track.

The manner in which Fodor sets up Classicism is that it incorporates the CTM and the LOT. This would imply that our minds are syntactic inference engines. But this view is not accurate. We generally find it difficult to understand logic and hence generally find it difficult to perform highly sequential problem-solving tasks. If our minds are structured in such a way that thinking is a process of making complex inferences, then we should have no difficulty in making such inferences. But if this were the case, then why do, for example, so many students who take courses in logic perform so poorly? It is by no means obvious that our minds are inference-generating machines. And this is psychologically suggestive. That we 83 are not, in general, good at logic, is a merit for connectionist systems. Co~ectionistsystems are not well suited, just as we are not, at performing complex logical inferences. Moreover, it would be rather surprising if we were excellent at performing logical inferences if systematicity is both context-dependent and context- independent. This is yet another aspect of han cognition that is adequately explained by connectionism.

3.4 Leaky Minds According to Clark, we have leaky minds in which the mind extends over the brain and the world. The mind is an associative engine that interacts with the world - the environment. In essence, the mind requires external support. Clark argues that artificial neural networks, that is, connectionist networks, complement the idea that we have leaky minds. By contrast, traditional classical models, which strictly work with 'rule and-symbolt manipulation, do not allow for the mind to overlap with its environment. But this is odd, for if inferential structures are valid and if they do not involve the use of the external world, how can inferences be truth- preserving? Simply put, Classical models treat human cognition, in particular, human reasoning as something 84 that is disconnected from the world. In effect, Clark has introduced a more abstract level of analysis - a mind-world level. Human cognition does not only involve the manipulation of symbols in the head. Clark argues that connectionist research has broadened our horizons with regard to the ways in which a physical system. such as the brain, rnight encode and exploit information and knowledge (Clark 1996. p. 58).

For example, connectionist networks similar to NETtalk have been applied to the areas of recognizing handwritten zip codes. face- recognition. robotic control, and forward-looking learning.

As Clark rightly point out, the ability of connectionist research '... to illuminate biological cognition depends not just on using a processing style that is at least roughly reminiscent of real neural systems but also on deploying such resources in a biologically realistic manner. Highly artificial choices of input and output representations and poor choices of problem domains have. 1 believe, robbed the neural network revolution of some of its initial rnomentum. This worry relates directly to the emerging emphasis on real-world action and thus merits some expansion' (Clark 1996, p. 58) . Clark's worry is that much of the research in the field of connectionist networks stemmed from the

Classical treatment of problems. Many connectionist networks dealt only with small portions of human cognition. Even worse, the small portions that were

researched involved highly artificial representations,

such as block balancing programs (a network is trained to perform the task of balancing blocks). Such programs did not involve the use of real motor actions involving

robotic arms - it was simply the activity of two output units interpreted so that the equal activity of both

outputs indicated a state of balance, whereas excessive

activity on either output unit indicated that the beam would lean in one direction, depending on which unit was

overfy excited. And if networks are set up artificially,

then it is fair to Say the solution provided by such networks are both artificial and unrealistic. Clark

suggests that a better, more realistic strategy would be

to set up a network so that the inputs would be taken

£rom cameras, and the system would yield real action as

outputs (e.g., moving real blocks until they balanced

(Clark 1996, pp. 58-59). The upshot is that cognitive

science can no longer hide behind the veil of abstraction

- it rnust incorporate the real world and the acting organism. As Clark claims,

l... abstracting away £rom the real-world poles of sensing and acting deprives our artificial systems of the opportunity to simplify or otherwise transform their information-processing tasks by the direct exploitation of real-world structure. Such exploitation may be essential if we hope to tackle sophisticated problem solving using the kinds of biologically plausible pattern-completing resources that artificial neural networks provide..: (Clark 1996, p. 59). ~ccordingto Clark, connectionist minds are well suited to accepting external support. For example, most of us can learn simple multiplications, such as 5X5=25.

That is, when someone asks. 'Whatrs5x51' we immediately answer '25'. Our knowledge of such simple multiplications can be supported by a simple 'on-board pattern- recognition device'. However, we tend to run into problems concerning longer multiplications. That is, if someone were to ask, 'What's 8561X9471?' we would not be able to give an irnmediate answer. We would first have to resort to pen and paper, or to a calculator, work the problem out longhand, and then provide an answer. When we resort to pen and paper. we are, in effect, reducing

the complex problem to a sequence of srnaller problems. In

this case, we begin with 1x1. What we are doing then is using the external medium (paper) to store the results of these simpler problems. And, by a series of simpler pattern completions, in conjunction with an external storage, we arrive at the solution to the problem (Clark

1996, p. 61).

1 daim that this point can also be made with regard to logical reasoning. Most of us cannot perform highly sequential problem-solving tasks, such as those involved in lcgic, in our minds. We can. however, easily solve simple logical problems. For example, if we are asked what the conclusion is to an argument of the fom 'P&Q1, we immediately answer 'Pt. But if we were asked to perform a derivation of A+(B&C), we would resort to pen and paper and reduce the complex problem to a sequence of problems. In this case, we begin with A+B.

However, as Clark rightly points out, some people can learn to perform complex multiplications and complex logical problems in their minds. But how do they accomplish this? In essence, they learn to manipulate a mental mode1 in the same way as they manipulate the external world. However , 'This kind of interna1 symbol manipulation is irnportantly distinct from the classical vision of inner symbols, for it claims nothing about the cornputational substrate of such imaginings. The point is simply that we can mentally stimulate the external arena and hence, at times, internalize cognitive competencieç that are nonetheless rooted in manipulations of the external world"' (Clark 1996, p.

ra L 'The neat classical separation of -3ta and process, of symbol structures and CPU, may have reflected nothing so much as the separation between the agent and an external scaffolding of ideas persisting on paper, in filing cabinets, or in electronic media' (Clark 1996. p. 61).

Another problem for Classical systems, but a merit for connectionist systems, is that Classical systems ignore the way we use the real spatial properties of a work space to sirnplify on-board computation. In effect,

Classical systems have adopted the 'plan-as-program1 idea in which we specify a complete sequence of actions in order to achieve some goal. For example, a list of instructions for making a cake would count as a specification. By following the Classical idea of planning, complex sequences of actions are detelmined by an internalized version of a set of instructions.

However. such a planning strategy leaves out the role that the external world plays. When we look at real-world behaviours and planning of cognitive agents, there is an 89 interplay between the plan and the supporting external world. Consider the following example, which involves the use of Scrabble tiles. When we play Scrabble, we physically order and reorder the tiles as a way of prompting our on-line neural resources. Now take a connectionist network and imagine the on-line neural resource as a kind of pattern- completing associative memory. One Scrabble playing strategy that the network can employ is to use external manipulations so as to create a number of different fragmentary inputs (new letter strings), which are capable of prompting the recall of whole words £rom the pattern-completing resource. The external, physical manipulation is useful, and this suggests that our on-board (in-the-mind) resources do not themselves provide easily for such manipulations (Clark 1996, pp. 64-64) . By contrast, Classical systems would treat such interna1 operations as trivial. But,

'This simple fact argues in favour of a non-classical mode1 of the inner resources. Once again, it looks for al1 the world (pu intended) as if the classical image bundles into the machines a set of operational capacities which in real life emerge only from the interactions between machine (brain) and world' (Clark 1996, p. 64 emphasis in original). ~hus,the external world can function not by just providing the mind with external memory support, as in the case of complex multiplications and complex logical problems; the external world is also an arena in which external operations systematically transform the problems posed to human minds. And this is a move in the right direction. Cognitive science should abandon the independence between physical space and information processing in favour of a unified physio-informational space (Clark, 1966, p.66).

As Clark States, 'Every thought is had by a brain. But the flow of thoughts and the adaptive success of reason are now seen to depend on repeated and crucial interactions with external resources. The true engine of reason ... is bounded by neither skin nor skull' (Clark 1996, pp. 68-69 emphasis in original).

~hus,connectionist minds are well suited to accepting external support; and this indicates that at yet another level of analysis - the mind-world level connectionism is able to offer an explanation of yet another feature of human cognition. Classicism simply

TO conclude, even if we do not have a non-Classical account of systematicity, it may be the case that the

92 biologically plausible theory of human cognition.

Connectionisrn is not merely an implementation of the LOT. Boolos, G.S. and Jeffrey, R.C. (1989) Computability and Logic. Third edition. New York: Cambridge University Press.

Clark, A. (1989/90), Connectionist Minds. In Proceedings of the Aristotelian Society 90.

Clark, A. (1993). Associative Engines: Connectionism, Concepts, and Representational Change. Cambridge, MA: Bradford/The MIT Press.

Clark, A. (1996). Being There: Putting ~rain,Mind, and Body Worl d Together. Cambridge, MA: Bradf ord/The MIT Press.

Dennett, D. (1987). The Intentional Stance. Cambridge, MA:Bradford/The MIT Press.

Fodor, J. (1975). The Language of Thought. Cambridge, MA: Harvard University Press.

Fodor, J. (1987). Why there still has to be a laquage of thought. In Psychosemantics. Cambridge, MA: The MIT Press.

Fodor, J. (1990). A Theory of Content and Other Essays. Cambridge, MA: Bradford/The MIT Press.

Fodor, J. and Pylyshyn, 2. (1988). Connectionism and Cognitive Architecture: A Critical Analysis. In Cognition 88.

Fodor, J. and McLaughlin, B. (1990). Connectionism and the Problem of Systematicity: Why Smolensky's Solution Doesn't Work. In Cognition 35.

Haack, S. (1978). ~hilosophyof Logics. Cambridge: Cambridge University Press. 94 ~rvine,A. (1996). Philosophy of Logic. In Philosophy of Science, Logic and Mathematics in the 20th Century, edited by S.G.Shanker. Vol. IX: The Routledge History of Philosophy. London: Routledge.

MacDonald, C. (1995) Introduction: Classicism v. Connec t ionism. In Connectionisrn: Debates on Psycho1 ogical Explanation, edited by C . MacDonald and G.MacDonald. Oxford: Basil Blackwell.

McLeod. P. and Plunkett, K. and Rolls, E. (1998). Introduction to Comectionist Modeling of Cogni tive Processes. Oxford: Oxford University Press.

Rumelhart, D. (1989). The Architecture of Mind: A Connectionist Approach. In Foundations of Cognitive Science, edited by M. Posner. Cambridge, MA: Bradford/The MIT Press.

Srnolensky, P. (1988). On The Proper Treatmrnt of Connec tionism . In Behavioral and Brain Sciences 11.

Srnolensky, P. (1991). Connectionisrn, Constituency and the Language of Thought. In Meaning in Mind: Fodor and his Critics, edited by B. Loewer and G. Rey. Oxford: Basil Blackwell.

Stainton, R. (1996). Philosophical Perspectives on Language. ~roadviewPress.

Tarski. A. (1944). The Semantic Conception of Truth. In Phi1 osophy and Phenomenological Research 4.