<<

AI Magazine Volume 9 Number 4 (1988) (© AAAI)

Connectionism and Information-

B. Chandrasekaran, Ashok Goel, and Dean Allemang

24 AI MAGAZINE Processing Abstractions

The Message Still Counts More Than the Medium

Challenge to modeling human cognition and per- Connectionism challenges a the Symbolic View ception. basic assumption of much of AI, that Connectionism and symbolicism mental processes are best viewed as algo- Much of the theoretical and empirical both agree on the idea of intelligence rithmic symbol manipulations. Connec- research in AI over the past 30 years as of tionism replaces symbol structures with has been based on the so-called sym- representations but disagree about the distributed representations in the form of bolic paradigm—the thesis that algo- medium in which the representations weights between units. For problems close rithmic processes which interpret dis- reside and the corresponding process- to the architecture of the underlying crete symbol systems provide a good machines, connectionist and symbolic ing mechanisms. We believe that basis for modeling human cognition. approaches can make different representa- symbolicism and connectionism carry tional commitments for a task and, thus, Stronger versions of the symbolic a large amount of unanalyzed assump- can constitute different theories. For com- paradigm were proposed by Newell tional baggage. For example, it is not plex problems, however, the power of a (1980) and Pylyshyn (1984). Newell’s clear if many of the theories cast in system comes more from the content of hypothesis is the symbolic mode really require this the representations than the medium in an example of the symbolic view. form of computation and what role which the representations reside. The con- Pylyshyn argues that symbolicism is the connectionist architecture plays nectionist hope of using learning to obvi- not simply a metaphoric language to in a successful connectionist solution ate explicit specification of this content is talk about cognition but that cogni- to a problem. We examine the undermined by the problem of program- tion literally is computation over ming appropriate initial connectionist assumptions and the claims of con- symbol systems. It is important to architectures so that they can in fact nectionism in this article to better learn. In essence, although connectionism note that the symbolic view does not understand the nature of representa- is a useful corrective to the view of imply a restriction to serial computa- tions and information processing in as a , for most of the cen- tion or a belief in the practical general. tral issues of intelligence, connectionism sufficiency of current von Neuman is only marginally relevant. computer architectures for the task of The Nature of Representations: understanding intelligence. Often, dis- agreements about symbolicism turn Roots of the Debate out to be arguments for computer The symbolic versus connectionist architectures that support some form debate in AI today is the latest version of parallel and distributed processing of a fairly classic contention between rather than arguments against compu- two sets of intuitions, each leading to tations on discrete symbolic represen- a weltanschauung about the nature of tations. intelligence. The debate can be traced In spite of what one might regard as in modern times at least as far back as significant AI achievements in provid- Descartes (to Plato if one wants to go ing a computational language to talk further back) and the mind-brain dual- about cognition, recurring challenges ism known as Cartesianism. In the have been made to the symbolic Cartesian world view, the phenomena paradigm. A number of alternatives of the mind are exemplified by lan- have been proposed whose basic guage and thought. These phenomena This article is an expanded version of part of a larger mechanisms are not in the symbol- can be implemented by the brain but paper entitled “What Kind of Information Processing Is interpretation mode. Connectionism are seen to have a constituent struc- Intelligence? A Perspective on AI Paradigms and a Pro- posal” that appears in Foundations of Artificial Intelli- is one such alternative. It revives the ture in their own terms and can be gence: A Source Book, eds. Derek Partridge and Yorick basic intuitions behind the early per- studied abstractly. Symbolic logic and Wilks, Cambridge University Press, 1988. A much ceptron theory (Rosenblatt 1962) and other symbolic representations are abridged version appears in Brain and Behavioral offers largely continuous, nonsymbol- often advanced as the appropriate Sciences 11(1) as a commentary on a paper on interpreting processes as a basis for tools for studying these phenomena. connectionism.

WINTER 1988 25 Functionalism in philosophy, infor- ther an inference nor a product of any ences also exist between them in mation-processing theories in psy- kind of information processing; computational terms. As the size of chology, and the symbolic paradigm rather, it is a one-step mapping from the multiplicands increases, the algo- in AI all share these assumptions. stimuli to categories of perception rithmic solution suffers in the Although most of the intuitions that made possible by the inherent proper- amount of time it takes to complete drive this point of view arise from a ties of the perceptual architecture. All the solution, and the slide rule solu- study of cognitive phenomena, the the needed distinctions are already tion suffers in the amount of precision thesis is often extended to include directly in the architecture, and no it can deliver. perception; for example, for Bruner processing over representations is Let us call the algorithmic and slide (1957), perception is inference. In its needed. rule solutions S1 and S2. Consider modern version, the Cartesian view- We note that the proponents of the another solution, S3, which is the point appeals to the Turing-Church symbolic paradigm can be happy with simulation of S2 by an algorithm. S3 hypothesis as a justification for limit- the proposition that mental phenome- can simulate S2 to any desired accura- ing attention to symbolic models. na are implemented by the brain, cy. However, S3 has radically different These models ought to suffice, the which might or might not have a properties from S1 in terms of the argument goes, because even continu- symbolic account. However, the anti- information that it represents. S3 is ous functions can be computed to Cartesian theorists cannot accept this closer to S2 representationally. Its arbitrary precision by a Turing duality. They want to show the mind symbol-manipulation character is at a machine. as epiphenomenal. To put it simply, lower level of abstraction altogether. The opposing view springs from the brain is all there is, and it isn’t a Given a black-box multiplier, ascrip- skepticism about the separation of the computer. tion of S1or S2 (among others) about mental from the brain-level phenome- Few people in either camp subscribe what is really going on results in dif- na. The impulse behind anti-Carte- to all the features in our descriptions. ferent theories about the process. sianism appears to be a reluctance to Connectionism is a less radical mem- Each theory makes different represen- assign any kind of ontological inde- ber of the anti-Cartesian camp tational commitments. Further, pendence to the mind. In this view, because many connectionists do not although S2 is analog, the existence of the brain is nothing like the symbolic have any commitment to brain-level S3 implies that the essential character- processor of Cartesianism. Instead of theory making. Connectionism is also istic of S2 is not continuity but a radi- what is seen as the sequential and explicitly representational—its main cally different sense of representation combinational perspective of the sym- argument is only about the medium and processing than S1. bolic paradigm, some of the theories of representation. The purpose of the The connectionist models relate to in this school embrace parallel, holis- preceding account is to help in the symbolic models in the same way tic (that is, they cannot be explained understanding the philosophical S2 relates to S1. An adequate discus- as compositions of parts), nonsymbol- impulse behind connectionism and sion of what makes a symbol requires ic alternatives; however, others do not the rather diverse collection of bedfel- more space and time than we current- even subscribe to any kind of informa- lows that it has attracted. ly have (Pylyshyn [1984] provides a tion processing or representational thorough and illuminating discus- language in talking about mental phe- Symbolic and NonSymbolic sion), but the following points are use- nomena. Those who do accept the ful. A type-token distinction exists: Representations need for information processing of Symbols are types about which some type nevertheless reject process- To better understand the difference abstract rules of behavior are known ing of labeled symbols and look to between the symbolic and nonsym- and can be brought into play. This dis- analog, or continuous, processes as bolic approaches, let us consider the tinction leads to symbols being labels the natural medium for modeling the problem of multiplying two positive that are interpreted during the pro- relevant phenomena. In contrast to integers. We are all familiar with algo- cess; however, no such interpretations Cartesian theories, most of the con- rithms to perform this task. We also exist in the process of slide rule mul- crete work deals with perceptual and know how the traditional slide rule tiplication (except for input and out- motor phenomena, but the framework can be used to do this multiplication. put). Thus, the symbol system can is meant to cover complex cognitive The multiplicands are represented by represent abstract forms, and S2 per- phenomena as well. their logarithms on a linear scale, forms its addition or multiplication in philoso- which are then added by being set not by instantiating an abstract form, phy, Gibsonian theories in psycholo- next to each other; the result is but by having, in some sense, all the gy, and connectionism in obtained by reading off the sum’s additions and multiplications directly and AI can all be grouped as more or antilogarithm. Although both the in its architecture. less sharing this perspective, even algorithmic and slide rule solutions Although we use the word “pro- though they differ from each other on are representational, in no sense can cess” to describe both S1 and S2, a number of issues. The Gibsonian either of them be thought of as an strictly speaking no process exists in direct perception theory (Gibson implementation of the other. They the sense of a temporally evolving 1950), for example, is nonrepresenta- make different commitments about behavior in S2. The architecture tional. Perception, in this view, is nei- what is represented. Striking differ- directly produces the solution. This is

26 AI MAGAZINE the intuition present in Gibson’s (1950) theory of direct perception as Three Ways to Multiply Numbers opposed to Bruner’s (1957) alternative proposal of perception as inference, Consider the problem of multiplying we are dealing with integer multiples because the process of inference 45 x 17 to get 765. A classical algo- of powers of ten and using the implies a temporal sequence. Connec- rithmic approach to this problem is to columns to keep track of symbolic tionist systems can have a temporal do it the way we were taught in representations of them. In the case of evolution, but unlike algorithms, school, showing the work: the slide rule, we are dealing with log- information processing does not have 45 arithms and letting the architecture of a step-by-step character. Thus, con- 17 the slide rule keep track of them. nectionist models are often presented 315 We can also solve the multiplica- 45 tion by simulating the slide rule with as holistic. 765 a computer. That is, we can compute The main point of this subsection is However, we can also use a slide rule. the logarithms to any desired accura- that functions exist for which the On a slide rule, the number 45 is writ- cy, add them up, to get the logarithm symbolic and connectionist accounts ten log(45) units from the end of the of the answer. can differ fundamentally in terms of rule. Hence, to multiply 45 x 17, we log 45 + log 17 = log 765 the representational commitments line up the distances log(45) and 1.653 + 1.230 = 2.883 they make. Having granted that con- log(17) next to each other, which gives nectionism can make a theoretical us the place on the rule at log(45) + In this solution, the objects are still difference, we now want to argue that log(17) = log(765), which is labeled logarithms, but the addition is done the difference connectionism makes 765, the desired answer (see figure 1). symbolically. Just as with the slide is relatively small to the practice of Notice that if we use the pencil- rule, when the numbers get larger, the most AI as a research enterprise. and-paper algorithm on larger num- answer is less precise. The interesting Although our arguments refer bers, we use more pencil lead and characteristics of each solution come specifically to connectionist models, spend more time writing, and that if from the representational commit- they are actually intended to apply to we use the slide rule, the answer is ments it makes, not from the symbol- ic-nonsymbolic nature of its architec- nonsymbolic theories in general. less precise. In the pencil-and-paper example, ture. Connectionism and Its Main Features 1 2 Connectionism as an AI theory comes .1 .2 .3 .4 .5 .6 .7 .8 .9 in many different forms. Exactly what constitutes the essence of con- nectionism is open to debate. The .5 .5 .5 .5 connectionist architectures in the per- 4 5678 ceptron/parallel distributive process- 7.65 ing style (Rosenblatt 1962; Rumelhart 45 x 17 = 765 et al. 1986) share the following ideas. The representation of information is in the form of weights of connections between processing units in a net- Figure 1. Multiplication Using a Slide Rule. work, and information processing consists of the units transforming their input into some output, which is units’ behavior in a connectionist net- A Connectionist Solution to Word then modulated by the weights of work, is largely irrelevant. This Recognition, describes a specific con- connections as input to other units. approach is consistent with Smolen- nectionist proposal for word recogni- Connectionist theories emphasize a sky’s (1988) statement that the lan- tion. form of learning in which the weights guage of differential equations is A number of properties of such con- are adjusted continuously so that the appropriate to use when describing nectionist networks are worthy of network’s output tends toward the the behavior of connectionist net- note and explain why connectionism desired output. Although this descrip- works. Further, although our descrip- is viewed as an attractive alternative tion is couched in nonalgorithmic tion is couched in the form of contin- to the symbolic paradigm. first is par- terms, in fact, many connectionist uous functions, the essential aspect of allelism. Although theories in the theorists describe the units in their the connectionist architecture is not symbolic paradigm are not restricted systems in terms of algorithms that the property of continuity; it is that to serial algorithms, connectionist map their input into discrete states. the representation medium has no models are intrinsically parallel. Sec- However, the discrete-state descrip- internal labels that are interpreted and ond is distribution. In some connec- tion of the units’ output, as well as no abstract forms that are instantiated tionist schemes (McClelland, Rumel- the algorithmic specification of the during processing. Sidebar 2, entitled hart, and Hinton 1986), the represen-

WINTER 1988 27 tation of information is distributed els (Smolensky 1988); still others offer Information-Processing over much of the network in a spe- connectionism as a computational Abstractions cialized sense—the state vector of the method that operates in the symbolic- network weights is the representa- level representation itself (Feldman Some proponents of connectionism tion. Third is the softness of con- and Ballard 1982). The essential idea claim that although solutions in the straints. Because the network con- uniting these theories is that the symbolic paradigm are composed of tains a large number of units, each totality of connections defines the constituents, connectionist solutions bearing a small responsibility for the information content rather than the are holistic. Composition, in this task, and because of the continuity of representation of information as a argument, is taken to be, intrinsically, the space over which the weights take symbol structure. a symbolic process. Certainly, for values, the output of the network some simple problems, connectionist solutions exist with this holistic char- tends to be more or less smooth over Is Connectionism Merely an the input space. Fourth is learning. acter. Some connectionist solutions to Because of a belief that connectionist Implementation Theory? character recognition, for example, directly map from pixels to characters schemes are particularly good at Several arguments have been made and cannot be explained as composing learning, often an accompanying that connectionism can, at best, pro- evidence about the features, such as belief exists that connectionism offers vide possible implementations for closed curves, lines, and their rela- a way to avoid programming an AI symbolic theories. According to one, tions. Character recognition by tem- system and let learning processes dis- continuous functions are thought to be plate matching, a nonsymbolic cover all the needed representations. the alternative to discrete symbols; though not a connectionist solution, The properties of parallelism and because they can be approximated to is another example whose informa- distribution have attracted adherents an arbitrary degree of precision, it is who feel that human memory has a tion processing cannot be explained as feature composition. However, as problems get more complex, the Connectionism does not offer advantages of modularization and composition are as important for con- a royal road to learning nectionist approaches as they are for symbolic computation. holistic character—much like a holo- argued that one need only consider Let us consider word recognition, a gram—and consequently react nega- symbolic solutions. Another argument problem area that has attracted tively to discrete symbol-processing is that connectionist architectures are significant attention in connectionist theories because they compute the thought to be the implementation literature. In particular, consider needed information from constituent medium for symbolic theories, much recognition of the word TAKE as dis- parts and their relations. Dreyfus as the computer hardware is the imple- cussed by McClelland and Rumelhart (1979), for example, argues that pat- mentation medium for software. In the (1981). A featureless connectionist tern recognition in humans does not subsection entitled Symbolic and Non- solution similar to the one for individ- proceed by combining evidence about symbolic Representations, we consider ual characters can be imagined, but a constituent features of a pattern but, and reject these arguments. We show more natural solution is one that in rather, uses a holistic process. Thus, that symbolic and nonsymbolic solu- some sense composes the evidence Dreyfus looks to connectionism as tions can be alternative theories in the about individual characters into a vindication of his long-standing criti- sense that they can make different rep- recognition of the word TAKE (see cism of symbolic theories. Connec- resentational commitments. sidebar 2). In fact, the connectionist tionism is said to perform direct recog- Yet another argument is based on a solution that McClelland and Rumel- nition, and symbolicism performs consideration of the properties of hart describe has a natural interpreta- recognition by sequentially high-level thought, in particular, lan- tion in these terms. Just because the intermediate representations. guage and problem-solving behavior. word recognition is done by composi- These characteristics are especially Connectionism by itself does not have tion does not mean that each of the attractive to those who believe that the constructs for capturing these characters is explicitly recognized as AI must be based more on brainlike properties, the argument runs, so, at part of the procedure or that the evi- architectures, even though within the best, it can only be a way to imple- dence is added together in a step-by- connectionist camp, a wide diver- ment the higher-level functions. We step, temporal sequence. gence is present about the degree to discuss this point and related issues in Why is such a compositional solu- which directly modeling the brain is Roles of Symbolic and Connectionist tion more natural? Reusability of considered appropriate. Although Processes. parts, reduction in learning complexi- some of the theories explicitly Having granted that connectionism ty, and greater robustness as a result attempt to produce neural-level com- can make a theoretical difference, we of intermediate evidence are the putational structures, others propose now argue the difference connection- major computational advantages of an intermediate subsymbolic level ism makes is relatively small to the modularization. If the reader doesn’t between the symbolic and neural lev- practice of most of AI. see the power of modularization for

28 AI MAGAZINE word recognition, consider sentence another, each with its own signature tions are bound to be unrealistic. recognition: If one were to go directly in terms of the performance details. from pixels to sentences, without in These examples also raise questions The Information-Processing Level some sense going through words, the about the degree to which connection- Marr (1982) originated the method of number of recognizers and their com- ist representations can be distributed. information-processing analysis as a plexity would have to be quite large For complex tasks, information is, in way to separate the essential elements even for sentences of bounded length. fact, localized into portions of the net- of a theory from implementation-level Composition is a powerful aid against work. Again, in the network for recog- commitments. He proposed that the complexity, whether the underlying nition of the word TAKE, physically following methodology be adopted for system is connectionist or symbolic local subnets can be identified, each this purpose. First, identify an infor- (Simon 1969). Of course, connection- corresponding to one of the charac- mation-processing function with a ism provides one style for composi- ters. Thus, hopes for almost holo- clear specification about what kind of tion, and symbolic methods provide graphically distributed representa- information is available for the func-

A Connectionist Solution to Word Recognition For an illustration of how connection- zero. In the first three positions, T, A, result, TAKE becomes more active ist networks work, let us consider the and K are unambiguously activated. than any other word and inhibits other model proposed by McClelland and For the fourth position, the activations words, thereby successfully dominat- Rumelhart (1981) for the perception of of the detectors for E and F start grow- ing the pattern of activation among letters of visually presented words. ing as the feature detectors below the word units. As TAKE grows in Our description of their model closely them are activated. As these detectors strength, it sends feedback to the letter follows McClelland, Rumelhart, and become active, they and the detectors level, reinforcing the activations of T, Hinton (1986). Their model contains for T, A, and K start to activate detec- A, K, and E. This feedback gives E the four sets of detectors for the four-letter tors for words that have these letters upper hand on F in the fourth position, input words, with a set of units in them. A number of words might be and eventually the stronger activation assigned to detect visual features in partially consistent with the active let- of the E detector dominates the pat- each of the four different letter posi- ters, but only TAKE matches the tern of activation, suppressing the F tions. The feature-detecting units for active letters in all positions. As a detector completely. one of the letter positions are shown in figure 2. There are four sets of detectors for the letters themselves, and one set for the words. Each unit in the network has an activation value ABLE TRIP TIME that corresponds to the strength of the hypothesis which states what the unit stands for is present in the input. The TRAP TAKE CART connections between the units in the network are such that if two units are mutually consistent—in the way that the letter T in the first position is con- sistent with the word TAKE—then the activation of one unit tends to support the activation of the other. Similarly, if two hypotheses are mutually incon- sistent, then the corresponding units tend to inhibit each other. Let us consider what happens when ANTGS a familiar stimulus under degraded conditions is presented to this net- work. Let us suppose that the input display consists of the letters T, A, and K fully visible and enough of the fourth letter to rule out all letters but E or F. Initially, the activations of all units are set at or below zero. When the display is presented, the activa- Figure 2. Connectionist Network for Word Recognition. tions of detectors for features present From “An Interactive Model of Context Effects in Letter Perception: Part 1, An Account of in each letter position grow above Basic Findings” by J. L. McClelland and D. E. Rumelhart. Psychological Review 88:380.

Photo courtesy of American Psychological Association, copyright 1981. Reprinted by permission.

WINTER 1988 29 tion as input and what kind of infor- paradigm, it is represented with formations are best done using con- mation needs to be made available as labeled symbols, which permit nectionist networks and which using output. Then, specify a particular abstract rules of composition to be symbolic algorithms can properly fol- information-processing theory for invoked and instantiated. In the con- low once the information-pro- achieving this function by stating nectionist paradigm, evidence is rep- cessing–level specification is given. what kinds of information need to be resented more directly and affects the Thus, the connectionist and symbolic represented at various processing processing without undergoing any approaches are realizations of the stages. Actual algorithms can then be interpretive process. Describing a information-processing–level descrip- proposed to carry out the information- piece of a network as evidence about a tion, which is more abstract than processing theory. These algorithms character is a design and explanatory either realization. make additional representational stance and is not necessarily part of commitments. In the case of vision, the actual information processing in Architecture-Independent for example, Marr specified that one connectionist networks. and -Dependent Decompositions of the functions is to take image As connectionist structures are intensities in a retinal image as input built to handle increasingly complex We argued earlier that for a given and produce as output a three-dimen- phenomena, they will have to incor- function, the symbolic and nonsym- sional shape description of the objects porate their own versions of modulari- bolic approaches might make rather in the scene. His theory of how this ty and composition. Already we saw different representational commit- function is achieved in the visual sys- such modularity in the moderately ments. We also just argued, seemingly tem is that three distinct kinds of complex word-recognition example. paradoxically, that for complex func- tions the two theories converge in their representational commitments. To clarify, think of two stages in the Radical connectionism, similar to radical decomposition of the function: archi- symbolicism, seems to demand all of cognition tecture independent and architecture dependent. The architecture-indepen- as its domain, and we argue that this dent stage is an information-process- demand cannot be conceded ing theory that can be realized by either symbolic or connectionist architectures. In either case, further architecture-dependent decomposi- information need to be generated: When—and if—we finally have con- tion decisions need to be made. In par- first, from the image intensities, a pri- nectionist implementations solving a ticular, connectionist architectures mal sketch of significant intensity variety of high-level cognitive prob- offer some elementary functions that changes—a kind of edge description of lems (say, natural language under- are rather different from those the scene—is generated. Second, a standing or problem solving and plan- assumed in traditional symbolic description of the objects’ surfaces ning), the design of such systems will approaches. Simple functions such as and their orientation—what he called have an enormous amount in com- multiplication are so close to the a 2-1/2 -dimensional sketch—is pro- mon with the corresponding symbolic architecture level that we only saw duced from the primal sketch. Third, theories. This commonness will be at the differences between the represen- a three-dimensional shape description the level of information-processing tational commitments of the algorith- is generated. Even though Marr talked abstractions that both classes of theo- mic and slide rule solutions. Howev- in the language of algorithms as the ries would need to embody. In fact, er, the word-recognition problem is way to realize the information-pro- the contributions of many of the nom- sufficiently removed from the archi- cessing theory, in principle, there is inally symbolic theories in AI are real- tectural level that we saw informa- no reason why appropriate parts of the ly at the level of the information-pro- tion-processing–level similarities realization cannot be done connec- cessing abstractions to which they between symbolic and connectionist tionistically. make a commitment and do not rely solutions. Information-processing abstractions on the fact that they were implement- Where the architecture-independent constitute the content of much AI ed in a symbolic structure. Symbols information-processing theory stops theory formation. In the recognition have been often used to stand for and the architecture-dependent real- of the word TAKE, for example, the abstractions that need to be captured ization starts is not clear. It is an information-processing abstractions one way or another. The hard work of empirical issue, partly related to the in which the theory of word recogni- theory making in AI will always primitive functions that can be com- tion was couched evidenced the pres- remain at the level of proposing the puted in a particular architecture. The ence of individual characters. The dif- right information-processing level further away a problem is from the ference between the recognition abstractions because these abstrac- architectures’ primitive functions, the schemes in the symbolic and connec- tions provide the content of the repre- more important the architecture-inde- tionist paradigms is in how the evi- sentations. The decisions about which pendent decompositions. The final dence is represented. In the symbolic of the information-processing trans- performance will, of course, have fea-

30 AI MAGAZINE tures that are characteristic of the ciations do the trick. As long as the the representational issues remain, architecture, such as the softness of space to be searched is not large, as whether one adopts connectionism for constraints for connectionist architec- long as there are no local minima, and such mappings. Interesting learning tures. as long as there are enough trials, hill theories in the symbolic framework climbing can find the appropriate can also be interpreted as starting Learning to the Rescue? region in the space. with a strong set of abstractions to Again, the recognition scheme for which the learning process adds What if connectionism can provide TAKE well illustrates this point. In sufficient new abstractions to solve learning mechanisms such that a net- the connectionist scheme cited earli- the task. work starts without representing any er, the decisions about which subnet Of course, in human learning, information-processing abstractions is going to be largely responsible for T, although some of the necessary and learns to perform the task in a which for A, and so on, as well as how abstractions are programmed in at reasonable amount of time; that is, it the feedback is going to be directed, various times through explicit discovers the needed abstractions by are all essentially made by the experi- instruction, a large amount of learn- learning? In this case, connectionism menter before any learning starts. The ing takes place without any designer can sidestep pretty much all the repre- underlying information-processing intervention in setting up the learning sentational problems and dismiss theory is that evidence about individ- structure. However, there is no reason them as the bane of symbolicism. The ual characters is going to be formed to believe that humans start with a fundamental problem of complex directly from the pixel level, but structure- and abstraction-free initial learning is the credit-assignment recognition of TA is done by combin- configuration. In fact, to account for problem, that is, deciding what part of ing information about the presence of the power of human learning, the ini- the system is responsible for either T and A as well as their joint likeli- tial configurations that a child starts the correct or the incorrect perfor- hood. The degree to which the evi- with need to contain complex and mance in a case so that the learner dence about them is combined is intricate representations sufficient to knows how to change the system’s determined by the learning algorithm support the learning process in a com- structure. Abstractly, the range of and the examples. In setting up the putationally efficient way. One can- variation of the system’s structure can initial configuration, the designer is not avoid the specification of appro- be represented as a multidimensional actually programming the architec- priate initial structures and still get space of parameters, and learning can ture to reflect the information-pro- complex learning at different levels of be viewed as a search in this space for cessing theory of recognizing the description to take place in less than a region that corresponds to the right word. An alternate theory for word evolutionary or geologic time. That is, structure of the system. The more recognition, say, one that is more connectionism does not offer a royal complex the learning task, the more holistic than this theory (that is, one road to learning. vast the space in which to do the that learns the entire word directly search. Thus, learning the correct set from the pixels), has a different initial Roles of Symbolic and of parameters by search methods that configuration. Of course, because of a Connectionist Processes do not have a powerful notion of cred- lack of guidance from the architecture it assignment would work in small about localizing the search during In the study of connectionist and search spaces but would be computa- learning, such a network takes much symbolic processes, three distinctions tionally prohibitive for realistic prob- longer to learn the word. This is pre- can be identified as sharing close lems. Does connectionism have a cisely the point: The designer recog- affinity. The first is the distinction solution to this problem? nized this and set up the configura- between macrophenomena and In connectionist schemes, a tion so that learning can occur in a microphenomena of intelligence. The significant part of the abstractions reasonable time. Thus, although the second is the distinction between pro- needed are built into the architecture connectionist schemes for word recog- cesses that leave markings that last in the choice of input, feedback direc- nition still make a useful performance over time and intuitive or subcon- tions, allocation of subnetworks, and point, a significant part of the leverage scious phenomena occurring in an semantics that underlie the choice of still comes from the information-pro- instant. The last is the distinction layers for the connectionist schemes. cessing abstractions with which the between symbolic and connectionist Thus, the input and the initial designer started. processes. These three distinctions configuration incorporate a sufficient- Additionally, the system that need to be unpacked a bit to see what ly large part of the abstractions need- results after learning has a natural can be allocated to what processes. ed that what is left to be discovered interpretation: The learning process Rumelhart, McClelland, and the by the learning algorithms, although can be interpreted as having success- PDP Research Group (1986) use the nontrivial, is proportionately small. fully searched the space for those term “micro-” in the subtitle of their The initial configuration decomposes additional abstractions which are book to indicate that the connection- the search space for learning in such a needed to solve the problem. Thus, ist theories are concerned with the way that the search problem is much connectionism is one way to map fine details of intelligent processes. A smaller in size. In fact, the space is from one set of abstractions to a more duration of 50–100 milliseconds has sufficiently small that statistical asso- structured set of abstractions. Most of often been suggested as the size of the

WINTER 1988 31 temporal grain for processes at the side. For example, natural language correctly feel the attraction of connec- micro level. However, certain aspects sentence comprehension, which gen- tionist approaches for some parts of of human cognitive behavior actually erally takes place instantly or as a their theory formation, the parts evolve over time on a scale of sec- reflex, nevertheless seems to require where one or more of such elementary onds, if not minutes, and have a clear such a formal structure. Fodor and functions seem necessary. In a theory temporal ordering of the major behav- Pylyshyn (1988) argue that much of such as GPS, for example, the ioral states. These processes can be thought has the properties of produc- retrieval of the appropriate operators termed macrophenomena of intelli- tivity and “systematicity.” Productiv- has traditionally been implemented in gence. ity refers to a potentially unbounded a symbolic framework, but a connec- Perceptual processes such as face recursive combination of thought that tionist realization of this retrieval recognition and cognitive processes is presumed in human intelligence. seems to have useful properties. (As such as being reminded are examples Systematicity refers to the capability another example, Anderson and of microphenomena. As an example of of combining thoughts in ways that Mozer [1981] propose a model of macrophenomena, consider the goal- require abstract representation of retrieval using spreading activation directed problem-solving activity that underlying forms. Fodor and Pylyshyn [which has a connectionist ring to it], a system such as General Problem argue that we need symbolic compu- where the objects of retrieval still Solver (GPS) (Newell and Simon 1972) tations, with their capacity for have significant symbolic content to tries to model. The agent is seen to abstract forms and algorithms, to real- them. Also, sidebar 3 on connection- have a goal at a certain instant, to set ize these properties. ism and word pronunciation is an up a subgoal at another instant, and so Thus, macrophenomena and example of connectionism being used on. Within this problem-solving significant parts of microphenomena within a largely symbolic framework.) behavior, the selection of an appropri- not only need the appropriate infor- Connectionism and symbolicism have ate operator, which is typically mod- mation-processing abstractions avail- different but overlapping domains. A eled in GPS implementations as a able, but at least parts of them need complete theory that integrates these retrieval algorithm from the Table of the abstractions encoded and manipu- domains along the lines suggested Connections, could be a micro behav- lated symbolically. Whether the sym- here can be a source for powerful ior. Many phenomena of language and bolic view needs to be adopted for explanations of the total range of the reasoning have a large macro compo- implementation of the other parts is phenomena of intelligence. nent. the next question. If any of them can The proposed division of responsi- Neither traditional symbolic com- be identified with microphenomena bility echoes in the proposal in putationalism nor radical connection- that have a particularly appealing con- Smolensky (1988) that connectionism ism has much use for this distinction nectionist realization, then one might operates at a lower level than the because all the phenomena of intelli- have an interesting division of respon- symbolic, a level he calls subsymbol- gence, micro and macro, are meant to sibility. ic. He also posits the existence of a come under their particular purview. Are there such microphenomena? conscious processor and an intuitive We want to present the case for a divi- The symbolic paradigm has tradition- processor. The connectionist propos- sion of responsibility between connec- ally assumed that the symbolic, algo- als are meant to apply directly to the tionism and symbolic computational- rithmic character of the macrophe- intuitive processor. The conscious ism in accounting for the phenomena nomena also characterizes the inner processor can have algorithmic prop- of interest. workings of the cognitive processor erties, according to Smolensky, but Let us take the macro, conscious- that generates the macrophenomena. still a large part of the information- level phenomena first. It seems Connectionism clearly challenges this processing activities that were tradi- inescapable that macrophenomena assumption. Radical connectionism, tionally attributed to symbolic archi- have a high degree of symbolic and similar to radical symbolicism, seems tectures really belong in the intuitive algorithmic content, whatever one’s to demand all of cognition as its processor. beliefs about the formal nature of domain, and we argue that this Nevertheless, the style of integra- microphenomena might be. (See demand cannot be conceded. Never- tion proposed leaves a number of Pylyshyn [1984] for compelling argu- theless, the architectures in the con- problems to be solved. The first prob- ments in this regard.) How much of nectionist mold offer some elemen- lem is how to get the symbolic prop- language and other aspects of thought tary functions which are rather differ- erties of behavior at or near the level require symbol structures can be a ent from those assumed in the tradi- of consciousness out of the connec- matter of debate, but certainly, logical tional symbolic paradigm. In particu- tionist architecture. Additionally, the reasoning and goal-directed problem lar, certain kinds of retrieval and theory cannot relegate conscious solving such as with GPS are two matching operations and low-level thought to the status of an epiphe- examples of such behavior. parameter learning are especially nomenon. We know that the phenom- What follows is a range of phenom- appropriate elementary functions for ena of consciousness have a causal ena that seem to have a micro, below- which connectionism offers methods interaction with the behavior of the conscious character, but whose formal with attractive properties. Thus, a intuitive processor. What we con- requirements nevertheless place them number of investigators in macro AI sciously learn and think affects our largely on the symbolic, algorithmic unconscious behavior slowly but sure-

32 AI MAGAZINE ly, and vice versa. What is conscious and willful today becomes uncon- A Connectionist Solution to the Pronunciation Problem scious tomorrow. All this raises a For an example of a connectionist net- relaxation algorithm is then applied to complex constraint for connection- work that is not merely an implemen- the network to decide which pronun- ism: It now needs to provide some tation of a symbolic algorithm but ciation to prefer. sort of continuity of representation also benefits from using appropriate This solution shows the same fuzzi- and process so that this interaction information-processing abstractions, ness as connectionist solutions inas- can take place smoothly. consider the PRO system of Lehnert much as the recognition is based on Our account does not merely rele- (1987). patterns that emerge from the corpus gate connectionism to an implemen- The task is to use a large case base of cases. However, the learning is not tation status similar to relation of words and their pronunciations to done in a standard connectionist fash- between computer software and hard- learn to pronounce novel words. Cases ion. The power of a learning scheme are presented as letter string and comes from its capability to success- ware. Because the primitive functions phoneme pairs. Thus, the pronuncia- fully solve the credit-assignment prob- that connectionism delivers are quite tion of the word showtime results in lem. PRO makes a statement about different from those assumed in the the sequence of pairs (SH/sh, the credit-assignment problem by symbolic framework, their availabili- OW/o, T/t, I/i, ME/m). This sequence using frequency of hypothesis ty changes theory making for the is split into triplets, for example, sequences as the basis of learning. overall symbolic process in a funda- (OW/o, T/t, I/i, for training. The num- This approach makes learning much mental way. The theory now has to ber of occurrences of each triplet dur- faster because the necessary abstrac- decompose the symbolic process to ing training is counted. tions are already present in the sys- take special advantage of the power of When a query is presented to PRO, tem, and credit assignment is focused. connectionist primitives. For exam- it generates all possible hypothesis The power of this method for assign- sequences it can associate with the ing credit comes from the appropriate ple, problem-solving theories of word. Note that this output does not information-processing abstractions of expert behavior might radically differ usually contain all possible segmenta- phonetic hypotheses. if retrieval were to be a large compo- tions of the input word because most At the information-processing level, nent of such theories, making prob- substrings are not associated with this theory states that the appropriate lem solving by retrieval of past cases hypotheses encountered in training. way to decide how to pronounce a and modification of their solutions an These hypotheses are linked into a word is to break it into groups of let- especially dominant component, as in network, with supporting connections ters which correspond to phonetic case-based reasoning. between hypotheses that correspond hypotheses rather than into the obvi- It is important to note that this pro- to a particular segmentation of the ous units of individual letters. Fur- posal of the division of responsibility word and inhibitory connections thermore, frequencies of phonetic between hypotheses which represent hypothesis sequences in a case base does not mean abandoning the role of different uses for the same input let- can distinguish which hypotheses to information-processing abstractions ter. A node that Lehnert refers to as a use. At the architecture level are the we have been arguing for. One should "context node" is added to the network specific relaxation algorithm and the be careful about putting too much wherever three consecutive hypothe- context nodes. The success of this faith in connectionist mechanisms. ses correspond to one of the triples method comes from the information- As we stated earlier, the power for encountered during training. The acti- processing abstractions, and the fuzzi- these operations is going to come vation levels of the context nodes are ness of the solution comes from the from appropriate encodings that get computed based on the number of connectionist architecture. represented connectionistically. Thus, occurrences of this triplet. A standard although memory retrieval might have interesting connectionist compo- nents, the basic problem is still to find the principles by which episodes are indexed and stored, except that now one might be open to these encodings being represented connec- not mean that schema theory is only a the explanatory power. tionistically. macroapproximation. Schema, in the Finally, we want to address the sense of being an information-process- Conclusion comment by Rumelhart et al. (1986) ing abstraction needed for certain that symbolic theories are really macrophenomena, is a legitimate con- What impact will connectionism have explanatory approximations of theo- ceptual construct for which connec- on AI in general? Much of AI research, ries which are connectionist at a deep- tionist architectures offer a particular- except where microphenomena domi- er level. As an example, they suggest ly interesting realization. It is not that nate and symbolic AI is simply too that a schema or a frame is not really connectionist structures are the reali- hard edged in its performance, will explicitly represented as such but is ty and that symbolic accounts provide and should remain largely unaffected constructed, as needed, from general an approximate explanation; rather, it by connectionism for two reasons. connectionist representations. This is the information-processing abstrac- First, most of the work is in discover- suggestion seems plausible but does tions which contain a large portion of ing the information-processing theory

WINTER 1988 33 of a phenomenon in the first place. beyond this, it is time to make theo- tion Problems. Biological The further the task-level description ries at a different level of description 52:141–152. is from the phenomenon at the raw altogether. Lehnert, W. G. 1987. Case-Based Problem architecture level, the more common As said in Chandrasekaran (1986) in Solving with a Large Knowledge Base of are the representational issues a slightly different context, “There Learned Cases. In Proceedings of the Sixth between the connectionist and sym- has been an ongoing search for the National Conference on Artificial Intelli- gence, 301–306. Menlo Park, Calif.: Ameri- bolic approaches. Second, none of the ‘holy grail’ of a uniform mechanism connectionist arguments or empirical that will explain and produce intelli- can Association for Artificial Intelligence. results show that the symbolic, algo- gence. This desire has resulted in a McClelland, J. L., and Rumelhart, D. E. rithmic character of thought is a mis- number of candidate mechanisms 1981. An Interactive Activation Model of taken hypothesis, purely epiphenome- —from of the 1960s Context Effects in Letter Perception: Part 1. An Account of Basic findings. Psycho- nal, or simply irrelevant. through first-order predicate calculus logical Review 88:375-407. Our arguments for and against con- to rules and frames—to satisfy this nectionist notions are not really need.” Intelligence is not a product of McClelland, J. L.; Rumelhart, D. E.; and specific to any particular scheme. any one mechanism, whether at the Hinton, G. E. 1986. The Appeal of Parallel Distributed Processing. In Parallel Dis- They are intended to apply to non- connectionist or rule level. Reduc- tributed Processing, vol. 1, eds. D. E. symbolic approaches in general, tionism, either of the connectionist or Rumelhart, J. L. McClelland, and the PDP including the approaches of Hopfield symbolic style, misstates where the Research Group. Cambridge, Mass.: MIT and Tank (1985). The work of Reeke power of intelligence as a phe- Press/Bradford. and Edelman (1988) challenges any nomenon is coming from. Its power is McClelland, J. L.; Rumelhart, D. E.; and form of representationalism, which a result of cooperation between differ- the PDP Research Group, eds. 1986. Paral- requires a separate answer. Within ent mechanisms and representations lel Distributed Processing, 2 vols. Cam- representationalist theories, however, at different levels of description. bridge, Mass.: MIT Press/Bradford. it seems that we need to find a way to McCulloch, W. S., and Pitts, W. 1943. A deal with three constraints on archi- Acknowledgments Logical Calculus of the Ideas Immanent in tectures for mental phenomena: (1) A We gratefully acknowledge the support of Nervous Activity. Bulletin of Mathemati- large part of theory making in AI has the Defense Advanced Research Projects cal Biophysics 5:115–137. to do with the content of mental rep- Agency, RADC contract F30602-85-C- Marr, D. 1982. Vision. San Francisco: Free- resentations. We call them the infor- 0010, the Air Force Office of Scientific man. mation-processing abstractions. (2) Research, grant 87-0090, and the National Newell, A. 1980. Physical Symbol Sys- Science Foundation for the graduate fel- Whatever one’s position on the nature tems. 4:135–183. of representations below conscious lowship (Allemang) granted during the preparation of this article. Newell, A., and Simon, H. A. 1972. processes, it is clear that processes at Human Problem Solving. Englewood or close to this conscious level are References Cliffs, N.J.: Prentice-Hall. intimately connected to language and Pylyshyn, Z. W. 1984. Computation and knowledge and, thus, have a large dis- Anderson, J. R., and Mozer, M. C. 1981. Categorization and Selective . In Cognition: Towards a Foundation for Cog- crete symbolic content. (3) The con- nitive Science. Cambridge, Mass.: MIT nectionist ideas on representation Parallel Models of Associative Memory, eds. G. E. Hinton and J. R. Anderson, Press. suggest how nonsymbolic representa- 213–236. Hillsdale, N.J.: Lawrence Erl- tions and processes can provide the Reeke, G. N., Jr., and Edelman, G. M. 1988. baum. Real Brains and Artificial Intelligence. medium in which thought resides. Bruner, J. S. 1957. On Perceptual Readi- Daedalus 117(1): 143–173. From the viewpoint of computer ness. Psychological Review 64:123–152. science, connectionism has done a Rosenblatt, F. 1962. Principles of Neurody- Chandrasekaran, B. 1986. Generic Tasks in namics. New York: Spartan. useful service in refocusing attention Knowledge-Based Reasoning: High-Level Rumelhart, D. E.; Smolensky, P.; McClel- on alternative models of computation. Building Blocks for Expert System Design. land, J. L.; and Hinton, G. E. 1986. Schema- However, for much of AI and cogni- IEEE Expert 1(3): 23–30. tion, the supposed battle between ta and Sequential Thought Processes in Dreyfus, H. L. 1979. What Computers connectionism and symbolicism is PDP Models. In Parallel Distributed Pro- Can’t Do. New York: Harper & Row. cessing: Explorations in the Microstructure mere shadowboxing. Neither of the of Cognition, vol. 2, eds. J. L. McClelland, theories explains or accounts for all Feldman, J. A., and Ballard, D. H. 1982. D. E. Rumelhart, and the PDP Research intelligence or cognition. The task of Connectionist Models and Their Proper- ties. Cognitive Science 6:205–254. Group. Cambridge, Mass.: MIT Press/Brad- building a natural language under- ford. standing system is not even remotely Fodor, J. A., and Pylyshyn, Z. W. 1988. Simon, H. A. 1969. The Sciences of the complete just because we have a Connectionism and Cognitive Architec- ture: A Critical Analysis. Cognition Artificial. Cambridge, Mass.: MIT Press. bucket of connectionist units and 28:3–71. weights or, equally, a universal Turing Smolensky, P. 1988. On the Proper Treat- Gibson, J. J. 1950. The Perception of the ment of Connectionism. Behavioral and machine in front of us. Sure, it is nice Visual World. Boston: Houghton-Mifflin. Brain Sciences, 11(1):1–23. to know they both provide a certain kind of universality (if, in fact, con- Hopfield, J. J., and Tank, D. W. 1985. Neu- nectionist architectures do), but ral Computation of Decisions in Optimiza-

34 AI MAGAZINE