<<

Deseret Language and Linguistic Society Symposium

Volume 27 Issue 1 Article 3

3-23-2001

Toward a Psychological Analysis of the Sentence from the Work of Lashley, Chomsky, Wundt, Polanyi, and Skousen's AML

Bruce L. Brown

Follow this and additional works at: https://scholarsarchive.byu.edu/dlls

BYU ScholarsArchive Citation Brown, Bruce L. (2001) "Toward a Psychological Analysis of the Sentence from the Work of Lashley, Chomsky, Wundt, Polanyi, and Skousen's AML," Deseret Language and Linguistic Society Symposium: Vol. 27 : Iss. 1 , Article 3. Available at: https://scholarsarchive.byu.edu/dlls/vol27/iss1/3

This Article is brought to you for free and open access by the Journals at BYU ScholarsArchive. It has been accepted for inclusion in Deseret Language and Linguistic Society Symposium by an authorized editor of BYU ScholarsArchive. For more information, please contact [email protected], [email protected]. Toward a Psychological Analysis of the Sentence from the Work of Lashley, Chomsky, Wundt, Polanyi, and Skousen's AML

Bruce L. Brown

ashley (1951) and Chomsky (1957) He pointed out that a given set of clearly demonstrated the inadequacy phonemes in spoken words (or of letters in L of "left-right" associationistic models typed words) can occur in a number of com­ in accounting for language and other kinds binations, such as the reverse combinations of holistically patterned behavior. Both right and tire (p. 115). Lashley then made the argued persuasively that the kind of holistic very obvious point that "the order must dependencies among elements that charac­ therefore be imposed upon the motor ele­ terize language cannot be explained ments by some organization other than direct through behavioristic S-R connections. associative connections between them" Chomsky (1957, 18-25) began his attack on (1951, 115). He further argued that words behavioristic theories of language by demon­ stand in relation to sentences as letters do in strating the inadequacy of Markov processes relation to words, and that words also have (a precise embodiment of S-R chaining theo­ no intrinsic temporal valence as implied by ry) in accounting for patterned sequences of the associative chaining models. Drawing behavior. In particular, he showed that the upon an analysis of the language translation kinds of holistic dependencies among ele­ process, he argued that this syntactic order is ments that characterize syntactic structures also not to be attributed to the thought in language could not be accounted for with process-the same thought can be expressed left-right associationistic models, but rather, with quite different temporal structures in would require a top-down hierarchical different languages. Translators translate approach. holistic thoughts, not word by word. As he In his influential paper on the problem of summarized: "the mechanism which deter­ serial order in behavior, Lashley (1951) made mines the serial activation of the motor units a similar case for the necessity of hierarchical is relatively independent, both of the motor explanation, but from a neurological point of units and (also) of the thought structure." view. The lines of his argument were quite Lashley (p. 115) argued that language is not different from Chomsky'S. Chomsky'S argu­ the only example of this kind of syntactically ment was essentially formal and based upon structured behavior, that a multitude of artificial models of logical mechanisms. skilled behaviors in man and other animals Lashley's argument was neurological, but display this kind of implicit hierarchical also conversational and straightforward. He structure and cannot be explained in terms of first reviewed a variety of anecdotal observa­ associative connections among the elements. tions concerning language and then asked Wundt (1912, Chap. 7, "Die Satzfiigung") what kind of neurological organization reasoned from a very different perspective. would be necessary to account for them. His primary task was to explain the formation

DLLS 2001 18 BRUCE L. BROWN

of sentences. He reasoned that any expla­ intelligence (AI) brand of cognitive nation of the sentence that focuses only psychology further built upon this upon its surface structure would obviously unbridled mechanistic mentalism, much be inadequate. He characterized the to the detriment of a truly cognitive sentence as "both a simultaneous and a approach to explanation. The excesses of sequential structure" (see p. 21 of the AI movement were at least as out­ Blumenthal). It is simultaneous because at rageous as those of the behaviorists a any given moment it is present in con­ decade or two earlier. The behaviorists sciousness as a totality even as the individ­ insisted on mechanistic explanations, but ual words are spoken. We focus upon the also on the law of parsimony. Neo­ whole of what we are saying even as the cognitivists seem to be willing to sacrifice words flow forth in a habitual way that is parsimony as long as a computer not introspectible to us. As he said: metaphor is satisfied, to guarantee mech­ anistic explanation. The sentence, however, is not an But parsimony still makes sense. image running with precision There is no reason to create complex, through consciousness where each burdensome explanations if simpler ones single word or single sound appears will suffice. Polanyi's characterization of only momentarily while the preced­ the nature of skills led the way for us ing and following elements are lost here. He began his discussion of the from consciousness. Rather, it stands psychology of skills (1962) with the as a whole at the cognitive level while trenchant statement: it is being spoken. If this should ever not be the case, we would irrevocably I shall take as my clue for this lose the thread of speech. (quoted in investigation the well-known fact Blumenthal 1970, 21) that the aim of a skilful performance is achieved by the observance of a set of Like Chomsky, Wundt held that any rules which are not known as such to explanation of the sentence that focuses the person following them. (49) only upon its surface structure would be obviously inadequate. But unlike He then went on to offer explanations Chomsky's position, both Wundt's of the physical principles underlying account and that of Lashley left open the swimming and riding a bicycle, but with question of whether the psychology the caveat that one certainly would not of the sentence requires one to posit have to understand those explanations to the literal existence of syntactical rules perform either of these skills. Either of in the human psyche. these skills is acquired tacitly through Clearly a strong case can be made trial and error, or through apprenticeship, for an explanation of patterned serial but without explicit awareness of the behavior that does not attribute it to principles involved. This approach to associative connections among the ele­ explaining the acquisition of skills ments. However, we cannot consider that (including linguistic skill) is consonant demonstration to be equivalent to making with influential theories of perception, the case for rule-based explanations. such as the "transactionalism" of Some have taken it this way. In particu­ Ames (1946) and Kilpatrick (1961) lar, the Chomskian approach put phrase and J. J. Gibson's theory (1966) of structure rewrite rules and transforma­ "direct perception." tional rules in center stage and imbued A full explanation of the principles them with ontological status, thus open­ involved in any of these skills is probably ing the way for a new era of mentalism in beyond our present scientific capability. the behavioral sciences. The new artificial However, Polanyi (1962) offered the TOWARD A PSYCHOLOGICAL ANALYSIS OF THE SENTENCE 19

following as a first approximation and ?~ysical decomposition is not a pre­ of the physical principles involved in reqUIs1te to performing the skill. There are riding a bicycle: many six- and seven-year-old children ",:"ho have mastered the skill of riding a Again, from my interrogations of b1cycle, but the explanation given above physicists, engineers and bicycle would probably mean very little to any of manufacturers, I have come to the them. Nor would it make sense to hypoth­ co~clusion that the principle by esize the existence of "rules" of this kind in wh1ch the cyclist keeps his balance is their heads, a kind of inborn, unconscious, not generally known. The rule unintrospectible BRAD ("bicycle riding observed by the cyclist is this. When acquisition device"). There are many ways he starts falling to the right he turns to explain what one is doing, some the handle-bars to the right, so that explicitly physical (such as the foregoing), the course of the bicycle is deflected some metaphorical, some complex, and along a curve towards the right. This some simple, but probably none of these results in a centrifugal force pushing levels of explanation fully captures or the cyclist to the left and offsets the exhausts what is actually going on. gravitational force dragging him I remember hearing Chomsky say in down to the right. This manoeuvre a talk at McGill University in 1967 that presently throws the cyclist out of the logical capability implicit in the lin­ balance to the left, which he counter­ gui~tic performance of a typical three-year acts by turning the handlebars to the old 1S more complex than the principles of left; and so he continues to keep him­ calculus. At the time I found that state­ sel~ in balance by winding along a ment preposterous. With thirty-five more senes of appropriate curvatures. A years of experience the statement now simple analysis shows that for a seems obvious and correct. The two sen­ given angle of unbalance the curva­ tences "They are easy to please" and ture of each winding is inversely "They are eager to please" at first seem proportional to the square of the alike in structure, and their surface structure ~peed at which the cyclist is proceed­ is similar. However, an impersonal trans­ mg. But does this tell us exactly how formation shows that they are very dif­ to ride a bicycle? No. You obviously ferent in deep structure: ''It is easy to cannot adjust the curvature of your please them," but not ''It is eager to please bicycle's path in proportion to them." Polanyi would explain this in the ratio of your unbalance over the terms of the contrast between tacit square of your speed; and if you knowledge and explicit knowledge. We could you would fall off the machine have a tacit apprehension of linguistic for there are a number of other factor~ principles of great depth and subtlety, to be taken into account in practice but we do not have explicit knowledge of which are left out in the formulation the principles involved. Chomsky'S subtle of this rule. Rules of art can be use­ and complex linguistic rules could be ful, but they do not determine viewed in this framework as being an the practice of an art; they are max­ explicit spelling out of the logic under­ ims, which can serve as a guide to an art only if they can be integrated into lying wh~t every person can do linguis­ tically w1thout taking thought, without the practical knowledge of the art. being able to introspect. T. G. R. Bower They cannot replace this knowledge. (49-50) (1977) and his colleagues have shown that P~aget's (1954) developmental stages . .Obviously, being able to explain bicycle for chIldren are much too conservative. ndmg at this high level of abstraction Infants and young children have a tacit 20 BRUCE L. BROWN

mastery of various cognitive tasks long stored linguistic experiences using this before they can give proper explicit one basic principle. Close approxima­ accounts, and Piaget made the mistake of tions to actual performance can be basing his stages on what children would achieved by adjusting the level of say, what they could explain. "imperfect memory." It is intriguing how One of the major approaches to such a simple hypothesized process can language and cognition to come forth in create complex behavior that could be the past thirty years is the work of the explained at the highest level in terms of Parallel Distributed Processing (PDP) a complex and subtle rule system of the Group (Rumelhart, McClelland, and the kind Chomsky has described. PDP Research Group 1986), so-called Both the connectionist models (PDP) "neural nets" or "connectionist models." and AML are what Skousen (1995, 227) The connectionist models capitalize on referred to as "procedural" as contrasted this "levels of explanation" approach, with rule approaches, which are "declar­ with the proposal that fairly simple ative." As procedural models, both AML associationistic mechanisms can be mod­ and PDP avoid the major conceptual eled on a computer to create close problems encountered in rule-based approximations to behavior that appears models. Skousen (1995) identified at least to be rule-governed. Chandler (1995) three such problems: rule-governed summarizes their major achievement: approaches cannot deal with "leakage" "They have shown that rule-like regulari­ across category boundries; they are not ties can emerge from the massed inter­ robust in dealing with missing informa­ action of relatively simple processes tion or ill-formed context in the way that operating on homogeneous networks of actual speakers are; and they are pushed information even though those networks to revert to a competence/ performance contain and refer to no explicit rep­ distinction to account in an ad hoc way resentations of those rules" (234-35). The for failures of the model to deal with real, strategy is an ingenious one, and it dynamic aspects of language. has won for D. O. Hebb's neurological AML has a number of features to behavioristic associationism (on which recommend it over other available PDP is based) a new hearing within con­ procedural language models. One is its temporary cognitive psychology. explicit incorporation of episodic memory Skousen's (1989) analogical modeling into the learning process. Another is its of language (AML) also accounts for potential to account for more general seemingly rule-governed behavior with­ perceptual processes beyond language. out recourse to explicitly represented Both Skousen (1995) and Chandler (1995) rules. The approach is based upon a very have pointed to a number of failings of the simple principle of "natural statistics": to connectionist models that AML seems to minimize the number of disagreements overcome. For one, connectionist models, (Skousen 1992). In the same way that the once trained, are deterministic and can­ complexities of hypothesized internalized not handle probability matching. linguistic rules can be avoided with this Furthermore, connectionist network approach, the complexities of statistical training can often require an inordinately decision theories can also be avoided. long time even for simple behaviors, can That is, there is no need to posit that the get stuck in local minima, and even when learner acquires some kind of "proba­ trained cannot adjust to learn new input bilistic rule" for dealing with linguistic but rather collapses into predicting categorization. Rather, his performance nonsense (the so-called "catastrophe can be accounted for by the simple propo­ problem"). AML, on the other hand, is sition that he samples from his own particularly good at probability matching TOWARD A PSYCHOLOGICAL ANALYSIS OF THE SENTENCE 21 in a way that corresponds to actual advantages of AML over neural net­ human language learners. Also, no training works. AML can be seen as a sophisticated is necessary, there are no local minima, extension of associationistic principles, and it adjusts well to new input, even one that makes them capable of accounting contradictory input. for seemingly rule-governed behavior. There are particular problems yet to Given the arguments for the superiority be solved in the application of AML. One of the AML approach to the modeling of of the biggest problems is computation­ human linguistic behavior, it could be al. With commonly used computational argued that this paper has come full circle methods, each variable that is added back to the associationistic approach essentially doubles the processing time criticized by Lashley and Chomsky. as well as the memory requirements of However, this is not just a case of "rocks the computer. Also, the notable suc­ break scissors, scissors cut paper, and cesses of AML have been in the more paper covers rocks." A better metaphor well-defined areas of / would be an upward spiral, where , , and morpholo­ the associationism implied in analogical gy. Application to more abstract and dif­ modeling represents a much higher ficult areas of and syntax has level of sophistication than the simple yet to be demonstrated. However, initial left-right associationistic chain theory work with syntax looks promising. that still falls vulnerable to the Lonsdale (2001), for example, has found Lashley / Chomsky critique. Nor does some success in translating from French it mean that with the continued to English using analogical cloning, ascension of analogical modeling we following the method of Jones (1996). would expect to witness the demise The probability matching aspect of of rule-governed approaches. In the analogical modeling is particularly concluding paragraphs of his fun­ interesting to psychologists in that it fore­ damental work on analogical modeling, shadows the possibility of higher level Skousen discussed the place of rule theoretical integration with other approaches: established principles of human and Despite the many arguments, both animal behavior. A case in point is the empirical and conceptual, in favor of well known matching law of Richard an analogical approach to the descrip­ Herrnstein (1961) whereby probabilities tion of language, there is a place for of response are found to match rule approaches too. An optimal rule probabilities of reinforcement. There description serves as a kind of meta­ are probably many linguistic examples language that efficiently describes of probability matching of this kind. past behavior and allows us to talk Tucker and his colleagues (1968), as one about that behavior. Whenever we example, have documented a linguistic attempt to summarize behavior or to probability matching in native French discover relationships in data, our speakers with respect to the categorization viewpoint is structuralist. But if we of grammatical gender of "artificial" wish to predict language behavior French words. They found a close match rather than just describe it, we must between the gender selection probabilities abandon rule approaches. Rule for various invented words and the gender descriptions have great difficulty in probabilities for words with the same explaining actual language usage. endings in Petit Larousse. Skousen (1995) (1989, 139) recognized this capacity of AML to deal with the ubiquitous phenomenon of Skousen went on to compare lan­ probability matching as one of the many guage rules with Boyle's Law and 22 BRUCE L. BROWN

Charles's Law as general physical laws Blumenthal, Arthur L. 1970. Language and psychology: that are only approximations to the real Historical aspects of . New York: John behavior of gasses. They are fairly accu­ Wiley and Sons. rate in accounting for gas molecules Bower, T. G. R. 1977. A primer of infant development. San acting in the aggregate under most Francisco: W. H. Freeman. conditions, yet they have no real existence Chandler, Steve. 1995. Non-declarative : Some except in the minds of scientists. He made neuropsychological perspectives. Rivista di this comparison with linguistic rules: Linguistica 7, no. 2: 233-47. Chomsky, Noam. 1957. Syntactic structures. The Hague: In no literal sense can it be said that Mouton. individual gas molecules follow these Gibson, James J. 1966. The senses considered as perceptual laws. In a similar way, linguistic rules systems. Boston: Houghton Mifflin. are meta-descriptive devices that Herrnstein, Richard J. 1961. Relative and absolute exist only in the minds of linguists. strength of response as a function of frequency of Speakers do not appear to use rules reinforcement. Journal of the Experimental Analysis of in perceiving and producing lan­ Behavior 4: 267-72. guage. Moreover, linguistic rules can Jones, Daniel. 1996. Analogical natural language processing. only explain language behavior for London: UCL Press. ideal si tua tions. As in physics, Kilpatrick, F. P. 1961. Explorations in transactional psychol­ an atomistic approach seems to be a ogy. New York: New York University Press. more promising method for predict­ Lashley, Karl S. 1951. The problem of serial order in ing language behavior. (1989, 140) behavior. In Cerebral mechanisms in behavior, ed. L. A. This is reminiscent of Polanyi's char- Jeffress. New York: John Wiley and Sons. 112-36. acterization of a skillful performance as Lonsdale, Deryle W. 2001. Recent advances in analogical being achieved by the observance of a set cloning. Presentation at the Deseret Language and of rules which are not known as such to Linguistics Society 2001 Symposium. Brigham Young University. the person following them. Linguistic Piaget, Jean. 1954. Origins of intelligence. New York: Basic behavior can be described in a general Books (original French edition, 1936). way by rules, but an analogical modeling Polanyi, Michael. 1962. Personal knowledge: Towards a post­ approach is probably much closer to the critical philosophy. New York: Harper and Row. actual psychological processes involved Rumelhart, David E., James L. McClelland, and the PDP and accounts better for actual linguistic Research Group. 1986. Foundations. Vol. 1 of Parallel behavior (performance). Skousen's illu­ distributed processing: Explorations ill the micro­ minating comments on the place of rules structure of cognition. Cambridge, MA: Mfr Press. and analog constitute a fitting conclusion Skousen, Royal. 1989. Analogical modeling of language. to his first published book on analogical Dordrecht: Kluwer Academic Publishers. modeling. They are also, perhaps, a Skousen, Royal. 1992. Analogy and structure. Dordrecht: promising prelude to the construction of Kluwer Academic Publishers. a serious account of the psychology of Skousen, Royal. 1995. Analogy: A non-rule alternative to the sentence, that mysterious process by neural networks. Rivista di Linguistica 7, no. 2: which our holistic thoughts are automati­ 213-31. cally converted into a string of words. Tucker, G. Richard, Wallace E. Lambert, Andre Rigault, and Norman Segalowitz. 1968. A psychological REFERENCES investigation of French speakers' skill with gram­ matical gender. Journal of Verbal Learnillg and Verbal Ames, Adelbert, Jr. 1946. Binocular vision as affected by Behavior 7: 312-16. relations between uniocular stimulus-patterns in Wundt, Wilhelm M. 1912. Die sprache. Book 2, vol. 1 of commonplace environments. American Journal of the Volkerpsychologie series. Leipzig: Engelmann. Psychology 59: 333-57.