Connectionist Models of Language Processing

Cognitive Studies Preprint 2003, Vol. 10, No. 1, 10–28 Connectionist Models of Language Processing Douglas L. T. Rohde David C. Plaut Massachusetts Institute of Technology Carnegie Mellon University Traditional approaches to language processing have been based on explicit, discrete representations which are difficult to learn from a reasonable linguistic environment—hence, it has come to be accepted that much of our linguistic representations and knowledge is innate. With its focus on learning based upon graded, malleable, distributed representations, connectionist modeling has reopened the question of what could be learned from the environment in the absence of detailed innate knowledge. This paper provides an overview of connectionist models of language processing, at both the lexical and sentence levels. Although connectionist models have been applied to the full and processes of actual language users: Language is as lan- range of perceptual, cognitive, and motor domains (see Mc- guage does. In this regard, errors in performance (e.g., “slips Clelland, Rumelhart, & PDP Research Group, 1986; Quin- of the tongue”; Dell, Schwartz, Martin, Saffran, & Gagnon, lan, 1991; McLeod, Plunkett, & Rolls, 1998), it is in their 1997) are no less valid than skilled language use as a measure application to language that they have evoked the most in- of the underlying nature of language processing. The goal is terest and controversy (e.g., Pinker & Mehler, 1988). This not to abstract away from performance but to articulate com- is perhaps not surprising in light of the special role that lan- putational principles that account for it. guage plays in human cognition and culture. It also stems in A major attraction of the connectionist approach to lan- part from the considerable difference in goals and methods guage, apart from its natural relation to neural computation, between linguistic and psychological approaches to the study is that the very same processing mechanisms apply across of language. This rift goes deeper than a simple dichotomy the full range of linguistic structure. This paper provides an of emphasizing competence versus performance (Chomsky, overview of connectionist models of language processing, at 1957)—it cuts to the heart of the question of what it means both the lexical and sentence levels. to know and use a language (Seidenberg, 1997). Traditional approaches to language processing have been Lexical Processing based on explicit, discrete representations which are difficult or impossible to learn from a reasonable linguistic environ- Phonological development ment (Gold, 1967). Therefore, it has come to be accepted that much of our linguistic representations and knowledge Although the use of language seems straightforward to is innate. With its focus on learning based upon graded, adult native speakers, an infant must solve numerous diffi- malleable, distributed representations, connectionist model- cult computational problems in learning to understand and ing has reopened the question of what could be learned from produce speech, stemming from the fact that speech is ex- the environment in the absence of detailed innate knowledge. tended in time, highly variable and, at a morphemic level, Although the need to learn internal representations poten- has no systematic relation to its underlying meaning. More- tially gives connectionist networks great power and flexibil- over, infants must learn to produce comprehensible speech ity, it also introduces limitations. These limitations are im- without any direct articulatory instruction or feedback. portant and, ideally, will reflect limitations observed in hu- Plaut and Kello (1999) proposed a framework for phono- man language processing. logical development in which phonology mediates among acoustic, articulatory, and semantic representations in the From a connectionist perspective, performance is not an service of both comprehension and production. A critical imperfect reflection of some abstract competence, but rather aspect of the approach is that, given the absence of direct ar- the behavioral manifestation of the internal representations ticulatory feedback, learning to produce speech is driven by indirect feedback derived from the comprehension system— that is, from the acoustic, phonological, and semantic con- sequences of the system’s own articulations (Locke, 1983; This research was supported by an NIH Integrated Behavioral Science Center Grant (MH64445, J. McClelland, PD), and by Menn & Stoel-Gammon, 1995; Studdert-Kennedy, 1993). NIMH NRSA MH65105. Correspondence regarding this article This is accomplished by learning an internal forward model may be sent either to Douglas Rohde ([email protected]), MIT, of the physical processes that relate articulation to acoustics NE20-437E, 3 Cambridge Center, Cambridge, MA 02139, USA, (Jordan & Rumelhart, 1992). Such a model is learned by or to David Plaut ([email protected]), Department of Psychology, executing a variety of articulations, predicting how they will Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA sound, and then adapting the model based on the discrepancy 15213-3890, USA. between this prediction and the actual resulting acoustics. In 1 2 ROHDE AND PLAUT the infant, the forward model is assumed to develop primarily rich languages like Hebrew (e.g., Frost, Deutsch, & Forster, as a result of reduplicated and variegated babbling in the sec- 2000) are typically interpreted as being problematic for the ond half of the first year (Vihman, 1996). Once developed, connectionist account. To evaluate whether this interpreta- the forward model can be used to convert acoustic feedback tion is valid, Plaut and Gonnerman (2000) carried out simu- (i.e, whether an utterance sounded right) into the articula- lations in which a set of morphologically related words vary- tory feedback necessary to train speech production (Perkell, ing in semantic transparency were embedded in either a mor- Matthies, Svirsky, & Jordan, 1995). An implementation of phologically rich or impoverished artificial language. They the framework, in the form of a simple recurrent network found that morphological priming increased with degree of (Elman, 1991a), learned to comprehend, imitate, and inten- semantic transparency in both languages. Critically, priming tionally name a corpus of 400 monosyllabic words, and its extended to semantically opaque items in the morphologi- speech errors in development were similar to those of young cally rich language (consistent with findings in Hebrew) but children. not in the impoverished language (consistent with findings in English). Such priming arises because the processing of all Morphology items, including opaque forms, is influenced by the degree of morphological organization of the entire system. These Most linguistic domains are quasi-regular in that there is findings suggest that, rather than being challenged by the considerable systematicity between inputs and outputs but occurrence of non-semantic morphological effects in mor- also numerous exceptions. A standard assumption is that phologically rich languages, the connectionist approach may systematic linguistic knowledge takes the form of explicit provide an explanation for the cross-linguistic differences in rules and that items which violate the rules are handled by a the occurrence of these effects. separate associative mechanism (see Pinker, 1999). Connec- tionist modeling provides an alternative view, in which all Word reading items coexist within a single system whose representations and processing reflect the relative degree of consistency in Many of the issues concerning quasi-regularity in mor- the mappings for different items. phological processing also arise in the context of word read- A key battleground in the debate between these two views ing. As in morphology, the spelling-sound correspondences of the language system has been the relatively constrained of English are highly systematic but admit many excep- domain of English inflectional morphology—specifically, tions (e.g., HAVE, PINT, YACHT) and, as in morphology, re- forming the past-tense of verbs. Rumelhart and Mc- searchers have proposed separate mechanisms for processing Clelland (1986) attempted to reformulate the issue away regular and exception items (Coltheart, Rastle, Perry, Lang- from a sharp dichotomy between explicit rules (add –ed; don, & Ziegler, 2001). e.g., WALK/WALKED) and exceptions (e.g., SING/SANG, Plaut, Seidenberg, McClelland, and Patterson (1996) de- DRINK/DRANK, GO/WENT), and toward a view that empha- veloped a series of connectionist simulations in support of an sizes the graded structure relating verbs and their inflections. alternative conception of language knowledge and process- They developed a connectionist model that learned a direct ing in which all items coexist within a single system whose association between the phonology of all types of verb stems representations and processing reflect the relative degree of and the phonology of their past-tense forms. Although this consistency in the mappings for different items. Different initial model had numerous limitations (Pinker & Prince, types of information about a word—orthographic, phonolog- 1988), many of these have been addressed in subsequent ical, and semantic—are represented as distributed patterns simulation work (Cottrell & Plunkett, 1995; MacWhinney over separate groups of units. In performing a task like read- & Leinbach, 1991; Marchman, 1993; Plunkett & Marchman, ing aloud, orthographic information

Connectionist Models of Language Processing

A Distributed Computational Cognitive Model for Object Recognition

The Place of Modeling in Cognitive Science

A Bayesian Computational Cognitive Model ∗

Cognitive Psychology for Deep Neural Networks: a Shape Bias Case Study

Cognitive Modeling, Symbolic

Connectionist Models of Cognition Michael S. C. Thomas and James L

Artificial Intelligence and Cognitive Science Have the Same Problem

Models of Cognition: Neurological Possiblity Does Not Indicate Neurological Plausibility

Artificial Neural Nets

A Cognitive Model of Drivers Attention

Using Recurrent Neural Networks to Understand Human Reward Learning

Cognitive Modeling for Intelligent Tutoring Systems