A Theory of Lexical Access in Speech Production
Total Page:16
File Type:pdf, Size:1020Kb
BEHAVIORAL AND BRAIN SCIENCES (1999) 22, 1–75 Printed in the United States of America A theory of lexical access in speech production Willem J. M. Levelt Max Planck Institute for Psycholinguistics, 6500 AH Nijmegen, The Netherlands [email protected] www.mpi.nl Ardi Roelofs School of Psychology, University of Exeter, Washington Singer Laboratories, Exeter EX4 4QG, United Kingdom [email protected] Antje S. Meyer Max Planck Institute for Psycholinguistics, 6500 AH Nijmegen, The Netherlands [email protected] Abstract: Preparing words in speech production is normally a fast and accurate process. We generate them two or three per second in fluent conversation; and overtly naming a clear picture of an object can easily be initiated within 600 msec after picture onset. The un- derlying process, however, is exceedingly complex. The theory reviewed in this target article analyzes this process as staged and feed- forward. After a first stage of conceptual preparation, word generation proceeds through lexical selection, morphological and phonolog- ical encoding, phonetic encoding, and articulation itself. In addition, the speaker exerts some degree of output control, by monitoring of self-produced internal and overt speech. The core of the theory, ranging from lexical selection to the initiation of phonetic encoding, is captured in a computational model, called weaverϩϩ. Both the theory and the computational model have been developed in interac- tion with reaction time experiments, particularly in picture naming or related word production paradigms, with the aim of accounting for the real-time processing in normal word production. A comprehensive review of theory, model, and experiments is presented. The model can handle some of the main observations in the domain of speech errors (the major empirical domain for most other theories of lexical access), and the theory opens new ways of approaching the cerebral organization of speech production by way of high-temporal- resolution imaging. Keywords: articulation, brain imaging, conceptual preparation, lemma, lexical access, lexical concept, lexical selection, magnetic en- cephalography, morpheme, morphological encoding, phoneme, phonological encoding, readiness potential, self-monitoring, speaking, speech error, syllabification, weaverϩϩ 1. An ontogenetic introduction tuned to the mother tongue (De Boysson-Bardies & Vih- man 1991; Elbers 1982). These exercises provided us with Infants (from Latin infans, speechless) are human beings a protosyllabary, a core repository of speech motor pat- who cannot speak. It took most of us the whole first year of terns, which were, however, completely meaningless. our lives to overcome this infancy and to produce our first Real word production begins when the child starts con- few meaningful words, but we were not idle as infants. We necting some particular babble (or a modification thereof) worked, rather independently, on two basic ingredients of to some particular lexical concept. The privileged babble word production. On the one hand, we established our pri- auditorily resembles the word label that the child has ac- mary notions of agency, interactancy, the temporal and quired perceptually. Hence word production emerges from causal structures of events, object permanence and loca- a coupling of two initially independent systems, a concep- tion. This provided us with a matrix for the creation of our tual system and an articulatory motor system. first lexical concepts, concepts flagged by way of a verbal la- This duality is never lost in the further maturation of our bel. Initially, these word labels were exclusively auditory word production system. Between the ages of 1;6 and 2;6 the patterns, picked up from the environment. On the other explosive growth of the lexicon soon overtaxes the protosyl- hand, we created a repertoire of babbles, a set of syllabic ar- labary. It is increasingly hard to keep all the relevant whole- ticulatory gestures. These motor patterns normally spring word gestures apart. The child conquers this strain on the up around the seventh month. The child carefully attends system by dismantling the word gestures through a process to their acoustic manifestations, leading to elaborate exer- of phonemization; words become generatively represented cises in the repetition and concatenation of these syllabic as concatenations of phonological segments (Elbers & Wij- patterns. In addition, these audiomotor patterns start res- nen 1992; C. Levelt 1994). As a consequence, phonetic en- onating with real speech input, becoming more and more coding of words becomes supported by a system of phono- © 1999 Cambridge University Press 0140-525X/99 $12.50 1 Levelt et al.: Lexical access in speech production logical encoding. Adults produce words by spelling them out child develops a system of lemmas,1 packages of syntactic as a pattern of phonemes and as a metrical pattern. This information, one for each lexical concept. At the same time, more abstract representation in turn guides phonetic en- the child quickly acquires a closed class vocabulary, a rela- coding, the creation of the appropriate articulatory gestures. tively small set of frequently used function words. These The other, conceptual root system becomes overtaxed as words mostly fulfill syntactic functions; they have elaborate well. When the child begins to create multiword sentences, lemmas but lean lexical concepts. This system of lemmas is word order is entirely dictated by semantics, that is, by the largely up and running by the age of 4 years. From then on, prevailing relations between the relevant lexical concepts. producing a word always involves the selection of the ap- One popular choice is “agent first”; another one is “location propriate lemma. last.” However, by the age of 2;6 this simple system starts The original two-pronged system thus develops into a foundering when increasingly complicated semantic struc- four-tiered processing device. In producing a content word, tures present themselves for expression. Clearly driven by we, as adult speakers, first go from a lexical concept to a genetic endowment, children restructure their system of its lemma. After retrieval of the lemma, we turn to the lexical concepts by a process of syntactization. Lexical con- word’s phonological code and use it to compute a phonetic- cepts acquire syntactic category and subcategorization fea- articulatory gesture. The major rift in the adult system still tures; verbs acquire specifications of how their semantic ar- reflects the original duality in ontogenesis. It is between the guments (such as agent or recipient) are to be mapped onto lemma and the word form, that is, between the word’s syn- syntactic relations (such as subject or object); nouns may ac- tax and its phonology, as is apparent from a range of phe- quire properties for the regulation of syntactic agreement, nomena, such as the tip-of-the-tongue state (Levelt 1993). such as gender, and so forth. More technically speaking, the 2. Scope of the theory In the following, we will first outline this word producing system as we conceive it. We will then turn in more detail to the four levels of processing involved in the theory: the activation of lexical concepts, the selection of lemmas, the morphological and phonological encoding of a word in its prosodic context, and, finally, the word’s phonetic encoding. In its present state, the theory does not cover the word’s ar- ticulation. Its domain extends no further than the initiation of articulation. Although we have recently been extending the theory to cover aspects of lexical access in various syn- left to right: Willem J. Levelt, Ardi Roelofs, and Antje S. tactic contexts (Meyer 1996), the present paper is limited Meyer. to the production of isolated prosodic words (see note 4). Willem J. M. Levelt is founding director of the Max Every informed reader will immediately see that the the- Planck Institute for Psycholinguistics, Nijmegen, and ory is heavily indebted to the pioneers of word production Professor of Psycholinguistics at Nijmegen University. research, among them Vicky Fromkin, Merrill Garrett, He has published widely in psychophysics, mathemati- Stephanie Shattuck-Hufnagel, and Gary Dell (see Levelt, cal psychology, linguistics and psycholinguistics. Among 1989, for a comprehensive and therefore more balanced re- his monographs are On Binocular Rivalry (1968), view of modern contributions to the theory of lexical ac- Formal Grammars in Linguistics and Psycholinguistics cess). It is probably in only one major respect that our ap- (3 volumes, 1974) and Speaking: From Intention to proach is different from the classical studies. Rather than Articulation (1989). basing our theory on the evidence from speech errors, Ardi Roelofs, now a Lecturer at the School of Psy- spontaneous or induced, we have developed and tested our chology of the University of Exeter, obtained his Ph.D. at notions almost exclusively by means of reaction time (RT) Nijmegen University on an experimental/computational research. We believed this to be a necessary addition to ex- study of lexical selection in speech production, per- isting methodology for a number of reasons. Models of lex- formed at the Nijmegen Institute for Cognition and ical access have always been conceived as process models Information and the Max Planck Institute. After a of normal speech production. Their ultimate test, we ar- postdoctoral year at MIT, he returned to the Max Planck Institute as a member of the research staff to further gued