<<

DOCUMENT RESUME

ED 366 020 CS 508 427

AUTHOR Fowler, Carol A., Ed. TITLE Spee:11 Status Report, January-March 1993. INSTITUTION Haskins Labs., New Haven, Conn. -REPORT NO SR-113 PUB DATE 93 NOTE 214p.; For the previous report, see ED 359 575. PUB TYPE Collected Works General (020) Reports Research/Technical (143)

EDRS PRICE MF01/PC09 Plus Postage. DESCRIPTORS Adults; *Articulation (); *Beginning ; Chinese; Research; Elementary Secondary ; French; Higher Education; Infants; Acquisition; Language Research; Music; Reading Relationship; *Speech Communication; Spelling; Thai IDENTIFIERS Speech Research

ABSTRACT One of a series of quarterly reports, this publication contains 14 articles which report the status and progress of studies on the nature of speech, instruments for its investigation, and practical applications. Articles in the publication are: "Some Assumptions about Speech and How They Changed" (Alvin M. Liberman); "On the Intonation of Sinusoidal Sentences: Contour and Pitch Height" (Robert E. Remez and Philip E. Rubin); "The Acquisition of Prosody: Evidence from French- and English- Infants" (Andrea G. Levitt); "Dynamics and Articulatory " (Catherine P. Browman and Louis Goldstein); "Some Organizational Characteristics of Speech Movement Control" (Vincent L. Gracco); "The Quasi-Steady Approximation in " (Richard S. McGowan); "Implementing a Genetic Algorithm to Recover Task-Dynamic Parameters of an Articulatory Speech Synthesizer" (Richard S. McGowan); "An MRI-Based Study of Pharyngeal Volume Contrasts in Akan" (Mark K. Tiede); "Thai" (M. R. Kalaya Tingsabadh and Arthur S. Abramson); "On the Relations between Learning to Spell and Learning to Read" (Donald Shankweiler and Eric Lundquist); "Word Superiority in Chinese" (Ignatious G. Mattingly and Yi Xu); "Prelexical and Postlexical Strategies in Reading: Evidence from a Deep and a Shallow Orthography" (Ram Frost); "Relational Invariance of Expressive Microstructure across Global Tempo Changes in Music Performance: An Exploratory Study" (Bruno H. Repp); and "A Review of 'Psycholinguistic Implications for Linguistic Relativity: A Case Study of Chinese' by Rumjahn Hoosain" (Yi Xu). (RS)

*********************************************************************** * Reproductions supplied by EARS are the best that can be made * * from the original document. * *********************************************************************** Status Report on Speech Research

U.S. DEPARTMENT OF EDUCATION Office of Educatronat Research and improvement EDUCATIONAL RESOURCES INFORMATION CENTER (ERIC) Anus document has been reproduced as recer,.ed from the person or organrtation originating 0 Mrnor changes have been made 10 imforOve reproduchon duality

Points of nrew of oprniona StafficS in MS clocu . ment bo not neCestahly represent official OE RI posrtron or coach

SR-113 JANUARY-MARCil 1993

BEST COPYAVAILABLE 2 Haskins Laboratories Status Report on Speech Research

SR-113 JANUARY-MARCH 1993 NEW HAVEN,

3 Distribution Statement Editor Carol A. Fowler Production Yvonne Manning Fawn Zefang Wang This publication reports the status and progress of studies on the nature of speech, instrumentation for its investigation, and practical applications. Distribution of this document is unlimited. The information in this document is available to the general public. Haskins Laboratories distributes it primarily for library use. Copies are available from the National Technical Information Service or the ERICDocument Reproduction Service. See the Appendix for order numbers of previo- is Status Reports. Correspondence concerning this report should be addressed to the Editor at the address below: Haskins Laboratories 270 Crown Street New Haven, Connecticut 06511-6695 Phone: (203) 865-6163FAX: (203) 865-8963 Bitnet: HASKINSOYALEHASK Internet: H ASKINS%[email protected] .11. This Report was reproduced on recycled paper

SR-113 January-March 1993 Acknowledgment

The research reported here was made possible in partby support from the following sources:

National Institute of Child Health and Human Development Grant HD-01994 Grant HD-21888

National Institute of Health Biomedical Research Support Grant RR-05596

National Foundation Grant DBS-9112198

National Institute on Deafness and Other Communication Disorders Grant DC 00121 Grant DC 00865 Grant DC 00183 Grant DC 01147 Grant DC 00403 Grant DC 00044 Grant DC 00016 Grant DC 00825 Grant DC 00594 Grant DC 01247

SR-113 January-March 1993 iii

5 Investigators Technical Staff

Arthur Abramson* Michael D'Angelo Peter J Alfonso* Vincent Gulisano Eric Bateson* Donald Halley Fredericka Be ll-Berti* Yvonne Manning Catherine T. Best* William P. Scully Susan Brady* Fawn Zefang Wang Catherine P. Browinan Edward R. Wiley Claudia Care Ile Franklin S. Cooper* Stephen Crain* Administrative Staff Lois G. Dreyer* Alice Faber Philip Chagnon Laurie B. Feldman* Alice Dadourian Janet Fodor* Betty J. DeLise Anne Fowler* Lisa Fresa Carol A. Fowler* Joan Martinez Louis Goldstein* Carol Gracco Vincent Gracco Students* Katherine S. Harris4 John Hogden Melanie Campbell Leonard Katz* Sandra Chiang Rena Arens Krakow* Margaret Hall Dunn Andrea G. Levitt* Terri Erwin Alvin M. Liberman* Joseph Kalinowski Diane Li llo-Martin* Laura Koenig Leigh Lisker* Betty Kollia Anders Lofqvist Simon Levy Ignatius G. Mattingly* Salvatore Miranda Nancy S. McGarr* Maria Mody Richard S. McGowan Weijia Ni Patrick W. Nye Mira Peter Kiyoshi OshirnaT Christine Romano Kenneth Pugh* Joaquin Romero Lawrence J. Raphael* Dorothy Ross Bruno H. Repp Arlyne Russo Hy la Rubin* Michelle Sander Philip E. Rubin Sonya Sheffert Elliot Saltzrnan Caroline Smith Donald Shankweiler* Brenda Stone leffrey Shaw Mark Tiede Michael Studdert-Kennedy* Qi Wang Michael T. Turvey* Yi Xu .Douglas Whalen Elizabeth Zsiga

*Part-time tVisiting from , Japan

SR-113 January-March 1993 Contents

Some Assumptions about Speech and How They Changed Alvin M. Liberman 1 On the Intonation of Sinusoidal Sentences: Contour and Pitch Height Robert E. Remez and Philip E. Rubin 33 The Acquisition of Prosody: Evidence from French- and English-Learning Infants Andrea G. Levitt 41 Dynamics and Articulatory Phonology Catherine P. Browman and Louis Goldstein 51 Some Organizational Characteristics of Speech Movement Control Vincent L. Gracco 63 The Quasi-steady Approximation in Speech Production Richard S. McGowan 91 Implementing a Genetic Algorithm to Recover Task-dynamic Parameters of an Articulatory Speech Synthesizer Richard S. McGowan 95 An MRI-based Study of Pharyngeal Volume Contrasts in Akan Mark K. Tiede 107 Thai M. R. Kalaya Tingsabadh and Arthur S. Abramson 131 On the Relations between Learning to Spell and Learning to Read Donald Shankweiler and Eric Lundquist 135 Word Superiority in Chinese Ignatius G. Mattingly and Yi Xu 145 Pre lexical and Post lexical Strategies in Reading: Evidence from a Deep and a Shallow Orthography Ram Frost 153 Relational Invariance of Expressive Microstructure across Global Tempo Changes in Music Performance: An Exploratory Study Bruno H. Repp 171 A Review of Psycho linguistic Implications for Linguistic Relativity: A Case Study of Chinese by Rumjahn Hoosain Yi Xu 197

Appendix 209

SR-113 January-March 7993 v

7 Haskins Laboratories Status Report on Speech Research

8 Haskins I.boratories Status Report on Speech Research 1993,SR-113, 1-32

Some Assumptions aboutSpeech and How They Changed*

Alvin M. Liberman

My aim is to provide a brief account of the re- requirements of phonological communication and search on speech at Haskins Laboratories as seen to all that is entailed by the fact that speech is a from my point of view. In pursuit of that aim, I species-typical product of biological . The will scant most experiments and their outcomes, consequence for development of wasthat, putting the emphasis, rather, on the changing as- with the one exception just noted, I never changed sumptions that, as I understood them, guided the my mind abruptly, but rather seguedfrom each research or were guided by it. I will be particu- earlier position to each later one, leaving behind larly concerned to describe the development of no clear sign to mark the time, the extentof the those assumptions, hewing as closely as I can to change, or the reason for making it. Faced now the order in which they were made, and then ei- with having to describe a theory that was in ther abandoned or extended. almost constant flux, I can only offer what I must, My account is necessarily inaccurate, not just in retrospect, make to seem a progress through because I must rely in part on about my distinct stages. state of mind almost 50 years ago when, in the My account is also necessarily presumptuous, earliest stages of our work, we did not always because I will sometimes say 'we' when I probably make our underlying assumptions explicit, but, should have said 'I', and vice versa. In so doing, I even more, for reasons that put a proper account am not trying to avoid blame for badideas that beyond the reach of any recall, however true. The were entirely mine, nor to claim creditfor good chief difficulty is in the relation between the ideas that I had no part in hatching. It is, rather, theoretical assumptions and the research they that everything I have done or thought in research were supposed to rationalize. Thus, ithappened on speech has been profoundlyinfluenced by my only once, as I see it now, that the assumptions colleagues at the Laboratories. In most cases, changed promptly in response to the results of a however, I can't say with certainty just how or by particular experiment; in all other cases, they whom. A consequence is that, of all the words I lagged behind, as reinterpretations of data that use in this chronicle, the most ambiguous byfar had accumulated over a considerable perkod of are 'I' and 'we'. time. Moreover, theory was influenced, not only by 'I' began to be confused with 'we' on a day in our empirical findings, but equally, if even more June, 1944, when I was offered a job at the belatedly, by general considerations of plausibility Laboratories to work with Frank Cooper on the that arose when, stimulated by colleagues and by development of a reading machine for the blind, a the gradual broadening of my own outlook, I device that would convert printed letters into began to give proper consideration to the special intelligible sounds. Of course, we appreciated from the outset that the ideal machine would render the print as speech that the blind user had already mastered. Though that is done quite The preparation of this paper was aided by the National routinely now, it was, in 1944, far beyond the Institute of Child Health and Human Development under science and that was available to us. Grant HD 01994. Since the paper is a very personal account of There were no optical character readers to identify the research that has, since 1965, been supported by that the letters, and no rules for synthesis that would agency, I am moved here to offer special ..nanks to the agency for its long-continued help, and to Dr. James Kavanagh, a have enabled us to produce speech from their member of its staff, for the kind and wise counsel that he has, outputs. But, reassured by our assumptions about for so many years, given me. the relation of speech to language, of which more

9 2 Liberman later, we did not think it critical that the machine us, not only in the several stages of the reading should speak. Rather, we supposed that it had machine work, but also in the development of, and only to produce distinctive sounds of some kind early work with, a research synthesizer we called that the blind would then learn to associate with the . the consonants and vowels of the language, much The assumptions about speech that have been as we supposed they had done at an earlierstage made by us and others differ in many details. As I of their lives with the sounds of speech. Thus see it now, nowever, there is one question that conceived, our enterprise lay in the domains of stands above those details, dividing auditory and discrimination learning, neatly into two categories: does speech recruit two subjects I had presumably mastered as a motor and perceptual processes of a general sort, graduate student at Yale. Indeed, my dissertation processes that cut horizontally across a wide had been basedon experimentsabout variety of ; or does it belong to a special discrimination learning to acoustic stimuli, and I phonetic mode, a distinct kind of action and was thoroughly familiar with the neobehaviorist, perception comprising a vertical arrangement of -response theory that was thought by my structures and mechanisms specifically and professors and me to account for the results. In exclusively adapted for linguistic communication? the course of my eduLation in , I had I will use this issue as the basis for organizing my learned nothing about speech, but I didn't think account of the assumptions we made, calling them that mattered, because the theory I had absorbed 'horizontal' or 'vertical' according to the side of the was supposed, like most other theories in divide on which they fall. psychology, to apply to everything.I was, therefore, enthusiastic about the job, confident THE HORIZONTAL VIEW OF SPEECH that I knew exactly how to make the reading AND THE DESIGN OF READING machine serve its intended purpose and so put the MACHINES theory to practical use. In the event, the theory, and, indeed, virtually everything else I had As it pertains to the perceptual side of the learned, proved to be, at best, irrelevant and, at speech process, the horizontal view, which is how worst, misleading. I think it unlikely that I would we saw speech at the outset, rests on three ever have discovered that had it not been for two assumptions: (1) The constituents of speech are fortunate, as they proved to be, circumstances. sounds. (2) Perception of these sounds is managed One was the collaboration, from the outset with by processes of a general auditory sort, processes Frank Cooper, whose gentle but nonetheless that evoke percepts no different in kind from those insistent prodding helped me to accept that the produced in response to other sounds. (3) The theory might be wrong, and to see what might be percepts evoked by the sounds of speech, being more nearly right. The other was that speech lay inherently auditory, must be invested with constantly before me, providing an existence proof phonetic significance, and that can be done only that language could be conveyed efficiently by by means of a cognitive translation. Accordingly, sound, and thus setting the high standard by the horizontal view assumes a second stage, which I was bound to evaluate the performance of beyond perception, where the purely auditory the acoustic substitutes for speech that our early percepts are given phonetic names, measured assumptions led us to contrive. But for Frank against phonetic prototypes, or associated with C000per, on the one hand, and speech on the 'distinctive features'. Seen this way, perceiving other, I might still be massaging those substitutes speech is no different in principle from reading and modifying the conditions of learning, satisfied script; listener and reader alike perceive one thing to achieve trivial improvements in systems that and learn to call it something else. were, by comparison with speech, hopelessly These assumptions wereand, perhaps, still inadequate. As it was, experience in trying to find areso much a part of the received view that we an alternative set of sounds brought Frank and, could hardly imagine an alternative; it simply did ultimately, me to the conclusion that speech is not occur to us to question them, or even to make uniquely effective as an acoustic vehicle for them explicit. At all events, it was surely these language. It remained only to find out why. conventional assumptions that gave us confidence But I get ahead of the story. To set out from the in our assumption that nonspeech sounds could be proper beginning, I should say why we initially made to work as well as speech, for what they believed there was nothing special about speech or clearly implied was that the processes by which its underlying processes, and where that belief led blind users would learn to 'read' our sounds would Some Assumptions about Speech and How They Changed 3 differ in no important way from those by which only that a blind woman in Scotland had been able they had presumably learned to perceive speech. to make her way through simple newspaper text. Our task, then, was simply to contrive sounds of We did know that the machine was not in use sufficient distinctiveness, and then design the anywhere in 1944, so we could only assume that it procedures by which the blind would learn to had been found wanting. At all events, the lesson I associate them with language. took from the Optophone was not that an acoustic alphabet is no subtitute for speech, but that the Auditory discriminability is all it takes particular alphabet produced by the Optophone As for distinctiveness, I was, at the outset, was not a good one. Moreover, it seemed likely, firmly in the grip of the notion that it was just so given the early date at which the machine had much psychophysical discriminability, almost as if been made, that the signal was less than ideally it were to be had simply by separating the stimuli clear. I, perhaps more than Frank, was sure we one from another by a sufficient number of just- could do better. (Indeed, I think that Frank, even noticeable-differences. This notion fit all too nicely at this early stage, had reservations about any with our ability, even then, to engineer a print-to- machine that read letter by letter, but he shared soundmachinethatwouldmeetthe my belief that we could design a letter-reading discriminability requirement, for such a machine machine good enough to be of some use.) had only to control selected parameters of its Doing better, as we conceived it then, simply acoustic output according to what was seen by a meant producing more learnable sounds. photocell positioned behind a narrow scanning Unfortunately, there was nothing in the literature slit. Provided, then, that we chose wisely among on auditory perception to tell us how to do that, so the many possible ways in which the sound could we set out to explore various possibilities. For that be made to vary according to what the photocell purpose, Frank built a device with which we could saw, the auditory results would be discriminable, simulate the performance of almost any direct hence learnable. translator, with the one limitation that the In all these machines the sound would be subjects of our experiments were not free to controlled directly by the amount or vertical choose the texts or to control the rate of scan. position of the print under the scanning slit. We (This was because the print had to be scanned by therefore called them 'direct translators' to passing a film negative of the text across a fixed distinguish them from an imaginable kind we slit.) Otherwise, the simulator was quite adequate called 'recognition machines', having in mind a for producing the outputs we would want to screen device that might one day be available to identify before settling on those that deserved further each letter, as present-day optical character testing. readers do, and produce in response a preset For much of the initial screening, Frank and I sound of any conceivable type. Since we were in no made the evaluations ourselves after determining position to build a recognition machine, we set our roughly how well we could discriminate the sights initially on a direct translator. sounds for a selected set of letters and words. On We were aware that a reading machine of the this basis, we quickly came to the conclusion that direct-translation kind had been constructed and there was, perhaps, some promise in a system that tested in England just after World War I. This used the print to modulate the frequency of the machine, called the Optophone, scanned the print output signal. (Certainly, this was very much with five beams of light, each modulated at a better than modulating the amplitude, which had different audio frequency. When a beam been tried first.) Having made this early decision, encountered print, a tone was made to sound at a we experimented with different orientations and frequency (hence, pitch) equal to the frequency at widths of the slit, the shape of the sound wave, which that beam was modulated. Thus, as the and the range of modulation, selecting, finally, a scanning slit and its beams were moved across the vertical orientation of a slit that was somewhat print, the user would hear a continuously narrower than, say, the vertical bar of a lower changing musical chord, the composition of which case and a sine wave with a modulation range depended, instant by instant, on the modulation of 100 to 4000 Hz. frequencies of the beams that struck print. The Further informal tests with this frequency- user's task, of course, was to learn to interpret the modulation (FM) system showed that it failed to changing chords as letters and words. I recall not provide a basis for discriminating certain letters ever knowing what tests, if any, had been carried (for example, n and u), which led us to conclude out to determine just how useful the machine was, that it would likely prove in more thorough tests

1 1 4 Liberman to be not good enough. This conclusion did not this new language Vuhzi', in honor of one of the discourage me, however, for I reckoned that the transposed words. difficulty was only that the signal variation, being The results of the learning test were simple one-dimensional, was not sufficiently complex. enough: of all the nonspeech signals tested, the Make the signals more complexly multi- original, simple FM and the original Optophone dimensional, I reasoned, so that somewhere in the were the most nearly learnable; all others trailed complexity the listener would find what he needed far behind. Worst of all, and by a considerable as the basis for associating that signal with the margin, were the extremely complex signals of the letter it represented. I recall that here, too, Prank Super FM, the system for which I had entertained was skeptical, but, open minded as always, he such high hopes. As for Wuhzi, the transposed agreed to try more elaborate arrangements. In speech, it was in a class of its own. It was one, the Dual FM, the slit was divided into an mastered (for the purposes of the test) in about 15 upper and lower half (each with its own photocell), trials, in contrast to the nonspeech sounds, on so that two tones, one in the upper part of the which subjects were, after 24 trials, performing at 4000-Hz range, the other in the lower, would be a level of only 60% with the simple FM and independently modulated. In another (Super FM), Optophone signals, and at 50% or lower for the the slit was divided into thirds (each with its own others, and all seemed at, or close to, their photocell), and the difference between the output asymptotes. More troubling was the fact that of the middle cell and the sum of the upper and learnability of the nonspeech signals went down lower cells controlled the (relatively slow) appreciably as the rate of scan was increased, frequency modulations of a base tone, while the even at modest levels of rate. What should have sum of all three cells controlled an audio been most troubling, however, were the frequency at which the base tone was itself indications that learning would be rate specific. frequency modulated. The effect of this latter Though we did not pursue the point, it seemed modulation was to create side bands on the base clear in my own experience with the FM system frequency at intervals equal to the modulation that what I had learned to 'read' at a rate of, say, frequency, and thus cause frequent, sudden, and 10 words per minute, did not transfer to a rate of usually gross changes in timbre and pitch as a 50. My impression, as I remember it now, was not consequence of the amount and position of the that things sounded much the same, only speeded print under the slit. At all events, the signals of up, but rather that the percept had changed the Super FM seemed to me to vary in entirely, with the result that I could no longer wonderfully complex ways, and so to provide a fair recognize whatever I had learned to respond to at test of my assumption about the necessary the slower speed. To the extent that this condition for distinctiveness and learnability. We observation was correct, users would have had to also simulated the Optophone, together with a learn a different nonspeech "language' for every variant on it in which, in a crude attempt to create significantly different rate of reading, and that consonant-like effects, we had the risers and alone would have made the system impractical. It descenders of the print produce various hisses and also became c1car that at rates above 50 or so clicks. We even tried an arrangement in which the words per minute, listeners could identify the print controlled the frequency positions of two words only on the basis of some overall formant-like bands, giving an impression of impressionI am reluctant to call it a patternin constantly changing vowel color. which the individual components were not readily We tested all of these systems, together with retrievable. The consequence of this, though we three or four others of similar type, by finding how did not take proper account of it at the time, was readily subjects could learn to associate the that perception of the nonspeech signals would sounds with eight of the most common four-letter have been, at best, logophonic, as it were, not words. To determine how well these systems did phonologic, so users would have had to learn to in relation to speech, we applied the same test to identify as many sounds are there were words to an artificial language that we spoke simply by be read, and would not have be-n able to fall back transposing phonological segments. Thus, vowels on their phonologic resources in order to cope with were converted into vowels, stops into stops, new words. fricatives into fricatives, etc., so the syllable At this point it had become evident that, as structure remained unchanged, hence easy for our Frank had been saying for some time, perception human reading machine to pronounce. We called of the acoustic alphabet produced by any letter- Some Assumptions about Speech and How Theu Changed 5 reading machine would be severely limited by the Having concluded at last that, given the proper- temporal resolving power of the earthat is, by ties of the auditory system, letter-by-letter read- its poor abililty to segregate and properly order ing machines were destined to be unsatisfactory, acoustic events that are presented briefly and in we experimented briefly with a system in which rapid succession. Taking account of the number of the -,ound was controlled by information that had discrete frequency peaks in each alphabetic sound been integrated across several letters. This was and the average number of alphabetic characters intended to reduce the rate at which discrete per word, we estimated the maximum rate to be acoustic events were delivered to the ear, and so around 50 words per minute, which I now believe circumvent the limitation inposed by its temporal to have been very much on the high side of what resolving power. I cannot now imagine what we might have been achieved. Still, 50 words per thought about the consequences of having to as- minute would be quite useful for many purposes, sociate holistically different sounds with linguisti- and we had a considerable investment in the cally arbitrary conjunctions of letters. But what- letter-reading direct-translator kind of system, so ever it was that we did or did not have in mind, we felt obliged to see how far a subject could go, this integrating type of reading machine failed ev- given control of the scanning rate, a great variety ery tz ,si.. and was quickly abandoned. of printed material, and much practice. Frank One other scheme we tested deserves mention if therefore built a working model of a direct- only because it now seems incredible that we translating FM reading machine, and we set should ever have considered it. I suppose we felt several subjects to work trying to master it. After obliged to look forward to the time when optical 90 hours of practice, our best subject was able to character readers would be available, and so to manage a fifth-grade reader (without correction) test some presumably appropriate output of the at an average speed of about four words per 'recognition' machine that would then be ,-)ossibie. minute. Moreover, the subject seemed to have Prerecorded '' seemed an obvious choice. reached her peak performance during the first 45 After all, the acoustic manifestations of phonemes hours. Comparing performance at the end of that were known to be distinctive, and people had, on period with performance after 90 hours, we found the horizontal view, already learned to connect no increase in rate and no decrease in errors. them to language. As for difficulty with rate, we Tests of the same system conducted by.John Flynn must have supposed that these sounds carried at the Naval Medical Research Laboratory yielded within them the seeds of their own integration results that gave no more grounds for optimism. into larger wholes. At all events, we recorded I must not omit from this account a reference to various 'phonemes' on film`duh' for /d/, `luh' for our experience with a sighted subject named (dare /1/, etc.and then carefully spliced the pieces of I say, appropriately?) Eve, who, having first film together into words and sentences. The result produced a properly shaped learning curve, was wholly unintelligible, even at moderate rates, attained a reading speed of over 100 words per and gave every indication of forever remaining so. minute, and convincingly demonstrated her skill before the distinguished scientists who were Perhaps auditory patterning is the answer sponsoring our research. In that demonstration, From all this experience with nonspeech we as in all of her work with the machine, she wore drew two conclusions, one right, one wrong. The blacked-out welder's goggles so she would not right conclusion was that an acoustic alphabet is suffer the discomfort of having to keep her eyes no way to convey language. Of course, it was shut tight. We all remember the day when one of humbling for me to realize that I might have our number, acting on what seemed an reached that conclusion without all the work on ungentlemanly impulse, put an opaque screen the reading machines, simply by measuring the between her and the print she was 'reading', at known temporal-resolving characteristics of the which point she bounded from her chair and fled ear against the rate at which the discrete sounds the lab, confessing later to the young man who would have to be perceived. As I have already had been monitoring her practice sessions that, b:' suggested, Frank must have thought this through, leaning her cheek against her fist, she had, from though he might not have been sufficiently pes- the very beginning, been raising the bottom of the simistic about the limits, and that would account goggles and so seeing the print. In designing the for his early opinion that an acoustic alphabet training and testing procedures, I had controlled might be at least marginally useful. I, on the other for every possibility except that one. Obviously, I hand, took the point only after seeing our early re- was the wrong kind of psychologist. sults and having my nose rubbed in them.

13 6 Liberman

The wrong conclusionwrong because it was know where else to search for clues. At this stage, still uncompromisingly horizontal in outlookwas however, our interest lay in auditory perception. that what we needed was not discriminability but Since we barely knew where to begin, we proper auditory patterning. Speech succeeds, we expected to rely, especially at the outset, on trial decided, because its sounds conform to certain and error. Accordingly, we had to have some easy Gestalt-like principles that yield perceptually dis- way to produce and modify a large number and tinctive patterns, and cause the phonetic elements wide variety of acoustic patterns. The sound to be integrated into the larger wholes of syllables spectrograph and the spectrograms it produced and words. Apparently, it did not occur to me that were no longer a wartime secret, and, like many phonological communication would be impossible others, we were impressed with the extent to if the discrete and commutable elements it re- which spectrograms made sense to the eye. quires were to be absorbed into indissoluble Beyond a doubt, they provided an excellent basis Gestalten. Nor did I bother to wonder how speak- for examining speech; it seemed to us a small step ers might have managed their articulators so as to to suppose that they would serve equally well for force all of the many covarying but not-indepen- the purpose of experimenting with it. The dently-controllable aspects of the acoustic signal experimenter would have immediately available to collaborate in such a way as to produce just the patterned essence of the sound, and thus those sounds that would form distinctive wholes. easily grasp the parameters that deserved his In any case, nobody knew what the principles of attention. These would be changed to suit his auditory pattern perception were. Research in aims, and he would then have only to observe the audition, unlike the corresponding effort in vision, effects of the experimental changes on the sound had been quite narrowly psychophysical, having as heard. But that required a complement to the been carried out with experimentsd stimuli that spectrographthat is, a device that would convert were not sufficiently complex to reveal such spectrograms, as well as the changes made on pattern perception as there might be. So if, for the them, into sound. purposes of the reading machine, we were going to It was in response to these aims and needs that make its sounds conform to the principles that the Pattern Playback was designed and, in the underlie good auditory patterns, we had first to case of its most important components, actually find the principles. We therefore decided to turn built by Frank Cooper. My role was simply to offer away from the reading machine work until we occasional words of encouragement, and, as the should have succeeded in that search. We were device took shape, to appreciate it as a triumph of the more motivated to undertake it, because the design, a successful realization of Frank's outcome would not only advance the construction determination to provide experimental and of a reading machine, but also count, more conceptual convenience to the researcher. The generally, as a valuable contribution to the science need for that convenience is, I think, hard to of auditory perception. appreciate fully, except as one has lived through the early stages of our speech research, when little THE PATTERN PLAYBACK: MAKING was known about acoustic , and progress THE RIGHT MOVE FOR THE WRONG depended critically, therefore, on the ability to test REASON guesses at the rate of dozens per day. It is exactly true, and important for understand- Seven years elapsed between the start of our ing how horizontal our attitude remained, to say enterprise and the publication of the first paper that our aim at this stage was to study the percep- for which we could claim significance. I would tion of patterned sounds in general, not speech in therefore here record that the support of the particular. As for the reading machine, we aid not HaskinsLaboratories and theCarnegie suppose we would use our results to make it Corporation of New York was a notable. and, to speak, only that we would have it produce sounds Frank and me, essential exercise of faith and that would be as good as speech because they patience, Few universities or granting agencies would conform to the same principles of auditory would then have been, or would now be, similarly perception. These would be sounds that, except for supportive of two young investigators who were the accidents of language development, might as trying for such a long time to develop an unproved well have been speech. Obviously, we would look method to investigate a questionwhat is special to speech for examples of demonstrably well-pat- about speech that it works so well?that was not terned sounds. Indeed, we would be especially at- otherwise being asked. I managed to survive in tentive to speech, but only because we did not academe by changing universities frequently, thus

1 4 Some Assumptions about Speech and How They Changed confusing the promotion-tenure committees, and the relief I feltFrank probably had more faith also by loudly trumpeting two rat-learning studies and therefore experienced correspondingly less that I published as extensions of my thesis relief--when, operating from a negative of a research. spectrogram, the Playback produced, on its first There were many reasons it took so long to get try, a highly intelligible sentence. from aim to goal. One was the failure, after two But that was only to pass the first test. For years of work, of the first model of the device that using these 'real' spectrograms as a basis for was to serve as our primary research tool. It was experimenting with speech would have been the second and very different model (hereafter the awkward in the extreme, since it would have been Pattern Playback or simply the Playback) that very hard to make changes on the film. We was, for our purposes, a success. In the form in therefore began immediately to develop the ability which it was used for so many years, this device to use hand-painted spectrograms to control the captured the information on the spectrogram by sound. For that purpose, we had first to find a using it to control the fate of frequency-modulated paint that would wet the acetate tape (i.e., not light beams. Arrayed horizontally so as to corre- bead up), be easily erased (without leaving scars), spond to the frequency scale of the spectrogram, and dry quickly. (The last requirement became there were 50 such modulated beams, comprising nearly irrelevant when, quite early in the work, the first 50 harmonics of a 120 Hz fundamental. we acquired a hair-dryer.) Unable to find a Those beams that were selected by the pattern of commercially available paint that met our the spectrogram were led to a phototube, the re- specifications, we became paint manufacturers, sulting variations of which were amplified and making our way by trial and error to a satisfactory transduced to sound. In effect, the device was, as product. That much we had to do. But there was someone once said, an optical player piano. time spent in other preliminaries that proved to When the Pattern Playback was conceived, we be quite unnecessary. Thus, overestimating the thought we might not be able to work from hand- degree of fidelilty to the original spectrogram we painted copies of spectrograms, but only from real would need, we assumed we would have to control ones. Therefore, we needed spectrograms on film the relative intensities of the several parts of the so that, given photographic negatives, the photo- pattern, and so devoted considerable effort to tube would see the beams that passed through the preparing and calibrating paints of various film. (When operating from hand-painted pat- reflectances. We also supposed that we would terns, the phototube would receive the beams that have to produce gradations of a kind that could were reflected from the paint.) For convenience in best be done with an airbrush, so we fiddled for a manipulating the patterns, we also wanted the time with that. frequency and time scales to be of generous di- But then, having decided that the price of mensions. Moreover, we thought at this stage that fidelity was too high, we simply began, with an we would need spectrograms with a large dynamic artist's brush and our most highly reflecting paint, range. None of these needs was fully met by the to make copies of spectrographic representations spectrograph that had been built originally at the of sentences, using for this purpose a set of twenty Bell Telephone Laboratories, and it was not avail- that had been developed at Harvard for work on able for sale in any case, so Frank set about to de- speech intelligibility. Taking pains with our first sign and build our own. Unfortunately for the copies to preserve as much detail as possible, we progress of our research, that took time. As for the succeeded in producing patterns that yielded what Playback itself, it was, as our insurance inspector seemed to us a level of intelligibility high enough remarked when first he saw it, 'homemade'. And, to make the method useful for research. We were indeed it was. Only the raw materials were store- then further encouraged about the prospects of bought. Everything else was designed and fash- the method when, after much trial and error, we ioned at the Laboratories, including, especially, discovered that we could achieve even greater the tone wheel, the huge circular film with 50 con- intelligibilityabout 85% forthe twenty centric rings of variable density used to produce sentenceswith still simpler and more highly the light beams that played on the spectrograms. schematized patterns. Once the special-purpose spectrograph and the Having got this far, we ran a few preliminary Playback were, at last, in working order, we had experiments no different in principle from those, to determine that speech could survive the rigors very common at the time, in which intelligibility of the transformations wrought first by the one was measured as a function of the filtering that machine and then by the other. I well remember the speech signal had been subjected to. But

15 8 Liberman instead of filtering, we presented the formants one all horizontal views has remained till now a well- at a time and in all combinations. The grossly kept secret. statistical result seemed of no great interest, even We also experimented briefly with a phe- then, so we did not pursue it. There was, however, nomenon that would, today, be considered an in- one unintended and unhappy result of ourinterest stance of 'streaming', though I thought of it then in the relative contributions of the several as the auditory equivalent of the phi phenomenon formants, In the course of giving a talk about the that had for so long occupied a prominent place in Playback at a meeting of the Acoustical Society of the study of visual patterns. Using painted lines America, Frank played several copy-synthetic so thin that each one activated only a single har- sentences, first with each formant separately, and monic, we observed that when they alternated be- then with the formants in various combinations. It tween a lower and a higher frequency, the listener was plain from the reaction of the audiencethat heard the alternation only if the resulting sinu- everyone was greatly surprised each timeFrank soids were sufficiently close in frequency and suf- read out the correct sentence. Apparently, people ficiently long in duration; otherwise, the impres- had formed wrong hypotheses as they tried to sion was of two streams of tones that bore no clear make sense of the speech produced by the temporal relation to each other. In pursuing this individual formants, and subsequently had effect, we looked for that threshold of frequency trouble correcting those hypotheses when separation and duration at which the subject presented with the full pattern. So the sentences could not distinguish a physically alternating pat- got nowhere near the 85% intelligibility they tern from one in which the two streams of tones deserved, and Frank got little recognition on that came on and went off in perfect synchrony. We occasion for a promising research method. did, in fact, succeed in finding such thresholds and in observing that they were sensitive to frequency AN EXCURSION INTO NONSPEECH separation and duration, as our hypothesis pre- AND THE INFINITELY HORIZONTAL dicted, but, after exhibiting a certain threshold for HYPOTHESIS a while, the typical subject would quite suddenly begin to hear the difference and to settle down, if We were, at this stage, strongly attracted to the only temporarily again, at a new threshold. notion that spectrograms of speech were highly Discouraged by this apparent lability, we aban- readable because, as tranformations of well- doned the project and began to put our whole ef- patterned acoustic signals, they managed still to fort on speech. conform to basic principles of pattern perception. Implicit in this notion was the assumption that EARLY (AND STILL HORIZONTALLY principles of pattern perception were so general as to hold across modalities, provided proper account ORIENTED) RESEARCH ON SPEECH was taken of the way coordinates were to be It was at about this point that Pierre Delattre transformed in moving from the one modality to came to the lab to visit for a day and, fortunately the other. As applied to the relation between for us, stayed for ten years. Initially, his interest vision and audition, this assumption could be was in discovering the acoustic basis for nasality tested by the extent to which patterns that looked in French vowels, but once he found what the alike could be so transformed as to make them Playback was capable of, his ambition broadened sound alike. To apply thistesttothe to include any and all elements of phonetic struc- spectrographic transform, we varied the size and ture. So, while continuing to work on nasality, orientation of certain easily identifiable geometric Pierre applied himself (and us) to producing two- forms, such as circles, squares, triangles, and the formant steady-state approximations to the cardi- like, converted them to sound on the Playback, nal vowels of Daniel Jones. (Pierre supplied the and then asked whether listeners would criterial ear.) The result was published in Le categorize the sounds as they had the visual Maitre Phonetique, written in French and in a patterns. Under certain tightly constrained narrow phonetic transcription. Thus, in our sec- circumstances, the answer was yes, they would; ond paper, we continued the habit that had been otherwise, it was no, they would not. At all events, established in the first of so publishing as to guar- we were so bold as to publish the idea, together antee that few among our colleagues would read with a description of the Playback in the what we wrote. Proceedings of the National Academy of . Meanwhile, we had begun to put our attention Fortunately for us, that journal is not widely read once again on the copy-synthetic sentences, but in our field, so our brief affair with the mother of now, instead of inquiring into the contribution of Some Assumptions about Speech and How They Changed 9 each formant to overall intelligibility, we sought, deed, anything else. We were therefore surprised more narrowly, to find the acoustic basesthe and pleased when, on tabulating the results, we cuesfor the perception of individual phones. I saw a reasonably clear and systematic pattern. recall working first on the word 'kill' as it and then equally surprised and pleased when a appeared in simplified copy we had made from a group of undergraduates, known for technical spectrogram of the utterance, 'Never kill a snake purposes as 'phonetically naive subjects', did much with your bare hands. Looking for the phone [IL as we had done, if just a little more variably. For we held the pattern stationary at closely spaced us and for them, the modal [k] response was for temporal intervals, thus creating at each resting bursts at or slightly above the second formant of point a steady-state signal controlled by the static the vowel, wherever that was; [t] was assigned positions of the formants at that instant. The most often L bursts centering at frequencies result, to our ears, was a succession of vowel-like above the highest of the [10 bursts; and a [p] re- sounds, but nothing we would have called an [1]. sponse was given to most of the bursts not Yet, running the tape through at normal speed, identified as [t] or [k]. and thus capturing the movement of the formant, This experiment was, for me, an epiphany. It is produced a reasonable approximation to that the one, referred to earlier, that changed my phone. I don't remember what we made of this (to thinking within hours or days after its results me) first indication of the importance of the were in. What caused the change was the finding dynamic aspects of the signal, but we did not that the effect of an acoustic cue depended to such immediately pursue it, turning instead to the [k] a very large extent on the context in which it at the beginning of the same word. appeared, and, more to the point, that perception accorded better with articulatory than Context-conditioned variability and the with sound. horizontal version of the motor theory The effect of context was especially apparent in The advantage of [k] from our point of view as the fact that the burst most frequently perceived experimenters was that it appeared, in the copy- as [k] was the one lying at or slightly above the synthetic word to be carried by a clearly second formant of the following vowel, even identifiable and nicely delimited cue: a brief burst though that formant sampled a range that ex- of sound that stood apart from the formants. Since tended from 3000 Hz (for [i]) at the one end, to 700 we knew that [k] was a voiceless stop, in the same Hz (for [u]), at the other. Moreover, this evidence class as [0 and [t.), we reckoned that all three that the same phonetic percept was cued by stim- stops might depend on such a burst, according to uli that were acoustically very different was but its position on the frequency scale. So, after a little one side of the coin, for the results also showed trial and error, we carried out what must be that, given the right context, different percepts counted our first proper experiment. Bursts (about could be evoked by stimuli that were acoustically 360 Hz in height, 15 msec in duration at their identical. This was the case with the burst at 1440 widest, and of a shape that caused Pierre to call Hz, which was perceived predominantly as [p] be- them 'blimps') were centered at each of 12 fore [il, as [k] before (al, and then again as [p], frequencies covering the full range of our though weakly, before [u]. spectrograms. Each burst was placed in front of As an empirical matter, then, this first, very seven of Pierre's steady-state cardinal vowels, and crude experiment demonstrated a kind and degree the resulting stimuli were randomized for of context-conditioned variability in the acoustic presentation to listeners who would be asked to cues that subsequent research has shown to be identify the consonant in each case as [pl, [t.], or pervasive in speech. As for its relevance to our [k]. earlier work on reading machines, it helped to Of all the synthetic patterns ever used, these rationalize one of the conclusions we had been burst-plus-steady-state-vowel 'syllables' were un- brought to, which was that speech cannot be an doubtedly the farthest from readily recognizable acoustic alphabet; for what the context effects speech. Indeed, they so grossly offended Pierre's showed was that the commutable acoustic unit is phonetic sensibilities that he agreed only reluc- not a phone, but rather something more like a tantly to join Frank and me as a subject in the ex- syllable. periment. After discharging my own duty as a sub- From a theoretical point of view, the results re- ject, I thought, as did Frank, that Pierre's reserva- vealed the need to find, somewhere in the percep- tions were well taken, and that our responses tual process, an invariant to correspond to the in- would reveal little about stop consonants or, in- variant phonetic unit, and they strongly suggested

17 10 Liberman that the invariant is in the articulation of the opportunity for precisely that association. to be phone. Thus, in the case of the [kj burst we noteu formed. In any case, my own view, as I remember that it was the articulatory movementraising it now, did comprehend two more or less distinct and lowering the tongue body at the velumthat stages: an initial auditory representation that, as remained reasonably constant, regardless of the a result of associative learning, ultimately gave vowel, and, further, that coarticulation of the con- way tothe sensory consequences of the stant consonant with the variable vowels ac- articulatory gesture that had always been counted for the extreme variability in the acoustic coincident with it. I had, I must now suppose, not signal. As for the other side of the cointhe very spent much time wondering exactly what 'gave different perception of the burst before different way' might mean. Had I been challenged on this vowelswhich resulted, it should be noted, from a point, I think I should have said that in the early fortuitous combination of circumstances probably stages of learning there surely was a proper unique to this experiment, we supposed that, in auditory percept, no different in kind from the order to produce something like a burst at 1440 resporise to any other acoustic stimulus and Hz in front of [i] or [u], one had to close and open equally available to consciousness, but that later, at the lips, while in front of [a], closing and open- when the bond with articulation was secure, this ing at the velum was required. representation would simply have ceased to be T-tking all this into account, we adopted a part of the perceptual process. But however I notionthe Early Motor Theorythat I now might have responded, the Early Motor Theory believe to have been partly right and partly was different from the standard two-stage view in wrong. It was right, I think, in assuming thatthe a way that had an important effect the way we object of perception in phonetic communication is thought about speech, and also on the direction of to be found in the processes of articulation. It was our research. It mattered greatly that we took the wrong, or at least short of being thoroughly right, object of perception, and the ultiniate constituent because it implied a continuing adherence to the of phonetic structure, to be an articulatory horizontal view. As earlier indicated, that view gesture, not a sound (or its auditory result), for assumes a two-stage process: first an auditory this began a line of thinking that would in time be representation no different in kind from any other seen to eliminate the need for the horizontalists' in that modality, followed, then, by linkage to second stage, and so permit us to exorcise the something phonetic, presumably as a result of linguistic ghoststhe phonetic names or other long experience and associative learning. The cognitive entities--that haunted it. As for the phonetic thing the auditory representation direction of our research, the Early Motor Theory becomes associated with is, in the conventional caused us to turn our attention to speech view, a name, prototype, or distinctive feature. As production, and so to initiate the inquiry into that we described the Early Motor Theory in our first process that has occupied an ever larger and more papers, we made no such explicit separation into important place in our enterprise. two stages, but only because our concern was rather to emphasize that perception was more Some mildly interesting assumptions that closely associated with articulatory processes than underlay the methods we used with sound, and then to infer that this was Given the extreme unnaturalness of the stimuli because the listener was responding to the sensory in many of our experiments, and the difficulty feedback from the movements of the articulators. subjects reported in them as speech, we However, we offered no hint that phonetic had to assume that there would be no important perception takes place in a distinct modality, thus interaction between the degree of approximation omitting to make explicit the assumption that is, to natural speech quality and the way listeners as will be seen, the heart of the vertical view. perceive the phonetic information. I thereiore note Indeed, we implied, to the contrary, that the here that this assumption has proved to be cor- effects we were concerned with were, potentially rect: no matter how unconvincing our experimen- at least, perfectly general, and so would tal stimuli as examples of speech, those listeners presumably occur for any perceptual response to a who were nevertheless able to hear them as physical stimulus, given long association between speech provided results that have held up re- the stimulus and some particular muscular markably well when the experiments were carried movement, together with its sensory feedback. out subsequently with stimuli that were closer What was special about speech was only that it approximations to the real thing. This was the provided the par excellence example of the first indication we had of the theoretically impor-

18 Some Assumptions about Speech and How They Changed 11 tant fact that accurate information about the rele- was primarily in how these transitional influences vant articulator movements, as conveyed, however might help people to 'read' spectrograms, they unnaturally, by the acoustic signal, is sufficient simply called attention to the direction of the for veridical phonetic perception, provided only movement, up or down, and did not speculate that the listener is not too put off by the absence about the role of these movements in speech of an overall speech-like quality. perception. Martin Joos, on the other hand, wrote We also had to assume that the validity of our explicitly about the consonant-vowel transitions, results would be little affected by a practice, as well as their context- conditioned variability, followed throughout our search for the cues, of and showed, by cutting out properly selected parts investigating, in any one experiment, only a single of magnetic-tape recordings, that these transitions phonetic feature in a single phonetic class, and conveyed information about the consonants. He instructing the subjects to limit their responses to also made the important observation that there the phones in that class. Obviously, this procedure was, therefore, no direct correspondence between left open the possibility that cues sufficient to the segmentation of the acoustic signal and the distinguish phones along, say, the dimension of segmentation of the phonetic structure. But Joos place in some particular condition of voicing or could not vary the transition for experimental manner would not work when voicing or manner purposes, so his conclusions could not be further was changed, or when the set of stimuli and refined. We therefore thought it a reasonable next corresponding response choices was enlarged. In step to make those variations, and so learn more fact, the results obtained in the limited contexts of about the role of the transitions in perception, our experiments were not overturned in later choosing, first, to study place of production among research when, for whatever reason, those limits stop and nasal consonants. were relaxed. I take this to be testimony to the Our research to that point had prepared us to independence in perception of the standard deal only with two-formant patterns, and since in- feature dimensions. spection of spectrograms indicated that transi- Finally, given early indications that there were tions of the first formant did not vary with place, several cues for each phonetic element, and given we chose to experiment with transitions of the that, to make our experiments manageable, at second. To that end, we varied the starting point, least in the early stages, we typically investigated hence the direction and extent, of these transi- one at a time, we had to assume the absence of tions by starting them at each of a number of fre- strong interactions among the cues. In fact, as we quencies above and below the steady state of the later discovered, there are such interactions following vowel. In one condition, the first formant specifically, the trading relations that bespeak a had a fixed transition that rose to the steady state perceptual equivalence among the various cues for from the very lowest frequency (120 Hz); the re- the same phone and that are, therefore of some sulting patterns were intended to represent voiced theoretical interest, as I will say laterbut these stops, and it was our judgment that they did that occur only within fairly narrow limits, and they reasonably well. In a second condition, the first only change the setting of a cue that is optimal for formant had a zero transitionthat is, it was the phone; they do not otherwise affect it. straight. We hoped that these patterns would Therefore, working on only one cue at a time did sound voiceless, but, in fact, they did not. They not cause us to be seriously misled. were used nevertheless. The stimuli in each of these conditions were presented for identification More context-conditioned variability and the as fb], [d], [g] to one group of listeners and as [p], dynamic aspects of the speech signal [t], [k] to another. The results showed clearly that The most cursory examination of a spectrogram the transitions do provide important information of running speech reveals nothing so clearly as the about the place dimension of the voiced and voice- almost continuous movement of the formants; less stops, and, also, that this information is inde- even the band-limited noises that characterize the pendent of voicing. Indeed, it mattered little fricatives seem more often than not to be whether the listeners were identifying a particular dynamically shaped. Indeed, Potter, Kopp, and set of synthetic stops as voiced or voiceless; the Green, in their book, Visible Speech, had same transitions were associated with the same remarked these movements, but they considered place, and there was, at most, only slightly less the effect to proceed from (constant) consonant to variability in the condition with the rising first (variable) vowel, at least in the case of stop formant, where the stimuli sounded voiced to us consonant-vowel syllables; and since their interest and the subjects were asked to judge them so. In a

13 12 Liberman third condition, we strove, fairly successfully we seriously, if only for a short time, the possibility thought, for nasal consonants by using a straight that the two kinds of cues collaborated in such a first formant, together with what we considered at way that two binary decisions resolved all the time to be an appropriate (fixed) nasal reso- perceptual ambiguity. Our data had shown that nance. Here, too, the second-formanttransitions the bursts were identified as [t], if they were high provided information about place, and that infor- in frequency and as Ip1 or Da if low; the second- mation was in no way affected by the change in formant transitions evoked [t] or Da if they were manner, even though we had reversedthe pat- falling, and [p], if rising. So a low burst and a terns so as to make the nasals syllable final, and rising transition would be an unambiguous (1)1; a so to take account of the fact that thevelar nasal high burst and falling transition would be [t]; and never appears initially in the syllable inEnglish. a low burst coupled with a falling transition would As for context effects, they were large and system- be [lc]. We were, of course, influenced to this atic. Thus, the best transition for [d] (or [t] or [n]) conclusion not just by our data but also, if only fell from a point considerably above the steady indirectly, by the then prevailing fashion for state of the vowel with (u], but with [i] it lose from binary arrangements. At all events, we made no a point below the vowel's steady state,and similar attempt to link our notion about binary decisions effects were evident with the transitions for other with the Early Motor Theory, perhaps because phones. We thought it supportive of a motor the- that would have been hard to do. ory that the highly variable transitionsfor the Acoustic loci: Rationalizing the transitions same consonant were produced by areasonably constant articulatory gesture as it was merged and their role in perception with the gesture appropriate for the following It required only a little further reflection, vowel. Equally supportive, in our view, was the combined with an examination of the results of fact that mirror-image transitions in syllable-ini- ourexperiments on thesecond-form ant tial and syllable-final positions nevertheless transitions, to see that perception was sensitive to yielded the same consonantal percept, for surely something more than whether the transition was these would sound very different in any well-be- rising or falling. Since those transitions reflect the haved auditory system. From an articulatory point cavity changes that occur as the articulators move of view, however, these transitions are seen as the from the consonant position to the vowel, and, acoustic consequences of the opening and closing since the place of production for each consonant is phases of the same gesture. As with the bursts, more or less fixed, we saw that we should expect then, perception cued by the transitions accorded to find a correspondingly fixed positionor 'locus', better with an articulatory process than with as we chose to call itfor its second formant. sound. More careful examination of the results of our Despite the demonstration, by us and others, of experiments suggested that, for each position on the extreme context sensitivity of the acoustic the dimension of place, the transition might, cues, some researchers have been concerned for indeed, have originated at some particukr many years to show that there are, nevertheless, frequency, and then made its way to the steady invariant acoustic cues, implying, then, that no state of the vowel, wherever that was. To refine special theoretical exertions are necessary in this notion, we carried out a two-step experiment. order to account for invariant phonetic percepts. In the first, we put steady-state second formants My own view of this matter has always been that, at each of a number of frequency levels, fnpm 3600 whatever the outcome of the seemingly never- Hz to 720 Hz, and paired each with one of a ending search for acoustic invariants, the number of first formants in the range 720 Hz to theoretical issue will remain largely untouched; 2400 Hz. The first formants had rising transitions for there is surely no question that the highly designed to evoke the effect of a voiced stop context-sensitive transitions do supply important consonant. Careful listening revealed that [d] was information for phonetic perceptionthey can, heard when the second formant was at 1800 Hz, indeed, be shown to be quite sufficient in many [b] at 720 Hz, and [g] at 3000 Hz, so we settled on circumstancesand that incontrovertible fact these frequencies as the loci for the places of must be accounted for. production of the three stops. The second step was to prepare two-fornant A brief flirtation with binary decisions patterns in which the second formant started at In our first attempt to interpret the significance each of these loci, and then rose or fell to the of the burst and transition results, we took steady state of the vowel, wherever that was. With

'1 0 Some Assumptions about Speech and How They Changed 13 these patterns, the consonant appropriate to the psychophysical relations, since the (implicit) locus was not evoked clearly. For [d], indeed, motor activity that was ultimately perceived was starting the transitions at the locus produced, for itself initially evoked by some kind of first-stage some steady states,[1)] and [g]. To get good auditory representation. I was therefore naturally consonants, we had in all cases to 'erase' the first attracted to the possibility that transition half of the transition so as to create a silent excursions of less than 50 msec duration were, interval between the locus and the actual start of perhaps, so integrated by the auditory system as the transition. We noted, further, that in the case to produce a unitary impression like that of a stop of [g], this maneuver worked only with second- consonant, while transitions with excursions formant steady states from 3000 Hz to about 1200 longer than that would evoke the impression of Hz, which is approximately where the vowel shifts gradual change that characterizes the semivowels. from spread to rounded; below 1200 Hz, no I well remember the excitement I felt when, approximation to [g] could be heard. having decided that a 50-msec duration might The concept of the locus, together with the ex- well be the auditory key, I appreciated how easy it perimental results that defined it, made simple would be to find out if, indeedit was. The sense of the transitions. It also tempted me to experimental test required only that I draw on the temper the emphasis on context-conditioned vari- Playback a series of rising and falling isolated ability by assuming that the perceptual machin- transitions in which the duration was varied over eryby which I might have meant the auditory a wide range. What I expected and hoped to find machinery'extrapolated' backward from the was that a duration of 50 msec would provide a start of the transition to the locus, and so arrived boundary between perception of a unitary stop- at a 'virtual' acoustic invariant. Fortunately, I like burst of sound on the one side, and a yielded to this temptation only briefly, and there semivowel kind of glide on the other. So far as I is, I think, no written record of my lapse. In any could tell, however, there appeared to be no such case, we began early to take the opposite tack, boundary at 50 msec and thus no evidence of an using the locus data to strengthen our conclusions auditory basis for the results of our experiment on about the role of context by emphasising the un- the distinction between stop and semivowel. I was toward consequences of actually starting the disapppointed, but not enough to abandon my transitions at the locus, and by pointing to the horizontal attitude about the role of auditory sudden shift in the [g] locus when the vowel representations in the ontogenetic development of crosses the boundary from spread to rounded. phonetic perception. Stop vs. semivowel, or once more into the : the right prediction auditory breach from the wrong theories It was apparent on the basis of articulatory According to just those aspects of the Early considerations, and also by inspection of Motor Theory that I now believe to be mistaken, spectrograms, that an acoustic correlate of the the auditory percept originally evoked by the stop-semivowel distinction was the duration or speech signal was supposed to give way to the rate of the appropriate transitions. To find out sensory consequences of the articulatory gesture, how this variable actually affected perception, we and it was just those consequences that were ul- carriedout the obvious experiment. We timately perceived. In arriving at this theory, I particularly wanted to know whether it was rate had, of course, been much influenced by the be- or duration, and also where on the rate or haviorist stimulus-response tradition in which I duration continuum the phonetic boundary was. had been reared. It was virtually inevitable, then, By varying the positions of the vowel formants, that I should take the next step and consider the and hence the extent of the transition, we were consequences of two processes'acquired distinc- able to separate the two variables and find that tiveness' and 'acquired similarity'that were part duration seemed to be doing all the work. We also of the same tradition. The point was simple found that the boundary was at about 50 msec. enough: if two stimuli become connected, through I recall thinking that 50 msec might be critical long association, to very different responses, then for some kind of auditory integration. As I have the feedback from those responses, having become already said, it seemed reasonable to me at the the end-states of the perceptual process, will cause time to suppose that phonetic distinctions had the stimuli to be more discriminable than they accommodated themselves to the properties of the had originally been; conversely, if these stimuli auditory system as revealed at the level of become connected to the same response, then, for

21 14 Liberman the same reason, they will be less discriminable. decided to move ahead, in parallel, with stop con- In fact, there was not then, and is not now, any ev- sonants. idence that such an effect occurs. But that did not It was not our purpose to obtain discrimination trouble me, for I supposed that investigators had thresholds, but only to measure discriminabililty not thought to look in the right place, and I could of a constant physical difference at various points not imagine a better place than . on the continuum of second-formant transitions. Neither was I troubled by what seems to me now To that end, we synthesized a series of 14 the patently implausible assumption, basic to the syllables in which the starting point of the second concepts of acquired distinctiveness and acquired formant was varied in steps of 120 Hz, from a similarity, that the normal auditory representa- point 840 Hz below the steady state of the tion of a suprathreshold acoustic stimulus could following vowel to a point 720 Hz- above it. We be wholly supplanted, or even significantly af- then paired stimuli that were one, two, and three fected, by the perceptual consequences of some steps apart on the continuum, and for each such motor response just because the acoustic stimulus pair measured discriminability by the ABX and the motor response had become strongly as- method (A and B were members of the pair, X was sociated. It was for me compelling that listeners one or the other, and the subject's task was to had for many years been making different articu- match X with A or B). The result was that there latory responses to stimuli that happened to lie on were peaks in the discrimination functions at either side of a phonetic boundary, so, by the positions on the continuum that corresponded to terms of the Early Motor Theory and the theory of the phonetic boundaries as earlier determined by acquired distinctiveness, that difference should the way the subjects had identified the stimuli as have become more discriminable. On the other [b], kl), or [g] when they were presented singly hand, those listeners had been making the same and in random order. This is to say that, other articulatory response to equally different stimuli things equal, discimination was better between that happened to lie within the phonetic class, so, stimuli to which the subjects habitually gave by the same theories, those stimuli should have different articulatory responses than it was been rendered less discriminable. between stimuli to which the responses were the To test the theories, I thought it necessary only same. Thus, the prediction was apparently to get appropriate measures of acoustic-cue confirmed by this instance of what we chose to call discriminability. As I know now, one can easily get 'categorical perception'. the effect I sought simply by listening to voiced In the published paper, we included a method, stops, for example, as the second-formant worked out by Katherine Harris, for computing transition is changed in relatively small and equal from the absolute identification functions what steps, for what one hears is,first, several the discrimination function would have been if, consonants that are almost identical (brs, then a indeed, perception had been perfectly categori- rather sudden shift to [di, followed by several calthat is, if listeners had been able to perceive almost identical [dl's, and then, again, a sudden differences only if they had assigned the stimuli to shift, this time to [g). Though we had the means to different phonetic categories. Applying this calcu- make this simple and quite convincing test, we did lation to our results, we found that, in this exper- not think to try it. Instead, I put together two iment at least, perception was rather strongly cat- wrong theories and produced what my professors egorical, but not perfectly so. had taught me to strive for as one of the highest To this point in our research, psychologists, forms of scientific achievement: a real prediction. including even those interested in language, had The test of the prediction was initially under- paid us little attention. Requests for reprints, and taken by one of our graduate students, Belver such other tokens of interest as we had received, Griffith, who, with my enthusiastic approval, had come mostly from communication engineers elected to do the critical experiment on steady- and phoneticians, and there were few references state vowels. We know now that the effect we to our work in the already considerable literature were looking for does not occur to any significant of , a field that had been extent with such vowels, so it was fortunate that established, seemingly by fiat, by a committee of Griffith, by nature very fussy about the materials psychologists and linguists who had met for a of his experiments, was unsuccessful in producing summer work session at Cornell. The result of vowels he was willing to use. The happy conse- their deliberations was abriefly famous quence was that he, together with the rest of us, monograph in which they defined the new Some Assumptions about Speech and How They Changed 15 discipline, constructed its theoretical framework, categorical perception. But the Early Motor and posed the questions that remained to be Theory did not require that the articulatory answered. That done, they officially launched the responses within a phonetic category be identical, field at a symposium held during the next so it did not predict that perception had to be national convention of the American Psycholegical perfectly categorical. (The degree to which the Association. At the end of the symposium, I asked articulatory responses within a category are a question that provoked one of the founding similar presumably varies according to the fathers to inform me, icily, that speech had category and the speaker; it is, therefore, a matter nothing to do with psycholinguistics. He did not for empirical determination.) Neither did the say why, but then, as one of the inventors of the theory in any way preclude the possibility that discipline, he was entitled to speak ex cathedra. It perceptual responses would be easier to is, however, easy to appreciate that in looking at discriminate when fresh than when stale. In any speech horizontally, as he and the other members case, it surely was relevant that the peaks one of the committee surely did, one sees nothing that finds with various measures of discrimination islinguistically interesting, only a set ofmerely confirm the quantal shifts a listener unexceptional noises and equally unexceptional perceives as the stimuli are moved along the auditory percepts that just happen to convey the physical continuum so rapidly as to make the more invitingly abstract structures where anyone memory load negligible. who would think deeply about language ought The other criticism, which seemed almost oppo- properly to put his attention. site to the one just considered, was that categori- The categorical perception paper seemed, how- cal perception is not relevant to psycholinguistics ever, to touch a psycholinguistic nerve. Perhaps because it is so common, and, more particularly, this was because, if taken seriously, it showed because those boundaries that the discrimination that the phonetic units were categorical, not only peaks mark are simply properties of the general in their communicative role, but also as immedi- auditory system, hence not to be taken as support ately perceived; and this nice fit of perceptual for the view that speech perception is interesting, form to linguistic function must have seemed at except, perhaps, within the domain of auditory odds with the conventional horizontal assumption psychophysics. As for the criticism that categorical that the auditory percepts assume linguistic sig- perception is common, it seemed to have been nificance only after a cognitive translation, not be- based on the misapprehension that we had fore. Our results could be taken to imply that no claimed categorical perception to be unique to such translation was necessary. and that there speech, but in fact we had not, having merely ob- might, therefore, be something psycholinguisti- served (correctly) that, given stimuli that lie on cally interesting and important about the precog- some definable physical continuum, observers nitivethat is, purely perceptualprocesses by commonly discriminate many more than they can which listeners apprehend phonetic structures. identify absolutely. Our claim about phonetic per- Most psychologists seemed unwilling to accept ception was only that there is a significant, if nev- that implication, though not all for the same ertheless incomplete, tendency for that commonly reason. Some argued that categorical perception, observed disparity to be reduced. On the other as we had found it, was of no consequence because hand, the claim by our critics that the boundaries it was merely an artifact of our method. This are generally auditory, not specifically phonetic, criticism boiled down to an assertion, perfectly was important and deserved to be taken seriously. consistent with the standard horizontal view, that It has led to many experiments on perception of the memory load imposed by the ABX procedure nonspeech control stimuli and on perception of made it impossible for the subject to compare the speech by nonhuman animals, leaving us and the initial auditory representations, forcing him to other interested parties with an issue that is still rely, instead, on the categorical phonetic names he vexed. In fact, I think the weight of evidence, assigned to the rapidly fading auditory echoes as taken together with arguments of plausibility, they were removed from the everchanging sensory overwhelmingly favors the conclusion that the world and elevated, for safer keeping, into short- boundaries are specific to the phonetic system, but term memory. It is, of course, true that reducing I reserve the justification for that conclusion to the time interval between the stimuli to be the last section of the paper. compared does raise the troughs that appear in There was yet another seemingly widespread the within-category parts of the discrimination misapprehension about categorical perception, funcntion, and thus reduces the approximation to which was that it had served us as the primary

4 aE-4 16 Liberman basis for the Early Motor Theory. In fact, we (or, in enhancing the perception of one stop, any at least, I) have long believed that ;.he facts about particular transition does not do so equally at the this phenomenon are consistent with the theory, expense of the other two, we concluded that a cue but they do not by any means provide its most im- not only tells a listener what a speech sound is, portant support; indeed, they were not available but also which of the remaining possibilities it is until at least five years after we had been per- not. It was almost as if we were supposing that suaded to the theory, as I earlier indicated, by the the third-formant transition had been designed, very first results of our search for the acoustic by nature or by the speaker, just to resolve an cues. ambiguity that the more important second- Finally, in the matter of categorical perception, I formant transition had overlooked. At all events, will guess that consonant perception is likely, we went on to speculate that the response alterna- when properly tested, to prove more nearly tives exist in a multidimensional phonetic space, categorical than experiments have so far shown it and, though we were not perfectly explicit about to be. The problem with those experiments is that this in the published paper, that a cue has a mag- they have used acoustic synthesis, so it has been nitude and a direction, just like a vector, with the prohibitively difficult in any single experiment to result that the final position of the percept in the make proper variations in more than one of the phonetic space is determined by the sum of the many aspects of the signal that are perceptually vectors. Such a conception is, of course, at odds relevant. But when only one cue is varied, as in with all the data now available that indicate how the experiments so far done, then, as it is changed exquisitely sensitive the listener is to all the from the form appropriate for one phone to the acoustic consequences of phonetically significant form appropriate for the next, it leaves all the , for what those data mean is that any other relevant information behind, as it were, definition of an acoustic cue is always to some ex- creating a situation in which the listener is tent arbitrary. Surely, it makes little sense to discriminating, not just the phonetic structure, wonder about the rules by which arbitrarily but also increasingly unnatural departures from defined cues combine to produce a perceptual it.I suspect that, with proper articulatory result. synthesis, when the acoustic signal will change in all relevant aspectsat least for the cases that The voicing distinction; an exercise in not are produced by articulations that can be said to seeing that which is most visible vary continuouslythe discrimination functions We discovered very early how to produce stops will come much closer to being perfectly that were convincingly voiced, but we had been categorical. frustrated for five years or more while seeking the key to synthesis of their voiceless counterparts. In The concept of 'cue' as a theoretically relevant our quest, we had examined spectrograms, sought entity advice from colleagues in other laboratories, and, At the very least, 'cue' is a term of convenience, by trial and error on the Playback, tried every useful for the purpose of referring to any piece of trick we could think of. We varied the strength of signal that has been found by experiment to have the burst relative to the vocalic section of the an effect on perception. We have used the word in syllable, drew every conceivable kind of first- that sense, and continue to do so. But there was a formant transition, and substituted various time when cue had, at least in my mind, a more intensities of noise for the harmonics through exalted status. I supposed that there was, for each varying lengths of the initial parts of the form ant phone, some specifiable number of particulate transitions. cues that combined according to a scientifically In fact, there was no noise source in the Pattern important principle to evoke the correct response. Playback,only harmonicsof a120 Hz It was this understanding of cues that was fundamental, but we had been able in research on implicit in the 'binary' account of their effects that the fricatives to make do by placing small dots of I referred to earlier. The same understanding was paint where noise was supposed to be. In isolation, more explicit in a dissertation on cue combination patches of such dots ounded like a twittering of that I had urged on Howard Hoffman. Finally, and birds, but in syllabic context they produced perhaps most egregiously, it became the center- fricatives so acceptable that, when we used them piece of our interpretation of an experiment on the in experiments, we got results virtually identical effects of third-formant transitions on perception to those obtained later when, with a new of place among the stops. Having found there that, synthesizer called Voback, we were able to deploy

24 Some Assumptions about Speech and How Thew Changed 17 real noise. Before Voback was available, however, substituting it for harmonics in the second and we had, in our attempts at voiceless stops, to rely third formants for the duration of the delay in on the trick that had worked for the fricatives. first-formant onset did soniewhat strengthen the When it did not help, we concluded that, unlike impression of voicelessness. In connection with the fricatives, the voiceless stops needed true this last conclusion, we noted that when, in an noise and, accordingly, that our inability to attempt to produce an initial [hi, which is, of synthesize them was to be attributed to the noise- course, the essence of aspiration, we replaced the producing limitations of the Playback. harmonics of all the formants with noise for the That we were wrong to blame the Playback first 50 or 100 msec, we did not get [h], but rather became apparent one day as a consequence of a the impression of a vowel that changed from discovery by Andre Malecot, one of Pierre's whispered to fully voiced; to get [h], we had to graduate students. While working to synthesize omit the first formant. To explain all this, we the syllable-final releases of stops, he omitted the advanced a suggestion, made to us by Gunnar first formant of the short-duration syllable that Fant, that the vocal cords were open during the constituted the release. I believe that he did this period of aspiration, and that it was this inadvertently. But whether by inadvertence or by circumstance that reduced the intensity of the design, he produced a dramatic effect: we all first formant, thus effectively delaying its onset. heard a stop that was quite clearly voiceless. We emphasized, then, that all the acoustically Encouraged by this finding, we adapted it to stops diverse cues were consequences of the same in syllable initial position, and carried out several articulatory event, and thereforeled,in related experiments. In each, we varied one accordance with the Motor Theory, to a percept potential cue for the voicing contrast for all three that was perfectly coherent. stops, paired with each of the vowels [i], (ael, and This early work on the voicing distinction was [u]. Our principal finding was that, with all else subsequently refined and considerably extended equal, simply delaying the onset of the first by Arthur Abramson and Leigh Lisker. In formant relative to the second and third was particular, it was they who established how the sufficient to cause naive listeners to shift their acoustic boundaries for the voicing distinction responses smartly between voiced and voiceless. vary with different , and thuF provided Then, recognizing that in so delaying the onset of the basis for the great volume of later research by the first formant we were, at the same time, other investigators who exploited these differences starting it at a higher frequency, we reconfirmed in pursuit of their interests in the ontogenesis of the observation we had made in our earlier speech. And it was Abramson and Lisker who experiments, which was that starting the first accurately characterized the relevant variable as formant at a very low frequency was important in voice-onset-time (VOT), defined as th: duration of creating the impresssion of a voiced stop, but that the interval between the consonant opening (in starting it higher, at the steady state of that the oral part of the tract) and the onset of voicing formant, did not, by itself, make much of a at the larynx. Unfortunately, some of the contribution to voicelessness. On the other hand, researchers who later used the voicing distinction delaying the onset of the first formant without at for their own purposes ignored the fact that the the same time raising its starting frequency did VOT variable is articulatory, not acoustic, and prove to be a very potent cue. Indeed, it appeared therefore failed to take into account in their from the responses of our listeners to be about as theoretical interpretations that its acoustic potent as the original combination of delay and manifestations are complexly heterogeneous. raised starting point. (For the purpose of this As for our initial discovery of the acoustic cues experiment, we varied the delay alone by for the voicing distinction, I note an I; ony in the contriving a synthetic approxiwation to the vowel long search that preceded it, for once one knows [ol in which the first formant was placed as low on where to look Li the spectrogram, the delay in the the pattern as it could go; it was, then, just this first formant onset can be measured more easily, straight formant that was delayed.) Next, we took and with greater precision, than almost any of the advantage of the newly available synthesizer, other consonant cues we had found. Consider, for Voback, which, as I earlier said, had a proper example, how important to perception is the noise source, to experiment with the effect of noise frequency at which a formant transition starts, in place of harmonics during the transitions. What and then how hard it is to specify that frequency we found was that substituting noise in all three precisely from an inspection of its appearance on a formants was, by itself, ineffective, but that spectrogram. Yet our tireless examination of 18 Liberman spectrograms had, in the case of the voicing written in articulatory terms, for then all the distinction, availed us nothing; we simply had not relevant acoustic information would be provided to seen what we now know to be so plainly there. the listener. There was, however, no alternative to the acoustically based rules we offered, because Synthesis by rule and a reading machine that there was not enough known about articulation, speaks but also because there was, in any case, no In this very personal chronicle of our early re- satisfactory articulatory synthesizer. Articulatory search, I have chosen to write of just those exper- synthesis would therefore have to wait. iments that best illustrate certain underlying as- Meanwhile, we had, by 1966, a computing sumptions I now find interesting. I have said facility, and had constructed a computer- nothing about the many other experiments that controlled, terminal-analog formant synthesizer. were carried out during roughly the same period It was then that Ignatius Mattingly joined the of time. I would now partly repair that omission staff and undertook to program the computer to by recognizing the existence of those others, and produce speech by rule. For this purpose, he drew by emphasizing that they p:ovided a collection of on the work he had done previously with Holmes data sufficient as a basis for synthesizing speech and Shearme in England, and also on all that had from a phonetic transcription, without the need to been learned about the cues and rules for copy from a spectrogram. Unfortunately, all the synthesis at the Laboratories. By 1968 the job was relevant data had been brought together only in done. Accepting an input of a phonetic string, the Pierre's head. Relying only on the experience he system would speak. The intelligibility of the had gained from participation in our published re- speech was tested on several occasions and in search, and also from the countless unpublished several different ways. Thus, it was tested experiments he had carried out in his unflagging informally by having blind veterans listen, for effort to refine and extend, Pierre could 'paint' example, to rather long passages from Dickens speech to order, as it were. But the knowledge and Steinbeck.It was evident that they that Pierre had in his head was, by its nature, not understood the speech, even at rates of 225 words so public as science demands. per minute, but we had no measure of exactly how We therefore recommended toFrances hard they found it to do so, and they did complain, Ingemann. when she came to spend some time at not without reason we thought, of what they the Laboratories, that she write a set of rules for called the machine's 'accent'. In more formal tests, synthesis, making everything so explicit that the rule-generated synthetic speech came off quite someone totally innocent of knowledge about well by comparison with 'real' speech, but, not acoustic phonetics could, simply by following the unexpectedly, there was evidence of a price rules, draw a spectrogram that would produce any exacted by the extra cognitive effort that was desired phonetic structure. Accepting this required to overcome its evident imperfections, a challenge, she decided to rely entirely on the price that had to be paid, presumably, by the papers we had published in journals or in lab processes of comprehension. reports; she did not use the synthesizer to test and At that point, we had in hand a principal improve as she went along, nor did she attempt to component of a reading machine that would formalize what Pierre and other members of the convert text to speech, and thus avoid all the staff might know but had never written down. She problems we had encountered in our earlier work nevertheless succeeded very well,J think, in with nonspeech substitutes. What was needed, in producing what must count as the first rules for addition, was an optical character reader to synthesis. I don't recall that we ever formally convert the letters into machine-readable form, assessed the intelligibility of the speech produced and also, of course, some way of translating by these rules, but I know that we found it rea- spelled English into a phonetic transcription sonably intelligible. At all events, an outline of the appropriate to the synthesizer. Given our history, rule system was published in 1959 under the title, it was inevitable that we should have been 'Minimal Rules for Synthesis'. The word `minimal' impatient to acquire these other components and was appropriate because the rules were written at see (or, more properly, hear) what a fully the level of features, the presumed `atoms' of automatic system could do. So, Frank, Patrick phonetic structure, not at the level of the more Nye, and others cobbled together just such a numerous segments or `molecules'. I should note, system, using an optical character reader we too, that we took explicit notice in the paper of our bought with money given us for the purpose by the belief that the rules for synthesis had better be Seeing Eye Foundation, a phonetic dictionary Some Assumptions about Speech and Haw They Changed 19 made available by the Speech reasons, I will organize this section of the paper Research Laboratory of Santa Barbara, and, of around them. course, our own computer-controlled synthesizer. The issue that unites the questions pertains to Tests revealed that the speech produced in this the place of speech in the biological scheme of fully automatic way was almost as good as that for things. That I should have come to regard that is- which the phonetic transcription had been hand- sue as central is odd, given the habits of mind I edited. But we were concerned about the evidence had brought to the research, for, as I earlier im- we had earlier collected concerning the probable plied, my education in psychology had been un- consequences for ease of comprehension that arose remittingly abiological. I had, to be sure, studied a out of the shortcomings of the speech. We little physiology, narrowly conceived, and it can- therefore put together a plan to evaluate the not have escaped my notice that in the physiologi- machine with blind college students who would cal domain things were not of a piece, having been use it to read their assignments; having found its formed, rather, into distinct systems for corre- weaknesses, we would then try to correct them. I spondingly distinct functions. At the level of be- assumed that various federal agencies would havior, however, I saw only an overarching same- compete to see which one could persuade us to ness, a reflection of my attachment to principles so accept their support for this undertaking. We general as to apply equally to a process as natural could, after all, show that a reading machine for as learning to speak and as arbitrary as memoriz- the blind was not pie-in-the-sky, but a do-able ing a list of nonsense syllables. thing that stood in need of just the kinds of I think I was moved first, and most generally, to improvement that further research would surely a different approach by scientists who work, not bring. Yet, though we tried very hard with several on spe, ch, but on other forms of species-typical agencies, and for several years, we failed utterly behmor. Thus, i was largely under the influence to get support, and were forced finally to abandon of people like Peter Mar ler, Nobuo Suga, Mark our plans. Still, we had the satisfaction of having Konishi, and Fernando Nottebohm that I came to proved to ourselves that a reading machine for the see myself as an ethologist, very much like them, blind was,Dse to being a reality. The basic and to appreciate that I would be well advised to research was largely complete; what remained begin to think like one. They helped me to was just the need for proper development. understand that speech is to the human being as echolocation is to the bat or song is to the birdto ON BECOMING VERTICAL, see, that is, that all these behaviors depend on OR HOW I RIGHTED MYSELF biologically coherent facultiesthat were To this point, my concern has been to describe specifically adapted in evolution to function the various forms of the horizontal view that my appropriately in connection with events that are of colleagues and I held during our work on non- particular ecological significance to the animal. To speech reading machines and in the early stages the horizontalist that I once was, this was heresy; of the research on speech to which it led. Now I but to the verticalist I was in process of becoming, mean to offer a more detailed account of the im- it was the beginning of wisdom. portant differences between that view and the Meanwhile, back at the Laboratories there were vertical view I now hold, In so doing, I draw biological stirrings on the part of Michael freely, and without specific attribution, on a Studdert-Kennedy, who is nevertheless not a number of theoretically oriented papers that were committed verticalist, and Ignatius Mattingly, written in close collaboration with various of my who is. Michael has been a constructive critic in colleagues. Among the most relevant of these are regard to virtually every biologically relevant several reviews by Ignatius Mattingly and me in notion I have dared to entertain. As for the which we hammered out the vertical view as I biological slant of the vertical view (including the (we) see it now. Revised Motor Theory), that is as much Ignatius's All these theoretically oriented papers deal, at contribution as mine. Indeed, the view itself is the least implicitly, with questions about speech to result of a joint effort, though, of course, he bears which the horizontal and vertical views give no responsibility for what I say about it here. different, sometimes diametrically opposed, Among the influences of a somewhat different answers. Such questions serve well, therefore, to sort, there was the growing realization that my define the two positions, and to explain how I early horizontal view did not sit comfortably even came to abandon the one for the other; for those with the results of the early research it was

Z17 20 Liberman

designed to explain. That will have been seen in speaker. Thus, I have been in ihe happy position what I have already said about my attempts to of taking advantage of the best of what my account for those results, and, especially, about ecological friends have had to offer, while freely the patch on the horizontal view that I have here rejecting the rest, and, as an important bonus, called the Early Motor Theory. 1-1:-avi ly loaded as being stimulated by our continuing disagreements it was with untested and wholly implausible to correct weaknesses and repair omissions in my assumptionsfor example, that auditory percepts own view. could, as a result of learning, be replaced by It was also relevant to the development of my sensations of movementit had begun to fall of its thinking that Isabelle Liberman, Donald own weight. Shankweiler, and Ignatius Mattinglyfollowed Contributing further to the collapse of the Early later by such younger colleagues as Benita Motor Theory was the research, pioneered by Blachman, Susan Brady, Anne Fowler, Hyla Peter Eimas and his associates, in which it was Rubin, and Virginia Mannhad begun to see in found that prelinguistic infants had a far greater our research how to account for the fact that capacity for phonetic perception than a literal speech is so much more natural (hence easier) reading of the theory would allow. than reading and writing, and thus to be explicit My faith was further weakened by the work of about what isrequiredof the would-be Katherine Harris and Peter MacNeilage, who, as reader/writer that mastery of speech will not have the first of the Laboratories' staff to work on taught him. As I will say later, their insights and speech production, were busily finding a great the results of their empirical work illluminated deal of context-conditioned variability in the aspects of the vertical view that I would otherwise peripheral articulatory movements (as reflected in not have seen. electromyographic measures), and thus disproving I was affected, too, by the results of experiments one of the assumptions of the Early Motor Theory, on duplex perception, trading relations, and which was that the articulatory invariant was in integration of cues, experiments that went beyond the final-common-path commands to the muscles. those, referred to earlier, that merely isolated the At the same time, was pointing cues and looked for discontinuities in the the way to an appropriate revision by showing discrimination functions. These later experiments, how, given context-conditioned variability at the done (variously) in close collaboration with level of movement, it is nevertheless possible, in- Virginia Mann, Bruno Repp, Douglas Whalen, deed necessary, to find invariance in the more re- Hollis Fitch, Brad Rakerd, Joanne Miller. Michael mote motor entitiesthat Michaelcalled Dorman, and Lawrence Raphael (few of whom are 'coordinative structures'. In any case, Michael and admitted verticalists) provided data that spoke Carol Fowler were strongly encouraging me to more clearly than the earlier findings to some of persevere in the aspect of the Early Motor Theory the shortcomings of the horizontal position, and that took gestures to be the objects of speech per- therefore inclined me ever more strongly to the ception, while simultaneously heaping scorn on vertical alternative. the idea that perception was a second-order trans- Finally, I should acknowledge the profound lation of a sensory representation, as the horizon- effect of Fodor's provocative monograph, 'The tal version of the theory required. I began, there- Modularity of Mind', which, in the early stages of fore, to take more seriously the possibility that my conversion, enlightened and stimulated me by there is no mediating auditory percept, only the its arguments in favor of the most general aspects immediately perceived gestures as provided by a of the vertical view. systemthe phonetic modulethat is specialized That I should finally have asked the following for the ecologically important function of repre- questions, and answered them as I do, reflects the senting them. influences I have just described, and fairly Not that Michael and Carol or, indeed, any of represents the theoretical position to which they the other 'ecological' psychologists in the moved me. Laboratories, are verticalists. They most certainly are not, because they do not accept (yet) that there In the development of phonological is a distinct phonetic mode, preferring, rather, to communication, what evolved? take speech perception as simply one instance of Defined as the production and perception of the way all perception is tuned to perceive the consonants and vowels, speech, as well as the distal objects; in the case of speech, these just phonological communication it underlies, is happen to be the articulatory gestures of the plainly a species-typical product of biological Some Assumptions about Speech and How They Changed 21 evolution. All neurologically normal human beings as well to the development of, say, a cursive communicate phonologically; no other creatures writing system. Was not the selection of the do. The biologically important function of cursive gestures similarly determined by motor phonologic communication derives from the way it and perceptual constraints that are independent exploits the combinatorial principle to generate of language? Yet, what that selection produced vocabularies that are large and open, in contrast were not the biologically primary units of speech, to the vocabularies of nonhuman, nonphonologic but only a set of optical artifacts that had then to systems, which are small and closed. Thus, be connected to speech in a wholly arbitrary way. phonological processes are unique to language and Of course, this is merely to say the obvious about to the human beings who command them. It the relation between speech and a writing system, follows that anyone who would understand how which is that the evolution of the one was speech works must answer the question: what biological, the other, not. That is surely a critical evolved? Not when, or why, or how, or by what difference, but one that the horizontal view must progression from earlier-appearing stages. The have difficulty comprehending. first question is simply: what? If pressed further to answer the question about The answer given by the horizontal view is the product of evolution, the horizontalists would clear: at the level of action and perception, noth- presumably have to say that, while nothing ing evolved; language simply appropriated for its evolved at the level of perception and action, there own purposes the plain-vanilla ways of acting and must have been relevant developments at a higher perceiving that had developed independently of cognitive level. Thus, it would have been evolution any linguistic function. Thus, those horizontalists that produced the phonetic entities of a cognitive who put their attention rather narrowly on the type to which the nonphonetic acts and percepts of perceptual side of the process argue that the cate- speech must, on the horizontal view, be gories of phonetic perception simply reflect the associated. Being neither acts nor percepts, these way speech articulation has accommodated itself cognitive entitiesor ideas, as they might be to the production of sounds that conform to the would presumably be acceptable within the properties of the auditory system, a claim that I horizontal framework as genetically determined will evaluate in some detail later. A recent and for language, hence special in a way broader, but still horizontal, take on the same is- that speech is not allowed to be. In itself, this sue distributes the emphasis more evenly between seems an unparsimonious, not to say biologically production and perception, arguing that phonetic implausible, assumption. And it can be seen to be gestures were selected by language on the basis of the more unparsimonious and implausible once constraints that were generally motor, as well as the horizontalist tries to explain how the generally auditory, However, the important point phonetically neutral acts and percepts got in this, as in the narrower view, is that the con- connected to the specialized cognitive entities in straints are independent of a phonetic function, the first place. In the case of a script, to wring one hence in no way specific to speech. Put forth as an more point out of that tired example, the obviously explicit challenge to the vertical assumption, the nonphonetic motor and visual representations of broader view has it that there is no reason to as- the writer and reader were connected to language sume a special mode for the production and per- by agreement among the interested parties. Can ception of speech, if, with the proper horizontal we seriously propose a similar account for speech? orientation, one can see that the units of speech If the horizontalists should reject the notion are optimized with respect to motor and percep- that phonetic ideas were the evolutionary bases tual constraints that are biologically general. for speech, there remains to them the most But the question is not whether language thoroughly horizontal view of all, which is that somehow developed out of the biology that was what evolved was a large brain. In that case, they already there; surely, it could hardly have done mightsupposeeitherthatphonological otherwise. The question, to put it yet again, asks, communication was an inevitable by-product of rather, what did that development produce as the the cognitive power that such a brain provides, basis for a unique mode of communication? When which seems unlikely, or that phonological the horizontalists say that the development of this communication was an invention, created by mode was accomplished merely by a process of large-brained people who were smart enough to selection from among the possibilities offered by have appreciated the immense advantages for general faculties that are independent ofcommunication of the combinatorial principle, language, they are giving an account that applies which seems absurd.

6 9 22 Liberman

The vertical view is different on all counts. What not. And, certainly, it is relevant to the claim evolved, on this view, was the phonetic module, a about a specialized mode of production that distinct system that uses its own kind of signal speech, in the very narrowest sense, is species- processing and its own primitives to form a specific: given every incentive and opportunity to specifically phonetic way of acting and perceiving. learn, chimpanzees are nevertheless unable to It is, then, this module that makes possible the manage the production of simple CVC syllables. phonological mode of communication. (The fact that the dimensions of their vocal tracts The primitives of the module are gestures of the presumably do not allow a full repertory of vowels articulatory organs. These are the ultimate con- should not, in itself, preclude the articulation of stituents of language, the units that must be ex- syllables with whatever vowels their anatomy changed between speaker and listener if linguistic permits.) communication is to occur. Standing apart as a As for the evolution of the phonetic gestures, I class from the nonphonetic activities of the same should think an important selection factor was not organsfor example, chewing, swallowing, mov- so much the ease with which they could be ing food around in the mouth, and licking the articulated, or the auditory salience of the lipsthese gestures serve a phonetic function and resulting sound, but rather how well they lent no other. Hence they are marked by their very na- themselves to being coarticulated. For it is ture as exclusively phonetic in character; there is coarticulation that, as I will have occasion to say no need to make them so by connecting them to later, makes phonological communication possible. linguistic entities at the cognitive level. As part of But it is also this very coarticulation that, as we the larger specialization for language, they are, saw earlier, frustrates the attempt to find the moreover, uniquely appropriate to other linguistic phonetic invariant in the acoustic signal or in the processes. Thus, the syntactic component is peripheral movements of the articulators. Still, adapted to operate on the specifically phonetic such motor invariants must exist, not just for the representations of the gestures, not on represen- aspect of the Motor Theory that explains how tations of an auditory kind. Indeed, it is precisely phonetic segments are perceived, but for just any this harmony among the several components of theory that presumes to explain how they are pro- the language specialization that makes the epithet duced; after all, speech does transmit strings of 'vertical' particularly apposite for the view I am invariant phonological structures, so these invari- here promoting. ants must be represented in some way and at Of course, the gestures constitute only the some place in the production process. But how are phonetic structures that the perceptual process they to be characterized, and where are they to be extracts from the speech signal. Such aspects of found? Having accepted the evidence that they are the percept as, for example, those that contribute not in the peripheral movements, as the Early to the perceived qualilty of the speaker's voice are Motor Theory assumed, Mattingly and I proposed not part of the phonetic system. Indeed, these are in the Revised Motor Theory that attention be presumably auditory in the ordinary sense, except paid instead to the configurations of the vocal as they may figure in speaker identification, for tract as they change over time and are compared which there may be a separate specialization. with other configurations produced by the same It is not only the gestures themselves that are gesture in different contexts. As for the invariant specifically phonetic, but also, presumably, their causes of these configurations, they are presum- control and coordination. Surely, there is in ably to be found in the more remote motor enti- speech production, as in all kinds of action, the tiessomething like Turvey's coordinative struc- need to cope with the many-to-one relations turesthat control the various articulator move- between means and ends, and also to reduce ments so as to accomplish the appropriate vocal- degrees of freedom to manageable proportions. In tract configurations. It is, I now think, structures these respects, then, the management of speech of this kind that represent the phonetic primi- and nonspeech movements should be subject to tives, providing the physiological basis for the the same principles. But there is, in addition, phonetic intentions of the speaker and the pho- something that seems specific to speech: the netic percepts of the listener. Unfortunately for grossly overlapped and smoothly merged the Motor Theory, we do not yet know the exact movements at the periphery are controlled by, and characteristics of these motor invariants, nor can must preserve information about, relatively long we adequately describe the processes by which strings of the invariant, categorical units that they control the movements of the articulators. speech cares about but other motor systems do My colleagues, including especially Cathe

30 Some Assumptions about Speech and How They Changed

Browman, Louis Goldstein, Elliot Saltzman, and ist is therefore required to assume that these rep- , are currently in search of those in- resentations are linked to language and to each variants and processes, and I am confident that other only insofar as speaker and listener have they will, in time, succeed in finding them. somehow selected them for linguistic use from the Meanwhile, I will, for all the reasons set forth in indefinitely large set of similarly nonphonetic al- this paper, remain confident that motor invariants ternatives, and then connected them at a cognitive do exist, and that they are the ultimate con- level to the same phonetic name or other linguistic stituents of speech, as produced and as perceived. entity. Altogether, a roundabout way for a natural According to the Revised Motor Theory, then, mode of communication to work. there is a phonetic module, part of the larger For the verticalists, on the other hand, the specialization for language, that is biologically question is easy. On their view, it was specifically adapted for two complementary processes: one phonetic gestures that evolved, together with the controls the overlapping and merging of the specialized processes for producing and perceiving gestures that constitute the phonetic primitives; them, and it is just these gestures that provide the the other processes the resulting acoustic signal so common currency with which speaker and listener as to recover, in perception, those same primitives. conduct their linguistic business. Parity is thus On this view, one sees a distinctly linguistic way guaranteed, having been built by evolution into of doing things down among the nuts and bolts of the very bones of the system; there is no need to action and perception, for it is there, not in the arrive at agreements about which signals are remote recesses of the cognitive machinery, that relevant and how they are to be connected to units the specifically linguistic constituents make their of the language. first appearance. Thus, the Revised Motor Theory How is speech related to other natural modes is very different from its early ancestor; the two remain as one only in supposing that the object of of communication? perception is the invariant gesture, not the I noted earlier that human beings communicate context-sensitive sound. phonologically but other creatures do not, and, further, that this difference is important, because How is the requirement for parity met? it determines whether the inventory of 'words' is In all communication, whether linguistic or not, open or closed. Now, in the interest of parsimony, sender and receiver must be bound by a common I ask whether either view of speech allows that understanding about what counts: what counts for there is, nevertheless, something common to two the sender must count for the receiver, else modes of communication that are equally natural. communication does not occur. In the case of On the horizontal view, the two modes must be speech, speaker and listener must perceive, or seen as different in every important respect. otherwise know, that, out of all possible signals, Nonhuman animals, the horizontalists would pre- only a particular few have linguistic significance. sumably agree, communicate as they do because of Moreover, the processes of production and their underlying specializations for producing and perception must somehow be linked; their perceiving the appropriate.signals. I doubt that representations must, at some point, be the same. anyone would seriously claim that these require to Though basic, this requirement tends to pass be translated before they can take on communica- unnoticed by those who lookatspeech tive significance. For the human, however, the horizontally, and, especially, by those whose horizontal position, as we have seen, is that the preoccupation with perception leaves production specialization, if any, is not at the level of the sig- out of account. However, vertical Motor Theorists nal, but only at some cognitive remove. I find it like Ignatius Mattingly and me are bound to think hard to imagine what might have been gained for the requirement important, so we have given it a human beings by this evolutionary leap to an ex- name`parity'and asked how, in the case of clusively cognitive representation of the commu- speech communication, it was established and nicative elements, except, perhaps, the smug sat- how maintained. isfaction they might take in believing that they Horizontalists must, I think, find the question communicate phonologically, and the nonhuman very hard. For if, as their view would have it, the animals do not, because they have an intellectual acts of the speaker are generally motor and the power the other creatures lack, and that even in percepts of the listener generally auditory, then the most basic aspects of communication they can act and percept have in common only that neither count themselves broad generalists, while the oth- has anything to do with language. The horizontal- ers must be seen as narrow specialists. 24 Liberman

On the vertical view, human and nonhuman have to be patent. The consequence would be that, communication alike depend on a specialization at to say a monosyllabic word like 'bag', the speaker the level of the signal. Of course, these specializa- would have to articulate the segments discretely, tions differ one from another, as do the vehicles and that would produce, not the monosyllable acoustic, optical, chemical, or electricalthat they `bag', but the trisyllable [bo] [x] [ge]. To articulate use. And, surely, the phonetic specializationdif- the syllable that way is not to speak, but to spell, fers from all the others in a way that is, as we and spelling would be an impossibly slow and te- know, critical to the openness or generativity of dious way to communicate language. language: Still, the vertical view permits us to see One might imagine that if production had been that phonetic communication is not completely off the only problem in the matter of rate, nature the biological scale, since it is, like the other natu- might have solved it by abandoning the vocal ral forms of communication, a specialization all tract, providing her human creatures, irsstead, the way down to its roots. with acoustic devices specifically adapted to producing rapid-fire sequences of sound. That What are the (special) requirements of would have taken care of the production problem, phonological communication, and how while, at the same time, defeating the ear. The are they met? problem is that, at normal rates, speech produces If phonology is to use the combinatorial from eight to ten segments per second, and, for principle, and so serve its critically important short stretches, at least double that number. But function of building a large and open vocabulary if each of those were a unit sound then rates that out of a small number of elements, then it must high would strain the temporal resolving power of meet at least two requirements. The more obvious the ear, and, of particular importance to phonetic is that the phonological segments be commutable, communication, also exceed its abililty to perceive which istosay discrete,invariant, and the order in which the segments had been laid categorical. The other requirement, which is only down. Indeed, the problem would be exactly the slightly less obvious, concerns rate. For if all one we encountered when, in the early work on utterances are to be formed by stringing together reading machines, we presented- acoustic an exiguous set of commutable elements, then, alphabets at high rates. inevitably, the strings must run to great lengths. According to the vertical view, nature solved the There is, therefore, a high premium on rapid rate problem by avoiding the acoustic-auditory communication of the elements, not only in the (horizontal) strategy that would have caused it. interest of getting the job done in good time, but What evolved as the phonetic constituents were also in order to make things reasonably easy, or the special gestures I spoke of earlier. These serve even feasible, for those other processes that have well as the elements of language, because, if got to organize the phonetic segments into words properly chosen and properly controlled, they can and sentences. be coarticulated, so strings of them can be Consider how these requirements would be met produced at high rates. In any case, all speakers of if, as the horizontal view would have it, the ele- all languages do, in fact, coarticulate, and it is ments were sounds and the auditory percepts they only by this means that they are able to evoke. If it were these that had to be commutable, communicate phonologically as rapidly as they do. then surely it would have been possible to make Coarticulation had happy consequences for per- them so, but only at the expense of rate. For ception, too. For coarticulation folds information sounds and the corresponding auditory percepts to about several successive segments into the same be discrete, invariant, and categorical would re- stretch of sound, thereby achieving a parallel quire that the segmentation be apparent at the transmission of information that considerably re- surface of the signal and in the most peripheral laxes the constraint imposed by the temporal re- aspects of the articulation. How else, on the hori- solving properties of the ear. But this gain came at zontal view, could commutability be achieved, ex- the price of a relation between acoustic signal and cept as each discrete sound and associated audi- phonetic message that is complex in a specifically tory percept were produced by a correspondingly phonetic way. One such complication is the con- discrete articulatory maneuver? Of course, the text-conditioned variability in the acoustic signal sounds and the percepts might be joined, as are that I identified as the primary motivation for the the segments of cursive writing, and that might Early Motor Theory, presenting it then as if it speed things up a bit, but, exactly as in cursive were an obstacle that the processes postulated by writing, the segmentation would nevertheless the theory had to overcome. Now, on the Revised

32 Some Assumptions about Speech and Haw They Changed 25

Motor Theory, we can see that same variability as would have it, or the articulatory gesture, which is a blessing, a rich source of information about pho- the choice of the vertically oriented Motor Theory. netic structure, and, especially, about order. The multiplicity of acoustic-phonetic boundaries Consider, again, the difficulty the auditory system and rues. Research of the kind j-ist referred to has has in perceiving accurately the ordering of dis- succeeded in isolating many acoustic variables crete and brief sounds that are presented sequen- important to perception of the various phonetic tially. Coarticulation effectively gets around that segments, and in finding for each the location of difficulty by permitting the listener to apprehend the boundary that separates the one segment from order in quite another way. For, given coarticula- some alternativefor example, [ba) from [da). The tion, the production of any single segment affects horizontalists take satisfaction in further the acoustic realization of neighboring segments, experiments on some of these boundaries in which thereby providing, in the context-conditioned it has been found that they are exhibited by variation that results, accurate information about nonhuman animals, or by human observers when which gesture came first, which second, and so on. presented with nonspeech analogs of speech, for Hence, the order of the segments is conveyed these findings are, of course, consistent with the largely by the shape of the acoustic signal, not by assumption that the boundaries are auditory in the way pieces of sound are sequenced in it. For nature. In response, the verticalists point to example, in otherwise comparable consonant- experiments in which it has been found that the vowel and vowel-consonant syllables, the listener boundaries differ between human and nonhuman is not likely to mistake the order, however brief subjects, and, in humans, between speech and the syllables, because the transitions for prevo- nonspeech analogs, arguing, in their turn, that calic and postvocalic consonants are mirror im- these findings support the view that the ages. But these will have the proper perceptual boundaries are specifically phonetic. Indeed, for consequence only if the phonetic system is special- some parties to the debate it has been in the ized to represent the strongly contrasting acoustic interpretation of these boundaries that the signals, not as similarly contrasting auditory per- difference between the two views has come into cepts, but as the opening and closing phases of the sharpest focus. The issue therefore deserves to be same phonetic gesture. Accordingly, order is given further ventilated. for free by processes that are specialized to deal It is now widely accepted that the location of the with the acoustic consequences of coarticulated acoustic-phonetic boundary on every relevant cue phonetic gestures. Thus, we see that a critical dimension varies greatly as a function of phonetic function of the phonetic module is not so much to context, position in the syllable, and vocal-tract take advantage of the properties of the general dimensions. It is now also known, and accepted, motor and auditory systemsa matter that was that some vary with differences in language type, briefly examined earlieras it is to find a way linguistic stress, and rate of articulation. For at around their limitations. least one of theserate of articulationthe variation is possibly continuous. From all this it Could the assignment of the stimulus follows that the number of acoustic-phonetic information to phonetic categories boundaries is indefinitely large, far too large, plausibly be auditory? surely, to make reasonable the assumption that Many of the empirically based arguments about they are properties of the auditory system. How the two theories of speech, including, especially, would these uncountably many boundaries have most of those that have been advanced against the been selected for as that system evolved? Surely, vertical position, come from experiments, much not just against the possibility that language like those described in the first section of this es- would come along and find them usefill. Indeed, as say, that were designed very simply to identify the auditory properties, they would presumably be information that leads to perception of phonetic dysfunctional,sincetheyareperceptual segments. The results of these experiments have discontinuities of a sort, and would, therefore, proved to be reliable, so there is quite general cause continuously varying acoustic events to be agreement about the nature of the relevant infor- perceived discontinuously, thereby frustrating mation. Disagreement arises only, but nonetheless veridical perception. fundamentally, about the nature of the event that The matter is the worse confounded for the the information is informing about. Is it the auditory theory when proper account is taken of sound, as a proper auditory (and horizontal) view the fact that, for every phonetic segment, there 26 Liberman are multiple cues, and that phonetic perception them. The auditory system is then free to respond uses all of them. For if, in accounting for the to all other acoustic stimuli in a way that does not perception of certain consonantal segments, we inappropriately conform their perception to a attribute an auditory basis to all the context- phonetic mold. variable boundaries on, say, the second-formant Integrating cues that are acoustically heteroge- transitionsalready a dubious assumption, as neous, widely distributed in time, and shared by we've seenthen what do we do about the third- disparate segments. Having already noted that formant transitions and the bursts (or fricative there are typically many cues for a phonetic dis- noises)? These various information-bearing tinction, I take note of the well-known fact that aspects of the signal are not independently these many cues are, more often than not, acousti- controllable in speech production, so one must cally heterogeneous. Yet, in the several cases so wonder about the probability that a gesture so far investigated, they can, within limits, be managed as to have just the 'right' acoustic traded, one for the other, without any change in consequences for the second-formant transition the immediate percept. That is, with other cues would happen, also, to have just the right neutralized, the phonetic distinction can be pro- consequences inallcases for the other, duced by any one of the acoustically heteroge- acoustically very different cues. On its face, that neous cues, with perceptual results that are not probability would seem to be vanishingly small. discriminably different. Since these perceptual Nor does it help the horizontal position to equivalences presumably exist among all the cues suggest, as some have, that the acoustic-phonetic for each such contrast, the number of equivalences boundaries exhibited by nonhuman animals must be very great, indeed. But how are these served merely as the auditory starting pointsthe many equivalences to be explained? From an protoboundaries, as it wereto which all the acoustic or auditory-processing standpoint, what others were somehow added. For this is to suppose do such acoustically diverse, but perceptually that, out of the many conditions known to affect equivalent, cues have in common? Or, in the ab- the acoustic manifestation of each phonetic sence of that commonality, how plausible is it to segment, some one is canonical. But is it plausible suppose that they might nevertheless have to suppose that there really are canonical forms evolved in the auditory system in connection with for vocal-tract size, rate of articulation, condition its nonspeech functions? In that regard, one asks of stress. language type, and all the other the same questions I raised about the claim con- conditions that affect the speech signal? And what cerning the auditory basis of the boundary posi- of the further implications? What, for example, is tions. What general auditory function would have the status of the countless other boundaries that selected for these equivalences? Would they not, in had then to be added in order to accommodate the almost every case, be dysfunctional, since they noncanonical forms? Did they infect the general would make very different acoustic events sound auditory system, or were they set apart in a the same? And, finally, what is the probability distinct phonetic mode? If the former, then why that speakers could so govern their articulatory does everything not sound very much like speech? gestures as to produce for each particular phonetic If the latter, then are we to suppose that the segment exactly the right combination of percep- listener shifts back and forth between auditory tually equivalent cues? and phonetic modes depending on whether or not The Revised Motor Theory has no difficulty it is the canonical form that is to be perceived? dealing with the foregoing facts about stimulus None of this is to say that natural boundaries or equivalence. It notes simply that the acoustically discontinuities do not exist in the auditory heterogeneous cues have in common that they are systemI believe there is evidence that they do products of the same phonetically significant but rather to argue that they are irrelevant to gesture. Since it is the gesture that is perceived, phonetic perception. the perceptual equivalence necessarily follows. All of the tbregoing considerations are simply Also relevant to the argument is the fact that grist for the Motor Theory mill. For on that phonetic perception integrates into a coherent theory, the phonetic module uses the speech phonetic segment a numerous variety of cues that stimulus as information about the gestures, which are, because of coarticulation, widely dispersed are the true and immediate objects of phonetic through the signal and used simultaneously to perception, and so finds the acoustic-phonetic provide information for other segments in the boundaries where the articulatory apparatus string, including not only their position, as earlier happened, for its own very good reasons, to put noted, but also their phonetic identity. The sim-

3 4 Some Assumptions about Speech and How They Changed plest examples of such dispersal were among the cal stimulus, or that an au&tory percept and a vi- very earliest findings of our speech research, as sual percept become indistinguishable as a conse- described in the first half of this essay. Since then, quence of frequent association. Indeed, we are re- the examples have been multiplied to include quired to believe, even more implausibly, that the cases in which the spread of the cues for a single seemingly auditory percept elicited by the optical segment is found to be much broader than origi- stimulus is so strong as to prevail over the normal nally supposed, extending, in some utterances, (anddifferent)auditoryresponsetoa from one end of a complex syllable to the other; suprathreshold acoustic stimulus that is presented yet, even in these cases, the phonetic system inte- concurrently. If there were such drastic perceptual grates the information appropriately for each of consequences of association in the general case, the constituent segments. I have great difficulty then the world would sometimes be misrepre- imagining what function such integration would sented to observers as they gained experience with serve in a system that is adapted to the perception percepts in different modalities that happened of- of nonspeech events. Indeed, I should suppose that ten to be contiguous in time. Fortunately for our it would sometimes distort the representation of relation to the world, there is no reason to suppose events that were discrete and closely sequenced. that such modality shifts, and the consequent dis- Again, the Revised Motor Theory has a ready, if tortions of reality, ever occur. As for the implica- by now expected, explanation: the widely tions of the horizontal account for the McGurk ef- dispersed cues are brought together, as it were, fect specifically, we should expect that the phe- into a single and perceptually coherent segment nomenon would be obtained between the sounds of because they are, again, the common products of speech and print, given the vast experience that the relevant articulatory gesture. literate adults have had in associating the one Integrating acoustic and optical information. It with the other. Yet the effect does not occur with is by now well known that, as Harry McGurk print. It also weighs against the same account demonstrated some years ago, observers form that prelinguistic infants have been shown to be phonetic percepts under conditions in which some sensitive to the correspondence between speech of the information is acoustic and some optical, sounds and seen articulatory movements, which provided the optical information is about is, of course, the basis of the McGurk effect. articulatory gestures. Thus, when observers are On the vertical view, the McGurk phenomenon presented with acoustic [ba], but see a face saying is exactly what one would expect, since the [de], they will, under many conditions of intensity acoustic and optical stimuli are providing and clarity of the signal, perceive [da], having information about the same phonetic gesture, and taken the consonant from what they saw and the it is, as I have said so relentlessly, precisely the vowel from what they heard. Though the gesture that is perceived. perceptual effect is normally quite compelling, the result istypically experienced as slightly Just how 'special' is speech perception? imperfect by comparison with the normal case in The claim that speech percep-tion is special has which acoustic and optical stimuli are in been criticized most broadly, perhaps, on the agreement. But the observers can't tell what the ground that it is manifestly unparsimonious and nature of the imperfection is. That is, they can't lacking in generality. Unparsimonious, because a say that it is to be attributed to the fact that they "special" mechanism is necessarily an additional heard one thing but saw another. Left standing, mechanism; and lacking in generality, because therefore, is the conclusion that the McGurk effect that which is special is, by definition, not general. provides strong evidence for the equivalence in As for parsimony, I have already suggested that phonetic percepi ion of two very different kinds of the shoe is on the other foot. For the price of physical information, acoustic and optical. denying a distinctly phonetic mode at the level of For those who believe that speech perception is perception is having to make the still less auditory, the explanation of the McGurk effect parsimonious assumption that such a mode begins must be that the unitary percept is the result of a at a higher cognitive level, or wherever it is that learned association between hearing a phonetic the auditory percepts of the horizontal view are structure and seeing it produced. As an explana- converted to the postperceptual phonetic shapes tion of the phenomenon, however, such an account they must assume if they are to serve as the seems manifestly improbable, since it requires us vehicles of linguistic communication. to believe, contrary to all experience, that a con- But generality is another matter. Here, the hor- vincing auditory percept can be elicited by an opti- izontal view might appear to have the advantage,

5 28 Liberman since it sees the perception of speech as a wholly on the basis of which the presence of phonetic unexceptional example of the workings of an audi- structure can, under all circumstances, be reliably tory modality that deals with speech just as it apprehended. But is it not so with syntax, too? If a does with all the other sounds to which the ear is perceiving system is to determine whether or not sensitive. In so doing, however, this view sacrifices a string of words is a sentence, it surely cannot what is, I think, a more important kind of general- rely on some list of surface properties; rather it ity, since it makes speech perception a mere ad- must determine if the string can be parsedthat junct to language, having a connection to it no less is, if a grammatical derivation can be found. Thus, arbitrary than that which characterizes the rela- the specializations for phonetic and syntactic per- tion of language to the visually perceived shapes ception have in common that their products are of an alphabet. The vertical view, on the other deeply linguistic, and are arrived at by procedures hand, shows the connection to language to be truly that are similarly synthetic. organic, permitting us to see speech perception as As for specializations that are adapted for special in much the same way that other compo- functions other than communication, Mattingly nents of language perception are special. I have and I have claimed for speech perception that, as I already pointed out in this connection that the earlier hinted, it bears significant resemblances to output of the specialized speech module is a repre- a number of biologically coherent adaptations. sentation that is, by its nature, specifically appro- Being specialized for different functions, each of priate for further processing by the syntactic com- these is necessarily different from every other one, ponent. Now I would add that the processes of but they nevertheless have certain properties in phonetic and syntactic perception have in common common. Thus, they all qualify as modules, in that the distinctly linguistic representations they Fodor's sense of that term, and therefore share produce are not given directly by the superficial something like the properties he assigns to such properties of the signal. Consider, in this connec- devices. I choose not to review those here, but tion, how a perceiving system might go about de- rather to identify, though only briefly, several ciding whether or not an acoustic signal contains other common properties that such modules, phonetic information. Though there are, to be including the phonetic, seem to have. sure, certain general acoustic characteristics of To see what some of those properties might be, natural speech, experience with synthetic speech Mattingly and I have found it useful to distinguish has shown that none of them necessarily signals broadly between two classes of modules. One com- the presence of phonetic structure. Having already prises, in the auditory modality, the specializa- noted that this was one of the theoretically inter- tions for pitch, loudness, timbre, and the like. esting conclusions of the earliest work with the These underlie the common auditory dimensions the highly schematized drawings used on the that, in their various combinations, form the in- Pattern Playback, I add now that more convincing definitely numerous percepts by which people evidence of the same kind has come from later re- identify a correspondingly numerous variety of search that carried the schematization of the syn- acoustic events, including, of course, many that thetic patterns to an extreme by reducing them to are produced by artifacts of one sort or another, three sine waves that merely follow the center and that are, therefore, less than perfectly natu- frequencies of the first three form ants. These bare ral. We have thought it fitting to call this class bones have nevertheless proved sufficient to evoke 'open'. It is appropriate to the all-purpose charac- phonetic percepts, even though they have no ter of these modules that its representations be common fundamental, no common fate, nor, in- commensurate with the relevant dimensions of deed, any other kind of acoustic commonality that the physical stimulus. Thus, pitch maps onto fre- might provide auditory coherence and mark the quency, loudness onto amplitude, and timbre onto sinusoids acoustically as speech. What the sinu- spectral shape; hence, we have called these repre- soids do offer the listenerindeed, all they offer sentations 'homomorphic'. It is also appropriate to is information about the trajectories of the for- their all-purpose function that these homomorphic mants, which is to say movements of the articula- representations not be permanently changed as a tors. If those movements can be seen by the pho- result of long experience with some acoustic event. netic module as appropriate to linguistically sig- Otherwise, acquired skill in using the sound of au- nificant gestures, then the module, being properly tomobile engines for diagnostic purposes, for ex- engaged, integrates them into a coherent phonetic ample, would render the relevant modules mal- structure; otherwise, not. There are. then, no adapted for every one of the many other events for purely acoustic properties, no acoustic stigmata, which they must be used.

36 Some Assumptions about Speech and How They Changed 29

Members of the other classthe one that in- heteromorphic and homomorphicat the same cludes speechare more narrowly specialized for time? Given binocularly disparate stimuli, why particular acoustic events or stimulus relation- does the viewer not see double images in addition ships that are, as particular events or relation- to depth? Or, given two syllables [dal and [gal that ships, of particular biological importance to the are distinguished only by the direction of the animal. We have, therefore, called this class third-formant transition, why does the listener not 'closed'. It includes specializations like sound lo- hear, in addition to the discrete consonant and calization, stereopsis, and echolocation (in the bat) vowel, the continuously changing timbre that the that I mentioned earlier. Unlike the representa- most nearly equivalent nonspeech pattern would tions of the open class, those produced by the produce, and to which the two transitions would closed modules are incommensurate with the di- presumably make their distinctively different, but mensions of the stimulus; we have therefore called equally nonphonetic, contributions. them 'heteromorphic'. Thus, the sound-localizing Mattingly and I have proposed that the competi- module represents interaural differences of time tion between the modules is normally resolved in and intensity, not homomorphically as time or favor of the members of the closed class by virtue loudness, but heteromorphically as location; the of their ability to preempt the information that is module for stereopsis represents binocular dispar- ecologically appropriate to computing the hetero- ities, not homomorphically as double images, but mophic percept, and thus, in effect, to remove that heteromorphically as depth. The echo-locating information from the flow. As for what is ecologi- module of the bat presumably represents echo cally appropriate, the closed modules have an time, not homomorphically as an echoing (bat) cry, elasticity that permits them to take a rather broad but heteromorphically as distance. In a similar view. Thus, the module for stereopsis represents way, the phonetic module respresents the contin- depth for binocular disparities considerably uously changing formants, not homomorphically greater than would ever be produced by even the as smoothly varying timbres, but heteromorphi- most widely separated eyes. The phonetic module cally as a sequence of discrete and categorical will, in its turn, tolerate rather large departures phonetic segments. from what is ecologically plausible. Imagine, for Unlike the open modules, those of the closed example, two synthetic syllables, [da] and [gal, class depend on very particular kinds of environ- distinguished only by the direction of a third-for- mental stimulation, not only for their develop- mant transition that. as I indicated earlier, ment, but for their proper calibration. Moreover, sounds in isolation like a nonspeech chirp. If the they remain plasticthat is, open to calibration syllables are now divided into two partsone, the for some considerable time. Consider, in this con- critical transition cue; the other, the remainder of nection, how the sound-localizing module must be the pattern (the 'base') that, by itself, is ambigu- continuously recalibrated for its response to inter- ously [da] or [galthen, the phonetic system will aural differences as the distance between the ears integrate them into a coherent [da] or [gal syllable increases with the growth of the child's head. The even when the two parts have, by various means, similarly plastic phonetic module is calibrated been made to come from perceptibly different over a period of years by the particular phonetic sources. (Mattingly has a different interpretation environment to which it is exposed. Significantly, of this particular phenomenon, so I must take full the calibration of these modules in no way affects responsibility for the one I offer here,) In what is, any of the specializations of the open class, even perhaps, the most dramatic demonstration of this though their representations figure importantly in kind of integration, the isolated transition cue is the final percept, as in the paraphonetic aspects of presented at a location opposite one ear, the am- speech, for example, This is to say that the closed biguous base at a location opposite the other. modules must learn by experience, as the phonetic Under these ecologically implausible circum- module most surely does, but the learning is of an stances, the listener nevertheless perceives a per- entirelyprecognitivesort,requiring only fectly coherent [dal or [gal, and, more to the point, neurological normality and exposure (at the right confidently localizes it to the side where only the time) to the particular kinds of stimuli in which ambiguous base was presented. (This happens, the module is exclusively interested. indeed, even when both the base and the critical As implied above, the two classes have their own transition cue are made of frequency-modulated characteristically different ways of representing sinusoids, which is a most severe test, since the the same dimension of the stimulus. Why, then, differently located sinusoids would, as I pointed does the listener not get both representations out earlier, seem to lack any kind of acoustic co-

3 7 30 Liberman

herence that might cause them to be integrated on How do speaking and listening differ from some auditory basis.) But there are limits tothis writing and reading? elasticity, and seemingly similar effects occur in phonetic perception and in stereopsis when those Among the most obvious, and obviously conse- limits are exceeded. Thus, in the speech case, as quential, facts about language is the immense dif- the stimulus is made to provide progressively ference in biological status, hence naturalness, be- more evidence for separate sourcesforexample, tween speech, on the one hand, and writ- by increasing the relative intensity of the isolated ing/reading, on the other. The phonetic units of transition on the one sidelisteners begin to hear speech are the vehicles of every language on both the integrated syllable and the nonspeech earth, and are commanded by every neurologically chirp. In other words, the heteromorphic and ho- normal human being. On the other hand, many, momorphic percepts are simultaneously repre- perhaps most, languages do not even have a writ- sented. In the speech case, this has been called ten form, and, among those that do, some compe- 'duplex perception'. Appropriate (and necessary) tent speakers find it all but impossible to master. tests for the claim that the duplex percepts are Having been thus reminded once again that representations of truly different types, and not speech is the biologically primary form of the simply a cognitive reinterpretation of a single phonological that typifies our species, we type, are in the demonstration that listeners can- readily appreciate that alphabetic writing is not not accurately identify the chirps as [da] and [gal, really the primary behavior itself, but only a fair and that they hear only the integrated syllable description of it. Since what is being described is and the chirp, not those and also the ambiguous species typical, alphabetic writing is a piece of base. It is also relevant to the claim that the dis- ethological science, in which case a writer/reader crimination functions obtained for the percepts on is fairly regarded as an accomplished ethologist. It the two sides of the percept are radically different, weighs heavily against the horizontal view, there- for that shows that the listeners cannot hear the fore, that, as I have already said and will say one in the other, even when, under the conditions again below, it cannot comprehend the difference of the discrimination procedure, they try. between speaking/listening and writing/reading, (Unfortunately, these tests have not been applied for in that respect it is like a theory of bird song to the cases of nonspeech auditory perception that that does not distinguish the behavior of the bird some have claimed to be duplex.) from that of the ethologist who describes it. In the case of stereopsis, the elasticity of the To see the problem created by the horizontal module is strained by progressively increasing the view, we need first to appreciate, once again, that binocular disparity. Beyond a certain point, the writing-reading did not evolve as part of the lan- viewer begins to perceive, not only heteromorpic guage faculty, so the relevant acts and percepts depth, but also homomorphic double images. This cannot be specifically linguistic. The important seems quite analogous to duplex perception, consequence, of course, is that they require to be though it has not been called that. made so, and, as I have said so many times, that In both speech and stereopsis, providing further can be done only by some kind of cognitive trans- evidence of ecological implausibility causes the lation. Now I would emphasize that it is primarily heteromorphic percept (phonetic structure or in respect of this requirement that writing and depth) to weaken as its homomorphic counterpart reading differ biologically from speech. Indeed, it (nonphonetic chirp or double images) strengthens, is precisely in the need to meet this requirement until, finally, the closed module fails utterly, and that writing-reading are intellectual achievements only the homomorphic percept of the open in a way that speech is not. But the horizontal modules is represented. Thus:ie information in view of speech does not permit us to see that es- the stimulus can seemingly be variously divided sential difference. Rather, it misleads us into the between the two kinds of representation, and, belief that the primary processes of the two modes since either gains at the expense of the other, it is of linguistic communication are equally general, as if there were a kind of conservation of hence equally nonphonetic. That being so, we information. must suppose that the relevant representations Putting the matter most generally, then, I are equally in need of a cognitive connection to the should say that speech is special, to be sure, but language, and so have the same status from a bio- neither more nor less so than many other logical point of view. We are, of course, permitted biologically coherent adaptations, including, of to see the obvious and superficial differences, but, course, language itself. for each one of those, the horizontal view would Some Assumptions about Speech and How Thew Changed 31 seem, paradoxically, to give the advantage to writ- such children do, indeed, not know how to break a ing-reading, leading us to expect that writing- word into its constituent phonemes. That finding reading, not speaking-listening, would be the eas- has now been replicated many times. Moreover, ier and more natural. For, surely, the printed researchers at the Laboratories and elsewhere characters offer a much better signal-to-noise ra- have found that the degree to which would-be tio than the phonetically relevant parts of the readers are phonologically aware may be the best speech sound; the fingers and the hand are vastly single predictor of their success in learning to more versatile than the tongue; the eye provides read, and that training in far greater possibilities for the transmission of in- has generally happy consequences for the progress formation than the ear; and, for all the vagaries of in reading of those children who receive it. There some spelling systems, the alphabetic characters is also reason to believe that, other things equal, bear a more nearly transparent relation to the an important cause of may be a phonological units of the language than the con- weakness in the phonetic module. That weakness text-variable and elaborately overlapped cues of would make the phonological representations less the acoustic signal. clear than they would otherwise be, hence that The vertical view, on the other hand, is much harder to bring to awareness. Indeed, there appropriately revealing. Given the phonetic is at least a little evidence that reading-disabled module, speakers do not have to know how to spell childen do have, in addition to their problems with a word in order to produce it. Indeed, they do not phonological awareness, some of the other even have to know that it has a spelling. Speakers characteristics that a weak phonetic module might have only to access the word, however that is be expected to produce. Thus, by comparison with done; the module then spellsit for them, normals, they seem to be poorer in several kinds automatically selecting and coordinating the of phonologically related performances: short- appropriate gestures. Listeners are in similar term-memory for phonological structures, but not case. To perceive the word, they need not puzzle for items of a nonlinguistic kind; perception of out the complex and peculiarly phonetic relation speech, but not of nonspeech sounds, in noise; between signal and the phonological message it naming of objectsthat is,retrieving the conveys; they need only listen, again leaving all appropriate phonological structureseven when the hard work to the phonetic module. Being they know what the objects are and what they do; modular, these processes of production and and, finally, production of tongue-twisters. perception are not available to conscious analysis, The vertical view was not developed to explain so the speakers and listeners cannot be aware of the writing-reading process or the ills that so how they do what they do. Though the frequently attend it, but rather for all the reasons representations themselves are available to given in earlier sections of this paper. That it consciousnessindeed, if they were not, use of an nevertheless offers a plausible account, while the alphabetic script would be impossiblethey are horizontal view does not, is surely to be counted already phonological in nature, hence appropriate strongly in its favor. for further linguistic processing, so the reader need not even notice them, as he would have to if, Are there acoustic substitutes for speech? as in the case of alphabetic characters, some When Frank Cooper and I set out to build a arbitrary connection to language had to be formed. reading machine for the blind, we accepted that Hence, the processes of speech, whether in the answer to that question was not just 'yes', but production or perception, are not calculated to put 'yes, of course'. As I see it now, the reason for our the speaker's attention on the phonological units blithe confidence was that, being unable to that those processes are specialized to manage. imagine an alternative, we could only think in On the basis of considerations very like those, what I have here described as horizontal terms. Isabelle Liberman, Donald Shankweiler, and We therefore thought it obvious that speech Ignatius Mattingly saw, more than twenty years sounds evoked normal auditory percepts, and that ago, that, while awareness of phonological these were then named in honor of the various structure is obviously necessary for anyone who consonants and vowels so they could be used for would make proper use of an alphabetic script, linguistic purposes. such awareness would not normally be a On the basis of our early experience with the consequence of having learned to speak. Isabelle nonspeech sounds of our reading machines, we and Donald proceeded, then,totest this learned the hard way that things were different hypothesis with prelilerate children, finding that from what we had thought. But it was not until

39 32 Liberman we were well into the research on speech that we the sounds meet that requirement, then they will began to see just how different and why. Now, engage the specialization for speech, and so drawing on all that research, we would say that qualify as proper vehicles for linguistic structures; the answer to the question about acoustic otherwise, they will encounter all the difficulties substitutes is, 'no', or, in the more emphatic that fatally afflicted the nonspeech sounds we modern form, 'no way'. The sounds produced by a worked with so many years ago. reading machine for the blind will serve as well as speech only if they are of a kind that might have FOOTNOTE been made by the organs of articulation as they "This essay will appear as the introduction to a collection of figure in gestures of a specifically phonetic sort. If papers to be published by MIT Press. Haskins Laboratories Status Report on Speech Research 1993, SR-113, 33-40

On the Intonation of Sinusoidal Sentences: Contour and Pitch Height*

Robert E. Remezt and Philip E. Rubin

A sinusoidal replica of a sentence evokes a clear impression of intonation despite the absence of the primary acoustic correlate of intonation, the fundamental frequency. Our previous studies employed a test of differential similarity to determine that the tone analog of the first formant is a probable acoustic correlate of sinusoidal sentence intonation. Though the typical acoustic and perceptual effects of the fundamental frequency and the first formant differ greatly, cur finding was anticipated by reports that harmonics of the fundamental within the dominance region provide the basis for impressions of pitch more generally. The frequency extent of the dominance region roughly matches the range of variability typical of the first formant. Here, we report two additional tests with sinusoidal replicas to identify the relevant physical attributes of the first formant analog that figure in the perception of intonation. These experiments determined (1) that listeners represent sinusoidal intonation as a pattern of relative pitch changes correlated with the frequency of the tonal replica of the first formant, and (2) that sinusoidal sentence intonation is probably a close match to the pitch height of the first formant tone. These findings show that some aspects of auditory pitch perception apply to the perc^ ption of intonation; and, that impressions of pitch of a multicomponent nonharmonic signal can be derived from the component within the dominance region.

When the pattern of formant frequency varia- composing the signal (Remez & Rubin, 1983, tion of a natural utterance is imparted to several 1990). Correspondingly, we have attributed the time-varying sinusoids, there are two obvious per- anomalous voice qualities to the absence of ceptual consequences. First, the phonetic content harmonic structure and broadband resonances of the original natural utterance is preserved, and from the short-term spectrum, and to the peculiar listeners understand the sinusoidal voice to be intonation of sinewave sentences (Remez et al., saying the same sentence as the natural one on 1981; Remez & Rubin, 1984; in press). Unlike which the sinusoidal sentence is modeled (Remez, natural and synthetic speech, sinewave replicas of Rubin, Pisoni, & Carrell, 1981). Second, the qual- utterances do not present the listener with a ity of a sinusoidal voice is unnatural both in its fundamental frequency common to its tonal timbre and its intonation (Remez et al., 1981; components. In the absence of this typical Remez, Rubin, Nygaard, & Howell, 1987; Remez & correlate of intonation, our studies revealed that Rubin, in press). Our studies have attributed the the tonal analog of the first formant is a probable intelligibility of sinusoidal sentences to the avail- source of the impression of the odd sentence ability of time-varying phonetic information which melody accompanying the phonetic perception of is independent of the specific acoustic elements replicated utterances. The studies of intonation that are reported here address two iswes about the method of our earlier research (Remez & Rubin, 1984), producing the The authors gladly acknowledge the advice and editorial evidence that now permits a clear interpretation. care of Stefanie Berns and Jennifer Pardo; and the assiduous scholarship and criticism of Eva Blumenthal. This research These new tests show that impressions of sinu- was supported by grants from NIDCD (00308) to R. E. Remez, soidal intonation are approximate to the pattern and from NICHHD (01994) to Haskins Laboratories. of frequency change of the tone analog of the first

33 41 34 Rernez and Rubin formant, and that the intonation of the sinusoidal chancewhen requested to identify the intonation sentence appears to approximate the specific pitch of a sinusoidal sentence replica that lacked the height of this crucial tone component. component analogous to the first formant. Overall, these results suggested that the Sinusoidal intonation intonation of a sinusoidal replica of a sentence is In our initial studies of the basis for the correlated with attributes of the analog of the first impression of intonation in sinewave sentences formant. The selection of the first formant tone (Remez & Rubin, 1984), we used a matching task, (Tone 1) as the match to intonation persisted even on each trial of which the subject listened to a when that tone had neither the greatest power sinusoidal sentence followed by two single-tone among the sinusoidal components of the sentence patterns, and chose from the pair of single tones nor the lowest frequency. In that case, it is not the better match to the intonation of the sinewave surprising that the intonation of sinewave sentence. By examining the likeness judgments sentences seems odd. The pattern of first formant across a set of single-tone candidates, we were frequency variation is better correlated with the able to identify a best match, good matches, and opening and closing of the jaw in cycles of syllable poor matches to the intonation of a particular production than the patterns of rise and fall of sentence. We generally found that the tone intonation, which are correlated with the imitating the first formant was preferred to all polysyllabic breath group. But, as evidence for a other alternatives. more specific claimthat sinusoidal sentence First, listeners consistently chose the tone repli- intonation is based on the frequency variation of cating the first formant from a set of tonal candi- the first-formant analogour prior findings are dates that also included the second and the third equivocal in two crucial ways. First, if listeners formant analogs of the sentence replica. The set of chose Tone 1 because it reprised the pattern of alternatives also contained a tone reproducing the sentence pitch, this choice may also have occurred greatest common divisor of the three concurrent if Tone 1 was simply the candidate falling closest tones of the sentence pattern; and a tone which in pitch range to the apparent intonation of the presented a linguistically and acoustically plausi- sentence. In fact, the report of Grunke & ble fundamental frequency contour for the sen- Pisoni (1982) suggests that this alternative tence that we used. The expression of any clear hypothesis is plausible. They noted that the preference among these alternatives suggested apparent similarity of speechlike syllable-length that subjects understood the instructions and tone patterns was affected by average tone were capable of the task that we set for them. frequency as well as by details within the pattern; Second, we found that the preference for the first none of the tests of Remez & Rubin (1984) formant analog did not depend on the fact that it evaluated the possibility that similar effects occur had the greatest energy among the constituents of in sentence length patterns. A second point to the sentence, for the first formant tone was the consider is that the single-tone alternatives used choice for matching the intonation regardless of by Remez & Rubin (1984) in the matching test the ordinal amplitude relations that we imposed confounded pitch range and pitch pattern. It is ameng the tones reproducing the formant centers. necessary to observe matching preferences for Here, the alternatives in the matching set again Tone 1in a study controlling pitch range were the tonal components of the sentence replica. differences among the test alternatives to Third, when listeners were asked to identify the conclude that listeners hear sinusoidal intonation intonation of a sentence pattern that included as the correlate of Tone 1. formant analogs with an additional tonal compo- The two studies reported here also used a nent below the frequency of Tone 1 that followed matching task to test the hypothesis that the the natural fundamental frequency values of the acoustic correlate of intonation in sinewave utterance from which the tonal replica was de- sentences is the tone that follows the frequency rived, they did not select this Fo-analog as the and amplitude variation of the first formant. In match to intonation, but instead chose the first both experiments we employed a set of alternative formant analog again. Here, the alternatives in single-tone frequency patterns for subjects to use the matching set were the tonal components of the in matching their impressions of intonation of a sentence replica and the tone reproducing Fo four-tone sentence replica. In the first experiment, variation. Last, listeners expressed no consistent we composed the set of candidates to distinguish experience of intonationperformance on the in- the perceptual effect of frequency contour from tonation matching task did not differ from average frequency. On the Intonation of Sinusoidal Sentences: Contour and Pitch Height 35

EXPERIMENT 1 tence "My t.v. has a twelve inch screen," spoken by one of the authors. This utterance was recorded on While our prior studies had pointed to the tone audiotape in a sound-attenuating chamber and analog of the first formant as the closest match to converted to digital records by a VAX-based pulse- the intonation of a sinusoidal sentence, additional code modulation system using a 4.5 kHz low-pass tests were needed to conclude that the pattern of filter on input and a sampling rate of 10 kHz. At this tone, and not merely its average frequency, 10-ms intervals, center-frequency and amplitude was the basis for the likeness judgmrni:s. The test values were determined for each of the three low- performed in Experiment 1 offers alternatives est oral formants and the intermittent fricative that have identical average frequency but formant by the analysis technique of linear pre- different frequency patterns derived from tonal diction (Markel & Gray, 1976). In turn, these val- components of the sinusoidal sentence. If subjects ues were uSed as sinewave synthesis parameters fail here to exhibit a clear preference in matching after correcting the errors typical of linear predic- intonation and single-tone pitch patterns, then we tion estimates. A full description of sinusoidal would conclude that the average frequency is well replication of natural speech is provided by Remez represented perceptually in the perception of et al. (1987). intonation of a sinusoidal sentence, and the The resulting sentence pattern comprised four specific pattern of frequency changes less well. time-varying sinusoids. Tone 1 corresponded to the first formant, Tone 2 to the second, Tone 3 to Method the third, and Tone 4 was used to replace the Subjects. Fifteen audiologically normal adults fricative formant. A spectrographic representation were recruited from the population of Barnard of this pattern is shown in Figure 1. The three and Columbia Colleges. All were native speakers single-tone patterns that were used to compose of English, and none had been tested in other the pairs of alternatives in the matching trials experiments employing sinusoidal signals. The were Tone 1 and frequency-transposed versions of subjects were paid for participating. Tone 2 and Tone 3, each a component of the sen- Acoustic Test Materials. The acoustic materials tence pattern that the subject heard at the begin- used in this test consisted of four sinusoidal pat- ning of each trial. Tone 1 was produced with the terns: one four-tone sentence pattern, and three frequency values it exhibited within the sentence single-tone patterns, all of them generated by a pattern. These three single-tone alternatives were sinewave synthesizer (Rubin, 1980). This synthe- produced with equal average power, and each had sizer produces sinusoidal patterns defined by pa- roughly the same average frequency of 320 Hz. rameters of frequency and amplitude for each This was accomplished by dividing the frequency tone, updated at the rate of 10 ms per parameter parameters of Tones 2 or 3 by constant divisors frame. The initial synthesis parameters were ob- throughout the pattern and then synthesizing the tained by analyzing a natural utterance, the sen- transposed time-varying sinusoids.

300 ms

3 liii 1110011011.1 oingillypo 1Oa 1" on% 2 Iihnitil vioosol it Wit 1

litimiounlaulloo "N1..4 III110111101111111111

Time figure 1. Spectrographic representation of the tone replica of the sentence, "My t.v. has a twelve inch screen." Amplitude variation is represented in the height of each hatch mark placed at the tone frequency. 36 Rentez and Rubin

The synthesized test materials were converted formant analog as the match to intonation, sug- from digital records to analog signals, recorded on gesting that subjects used pitch patterns rather audiotape, and were presented to listeners by that average frequency in this test. This finding playback of the audiotape. Average listening adds credibility to our claim that the pattern of levels were set at 72 dB SPL. Test materials were frequency variation of the first formant tone is delivered binaurally in an acoustically shielded supplying the acoustic basis for pitch impressions. room over Telephonics TDH-39 headsets. Nevertheless, it remained to be shown that the Procedure. During an initial instruction portion precise range of frequency variation of this tone is of the test session, listeners were told that the responsible for the listener's impression of intona- experiment was examining the identifiability of tion in a sinusoidal sentence replica. vocal pitch, the tune-like quality, of synthetic sentences. To illustrate the independence of Contour Variation phonetic structure and sentence melody for the test subjects, the experimenter sang the phrases 1.0 "My Country 'Tis of Thee" and "I Could Have Danced All Night" with the associated melodies and with the melodies interposed. When subjects 0.8 acknowledged their ability to determine the melody of a sentence regardless of its words, they were instructed to attend on each test trial to the 0.6 pitch changes of the sinusoidal sentence, to identify the pattern, and then to select the

alternative of the two following patterns that 0.4 more closely resembled the intonation ofthe sentence. The subjects recorded their choices in %-4 specially prepared response booklets. The format of each trial was identical, consisting 0.2 of three sinusoidal patterns. First was the sinu- soidal sentence "My t.v. has a twelve inch screen," 0.0 presented once. Then, one of the three single-tone Tone 1 Tone 1 Tone 2t patterns was presented. Last, a second single-tone vs vs vs pattern was presented. There were six different Tone 2f Tone 3t Tone 3t comparisons among the three different single-tone Comparison alternatives. Counterbalanced for order, each sub- ject judged each different comparison twenty Figure 2, Relative preference for single-tone matches to sinusoidal sentence intonation; group performance in times. Each sinusoidal pattern was approximately Experiment 1. Subjects preferred Tone 1 to other tones of 3.1 s in duration, the intei val between items equal average frequency. within a trial was 1 s, and the interval between trials was 3 s. EXPERIMENT 2 Results and Discussion One way to determine the perceptual effect of The results are not difficult to interpret. the precise frequencies of the first formant analog Figure 2 shows the proportion of trials on which (as opposed to the contour of its frequency each alternative was chosen relative to the variation) is to see how subjects fare in the number of trials on which it was presented. matching task if all the alternative pitch Subjects preferred Tone 1 to the other two candidates have the same contour, differing only intonation candidates, and the mean differences in average frequency. If subjects persist in were large. An analysis of variance was performed selecting Tone 1 as the best match when other on the differences in preference between candidate candidates exhibit the same pattern of variation tones in the three comparisons, and was highly (at different pitch heights) we may conclude that significant [F (2,40)=71.62, p<.0001]. Tone 1 is a fairly direct cause of pitch inipressions. Because the single-tone alternatives differed in In this test, we used the same sentence replica frequency contour but not in average frequency, that we created for Experiment 1, but the tone al- the data reveal a strong preference for the first ternatives for the matching task were all derived

4 4 On the Intonation of Sinusoidal Sentences: Contour and Pitch Height 3. from the pattern of the first formant analog, Pitch Height Variation Tone 1. They were made by transposing the syn- thesis parameters for Tone 1 up or down in frequency, resulting in a set of Tone 1 variants, only one of which had the identical frequency values exhibited by Tone 1 in the sentence pattern. If subjects express precision in their preferences, we can conclude at least that the relative pitch pattern is anchored to a specific impression of pitch height, however else the impression of intonation is moderated by converging influences. Method Subjects. Twenty volunteer listeners with normal hearing in both ears were drawn from the student population of Barnard and Columbia Colleges. None had previously participated in studies of sinusoidal synthesis. The subjects Tone 1 Tcne 1 Tone 1 Tone 1 received pay for participating. vs vs vs vs Acoustic Test Materials. The same sinusoidal .6 Tone 1 .8 Tone 1 1.2 Tone 1 1.4 Tone 1 sentence pattern, "My t.v. has a twelve inch screen," from Experiment 1 was employed here. Comparison The single-tone alternatives consisted of Tone 1 Figure 3(a). Relative preference for single-tone matches to and four variants: the frequency pattern of Tone 1 sinusoidal sentence intonation; group performance in transposed downward by 20% and 40%, and Experiment 2. Conditions in which Tone 1 was compared transposed upward by 20% and 40%. with four transposed versions are shown. Subjects Procedure. Subjects heard the same brief preferred Tone 1 to other tones of the same frequency instructions that were used in the first experiment contour. . to spotlight the independence of sentence melody Pitch Height Variation

and lexical attributes. The test itself comprised 1 0 100 trials in which each began with a single presentation of the sinusoidal sentence, "My t.v. has a twelve inch screen." Next, two of the five single tone alternatives were presented, and the subject chose the closer match to the pitch pattern of the sinusoidal sentence. There were ten different comparisons of the five alternatives, 06 repeated five times in each order. Results and Discussion The outcome of this test was clear, again. An analysis of variance performed on the preference differences across the ten contests was highly significant [F (4,60)=13.79, p<.0001]. In essence, subjects consistently selected Tone 1 as most like the sentence pitch. These results are shown in Figure 3a. 0.0 .61'1 .671 .611 .8 T1 8 TI 1.211 The intonation matches on trials in which Tone 1 vs vs was not a candidate are shown in Figure 3b. Four 1 2 T1 1 4 11 1 2 11 1 4 TI 1 4 71 of these comparisons contained a pair in which one candidate was closer to Tone 1 in frequency Comparison than the other was, and two of these comparisons Figure 3(b). Relative preference for single-tone matches to had tones equally similar to Tone 1 in physical sinusoidal sentence intonation; group performance in frequency. In the two equal-similarity cases, Experiment 2. Conditions in which transpositions of subjects chose the higher tone as the better match. Tone 1 were compared with each other are shown. 38 Remez and Rubin

This is exactly what we would expect on theThis is exactly what we would expect based on the precedent of classic studies of pitch scaling (cf. rough match of frequency to pitch in this range, in Ward, 1970), which roughly confirm the relation- other words, that a 40% increase in frequency is a ship of the 2:1 frequency ratio and the interval of subjectively equal change to a 20% decrease in the octave. In the present case, this means that frequency. Overall, this pattern of results suggests the tone with its frequency increased by 40% rela- that the pitch height of the intonation of a sinu- tive to Tone 1 should be as different, subjectively, soidal sentence must be quite close to the pitch from Tone 1 as the tone with its frequency de- impression created by Tone 1 presented alone. creased 20%. This hypothesis is encouraged by the outcome of the condition comparing the case of GENERAL DISCUSSION 20% downward vs. 40% upward transpositions, in The findings of Experiments 1 and 2 add which there was no clear preference, suggesting a precision to our account of the odd intonation that subjective equality of these two in degree of simi- accompanies sinusoidal replicas of sentences. larity to sentence pitch. Listeners in our tests consistently selected the To sumrarize the ten comparisons, we derived tonal analog of the first formant as the match of a likeness index for each of the fivesingle tone- their impressions of intonation, as if that alternatives, by summing the total number component of the tone complex plays a dual role in of contests over the whole test in which each was the perception of sinewave sentences: First, it acts selected. Each tone occurred in forty contests, as if it were a vocal resonance supplying which yields a limiting score of 40 were that segmental information about the opening and tone selected on every opportunity. The group closing of the vocal tract; second, it acts as the averages are portrayed in Figure 4, along with periodic stimulation driving the perception of the best quadratic fit to the five points. Although pitch. If the tone analog of the first formant is Tone 1 is clearly the most frequently chosen responsible, the intonation pattern of a sinusoidal alternative across the likeness test, the effect sentence ought to sound weird, given that this of frequency transposition is not symmetrical. tonal component (1) lies several octaves above the natural frequency of phonation, and (2) varies in a Pitch Height Comparisons manner utterly unlike the suprasegmental 40 pattern of Fo variation. If the perceiver catches on to the trick of treating sinusoids as formant analogs, then it is reasonable to expect the analog of the first 30 - formant to contribute to impressions of consonants and vowels. But, why should this constituent of the signal also create an impression of intonation? These two studies of tone contour and frequency 20 - bolster the case which we have made relying on the dominance region hypothesis (Remez & Rubin, 1984). Research on the causes of pitch impressions

10 with nonspeech sets of harmonic components had shown that the auditory system is roughly keyed to detect pitch preferentially from excitation in the range of 400-1000 Hz (Plomp, 1967; Ritsma, 1967). In essence, these psychoacoustic studies of .6 Tone 1 .8 Tone 1 Tone 1 1.2 Tone 11.4 Tone 1 fundamentals around 100-200 Hz revealed a predominant influence on pitch impressions of the Single-tone alternatives common periodicity of the third through fifth Figure 4. Representation of the integrated performance harmonics, and a lesser influence of higher or on the ten contests of Experiment 2 is shown in this plot lower harmonics. An important proof of the effects of overall similarity to the apparent intonation of the of the dominance region using speech sounds was sinewave sentence. The likeness index for the five reported by Greenberg (1980). This study assessed single-tone alternatives in Experiment 2 is graphed with human auditory evoked potentials in response to a best quadratic fit to the group averages. (The curve is plotted to emphasize the asymmetry of the effect of synthetic speech spectra, and showed that the transposition; no specific theoretical significance is given representation of the fundamental frequency in to the terms of the best-fitting function.) the recordings was strongest when the frequency

4 6 On the Intonation of Sinusoidal Sentences: Contour and Pitch Height 39 of the first formant fell within the dominance result departed from the prediction that the region. Of course, in ordinary speech, the first vowels of English differed characteristically in formant ranges widely, from roughly 270 Hz to their spectral centers of gravity, and that such 850 Hz in the speech of adults, to as high as 1 kHz integrated spectral properties ruled the perception in children. By this notion, in ordinary listening of speech sounds. Because the listeners evidently the band extending from 400-1000 Hz is analyzed did not perceive the tone complexes as vowels, to supply both periodicity information and the Kuhl et al. concluded that speech and nonspeech center-frequency of the first formant that sounds are accommodated by divergent perceptual traverses it. analyses. From the perspective of our findings, The periodicity of a speech signal and the though, it seems clear that pitch impressions of frequency of its first formant typically differ three-tone steady-state vowels would be derived greatly in natural speech, both in pattern as well from the frequency of the first formant analogs in as in frequency range. In sinewave replicas,which any case, whether or not the spectrum of the tone lack harmonic structure, often the sole component complex is integrated in establishing a vowel falling in the dominance region is the tonal analog impression. A subject who is instructed to match of the first formant. In that case, we claimed, the vowel quality of a three-tone complex or to despite the absence of harmonics, the first match its predominant pitch may attend to very formant analog evidently presents an effective if different psychoacoustic attributes in each task. inadvertent stimulus for pitch perception, and Although we admit that the paths to phonetic acts as well as an lit plicit resonant frequency. perception and auditory form perception diverge Here, the periodicity within the dominance region eaily on (Remez, Rubin, Berns, Pardo, & Lang, converges on the center frequency of the first in press), the present studies show how intrkate albeit functionalformant, and the apparent the interpretation of evidence can be. intonation is therefore unlike familiar speech in While these experiments on sinusoidal sentence its pattern of variation and its displaced range. pitchhaveproducedanovelinstance The present tests furnish two missing pieces corroborating the dominance region hypothesis, which clarify this situation. It is the frequency we have yet to observe a perceptual effect contour of Tone 1 which subjects use to match attributable to linguistic or paralinguistic their impressions of sentence melody, rather than attributes of the sentences. Some accounts of average frequency as such. Our test determined intonation report sentence-level effects based on this by forcing subjects to differentiate the departures from an underlying downdrift pattern, particular pitch contour of Tone 1 from its pitch or from phrase-final fall (Vaissière, 1983). The height and pitch range. Nonetheless, subjects violation of expectations based on typical reported that the precise pitch height of Tone 1 properties of sentences may contribute to the matched the impression of intonation of the apparent oddness of sinusoidal sentence melody, sentence. This was revealed in the second and to difficulties in intelligibility. But, despite experimentwhichrequiredsubjectsto the fact that tone analogs of sentences lack the differentiate frequency-transposed versions offamiliar acoustic properties associated with a Tone 1. Although subjects consistently chose the steady .;indamental, we have not observed any true Tone 1 as the best match to sentence effect of these gross departures in the registration intonation, they also confirmed, implicitly, that of intonation. The pitch patterns of these the familiar relation of frequency to apparent sentences follow the prediction given by the pitch obtains in this instance. dominance region hypothesis. Subsequent studies In implicating basic auditory analyzing may identify precise points of departure between mechanisms to account for the intonation of the impressions of intonation and the periodicity sinusoidal sentences, our claim suggests that of the tone supporting the percept. definite impressions of pitch should typify tonal analogs of spoken syllablesa familiar kind of REFERENCES nonspeech controlwhether or not phonetic Greenberg, S. (1980). Tempor.il neural coding of pitch and vowel segmental attributes are apparent. To take a quality. Working Papers in Phonetics, 52, 1-183. Los Angeles: recent instance, listeners who heard three-tone U. C. L. A. Grunke, M. E., & Pisoni, D. B. (1982). Perceptual learning of analogs of isolated steady-state vowels chose a mirror-image acoustic patterns. Perception & Psychophysics, 31, rough approximation of the first-formant analog in 210-218. matching their auditory impressions of these Kuhl, P. K., Williams, K. A., & Meltzoff, A. N. (1991). Cross-modal signals (Kuhl, Williams & Meltzoff, 1991); that speech perception in adults and infants using nonspeech 40 Remez and Rubin

auditory stimuli. Journal of : Human Remez, R E., Rubin, P. E., Nygaard, L C., & Howell, W. A. (1987). Perception and Performance, 17, 829-840. Perceptual normalization of vowels produced by sinusoidal Markel, J. D., & Gray, A. H., Jr. (1976). Linear prediction of speech. voices. Journal of Experimental Psychology: Human Perception and New York: Springer-Verlag. Performance, 13, 40-61. Plomp, R. (1967). Pitch of complex tones. Journal of the Acoustical Remez, R. E., Rubin, P. E., Pisoni, D. B., & Carrell T. D. (1981). Society of America, 41, 1526-1533. Speech perception without traditional speech cues. Science, 212, Remez, R. E., & Rubin, P. E. (1983). The stream of speech. 947-950. Scandinavian Journal of Psychology, 24, 63-66. Ritsma, R. J. (1967). Frequencies dominant in the perception of the Rernez, R. E., & Rubin, P. E. (1984). On the perception of pitch of complex sounds. Journal of the Acoustical Society of intonation from sinusoidalsentences.Perception & America, 42, 191-198. Psychophysics, 35, 429-440. Rubin, P. E. (1980). Sinewave synthesis. Internal memorandum, Remez, R. E., & Rubin, P. E. (1990). On the perception of speech Haskins Laboratories, New Haven, Connecticut. from time-varying attributes: Contributions of amplitude Vaissiere, J. (1983). Language-independent prosodic features. In variation. Perception & Psychophysics, 48, 313-325. A. Cutler and D. R. Ladd (Eds.), Prosody: Models and Remez, R. E., & Rubin, P. E. (in press). Acoustic shards, measurements (pp. 53-66). New York: Springer, . perceptual glue. In J. Charles-Luce, P. A. Luce and J. R. Ward, W. D. (1970). Musical perception. In J. A. Tobias, (Ed.), Sawusch (Eds.), Theories in Spoken Language: Perception, Foundations of nwdern auditory theory, Vol. 1 (pp. 407-447). New Production, and Development. Norwood, : Ablex York: Academic Press. Press. Remez, R. E., Rubin, P. E., Berns, S. M., Pardo, J. S., & Lang, J. M. FOOTNOTES (in press). On the perceptual organization of speech. *Journal of the Acoustical Society of America, 94, 1983-1988 (1993). Psychological Review. t Department of Psychology, Barnard College, New York.

4 8 Haskins Laboratories Status Report on Speech Research 1993, SR-113, 41-50

The Acquisition ofProsody: Evidence fromFrench- and English-learning Infants*

Andrea G. L-wittt

The reduplicative babbling of fiveFrench- and five English-learning infants,recorded when the infants were between the agesof 7;3 months and 11;1 months on average, was prosodic patterns. Certain fundamental examined for evidence of language-specific the frequency and syllable-timing patterns inthe infants' utterances clearly reflected influence of the ambient language. The evidencefor language-specific influence on syllable amplitudes was less clear. The results arediscussed in terms of a possible order of acquisition for the prosodic features offundamental frequency, timing, and amplitude.

French and the other English, DougWhalen, Qi 1. INTRODUCTION Wang and I have found evidence forthe early Prosody is generally described in termsof three acquisition of certain language-specific prosodic main suprasegmental featuresthat vary in features. These results can be discussed in terms language-specific ways:the fundamental of a possible order of acquisition forlanguage- frequency contours, which give alanguage its specific prosodic features and in terms of evidence characteristic melody; the duration ortiming for possible regression in children's apparent measures, which give alanguage its characteristic sensitivity to prosodic information. rhythm; and the amplitude patterns, whichgive a language its characteristic patterns of loud versus 2. CHILD-DIRECTED SPEECH(CDS) soft syllables. When does the prosodyof infants' In the last twenty-five years or so,researchers utterances begin to show language-specificeffects? have documented the existence of child-di:acted To answer this question it is important firstto speech (CDS), also know as "motherese," a special understand the linguistic environment ofthe style of speech or linguistit register usedwith child, which is characterized by aspecial young first language learners (e.g.,Ferguson et sociolinguistic register called child-directedspeech al., 1986). Most researchers consider CDS tobe (CDS). CDS has marked grammatical aswell as universal (e.g., Fernald et al., 1989; butcf. prosodic characteristics, for which a numberof Bernstein Ratner, & Pye, 1984; Heath, 1983). possible uses have been suggested. It isalso Compared to speech between adults, oradult- important to understand what is knownabout directed speech (ADS), CDS shows bothspecial infants' sensitivity to the three prosodicfeatures grammatical and prosodic features. From a of speech. Since English and French provide very grammatical perspective, CDS consists of shorter, different prosodic models for young infants,they simpler, and more concrete sentences, uses more are thus excellent choicesfor investigating the repetitions, questions, and imperatives and more issue of language-specific prosodic influences on emphatic stress. From a prosodic perspective, infants' utterances. Analyzing the reduplicative CDS includes high pitch, slow rate, exaggerated babbling of two groups of infants, one learning pitch contours, long pauses, increasedfinal- syllable lengthening, and whispering. Some researchers have attributed adults' This work was supported by NIH grant DC00403 to production of the higher pitch and more variable Catherine Best and Haskins Laboratories. We thank the families of our French and American infants for their fundamental frequency of CDS to the preference participation in this research. of very young children for higher pitched sounds

41 4 9 42 Levitt

(Sachs, 1977), whereas others have focused on 3. INFANT RESPONSE TO PROSODY these prosodic characteristics as contributing to Bull and his colleagues (Bull, Eilers, & 011er, the expression of affection (Brown, 1977) or for 1984, 1985; Eilers, Bull, 011er, & Lewis, 1984) attracting the child's attention (Garnica, 1977). have shown that infants in the second half year of More recently, however, some investigators have life can detect changes in each of the three argued for a more linguistically significant role for prosodicparametersunderdiscussion. the prosodic characteristics of CDS. Thus, Researchers have found that infants' response to researchers have variously suggested that the fundamental frequency variation or intonation is prosodic patterns of CDS may help infants in particularly strong. Indeed, infants' strong learning how to identify their native language response to CDS (Fernald,1985) can be (Mehler et al.,1988); to identify important interpreted as a preference on their part for its linguistic information, such as names for special fundamental frequency contours (Fernald unfamiliar objects (Fernald & Mazzie, 1983); and & Kuhl, 1987). In terms of early pitch production, even to parse the syntactic structures of their Kessen, Levine, and Wendrick (1979) found that native language (Hirsh-Pasek et al., 1987). Some infants between 3 and 6 months of age were able of our own current work suggests that the to match with their voices the pitches of certain prosodic features of CDS may also serve to notes, and there have also been reports of young enhance speaker-specific properties of the speech infants being able to match the fundamental signal. frequency contoursof spoken utterances As it turns out, not all of the linguistic features (Lieberman, 1986). Once children have begun to attributed to CDS are present at once. Indeed, speak, they can make communicative use of pitch, certain features are quite unlikely to co-occur. e.g., contrast a request from a label, even at the Other sociolinguistic registers remain relatively one-word stage (Galligan, 1987; Marcos, 1987). stable over time, but CDS does not. In fact, it is Other research has demonstrated that infants characterized by notable systematic changes that show an early perceptual sensitivity to some appear linked to the developmental stage of the specific rhythmic properties of language. For child spoken to (Bernstein Ratner, 1984, 1986; example, it has been shown that very young Malsheen, 1980; Stern, Spieker, Barnett, & infants can discriminate two bisyllabic utterances MacKain, 1983). As do the other features of CDS, the prosodic aspects also appear to change over when they differ in syllable stress (Jusczyk & time. For example, pitch height and the use of Thompson, 1978; Spring & Dale, 1977). Infants whispering are reduced as children grow older could perform this task both when the syllable (Garnica, 1977). There may even be changes in the stress was cued by all three typical prosodic types of fundamental frequency contours that a markers as well as when the stress was cued by child hears over time. A recent study (Papoutek & durationalone(Spring & Dale,1977). Hwang, 1991) has shown that Mandarin CDS Furthermore, Fowler, Smith, and Tassinary prosody, as produced for presyllabic infants, may (1985) found evidence that the basis for infants' even distort the fundamental lexical tones, which perception of speech timing is stress beats, just as are each marked by specific fundamental it is for adults. Relatively little attention has been frequency contours in the adult language. But placed on infants' sensitivity to amplitude or Chinese children do go on to learn the appropriate loudness differences, independently of its role in tones, and indeed our preliminary analyses of stress, except for the work of Bull and his Mandarin CDS, produced to an infant between 9 colleagues (1984), mentioned above. and 11 months of age, suggest that for the older 4. EARLY LANGUAGE-SPECIF1C infant there is considerably less distortion. Even if very early CDS has more universal than PROSODIC INFLUENCES ON language-specific prosodic patterns (Papou.tek, PRODUCTION PapoiLtek, & Symmes, 1991), the CDS addressed Although not all attempts to find support for to older infants, as well as all other forms ofearlylanguage-specificeffectsoninfant speech which young infants are likely to hear, utterances have been successful (see Locke, 1983 provide ample exposure to language-specific for a review), Boysson-Bardies and her colleagues prosodic patterns as well. What is known about were able to find such evidence in their cross- young infants' sensitivity to the prosodic patterns linguistic investigations of infant utterances. For of language? example, using acoustic analysis, they found that

5 0 The Acquisition of Prosody: Evidence front French- and English-learning Infants 43 the vowel formants of 10-month-old infants varied ing in French), final-syllable lengthening is more in ways consistent with the formant-frequency salient in French because French nonfinal sylla- patterns in the adult languages (Boysson-Bardies, bles are not typically lengthened due to word Halle, Sagart, & Durand, 1989). Some of our own stress, as are nonfinal syllables in English. research (Levitt & Utman, 1991), along with results from another study by Boysson-Bardies ligV 300 and her colleagues (Boysson-Bardies, Sagart, & 111 C Durand, 1984), suggested that young infants from different linguistic communities might also show 200 early language-specific prosodic differences, which we decided to explore by comparing the utterances of French-learning and English-learning infants. 100 4.1 Prosodic differences in English and French French and English differ in a number of ways 11 ill 101 11111 on each of the three prosodic features. In terms of fundamental frequency contours, there are several Two- to five- syllable words in French differences, including contour shape (DeIattre, 1965) and incidence of rising contours (Delattre, 1961). It is the difference in the incidence of rising contours that we investigated. Delattre (1961), who analyzed the speech of Simone de Beauvoir and Margaret Mead, found that the French speaker had many more rising continuation 0 contours (93%) than her American counterpart (11%). In our investigation of timing differences between French and English we focused on the syllable level, where we found at least three clear $W WSSWW WSW WW differences: the salience of finalsyllable Two- and three- syllable words In English lengthening, the timing of nonfinal syllables, and the interval between stressed syllables or, in other terms, the typical length of the prosodic word. In 300 Figure 1, we can see the first two rhythmic properties illustrated for French and English. The 200 graphs (taken from Levitt [19911) show syllable 0 timing measures based on reiterant productions by adult native speakers of French and English of 100 words of two to five syllables in the two languages. The native speakers replaced the individual syllables of each word with the syllable/ma/, while 0 111!III III !Ill !!!,1 preserving natural intonation and rhythm. To provide these data, ten native speakers of English Four and five- syllable words in English and ten native speakers of French were asked to Figure 1. Syllable timing measures for words of two- to produce reiterant versions of a series of 30 words five-syllables in French (top panel) and English (bottom in their own language. The top graph represents two panels). timing measures for French words of two to five syllables, the middle graph represents English The second properly, nonfinal syllable timing, is words of two and three syllables with all possible also clearly different for the two languages. stress patterns, and the bottom graph represents French has been classified as syllable-timed, with English words of four and five syllables with a a rhythmic structure known as "isosyllabicity," selection of stress patterns. which is characterized by syllables generally equal The first property, final-syllable lengthening, is in length. However, this description ignores the a more salient feature of French than of English. obvious, important final-syllable lengthening we Although both French and English exhibit final- see in French. On the other hand, aside from the syllable lengthening (breath-group final lengthen- effects of emphatic stress and inherent segmental 44 Levitt length differences, nonfinal syllables in French What sorts of evidence for language-specific generally are equal in length. In Figure 1 in the prosodic structure might we find in the vocal top panel, the nonfinal syllables in French words productions of young infants themselves? of three to five syllables are quite equal in length, whereas English nonfinal syllables (in the bottom two panels) are not because of variable word stress. Finally, the third rhythmic property that we investigated, the length of a prosodic word, here defined as the number of syllables from one stressed syllable to the next, may be expected to differ in English and French, again because of differences in the stress patterns in the two languages. Information about the typical length of POST 0 11 MOSUL 104111 the prosodic word in French and English comes SUMIN43 0.4.VON from studies by Fletcher (1991) and by Crystal and House (1990). Fletcher analyzed the conversational speech of six native speakers of French. Reanalyzing a portion of her data, we found that 56% of the speakers' polysyllabic 4. "prosodic words," which included all unaccented

42 syllables preceding an accented final syllable, 0 11 .1 .1, .8 "7, 04 ." 74 70 IP P.0.22 10 V were 4 or more syllables in length, on average.On the other hand, when we examined similar data Figure 2. Waveforms (showing characteristic amplitude patterns) of French population and its reiterant version from Crystal and House, who had analyzed the (top panel) and English population and its reiterant read speech of six English subjects, we found that version (bottom panel). prosodic words of 4 or more syllables accounted for only six percent of the total, on average. Thus, there is some evidence that interstreJs intervals 4.2 Reduplicative babbling studies or prosodic words tend to be longer in French. In order to investigate whether prosodic differ- How do the amplitude patterns of the two ences in fundamental frequency contour, rhythm, languages differ? Figure 2 shows the waveforms of or amplitude emerged in the vocal productions of the French word "population" with its reiterant French and American infants between the ages of version, spoken by a male native speaker of 5 and 12 months, the babbling of five English- French on top, and the waveforms of the English learning infants (three male and two female) and word "population" with its reiterant version, five French-learning infants (four male and one spoken by a male native speaker of English on the female) was recorded weekly by their parents at bottom. The patterns in Figure 2 are very home. The French-learning infants were recorded representative. Basically, French words tend to in Paris and the English-learning infants were start high in amplitude and generally decline, so recorded in cities in the northeastern United that final syllables, which are systematically States. Recordings began when the infants were longer than nonfinal syllables, tend to be lowest in between 4 and 6 months old and continued until amplitude or loudness. The French reiterant they were between 9 and 17 months old. Each version of "population," on the right, which avoids tape was phonetically transcribed, and all infant loudness variations due to inherent amplitude speechlike vocalizations were digitized for com- differences in different segments, as on the left, puter analysis. The vocalizations were divided looks rather like a Christmas tree on its side. On into utterances, or breath groups, which were de- the other hand, as can be seen from the fined as a sequence of syllables that were sepa- waveforms for the English words, nonfinal rated from adjacent utterances by at least 700 ms stressed syllables in English tend to have greater of silence and which contained no internal silent amplitude than surrounding syllables (as well as periods longer than 450 ms in length. From the greater duration), although there is also a transcribed and digitized utterances, we selected tendency for the last syllable in an English word all the reduplicative babbles, that is, those which to have lower amplitude if it is not stressed. contained the same consonant-like element as well

52 The Acquisition of Prosody: Evidence from French- and English-learning Infants 45 as the same vowel-like element, repeated in an ut- rising intonations in adult French speech terance of two or more syllables, according to our (Delattre, 1961), the reduplicative babbles of our transcriptions. French infants showed a significantly higher inci- Using thesecriteria, we obtained 208 dence of rising FO contours by comparison to those reduplicative utterances, approximately half (102) of our American infants. from the English-learning children and half (106) The results of our acoustic analysis of the from the French-learning infants. (See Table 1, reduplicative babbles also supported our taken from [Levitt & Wang, 1991]). perceptual finding. All of the reduplicative babbles were categorized according to the contour opinion Table 1. Description of the source of the 208 stimuli. of the majority of the listeners and then acoustically analyzed. The contour patterns were then averaged for each language. The mean Ages (in months) patterns for each of the contour types revealed an Ages (in months) at which at which reduplicative Number of appropriate fundamental frequency curve, and recordings babbles were reduplicati ve statistical tests of the fundamental frequency were made detected babbles values also support the finding that F tench infants produced more rising contours. French 4.2.2. Timing Measures. What about timing Infants measure differences in the infants' reduplicative MB 5-11 7-11 24 babbling? We investigated this aspect of the EC 6-12 6-12 42 French-learning and English-learning infants' MS 5-16 7-12 23 production in another recent study (Levitt & IZ 4-9 5- 7 9 Wang,1991).Recallthatfinalsyllable NB 4-14 8-13 8 lengthening is more salient in French, which also American has more regularly timed nonfmal syllables, and Infants longer prosodic words. Using the entire corpus of MA 5-16 8-12 24 208 utterances, we measured each syllable. A MM 5-17 7-12 35 conservative criterion for measuring syllable CR 5-17 9-10 7 length was adopted: the duration as measured AB 5-17 8-11 18 included only the visibly voiced portion of each VB 4-15 7-12 18 syllable. In order to test for final-syllable lengthening, we compared the length of the final 4.2.1. Fundamental Frequency Contours. For our syllables with the penultimate syllables in each analysis of the fundamental frequency contours of utterance. To test for regularity in the timing of the reduplicative babbles of the French and nonfinal syllables, we calculated the mean American infants, we decided to obtain contour standard deviation of the nonfinal syllables in judgments for the reduplicative babbles and to each utterance of three or more syllables, and to analyze them acoustically as well (Whalen, Levitt, test for length of prosodic word, we looked at the & Wang, 1991). First, we asked a group of number of syllabld per utterance per child. experienced listeners to judge whether each infant In Figure 3, the three graphs represent the babble had a falling, a rising, a fall/rise, a rise/fall, results of our investigation of the timing or a flat contour. In order to make the perceptual properties. The top graph shows that the French judgments feasible, we limited our data set to infants had a significantly greater proportion of those reduplicative babbles that were two or three long final s: gables (54%) than did the English- syllables in length. We found both acoustic and learning American infants (29%). In terms of the perceptual evidence for language-specific effects in regularity of nonfinal syllables as shown in the the FO contours of the reduplicative babbles of the middle graph, French infants produced more French- and English-learning infants. regular nonfinal syllables overall, although that Although about 65% of the perceptual judg- difference was not significant. ments made for both the French and the American However, when we analyzed the nonfinal reduplicative babbles were either rise or fall, these syllable timing measurements in terms of an early two categories were about equally divided in the and a late stage of reduplicative babbling judgments of the French babbles, whereas about production for each of the infants, we found a 75% were labelled fall for the American subjects. significant interaction, in that the nonfinal Thus, in agreement with the higher incidence of syllables of the French infants tended to become 46 Leuitt more regular whereas the nonfinal syllables ofthe our French and American infants. Duration English-learning American infants tended to measures for each of the syllables had already become more irregular. Finally, we also found a been obtained. significant difference in the length of prosodic Our results are pictured in the two graphs in words, with the French infants producing Figure 4. As indicated in the top graph, we found considerably more longer utterances (of 4 syllables that, as mentioned earlier, French adults tend to or more) than the Americans, as shownin the produce long final syllables with lowest amplitude bottom graph of Figure 3. (81%) significantly more often than American adults (45%) in their utterances overall [t(8)=3.2, 60 p=.0061, one-tailed], although Americans did show

12111/2rcor4 Lang Rola a similar tendency for long finals with low amplitudes, especially for words without a final- r,-40 syllable stress. The infants showed a similar 03 O 30 pattern of results, with French infants linking J long final syllables with lowest amplitude more 2 often (33%) than English-learning American

43 infants (21%), but this difference in the infant 0. 10 populations was not significant [t(8)=1.3, ns].

French En. 11414 111 Long final/Low amp

Eadi Sap Lab Singe

French Adults French Infants English Adults English Infants Fmnchhints English Weft 100 fil Long nonfinal/Low amp

60 C MI 0 t 2560 =

Portont S hot .... in40 Ptroont Long + 0 CO a) 20 O.

French Adults French Infants English AdultaEnglish Infants Ftench Figure 3. Comparison of French and English infants' Frgure 4. Comparison of duration-linked amplitude syllable timing patterns for final-syllable lengthening patterns for French- and English-speaking adults and (top panel) regularity of nonfinal syllables (middle French- and English-learning infants. The top panel panel), and number of syllables per utterance (bottom shows the typical French pattern and the bottom panel panel). shows the typical English pattern. 4.2.3. Amplitude Measures. What then about the At 'played in the bottom graph of Figure 4, last of the prosodic factors, amplitude or loudness? when we looked at nonfinal syllables in utterances In order to answer this question, we first analyzed of three or more syllables, we found that American adult amplitude patterns in the two languages adults tended to link long nonfinal syllables with from the reiterant speech study mentioned earlier highest amplitude or loudness (80%), significantly (Levitt, 1991). We chose five speakers of each more often than the French adults (45%)[t(8)=4. 1, language, 3 men and 2 women, at random. We p=.0016, one-tailed]. Similarly, American infants measured the peak amplitude in each of the tended to produced their highest amplitudes on reiterant syllables produced by the adults and also the longest nonfinal syllables (57%), whereas the of each of the reduplicated syllables produced by French infants did so less often (41%). This latter

5 4 The Acquisition of Prosody: Evidence from French- and English-Iearning Infants 47 difference between the two groups of infants more variable than those of adults and gradually approached significance ft(8)=1.6,p=.0711]. move towards more adult-like stability as they gain increasing motor control (e.g., Kent, 1976). 5. EVIDENCE FOR PROSODIC Producing regularly-timed syllables would thus he INFLUENCES IN CHILDREN'S LATER considerably more difficult for children than for PRODUCTIONS adults. However, before we relegate the child's control of the amplitude patterns of his/her language to By the age of two, children have already largely the status of prosody's stepchild, we have to keep mastered a number of the syllabic timing in mind that relatively little exploration has been properties of their language. Thus, Allen (1983) done of the infant's sensitivity to language has shown that French children exhibit final- amplitude patterns and that the present results syllable lengthening in polysyllabic words by two dealt with two languages, English and French, for years of age. Although the patterns of final- which differences in the amplitude patterns of lengthening produced by the children were more syllables may he less important in perception than variable than those produced by French adults, are the other prosodic variables. Until more direct the children's median ratios of final to nonfinal tests are undertaken of infants' sensitivities to all syllables were very comparable to those of French of the prosodic features and comparisons are made adults, roughly 1.6:1. Similarly, Smith (1978) has between languages such as English or French, on shown that English-speaking children between the one hand, and Ik, a language spoken in two and three years of age have mastered final- eastern Sudan, which contrasts voiceless and syllable lengthening as well, with a final to presumably low amplitude versus voiced, nonfinal ratio of close to 1.4:1 for both the adults presumably high amplitude vowels, on the other and the children. Some research with two-year-old hand (Maddieson, 1984), our conclusion that children learning tone languages (Li & Thompson, amplitude pattern control is acquired later than 1977) suggests that children can reproduce tonal other prosodic features must be provisional. patterns more accurately than speech segments, although certain tone contours appear easier to 7. EVIDENCE FOR REGRESSION IN acquire than others. PROSODIC LEARNING 6. POSSIBLE ORDER OF ACQUISITION We would also speculate, based on some of our OF PROSODIC FEATURES own findings as well as on suggestions from the Our results, taken together with some of the literature, that infants show a special sensitivity other research concerning infants' early percep- to prosody beginning perhaps at birth and lasting tion (and occasional production) and young chil- until about 9 or 10 months of age, when there may dren's production of certain fundamental fre- be some regression in the child's sensitivity to quency and rhythmic properties, lend support to prosody. It would come, of course, at a time when the notion that infants begin to imitate some of Werker and her colleagues (e.g., Werker, 1989; the prosodic properties of their native languages Werker & Lalonde, 1988; Werker & Tees, 1984) before they fully master its segmental properties. and Best and her colleagues (Best, in press; Best, Specifically, it would appear that the more global McRoberts, & Sithole, 1988) have shown that properties of fundamental frequency and syllabic there is a shift in infants' phonetic perception of timing are acquired before amplitude patterns. some nonnative segmental contrasts as their focus Within each prosodic domain, there also appears turns to learning the words of their native to be some evidence for a learning sequence. Li angu age. and Thompson (1977), as noted above, found that Our evidence comes from both perception and children learning Mandarin acquired some tone production studies of prosody. Recently Catherine contours, which are based on fundamental fre- Best, Gerald McRoberts, and I (1991) investigated quency, earlier than others. Similarly, our results the ability of infants who were either 2-4, 6-8, or on the acquisition of syllable timing suggest an 10-12 months old to discriminate a prosodic early beginning for children's development of con- contrast (questions versus statements) in English trol of final-syllable lengthening and of utterance (their native language) and in Spanish, when length, whereas acquiring the regular timing of there was segmental variation across the tokens nonfinal syllables in French appears more diffi- representing statement and question types. The cult. Children's vocal productions are notably finding of interest is the comparison between the

55 48 Levitt

6-8 month olds, who discriminated the prosodic Best, C., McRoberts, G., & Sithole, N. (1988). The phonological basis of perceptual loss for non-native contrasts: Maintenance contrast in both languages, and the 10-12 month of discrimination among Zulu clicks by English-speaking olds, who failed to discriminate the questions from adults and infants. journal of Experimental Psychology: Human the statements in both Spanish and English. Perception and Performance, 14, 345-360. Another study, by D'Odorico and Franco (1991), Boysson-Bardies, B. de, Sagart, L., & Durand, C. (1984). which looked at infants' production of specific, Discernible differences in the babbling of infants according to target language. Journal of Child Language, 22, 1-15. prosodically-defined vocalization types in different Boysson-Bardies, B. de, Halle, P. Sagan, L., & Durand, C. (1989). communicative contexts, also suggested some A crosslinguistic investigation of vowel formants in babbling. evidence of decline toward the end of the first Journal of Child Language, 16, 1-17. year. Apparently, the infants stop using these Brown, R. (1977). Introduction. ln C. Snow & C. Ferguson (Eds.). idiosyncratic, context-determined vocalizations at Talking to children: Language input and acquisition. Cambridge: Cambridge University Press. around 9 months of age. Finally, we also found a Bull, D., Eilers, E., & 011er, D. (1984). Infants' discrimination of tendency, though not significant, for some infants intensity variation in multisyllabic contexts. Journal of the in our production study to produce less consistent Acoustical Society of America, 76, 1-13 final syllable lengthening as they began to Bull, D., Eilers, E., & 011er, D. (1985). Infants' discrimination of produce their first words (Levitt & Wang, 1991). final syllable fundamental frequency in multisyllabic stimuli. journal of the Acoustical Society of America, 77, 289-295. Although the evidence is very preliminary, Crystal, T., & House, A. (1990). Articulation rate and the duration the period beginning at 10 months and extending of syllables and stress groups in connected speech. Journal of the until some time before the second birthday, Acoustical Society of America, 88, 101-112. may be marked by some "regression" in young Delattre, P. (1961). La lecon d'intonation de Simone de Beauvoir, infants' perception and production of prosodic etude d'intonation declarative comparee. The French Review, 35, 59-67. information. Delattre, P. (1965). Comparing the phonetic features of English, French, German and Spanish: An interim report. Philadelphia: 8. CONCLUSION Chilton. It is important to remember that children learn D'Odorico, L., & Franco, F. (1991). Selective production of vocalization types in different communicative contexts. Journal much more from prosody than its language-spe- of Child Language, 18, 475499. cific characteristics. Prosody has a number of par- Eilers, E.,Bull, D., 011er, D., & Lewis, D. (1984). The alinguistic functions so that it teaches, for exam- discrimination of vowel duration by infants. Journal of the ple, about turn taking (e.g., Schaffer, 1983) and Acoustical Society of America, 75, 213-218. about the expression of as well (e.g., Ferguson, C. (1964). Baby talk in six languages. American Anthropologist, 66,103-114. Scherer, Ladd, & Silverman, 1984). In addition to Ferguson, C. (1978) Talking to children: A search for universals. In language-specific and paralinguistic function, J. Greenberg, C. Ferguson, & E. Moravcsik (Eds.), Universals of prosody also serves some strictly grammatical lin- human language (pp. 203-2240). Stanford: Stanford University guistic functions, such as distinguishing between Press. questions and statements or between words in a Fernald, A. (1985). Four-month-old infants prefer to listen to motherese. Infant behavior and development, 8, 181 - 195. tone language. Mapping out the complete path by Fernald, A., & Kuhi, P. (1987). Acoustic determinants of infant which children acquire the paralinguistic, lan- preference for motherese speech. Infant Behavior and guage-specific, and grammatically significant Development, 8, 279-293. prosodic patterns of their native languages, begin- Fernald, A., & Mazzie, C. (1983, April). Pitch marking of new and ning from what appears to be quite an early start, old information in mothers' speech. Paper presented at the has just begun. meeting of the Society for Research in Child Development, Detroit. Fernald, A., Taeschner, T., Dunn, J. Papoulek, M., Boysson- REFERENCES Bardies, B. de, & Fukui, I. (1989). A cross-language study of Allen, G. (1983). Some suprasegmental contours in French two- prosodic modifications in mothers' and fathers' speech to year-old children's speech. Phonetica, 40, 269-292. preverbal infants. Journal of Child Language, 26, 477-501. Bernstein Ratner, N. (1984). Patterns of vowel modification in Fletcher, J. (1991). Rhythm and final lengthening in French. Journal mother-child speech. Journal of Child Language, 11, 557-578. of Phonetics, 19, 193-212. Bernstein Ratner, N. (1986). Durational cues which mark clause Fowler, C., Smith, M., & Tassinary, L. (1985). Perception of boundaries in mother-child speech. Journal of Phonetics, 14, 303- syllable timing by prebabbling infants. Journal of the Acoustical 309. Society of America, 79, 814-825. Bernstein Ratner, N., & Pye, C. (1984). Higher pitch in BT is not Galligan, R. (1987). Intonation with single words: Purposive and universal: Acoustic evidence from Quiche Mayan. Journal of grammatical use. Journal of Child Language, 14, 1-21. Child Language, 2, 515-522. Garnica, 0. (1977). On some prosodic and paralinguistic features Best, C., Levitt, A., & McRoberts, G. (1991). Examination of of speech to young children. In C. Snow & C. Ferguson (Eds.), language-specific influences in infants' discrimination of Talking to children: Language input and acquisition. Cambridge: prosodic categones. Actes du Xllerne Congres International des Cambridge University Press. Sciences Phonetiques (pp. 162-165). Aix-en-Provence, France: Haynes, L., & Cooper, R. (1986). A note on Ferguson's proposed Université de Provence Service des Publications. baby-talk universals. The Fergusonian impact: Papers in Honor

fl,:r 1/477 5 6 0 LHII The Acquisition of Prosody: Evidence_from French- and English-learning Infants 49

of Charles A. Ferguson on the Occasion of his 65th Birthday, registers of adult conversation and foreign language Berlin: Mouton de Gruyter. instruction. Applied Psycholinguistics, 12, 481-504. Heath, S. B. (1983). Ways with words. Cambridge: Cambridge Papoutek, M., Papoutek, H., & Symmes, D., (1991). The meaning University Press. of melodies in motherese in tone and stress languages. Infant Hirsh-Pasek, K. Kemler-Nelson, D. G Jusczyk, P. W., Wright, K., Behavior and Development, 14, 415.1440. Druss, B., & Kennedy, L. (1987). Clauses are perceptual units Sachs, J. (1977). The adaptive significance of linguistic input to for young infants. Cognition, 26, 269-286. prelinguistic infants. In C. Snow & C. Ferguson, (Eds.), Talking Jusczyk, P., & Thompson, E. (1978). Perception of a phonetic to children: Language input and acquisition. Cambridge: contrast in multisyllabic utterances by 2-month-oldinfants. Cambridge University Press. Schaffer, D. (1983). The role of intonation as a cue to turn taking in Perception and Psychophysics, 23, 105-109. Kent, R. (1976). Anatomical and neuromuscular maturationof the conversation. Journal of Phonetics, 11, 243-257. speech mechanism: Evidence from acoustic studies. Journal of Scherer, K., Ladd, D., & Silverman, K. (1984). Vocal cues to Speech and Hearing Research. :9, 421-445. speaker affect: Testing two models. Journal of the Acoustical Kessen, W., Levine, J., & Wendrick, K. (1979). The imitation of Society of America, 76,1346-1356. pitch in infants. Infant Behavior and Development, 2, 93-100. Smith, B. (1978). Temporal aspects of English speech production: Levitt, A. (1991). Reiterant speech as a test of nonnative speakers' A developmental perspective. Journal of Phonetics, 6, 37-67. mastery of the timing of French. Journal of the Acoustical Society Spring, D., & Dale, P. (1977). The discrimination of linguistic stress in early infancy. Journal of Speech and Hearing Research,20, of America, 90, 3008-3018. Levitt, A., & Utman, J. (1991). From babbling towards the sound 224-231. systems of English and French: A longitudinal two-case study. Stern, D. N., Spieker, S., Barnett, R. K., & MacKain, K. (1983). The Journal of Child Language, 19, 19-49. prosody of maternal speech: Infant age and context-related Levitt, A., & Wang, Q. (1991). Evidence for language-specific changes. Journal of Child Language, 10, 1-15. rhythmic influences in the reduplicative babbling of French- Werker, J. F. (1989). Becoming a native listener. American Scientist, and English-learning infants. Language and Speech, 34, 235-249. 77, 54-59. Li, C., & Thompson, S. (1977). 'The acquisition of tone in Werker, J. F., & Lalonde, C. E. (1988). Cross-language speech Mandarin-speaking children. Journal of Child Language, 4, 185- perception: Initial capabilities and developmental change. 199. Developmental Psychology, 24, 672-683. Locke, J. L. (1983). Phonological acquisition and change. New York: Werker, J., & Tees, R. C. (1984). Cross-language speech Academic Press. perception: Evidence for perceptual reorganization during the Maddieson, 1. (1984). Patterns of sounds. New York: Cambridge first year of life. Infant Behavior and Development, 7, 49-63. University Press. Whalen, D. H., Levitt, A., & Wang, Q. (1991). Intonational Malsheen, B. (1980). Two hypotheses for phonetic clarification in differences between the reduplicative babbling of French- and the speech of mothers to children. In G. Yeni-Komshian, J. F. English-learning infants. Journal of Child Language, 18, 501-516. Kavanagh, & C.A. Ferguson (Eds.), Child phonology: Vol. 1. Perception. New York: Academic Press. FOOTNOTES Marcos, H. (1987). Communicative functions of pitch range and pitch direction in infants. Journal of Child Language, 14, 255-268. *Appears in Proceedings of the NATO Advanced Research Mehler, J., Jusczyk. P., Larnbertz, G., Halsted, N., Bertoncini, J., & Workshop on Changes in Speech and Face Processing in Infancy: Amiel-Tison, C. (1988). A precursor of language acquisition in A Glimpse at Developmental Mechanisms of Cognition. Carry- young infants. Cognition, 29, 143-178. le-Rouet, France, June 29-July3,1992. Dordrecht, The Papousek, M., & Hwang, S.-F. (1991). Tone and intonation in Netherlands: Kluwer Academic Publishers. Mandarin babytalk to presyllabic infants: Comparison with t Also Wellesley College, Wellesley. Haskins Laboratories Status Report on Speech Research 1993, SR-173, 51-62

Dynamics and Articulatory Phonology*

Catherine P. Browman and Louis Goldsteint

1. INTRODUCTION The same description also provides an intrinsic specification of the high dimensional properties of Traditionally, the study of human speech and its the act (its various mechanical and bio-mechanical patterning has been approached in two different consequences). ways. One way has been to consider it as mechan- In this chapter, we will briefly examine the na- ical or bio-mechanical activity (e.g., of articulators ture of the low and high dimensional descriptions or air molecules or cochlear hair cells) that of speech, and contrast the dynamical perspective changes continuously in time. The other way has that unifies these to other approaches in which been to consider it as a linguistic (or cognitive) they are separated as properties of mind and body. structure consisting of a sequence of elements We will then review some of the basic assump- chosen from a closed inventory. Development of tions and results of developing a specific model in- the tools required to describe speech in one or the corporating dynamical units, and illustrate how it other of these approaches has proceeded largely in provides both low and high dimensional descrip- parallel, with one hardly informing the other at all tions. (some notable exceptions are discussed below). As a result, speech has been seen as having two 2. DIMENSIONALITY OF structures, one considered physical, and the other DESCRIPTION cognitive, where the relation between the two Human speech events can be seen as quite structures is generally not an intrinsic part of ei- complex, in the sense that an individual utterance ther description. From this perspective, a com- follows a continuous trajectory through a space plete picture requires 'translating' between the in- defined by a large number of potential degrees of trinsically incommensurate domains (as argued by freedom, or dimensions. This is true whether the Fowler, Rubin, Remez, & Turvey, 1980). dimensions are neural, articulatory, acoustic, The research we have been pursuing (Browman aerodynamic, auditory, or other ways of describing & Goldstein,1986;1989;1990a,b;1992) the event. The fundamental insight of phonology, ('articulatory phonology') begins with the very however, is that the pronunciation of the words in different assumption that these apparently a given language may differ from (that is, contrast different domains are, in fact, the low and high with) each other in only a restricted number of dimensional descriptions of a single (complex) ways: the number of degrees of freedom actually system. Crucial to this approach is identification employed in this contrastive behavior is far fewer of phonological units with dynamically specified than the number that is mechanically available. units of articulatory action, called gestures. This insight has taken the form of the hypothesis Thus, an utterance is described as an act that that words can be decomposed into a small can be deccmposed into a small number number of primitive units (usually far fewer than of primitive units (a low dimensional description), one hundred in a given language) which can in a particular spatio-temporal configuration. combine in different ways to form the large number of words required in human lexicons. Thus, as argued by Kelso, Saltzman, and Tuller (1986), human speech is characterized not only by This work was upported by NSF grant DBS-9112198 and NIH grants HD-01994 and DC-00121 to Haskins Laboratories. a high number of potential (microscopic) degrees Thanks to Alice Faber and Jeff Shaw for comments on an of freedom, but also by a low dimensional earlier version. (macroscopic) form. This macroscopic form is

51

5 8 52 Browman and Goldstein usually called the 'phonological' form. As will be A major approach that did take seriously the suggested below, this collapse of degrees of goal of unifying the cognitive and physical aspects freedom can possibly be understood as an instance of speech description was that in the Sound of the kind of self-organization found in other Pattern of English (Chomsky & Halle, 1968), in- complex systems in nature (Haken, 1977; cluding the associated work on the development of Kauffina.nn, 1991; Kugler & Turvey, 1987; Madore the theory of distinctive features (Jakobson, Fant, & Freedman, 1987; Schoner & Kelso, 1988). & Halle, 1951) and the quantal relations that Historically, however, the gross differences underlie them (Stevens, 1972, 1989). In this ap- between the macroscopic and microscopic scales of proach, an utterance is assigned two representa- description have led researchers to ignore one or tions: a 'phonological' one, whose goal is to de- the other description, ur to assert its irrelevance, scribe how the utterance functions with respect to and hence to generally separate the cognitive and contrast and patterns of alternation, and a the physical. Anderson (1974) describes how the 'phonetic' one, whose goal is to account for the development of tools in the nineteenth and early grammatically determined physical properties of twentieth centuries led to the quantification of the utterance. Crucially, however, the relation more and more details of the speech signal,but between the representations is quite constrained: "with such increasingly precise description, both descriptions employ exactly the same set of however, came the realization that much of it was dimensions (the features). The phonological repre- irrelevant to the central tasks of linguistic sentation is coarser in that features may take on science" (p.4). Indeed, the development of many only binary values, while the phonetic representa- early phonological theories (e.g., those of tion is more fine-grained, with the features having Saussure, Trubetzkoy,Sapir,Bloomfield) scalar values. However, a principled relation be- proceeded largely without any substantive tween the binary values and the scales is also pro- investigation of the measurable properties of the vided: Stevens' quantal theory attempts to show speech event at all (although Anderson notes how the potential continuum of scalar feature val- Bloomfield's insistence that the smallest ues can be intrinsically partitioned into categori- phonological units must ultimately be defined in cal regions, when the mapping from articulatory terms of some measurable properties of the speech dimensions to auditory properties is considered. signal). In general, what was seen as important Further, the existence of such quantal relations is about phonological units was their function, their used to explain why languages employ these par- ability to distinguish utterances. ticular features in the first place. A particularly telling insight into this view of Problems raised with this approach to speech the lack of relation between the phonological and description soon led to its abandonment, however. physical descriptions can be seen in Hockett's One problem is that its phonetic representations (1955) familiar Easter egg analogy. The structure were shown to be inadequate to capture certain serving to distinguish utterances (for Hockett, a systematicphysicaldifferencesbetween sequence of letter-sized phonological units called utterances in different languages (Keating, 1985; phonemes) was viewed as a row of colored, but Ladefoged, 1980; Port, 1981). The scales used in unboiled, easter eggs on a moving belt. The the phonetic representations are themselves of physical structure (for Hockett, the acoustic reduced dimensionality, when compared to a signal) was imagined to be the result of running complete physical description of utterances. the belt through a wringer, effectively smashing Chomsky and Halle hypothesized that such the eggs and intermixing them. It is quite striking further details could be supplied by universal that, in this analogy, the cognitive structure of the rules. However, the above authors (also Browman speech event cannot be seen in the gooey mess & Goldstein, 1986) argued that this would not itself. For Hockett, the only way the hearer can workthe same phonetic representation (in the respond to the event is to infer (on the basis of Chomsky and Halle sense) can have different obscured evidence, and knowledge of possible egg physical properties in different languages. Thus, sequences) what sequence of eggs might have been more of the physical detail ( and particularly responsible for the mess. It is clear that in this details having to do with timing) would have to be view, the relation between cognitive and physical specified as part of the description of a particular descriptions is neither systematic nor particularly language. Ladefoged's (1980) argument cut even interesting. The descriptions share color as an deeper. He argued that there is a system of scales important attribute, but beyond that there is little that is useful for characterizing the measurable relation. articulatory and acoustic properties of utterances, Dynamics and Articulatory Phonology 53 but that these scales are very different from the units arise from, or are constrained by, the micro- features proposed by Chomsky and Halle. scopic, i.e., the detailed properties of speech ar- One response to these failings has been to ticulation and the relations among speech articu- hypothesize that descriptions of speech should lation, aerodynamics, acoustics, and audition (e.g., include, in addition to phonological rules of the Lindblom, MacNeillage, & Studdert-Kennedy, usual sort, rules that take (cognitive) phonological 1983; Ohala, 1983; Stevens, 1972; 1989). A second representations as input and convert them to line has shown that there are constraints running physical parameterizations of various sorts. These inthe oppositedirection, such that the rules have been described as rules of 'phonetic (microscopic) detailed articulatory or acoustic implementation' (e.g., Keating, 1985; Keating, properties of particular phonological units are de- 1990; Klatt, 1976; Liberman & Pierrehumbert, termined, in part, by the macroscopic system of 1984; Pierrehumbert, 1990; Port, 1981). Note that contrast and combination found in a particular in this view, the description of speech is divided language (e.g., Keating, 1990; Ladefoged, 1982; into two separate domains, involving distinct Manuel & Krakow, 1984; Wood, 1982). The appar- types of representations: the phonological or ent existence of this bi-directionality is of consid- cognitive structure and the phonetic or physical erable interest, because recent studies of the structure. This explicit partitioning of the speech generic properties of complex physical systems side of linguistic structure into separate phonetic have demonstrated that reciprocal constrairt be- and phonological components which employ tween macroscopic and microscopic scaless a distinct data types that are related to one another hallmark of systems displaying 'self-organization' only through rules of phonetic implementation (or (Kugler & Turvey, 1987; see also discussions by 'interpretation') has stimulated a good deal of Langton in Lewin, 1992 [pp. 12-14; 188-191], and research (e.g., Cohn, 1990; Coleman, 1992; work on the emergent properties of "co-evolving" Fourakis & Port, 1986; Keating, 1988; Liberman complex systems: Hogeweg, 1989; Kauffman, & Pierrehumbert, 1984). However, there is a 1989; Kauffman & Johnsen, 1991; Packard, 1989). major price to be paid for drawing such a strict Such self-organizing systems (hypothesized as separation: it becomes very easy to view phonetic underlying such diverse phenomena as the con- and phonological (physical and cognitive) struction of insect nests and evolutionary and structures as essentially independent of one ecological dynamics) display the property that the another, with no interaction or mutual constraint. 'local' interactions among a large number of mi- As Clements (1992) describes the problem: "The croscopic system components can lead to emergent resultisthat the relation between the patterns of 'global' organization and order. The phonological and phonetic components is quite emergent global organization also places con- unconstrained. Since there is little resemblance straints on the components and their local between them, it does not matter very much for interactions. Thus, self-organization provides a the purposes of phonetic interpretation what the principled linkage between descriptions of form of the phonological input is; virtually any different dimensionality of the same system: the phonological description can serve its purposes high-dimensional description (with many degrees equally well. (p. 192)" Yet, there is a cot- strained of freedom) of the local interactions and the low- relation between the cognitive and physical dimensional description (with few degrees of structures of speech, which is what drove the freedom) of the emergent global patterns. From development of feature theory in the first place. this point of view, then, speech can be viewed as a In our view, the relation between the physical single complex system (with low-dimensional and cognitive, i.e. phonetic and phonological, as- macroscopic and high-dimensional microscopic pects of speech is inherently constrained by their properties)rather than as two distinct being simply two levels of descriptionthe micro- components. scopic and macroscopicof the same system. A different recent attempt to articulate the na- Moreover, we have argued that the relation be- ture of the constraints holding between the cogni- tween microscopic and macroscopic properties of tive and physical structures can be found in speech is one of mutual or reciprocal constraint Pierrehumbert (1990), in which the relation be- (Browman & Goldstein, 1990b). As we elaborated tween the structures is argued to be a 'semantic' there, the existence of such reciprocity is sup- one, parallel to the relation that obtains between ported by two different lines of research. One line concepts and their real world denotations. In this has attempted to show how the macroscopic prop- view, macroscopic structure is constrained by the erties of contrast and combination of phonological microscopic properties of speech and by the prin-

GO 54 Brourrnan and Goldstein ciples guiding human cognitive category-forma- As will be elaborated below, contrast among tion. However, the view fails to account for the utterances can be defined in terms of these apparent bi-directionality of the constraints. That gestural constellations. Thus, these structures can is, there is no possibility of constraining the mi- capture the low-dimensional properties of croscopic properties of speech by its macroscopic utterances. In addition, because each gesture is properties in this view. (For a discussion of possi- defined as a dynamical system, no rules of ble limitations to a dynamic approach to phonol- implementation are required to characterize the ogy, see Pierrehumbert & Pierrehumbert, 1990). high-dimensional properties of the utterance. A The 'articulatory phonology' that we have been time-varying pattern of articulator motion (and its developing (e.g., Browman & Goldstein, 1986, resulting acoustic consequences) is lawfully 1989, 1992) attempts to understand phonology entailed by the dynamical systems themselves (the cognitive) as the low-dimensional macroscopic they are self-implementing. Moreover, these time- description of a physical system. In this work, varying patterns automatically display the rather than rejecting Chomsky and Halle's property of context dependence (which is constrained relation between the physical and ubiquitous in the high dimensional description of cognitive,as the phonetic implementation speech) even though the gestures are defined in a approaches have done, we have, if anything, context-independent fashion. The nature of the increased the hypothesized tightness of that articulatory dimensions along which the relation by using the concept of different individual dynamical units are defined allows this dimensionality. We have surmised that the context dependence to emerge lawfully. problem with the program proposed by Chomsky The articulatory phonology approach has been and Halle was instead in their choice of the incorporated into a computational system being elementary units of the system. In particular, we developed at Haskins Laboratories (Browman & have argued that it is wrong to assume that the Goldstein, 1990a,c; Browman, Goldstein, Kelso, elementary units are (1) static, (2) neutral Rubin, & Saltzman, 1984; Saltzman, 1986; between articulation and acoustics, and (3) Saltzman, & Munhall, 1989). In this system, arranged in non-overlapping chunks. Assumptions illustrated in Figure 1, utterances are organized (1) and (3) have been argued against by Fowler et ensembles (or constellations)ofunitsof al. (1980), and (3) has also been rejected by most articulatory action called gestures. Each gesture is of the work in 'nonlinear' phonology over the past modeled as a dynamical system that characterizes 15 years. Assumption (2) has been, at least the formation (and release) of a local constriction partially, rejected in the 'active articulator' within the vocal tract (the gesture's functional version of 'feature geometry'Halle (1982), Sagey goal or etask'). For example, the word "ban" begins (1986), McCarthy (1988). with a gesture whose task islip closure.

3. GESTURES Intended output Articulatory phonology takes seriously the view utterance speech that the units of speech production are actions, and therefore that (1) they are dynamic, not static. Further, since articulatory phonology considers phonological functions such as contrast to be low- LINGUISTIC TASK VOCAL dimensional, macroscopic descriptions of such GESTURAL DYNAMIC TRACT actions, the basic units are (2) not neutral MODEL MODEL MODEL between articulation and acoustics, but rather are articulatory in nature. Thus, in articulatory phonology, the basic phonological unit is the articulatory gesture, which is defined as a

dynamical system specified with a characteristic C TRARTICULATORYAJECTORIES set of parameter values (see Saltzman, in press). Finally, because the tasks are distributed across the various articulator sets of the vocal tract (the Figure 1. Computational system for generating speech lips, tongue, glottis, velum, etc.), an utterance is using dynamically-defined articulatory gestures. modeled as an ensemble, or constellation, of a small number of (3) potentially overlapping The formation of this constriction entails a change gestural units. in the distance between upper and lower lips (or

61 Dipiamics and Articulatory Phonology 55

Lip Aperture) over time. This change is modeled two paired tract variable regimes are specified, using a second order system (a 'point attractor,' one controlling the constriction degree of a par- Abraham & Shaw, 1982), specified with particular ticular structure, the other its constriction loca- values for the equilibrium position and stiffness tion (a tract variable regime consists of a set of parameters. (Damping is, for the most part, values for the dynamic parameters of stiffness, assumed to be critical, so that the system equilibrium position, and damping ratio). Thus, approaches its equilibrium position and doesn't the specification for an oral gesture includes an overshoot it). During the activation interval for equilibrium position, or goal, for each of two tract this gesture, the equilibrium position for Lip variables, as well as a stiffness (which is currently Aperture is set to the goal value for lip closure; the yoked across the two tract variables). Each func- stiffness setting, combined with the damping, tional goal for a gesture is achieved by the coordi- determines the amount of time it will take for the nated action of a set of articulators, that is, a co- system to get close to the goal of lip closure. ordinative structure (Fowler et ai., 1980; Kelso, The set of task or tract variables currently im- Saltzman, & Tuller, 1986; Saltzman, 1986; plemented in the computational model are listed Turvey, 1977); the sets of articulators used for at the top left of Figure 2, and the sagittal vocal each of the tract variables are shown on the top tract shape below illustrates their geometric defi- right of Figure 2, with the articulators indicated nitions. This set of tract variables is hypothesized on the outline of the vocal tract model below. Note to be sufficient for characterizing most of the ges- that the same articulators are shared by both of tures of English (exceptions involve the details of the paired oral tract variables, so that altogether characteristic shaping of constrictions, see there are five distinct articulator sets, or coordi- Browman & Goldstein, 1989). For oral gestures, native structure types, in the system.

tract variable articulators involved LP lip protrusion upper & lower lips, jaw LA lip aperture upper & lower lips, jaw

TTCL tongue tip constrict location tongue tip, tongue body, jaw TTCD tongue tip constrict degree tongue tip, tongue body, jaw

TBCL tongue body constrict location tongue body, jaw TBCD tongue body constrict degree tongue body, jaw

VEL velic aperture velum

GLO glottal aperture glottis

*-TBCL?-+ tongue t p upper lip velum TTCL lower lip VEL TBCD law tongue body center

glottis

Figure 2. Tract variables and their associated articulators. 56 Brownian and Goldstein

In the computational system the articulators are those approaches are not identical to the tract those of a vocal tract model (Rubin, Baer, & variable dimensions. These approaches can be Mermelstein, 1981) that can generate speech seen as accounting for how microscopic continua waveforms from a specification of the positions of are partitioned into a small number of macro- individual articulators. When a dynamical system scopic categories. (or pair of them) corresponding to a particular ges- The physical properties of a given phonological ture is imposed on the vocal tract, the task-dy- unit vary considerably depending on its context namic model (Saltzman, in press; Saltzman, 1986; (e.g., Kent & Minifie, 1977; ohman, 1966; Saltzman & Kelso, 1987; Saltzman & Munhall, Liberman, Cooper, Shankweiler, & Studdert- 1989) calculates the time-varying trajectories of Kennedy, 1967). Much of this context dependence the individual articulators comprising that emerges lawfully from the use of task dynamics. coordinative structure, based on the information An example of this kind of context dependence in about values of the dynamic parameters, etc, lip closure gestures can be seen in the fact that contained initsinput. These articulator the three independent articulators that can trajectories are input to the vocal tract model contribute to closing the lips (upper lip, lower lip, which then calculates the resulting global vocal and jaw) do so to different extents as a function of tract shape, area function, transfer function, and the vowel environment in which the lip closure is speech waveform (see Figure 1). produced (Macchi, 1988; Sussman, MacNeilage, & Defining gestures dynamically can provide a Hanson, 1973). The value of lip aperture achieved, principled link between macroscopic and however, remains relatively invariant no matter microscopic properties of speech. To illustrate what the vowel context. In the task-dynamic some of the ways in which this is true, consider model, thearticulatorvariationresults the example of lip closure. The values of the automatically from the fact that the lip closure dynamic parameters associated with a lip closure gesture is modeled as a coordinative structure that gesture are macroscopic properties that define it links the movements of the three articulators in as a phonological unit and allow it to contrast achieving the lip closure task. The gesture is with other gestures such as the narrowing gesture specified invariantly in terms of the tract variable for [w]. These values are definitional, and remain of lip aperture, but the closing action is invariant as long as the gesture is active. At the distributed across component articulators in a same time, however, the gesture intrinsically context-dependent way. For example, in an utter- specifies the (microscopic) patterns of continuous ance like [ibi], the lip closure is produced concur- change that the lips can exhibit over time. These rently with the tongue gesture for a high front changes emerge as the lawful consequences of the vowel. This vowel gesture will tend to raise the dynamical system, its parameters, and the initial jaw, and thus, less activity of the upper and lower conditions. Thus, dynamically defined gestures lips will be required to effect the lip closure goal provide a lawful link between macroscopic and than in an utterance like [aba]. These microscopic microscopic properties. variations emerge lawfully from the task dynamic While tract variable goals are specified numeri- specification of the gestures, combined with the cally, and in principle could take on any real fact of overlap (Kelso, Saltzman, & Tuller, 1986; value, the actual values used to specify the ges- Saltzman & Munhall, 1989). tures of Engliah.in the model cluster in narrow ranges that correspond to contrastive categories: 4. GESTURAL STRUCTURES for example, in the case of constriction degree, During the act of talking, more than one gesture different ranges are found for gestures that is activated, sometimes sequentially and some- correspond to what are usually referred to as times in an overlapping fashion. Recurrent pat- stops,fricatives and approximants. Thus, terns of gestures are considered to be organized paradigmatic comparison(oradensity into gestural constellations. In the computational distribution) of the numerical specifications of all model (see Figure 1), the linguistic gestural model English gestures would reveal a macroscopic determines the relevant constellations for any ar- structure of contrastive categories. The existence bitrary input utterance, including the phasing of of such narrow ranges is predicted by approaches the gestures. That is, a constellation of gestures is such as the quantal theory (e.g., Stevens, 1989) a set of gestures that are coordinated with one an- and the theory of adaptive dispersion (e.g., other by means of phasing, where for this purpose Lindblom, MacNeilage, & Studdert-Kennedy, (and this purpose only), the dynamical regime for 1983), although the dimensions investigated in each gesture is treated as if it were a cycle of an Dynamics and Articulatmy Phonology undamped system with the same stiffness as the phase relations among the gestures, to calculate a actual regime. In this way, any characteristic gestural score that specifies the temporal activa- point in the motion of the system can be identified tion intervals for each gesture in an utterance. with a phase of this virtual cycle. For example, the One form of this gestural score for "pawn" is movement onset of a gesture is at phase 0 degrees, shown in Figure 3b, with the horizontal extent of while the achievement of the constriction goal (the each box indicating its activation interval, and the point at which the critically damped system gets lines between boxes indicating which gesture is sufficiently close to the equilibrium position) oc- phased with respect to which other gesture(s), as curs at phase 240 degrees. Pairs of gestures are before. Note that there is substantial overlap coordinated by specifying the phases of the two among the gestures. This kind of overlap can re- gestures that are synchronous. For example, two sult in certain types of context dependence in the gestures could be phased so that their movement articulatory trajectories of the invariantly speci- onsets are synchronous (0 degrees phased to 0 de- fied gestures. In addition, overlap can cause the grees), or so that the movement onset of one is kinds of acoustic variation that have been tradi- phased to the goal achievement of another (0 de- tionally described as allophonic variation. For ex- grees phased to 240 degrees), etc. Generalizations ample in this case, note the substantial overlap that characterize some phase relations in the ges- between the velic lowering gesture (velum {wide}) tural constellations of English words are proposed and the gesture for the vowel (tongue body in Browman and Goldstein (1990c). As is the case (narrow pharp. This will result in an interval of for the values of the dynamic parameters, values time during which the velo-pharyngeal port is of the synchronized phases also appear to cluster open and the vocal tract is in position for the in narrow ranges, with onset of movement (0 de- vowelthat is, a nasalized vowel. Traditionally, grees) and achievement of goal (240 degrees) being the fact of nasalization has been represented by a the most common (Browman & Goldstein, 1990a). rule that changes an oral vowel into a nasalized An example of a gestural constellation (for the one before a (final) nasal consonant. But viewed in word "pawn" as pronounced with the back un- terms of gestural constellations, this nasalization rounded vowel characteristic of much of the U.S.) is just the lawful consequence of how the individ- is shown in Figure 3a, which gives an idea of the ual gestures are coordinated. The vowel gesture kind of information contained in the gestural dic- itself hasn't changed in any way: it has the same tionary. Each row, or tier, shows the gestures that specification in this word and in the word "pawed" control the distinct articulator sets: velum, tongue (which is not nasalized). tip, tongue body, lips, and glottis. The gestures are The parameter valuespecifications and represented here by descriptors, each of which activation intervals from the gestural score are stands for a numerical equilibrium position value input to the task dynamic model (Figure 1), which assigned to a tract variable. In the case or the oral calculates the time-varying response of the tract gestures, there are two descriptors, one for each of variables and component articulators to the the paired tract variables. For example, for the imposition of the dynamical regimes defined tongue tip gesture labelled Iclo alv), Idol stands by the gestural score. Some of the time-varying for -3.5 mm (negative value indicates compression responses are shown in Figure 3c, along with of the surfaces), and lalv) stands for 56 degrees the same boxes indicating the activation intervals (where 90 degrees is vertical and would corre- for the gestures. Note that the movement curves spond to a midpalatal constriction). The associa- change over time even when a tract variable is tion lines connect gestures that are phased with not under the active control of some gesture. respect to one another. For example, the tongue Such motion can be seen, for example, in the LIPS tip Icic alv) gesture and the velum (wide) gesture panel, after the end of the box for the lip closure (for nasalization) are phased such that the point gesture. This motion results from one or indicating 0 degreesonset of movementof the both of two sources. (1) When an articulator is not tongue tip closure gesture is synchronized with part of any active gesture, the articulator returns the point indicating 240 degreesachievement of to a neutral position. In the example, the upper lip goalof the velic gesture. and the lower lip articulators both are returning Each gesture is assumed to be active for a fixed to a neutral position after the end of the lip proportion of its virtual cycle (the proportion is closure gesture. (2) One of the articulators linked different for consonant and vowel gestures). The to the inactive tract variable may also be linked linguistic gestural model uses this proportion, to some active tract variable, and thus cause along with the stiffness of each gesture and the passive changes in the inactive tract variable. 58 Brotuman and Goldstein

(3a) 'pan

wide VELUM

clo TONGUE TIP alv

narrow TONGUE BODY phar

LIPS clo lab

GLOTTS wide

(3b) 'pan

wide VELUM

clo TONGUE TIP alv

narrow TONGUE BODY phar

LIPS clo lab

GLOTTIS wide Dynamics and ArticulatarY Phonolov 59

(3c) 'pan

closed wide VELUM closed clo TONGUE TIP alv closed narrow TONGUE BODY pfir closed LIPS

GLOTTIS wide closed

100 200 300 400

TIME (ms) for "pawn." (a) Gestural descriptors and associationlines (b) Figure 3. Various displays from computational model boxes (c) Gestural descriptors and activation boxes plus Gestural descriptors and association lines plus activation the generated movements of (from top to bottom): VelicAperture, vertical position of the Tongue Tip (with respect to fixed palate/teeth), vertical position of the Tongue Body(with respect to the fixed palate/teeth), Lip Aperture,Glottal Aperture.

In the example, the jaw is part of the coordinative give some examples of how the notion of contrast structure for the tongue body vowel gesture, as is defined in a system based on gestures, usingthe well as part of the coordinative structure forthe schematic gestural scores in Figure 4. lip closure gesture. Therefore, even after the lip One way in which constellations may differ is in closure gesture becomes inactive, the jaw is the presence vs. absence of a gesture. This kind of affected by the vowel gesture, and its lowering for difference is illustrated by two pairs of subfigures the vowel causes the lower lip to also passively in Figure 4: (4a) vs. (4b) and (4b) vs. (4d). (4a) lower. "pan" differs from (4b) "ban" in having a glottis The gestural constellations not only characterize Iwidel gesture (for voicelessness), while (4b) "ban" the microscopic properties of the utterances, as differs from (4d) "Ann" in having a labial closure discussed above, but systematic differences among gesture (for the initial consonant). Constellations the constellations also define the macroscopic may alsodifferintheparticulartract property of phonological contrast in a language. variable/articulator set controlled by a gesture Given the nature of gestural constellations, the within the constellation, as illustrated by (4a) possible ways in which they may differ from one "pan" vs. (4c) "tan," which differ in terms of another is, in fact, quite constrained. In other whether it is the lips or tongue tip that perform papers (e.g., Browrnan & Goldstein,1986; 1989; the initial closure. A further way in which 1992) we have begun to show that gesturalconstellations may differ is illustrated by structures are suitablefor characterizing comparing (4e) "sad" to (40 "shad." in which the phonological functions such as contrast, and what value of the constriction location tract variable for the relation is between the view of phonological the initial tongue tip constriction is the only structure implicit in gestural constellations, and difference between the two utterances. Finally, that found in other contemporary views cf two constellations may contain the same gestures phonology (seealso Clements, 1992 for a and differ simply in how they are coordinated, as discussion of these relations). Here we will simply can be seen in (4g) "dab" vs. (4h) "bad." 60 Brawman and Goldstein

(a) pmn (b) bxn

wide wide VELUM do do TONGUE TIP ely wide wide TONGUE BODY phar phar do LIPS lab

GLOTTIS wide

(C) twn (d)Een

VELUM wide wide do do do TONGUE TIP alv alv alv

TONGUE BODY wide wide phar phar LIPS

GLOTTIS wide

(e) swd

VELUM crit do crit TONGUE TIP do alv alv alvpal alv wide wide TONGUE BODY phar phar

LIPS

GLOTTIS wide wide

(g) dwb (h) bwd

VELUM do do TONGUE TIP alv wide TONGUE BODY wide phar phar

LIPS do do lab lab GLOTTIS

Figure 4. Schematic gestural scores exemplifying contrast. (a) "pan" (b) "ban" (c) "tan" (d) "Ann" (e) "sad" (0 "shad" (g) "dab" (h) "bad."

6 7 Dynamics and Articulatory Phonology 61

This chapter described an approach to the de- action. In B. Butterworth (Ed.), Language production (pp. 373- 420). New York, NY: Academic Press, scription of speech in which both the cognitive and Haken, H. (1977). Synergetics: An introduction. Heidelberg physical aspects of speech are captvred by viewing Springer-Verlag. speech as a set of actions, or dynamic tasks, that Halle, M. (1982). On distinctive features and their articulatory can be described using different dimensionalities: implementation. Natural Language and Linguistic Theory, 1. 91- low dimensional or macroscopic for the cognitive, 105. Hockett, C. (1955). A manual of phonology. Chicago: University of and high dimensional or microscopic for the Chicago. physical. A computational model that instantiates Hogeweg, P. (1989). MIRROR beyond MIRROR. Puddles of LIFE. this approach to speech was briefly outlined. It In C. Langton (Ed.), Artificial life (pp. 297-316). New York: was argued that this approach to speech,which is Addison-Wesley. based on dynamical description, has several Jakobson, R., Fant, C. G. M., & Halle, M. (1951). Preliminaries to speech analysis: The distinctive features and thetr correlates. advantages.First,itcaptures both the Cambridge, MA: MIT. phonological (cognitive) and physical regularities Kauffmann, S. (1989). Principles of in complex that minimally must be captured in any systems. In D. Stein (Ed.), Sciences of complexity (pp. 619-711). description of speech. Second, it does so in a way New York: Addison-Wesley. that unifies the two descriptions as descriptions Kauffman:-.. S. (1991). Antichaos and adaptation. Scientific American, 265, 78-84. with different dimensionality of a single complex Kauffmann, S., & Johnsen, S. (1991). Co-evolution to the edge of system. The latter attribute means that this chaos: coupled fitness landscapes, poised states, and co-evolu- approach provides a principled view of the tionary avalanches. In C. Langton, C. Taylor, J. D. Farmer, & R. reciprocal constraints that the physical and Rasmussen (Eds.), Artificial life 11 (pp. 325- 369). New York: phonological aspects of speech exhibit. Addison-Wesley. Keating, P. A. (1985). CV phonology, experimental phonetics, and coarticulation. UCLA WPP, 62, 1-13. REFERENCES Keating, P. A. (1988). Underspecification in phonetics. Phonology, Abraham, R. H., & Shaw, C. D. (1982). Dynamics-The geometry of 5, 275-292. behavior. Santa Cruz, CA: Aerial Press. Keating, P. A. (1990). Phonetic representations in a generative Anderson, S. R., (1974). The organization of phonology. New York, grammar. Journal of Phonetics, 18, 321-334. NY: Academic Press. Kelso, J. A. S., Saltzman, E. L., & Tuller, B. (1986). The dynamical Browman, C. P., & Goldstein, L. (1986). Towards an articulatory perspective on speech production: Data and theory. Journal of phonology. Phonology Yearbook, 3, 219-252. Plwnetics, 14, 29-59. Brownian, C. P., & Goldstein, L. (1989). Articulatory gestures as Kent, R. D., & Minifie, F. D. (1977). Coarticulation in recent speech phonological units. Phonology, 6. 201-251. production models. Journal of Phonetics, 5, 115-133. Browman, C. P., & Goldstein, L. (1990a). Gestural specification Klatt, D. H. (1976). Linguistic uses of segmental duration in using dynamically-defined articulatory structures. Journal of English: Acoustic and perceptual evidence. Journal of the Phonetics, 18, 299-320. Acoustical Society of America, 59, 1208-1221. Browman, C. P., & Goldstein, L. (1990b). Representation and Kugler, P. N., & Turvey, M. T. (1987). Information, natural law, and reality: Physical systems and phonological structure. Journal of the self- assembly of rhythmic movement. Hillsdale, NJ: Lawrence Phonetics, 18, 411-424. Erlbaum Associates. Browrnan, C. P., & Goldstein, L. (1990c). In J. Kingston & M. E. Ladefoged, P. (1980). What are linguistic sounds made of? Beckman (Eds.), Tiers in articulatory phonology, with some Language, 56, 485- 502. Ladefoged, P. (1982). A course in imphcations for casual speech (pp. 341-376). phonetics (2nd ed.). New York, NY: Harcourt Brace Jovanovich. Browman, C. P., & Goldstein, L. (1992). Articulatory phonology: Lewin, R. (1992). Complexity. New York, NY: Macmillan. An overview. Phonetica, 49, 155-180. Ladefoged, P. (1982). A course in phonetics (2nd ed.). New York: Browman, C. P., Goldstein, L., Kelso, J. A. S., Rubin, P., & Harcourt Brace Jovanovich. Saltzman, E. (1984). from underlying Liberman, A. M., Cooper, F. S., Shankweiler, D. P., & Studdert- dynamics. Journal of the Acoustical Society of America, 75, S22-S23 Kennedy, M. (1967). Perception of the speech code. (A). Psychological Review, 74, 431- 461. Chomsky, N., & Halle, M. 1968. The sound pattern of English. Liberman, M., & Pierrehumbert, J. (1984). Intonational invariance New York: Harper Row. Clements, G. N. (1992). Phonological under changes in pitch range and length. In M. Aronoff, R. T. primes: Features or gestures? Phonetwa, 49, 181-193. Oehrle, F. Kelley, & B. Wilker Stephens (Eds.), Limguage sound Clements, G. N. (1992). Phonological primes: Features or structure (pp. 157-233). Cambridge, MA: MIT Press. gestures? Phonetica, 49. 181-193 Lindblom, B., MacNeilage, P., & Studdert-Kennedy, M. (1983) Cohn, A. C. (1990). Phonetic and phonological rules of Self-organizing processes and the explanation of phonological :ization. UCLA WPP, 76. universals. In B. Butterworth, B. Comrie, & 0. Dahl (Eds.), Coleman, I(1992). The phonetic interpretation of headed Explanations of linguistic universals (pp. 181-203). MAli.-mi: The phonological structures containing overlapping constituents. Hague. Phonology, 9. 1-44. Macchi, M. (1988). labial articulation patterns associated with Fouralos, M., & Port, R. (1986). Stop epenthesis in English. Journal segmental features and syllable structure in English. Phonetica of Phonetics. 14, 197-2_21. 45, 109-121. Fowler, C. A., Rubin, P., Remez, R. E., & Turvey, M. T. (1980). Madore, B.F., & Freedman, W L. (1987). Self-organizing Implications for speech production of a general theory of structures. Arnerwan Scientist. 75, 252-259. 62 Brownian and Goldstein

Manuel, S. Y., & Krakow, R. A. (1984). Universal and language (Eds.), Generation and modulation of action patterns (pp. 129-144). particular aspects of vowel-to-vowel coarticulation. Haskins Berlin/Heidelberg: Springer-Verlag lathoratories Status Report on Speech Research, SR77/78, 69-78. Saltzman, E., & Kelso, J. A. S. (1987). Skilled actions: A task McCarthy, J. J. (1988). Feature geometry and dependency: A dynamic approach. Psychological RerieW, 94, 84-106. review. Phonetica, 45, 84-108. Saltzman, E. L., & Munhall, K. G. (1989). A dynamical approach to Ohala, J. (1983). The origin of sound patterns in vocal tract gestural patterning in speech production. Ecological Psychology constraints. In P. F. MacNeilage (Ed.), The production of speech 1, 333-382. (pp. 189-216). New York, NY: Springer-Verlag. Schoner, G., & Kelso, J. A. S. (1988). Dynamic pattern generation Ohman, S. E. G. (1966). Coarticulation in VCV utterances: in behavioral and neural systems. Science, 239, 1513-1520. Spectrographic measurements. Journal of the Acoustical Society of Stevens, K. N. (1972). The quantal nature of speech: Evidence America, 39, 151- 168. from articulatory-acoustic data. In E. E. David & P. B. Denes Packard, N. (1989). Intrinsic adaptation in a simple model for (Eds.), Human communication: A unified view (pp. 51-66). New evolution. In C. Langton (Ed.), Artificial hfe (pp. 141-155). New York, NY: McGraw-Hill. York: Addison-Wesley. Stevens, K. N. (1989). On the quantal nature of speech. Journal of Pierrehumbert,J.(1990).Phonological and phonetic Phonetics, 17, 3-45. representation. Journal of Phonetics 18, 375-394. Sussman, H. M., MacNeilage, P. F., & Hanson, R. J. (1973). Labial Pierrehumbert, J. B., & Pierrehumbert, R. T. (1990). On attributing and mandibular dynamics during the production of bilabial grammars to dynamical systems. Journal of Phonetics, 18. consonants: Preliminary observations. Journal of Speech and Port, R. F. (1981). Linguistic timing factors in combination. Journal Hearing Research, 16, 397-420. of the Acoustical Society of America, 69, 262-274. Turvey, M. T. (1977). Preliminaries to a theory of action with Rubin, P. E., Baer, T., & Mermelstein, P. (1981). An articulatory reference to vision. Ln R. Shaw & J. Bransford (Eds.), Perceiving, synthesizer for perceptual research. Journal of the Acoustical acting and knowing: Toward an ecological psychology. Hillsdale, NJ: Society of America, 70, 321-328. Lawrence Erlbaum Associates. Sagey, E. C. (1986). The representation of features and relations in non- Wood, S. (1982). X-ray and model studies of vowel articulation. linear phonology, doctoral dissertation, MIT. Working Papers, Lund University, 23. Saltzman, E. (in press). Dynamics in coordinate systems in skilled sensorimotor activity. In T. van Gelder & B. Port (Eds.), Mind as FOOTNOTES motion. Cambridge, MA: MIT Press. T. van Gelder & B. Port (Eds.), Mind as motion. Cambridge, Saltzman, E. (1986). Task dynamic coordination of the speech MA: MIT Press (in press). articulators: A preliminary model. In H. Heuer & C. Fromm *Also Department of , . Haskins Laboratories Status Report on Speech Research 1993, SR-113, 63-90

Some Organizational Characteristicsof Speech Movement Control*

Vincent L. Gracco

The neuromotor organization for a class of speech sounds (bilabials) wasexamined to evaluate the control principles underlying speech as a sensorimotor process.Oral opening and closing actions for the consonants /p/, /b/, and Ira/ (CI) in /s Vi CIV2 C2/ context, where V1 was either /ae/ or /V, V2 was lae/, and C2 was /p/, wereanalyzed from four subjects. The timing of oral opening and closing action was found to be asignificant variable differentiating bilabial consonants. Additionally, opening and closing actions were found to covary along a number of dimensions implicating the movementcycle as the minimal unit of speech motor programming. The sequential adjustmentsof the lips and jaw varied systematically with phonetic context reflecting the differentfunctional roles of these articulators in the production of consonants and vowels. Theimplication of these findings for speech production is discussed.

INTRODUCTION Assuming that the individual phonetic segments of a language are stored in some manner in the As a motor process, speaking is an intricate or- nervous system, a number of important issues can chestration of multiple effectors (articulators) co- be identified. What are the production units, how ordinated in time and space to produce sound se- are they differentiated for sounds, andhow are quences. From a functional perspective, the they modified according to context? The present speech production mechanism is a special purpose investigation is an attempt to answer some of device modulating the aerodynamic and resonance these questions and identify some principles of properties of the vocal tract by rapidly creating speech motor organization used to scale, coordi- constrictions, occlusions, or overall shape changes. nate, and sequence multiple articulatory actions. These events provide the foundation for the cate- gorically distinct sounds of the language. The Movement characteristicsStop consonants functional task, driven by cognitive considera- One focus of the present investigation is to tions, involves a number of sensorimotor processes examine, in detail, the kinematic adjustments of that generate segmental units, specify the coordi- lip and jaw motion associated with a class of nation among contributing articulators, scale ar- English speech sounds, the bilabial stop ticulator actions to phonetic and pragmatic con- consonants /p/, /b/, /m/, in different vowel contexts. texts, and subsequently sequence these actions These consonants are the same in their place of into meaningful units for communication. articulation (the lips), employing the same set of articulators to create an obstruction at the oral end of the vocal tract. One property distinguishing This research was supported by grants DC-00121, and DC- these sounds is the presence or absence of voicing; 00594. from the National Institute of Deafness and Other /p/ is a voiceless consonant because laryngeal Communication Disorders The author thanks Carol Fowler, vibration is briefly (100-200 insec) arrested during Anders Lofqvist, Ignatius Mattingly and Rudolph Sock for comments on earlier versions of this manuscript. The author the oral closing. As the lips release the oral also thanks John Folkins for a thorough and constructively closure the vocal folds move back together and critical review of this manuscript. voicing for the next vowel is initiated (Lisker &

63 7 () 64 Gracco

Abramson, 1964). There are two other classes of Integrating results from different empirical stop consonants in English which create observations allows for some potential speculation temporary obstructions in different regions of the or. the voiced/voiceless differences. EMG results vocal tract, and use different articulators; alveolar suggest that the form of the oral closing motor and velar stop consonants. In addition, each of the commands is not different in magnitude or alveolar and ve' ar voiced consonants can be duration due to the presence or absence of voicing. produced with the velum lowered, creating the In contrast, the movement and acoustic studies nasal resonances for the In/ and /rj/ sounds, suggest that differences may be focused on respectively. Thus, English contains three movement timing as evidenced by changes in characteristic sets of stop consonants that differ in moven-mt velocity and vowel duration. While the the primary articulators used to create the available data suggest that one difference between obstruction and the location of the obstruction in sounds within an equivalence class such as the vocal tract. bilabials may be reflected in aspects of their It is not clear from previous investigations how timing, no detailed analysis has been conducted. articulatory movements within a , class are Another possibility is that articulatory motion modified by concomitant articulatory actions. That may be governed by control principles that operate is, are the lip and jaw movements for all bilabial on combinations of kinematic variables reflecting consonants similar with the acoustic distinctions important system parameters. For example, con- between sounds in the class attributed solely to sonant related movement differences may be the laryngeal and velar actions? Examination of specified by articulator stiffiaess, a construct with electromyographic (EMG), kinematic, and acoustic some potential as a motor control variable (Cooke, data reveal conflicting findings. Electromyo- 1980; Ostry & Munhall, 1985; Kelso et al., 1985). graphic activity from lip muscles for bilabial If an effector such as the lower lip is modeled as a production has generally failed to reveal linear second order system, changing the static significant differences in measured variables such stiffness of the system results in predictable as peak EMG amplitude and EMG burst duration changes in the velocity/displacement relationship; associated with the oral closing action (Fromkin, the velocity/displacement ratio varies directly 1966; Harris, Lysaught, & Schvey, 1965; Lubker with stiffness. Moreover if resting stiffness is a & Parris, 1970; Tatham & Morton, 1968). The one variable being controlled by the nervous system exception is the study by Sussman, MacNeilage, & then changes in an effector's kinematics can be Hanson (1973) in which the activity from one up- brought about by a single system parameter. It per lip depressor muscle (depressor anguli oris) has been suggested that mass normalized stiff- from one subject was found to be greater for /p/ ness, estimated from the ratio of a movement's than for /b/ and /m/. Kinematic studies have re- peak velocity and displacement, may be an impor- vealed a few movement differences for /p/ and /b/, tant control parameter for skilled actions includ- most consistently a tendency for the oral closing ing speech (Kelso et al., 1985; Kelso, 1986). While movement velocity to be higher for /p/ than /b/ or a number of studies have demonstrated consistent /rn/ (Chen, 1970; Summers, 1987; Sussman et al., and systematic velocity/displacement relations for 1973). Two of the above mentioned investigations a range of speech movements (Kuehn & Moll, (Sussman et al., 1973; Summers, 1987) have also 1976; Munhall, Ostry & Parush, 19851; Ostry, reported kinematic and EMG differences in Keller & Parush, 1983), to date it has not been movements surrounding the oral closing action. demonstrated unequivocally that stiffness is an Jaw opening for the vowel /a/ or /ae/ before /p/ was important control parameter, except as one of found to be higher than before /b/ (Summers, many possible alternatives (see Nelson, 1983 for 1987) and the lower lip opening velocity and asso- example). One possibility is that /p/ with higher ciated EMG activity (following the oral occlusion) closing velocity might be differentiated from /b/ was greater for /m/ than /p/ or /b/ (Sussman et al., and /m/ by its stiffness specification suggesting 1973). that speech sounds may be coded neurophysiologi- Acoustic studies have provided the most reliable cally by such a construct. findings associated with voicing differences. Closure durations for voiceless consonants are Articulator interactions and sequencing generally longer than closure durations for their It is becoming increasingly clear that individual voiced cognates and the preceding vowel is shorter articulators and their respective actions are not in the voiceless context (Denes, 1955; House & independent but functionally related by task. The Fairbanks, 1953; Luce & Charles-Luce, 1985). empirical support for this view comes from two Some Organizational Characteristics of Speech Movement Control 65 main sources. When the motions of the lips and influenced by phonetic context. The context in jaw for bilabial closure are examined in detail, it which a sound is produced is known to affect the appears that the individual articulators are not acoustic and kinematic properties associated with adjusted independently either in magnitude or its production (Daniloff & Moll, 1968; Sussman, relative timing (Gracco, 1988; Gracco & Abbs, MacNeilage, & Hanson, 1973; Parush, Ostry, & 1986; Hughes & Abbs, 1976). Spatiotemporal ad- Munhall, 1983; Perkell, 1969). It seems that artic- justments to suprasegmental manipulations also ulatory adjustments for context may have differ- suggest that stress and rate mechanisms act on all ential effects on the different movement phases. active regions of the vocal tract (Fowler, Gracco, In order to understand the organizational princi- V.-Bateson, & Romero, submitted). Changes in ar- ples and control processes that govern speech pro- ticulator movement patterns following mechanical duction, evaluation of speech movement differ- perturbation are also consistent with the sugges- ences must include the articulatory adjustments tion that the actions of individual articulators are within as well as across articulators and move- not adjusted independently. Rather, disruptions to ment phases. articulator movement during speech result in ad- Examining kinematic changes across movement justments in the perturbed articulator as well as phases has additional implications for under- those (unperturbed) articulators that are actively standing speech as a serial process. As pointed out involved in the production of the specific sound by Lashley (1951), serial actions, such as those (Folkins & Abbs, 1975; Kelso et al., 1984; Kollia, found in speech, locomotion, typing, and the play- Gracco, & Harris, 1992; Munhall, Lofqvist, & ing of musical instruments, cannot be explained in Kelso, in press; Shaiman, 1989; Abbs & Gracco, terms of successions of external stimuli or reflex 1984; Gracco & Abbs, 1985; 1988). While many chaining. Rather, the apparent rhythmicity found previous studies can be criticized on statistical in all but the simplest motor activities suggests grounds (see Folkins & Brown, 1987) it is not un- that some sort of temporal patterning or temporal reasonable to suggest that interactions among ar- integration may form the foundation for motor as ticulators are not random but systematic and in- well as perceptual activities (Lashley, 1951). tegral to the overall speech production process. Lashley further suggested that skilled action in- Together these observations suggest that speech volves the advanced planning of entire sequences movements are organized according to higher of action. More recently, Sternberg and colleagues level principles that involve articulatory aggre- have developed a model of speech timing that in- gates (similar to the principle of synergy defined corporates the concept of an advanced plan of ac- by Bernstein, 1967). tion, an utterance program, which is used to con- Speaking is also a continuous process involving trol the execution of the elements in sequence articulatory motion generating acoustic and aero- (Sternberg, Knoll, Monsell, & Wright, 1988). The dynamic events. Lip and jaw motion can be classi- elements of the utterance program are action fied as either opening or closing depending on the units defined abstractly on the basis of requiring a direction of the motion relative to an occlu- single selection process but may contain multiple sion/constriction or characteristic vocal tract distinguishable actions (Sternberg et al., 1988). shape. Oral opening is most often associated with That is, speech production is considered a hierar- vowel production while oral closing is most often chical process with advanced planning of what is associated with consonant production. Different to be said (utterance program) followed by the ex- relative timing patterns for oral articulatory mo- ecution of the program using smaller sequenced tions depending on direction have been inter- elements (action units) on the order of stress preted as manifestations of separate and distinct groups (Fowler, 1983; Sternberg et al., 1988). synergistic actions fundamental to the speech While this model provides a framework for the production process (Gracco, 1988). Moreover, the speech motor process, a number of unresolved opening phase of an articulatory action shows issues remain. Previous work has focused on greater variation in spatiotemporal adjustment evaluating the latency and duration of words or due to stress (Kelso, V.-Bateson, Saltzman, & Kay, syllablesrapidlyproducedbysubjects. 1985; Ostry, Keller, & Parush, 1983) than the cor- Examination of articulatory movement has not responding closing phase. Similarly, oral opening been undertaken and thus identification of the and closing movement duration show differential articulatory correlates of the action unit is effects to changes in speaking rate (Adams, lacking. If an action unit or unit of speech Weismer, & Kent, 1993). These differential pat- production is to have any theoretical significance, terns associated with movement direction are also identification of the physiological (articulatory) 66 Gracco instantiation of the construct is crucial. Further, actions for voiced and voiceless sounds differ in in order to fully understand the speech production their relative timing, reflecting different process, the mechanism by whichelemental units coordinative relations. Further, one previous are sequenced and modified isalso of central study has examined the relative timing of the lips import. In the present investigation, serial speech and jaw for oral opening, and reported movements within and across movement fundamentally different timing patterns among sequences were examined forevidence of the articulators for oral opening than for oral articulatory cohesiveness that may reflect the closing (Gracco, 1988). As such, extended ex- production unit as well as the manner in which amination of interarticulatory timing across phonetic context may modulate the underlying movement phases may be informative regarding temporal patterning. the extent and generality of any coordinative prin- ciples governing speech motor actions. Articulatory coordination A final focus of the present investigation is the Methods coordination patterns of the lips and jaw as they Subjects and movement task. Four females cooperate to occlude and open the oral end of the between the ages of 21 and 28 years were subjects vocal tract for the different bilabial consonants. A in the present study. None had a history of number of previous investigations suggest that neurological disorder and all were native speakers task-related articulatory movements, such as the of American English. Subjects were asked to lip and jaw motion for bilabial closing, are inter- repeat one of six utterances following the onset of dependent in their timing (Gracco, 1988; Gracco & an experimenter-controlled tone. The utterances Abbs, 1986). Results presented by Lofqvist and were of the form /s VI. C V2/ where V1 was either Yoshioka (1981;1984) similarly suggest a number /ae/ or /V, C was /p/, /b/, or Im/ and V2 was lae./. of consistent timing patterns for different laryn- The specific utterances were: geal-oral interactions following stress, rate, and 1) sapappie consonantal manipulations. Observations of tim- 2) seepapple ing coherence among articulatory actions have 3) sabapple been extended to include the lip, jaw, and larynx 4) st:ebapple associated with the initiation of voicing and de- 5) samapple voicing in a variety of phonetic contexts (Gracco, 6) seemapple 1990; Gracco & Lofqvist, 1989; Gracco & Lofqvist, in preparation). Such observations reflect a poten- Each word was repeated, in isolation, a total of tial principle of speech movement organization. 40 times, 10 or 20 consecutive times for each word For speech movement coordination, as for motor on the list in the presented order, followed by coordination in general (see Bernstein, 1967), the subsequent repetition(s) of the entire list. A total timing or patterning of the potential degrees of of 240 tokens were obtained for each subject with freedom for task-related actions, are constrained the exception of subject four whose total was 238 thereby reducing the overall motor control com- (one sapapple and one seemapple missing). The plexity (Gracco, 1988). stimulus to respond (auditory tone) was presented While the available evidence suggests that con- at approximately three second intervals. Subjects straining the degrees of freedom may be a general were instructed to produce each word with equal principle underlying the coordination of task-re- stress at a comfortable speaking rate following the lated speech articulators, it has recently been tone and to use an effort level appropriate for suggested that lip and jaw coordination is not in- speaking to someone 10-15 feet away. variant (De Nil & Abbs, 1991). Rather, variations Instrumentation and data acquisition. Single in the temporal sequencing of lip and jaw closing dimensional midsagittal movements of the upper movements for /b/ at fast and slow speaking r -tes, lip (UL), lower lip (LL), and jaw (J) were have been interpreted to reflect fundamentally transduced using a head-mounted strain gage different patterns requiring different coordinative system previously described (Barlow, Cole, & patterns. With the exception of the De Nil and Abbs, 1983). Briefly, movements of the upper lip, Abbs (1991) investigation, previous studies exam- lower lip, and jaw were transduced using ultra- ining the relative timing of the lips and jaw dur- light weight cantilever beams instrumented with ing oral closing have focused on voiceless /p/ strain gages attached to a lightweight head- (Caruso, Abbs, & Gracco, 1988; Gracco & Abbs, mounted frame. Transducers were attached 1986; 1988; Gracco, 1988; McClean, Kroll, & midsagittally at the vermilion border of the upper Loftus, 1990). It is possible that lip and jaw and lower lips and on a region of the chin which Some Organizational Characteristics of Speech Movement Control yielded minimal skin movement artifact (see fined as the distance and time, respectively, be- Barlow et al., 1983; Folkins, 1981; Kuehn, Reich, tween the onset and offset of each movement; & Jordan,1980; Muller & Abbs,1979). peak instantaneous velocity was also obtained for Transducers were visually aligned to capture the each movement. As with previous studies (Gracco, major movement axis for oral opening/closing for 1988; Gracco & Abbs, 1986) a measure of upper each subject with the resultant orientation rotated lip, lower lip, and jaw coordination during oral approximately 100 to 110 degrees posterior to the closing was obtained from the timing of each ar- vertical. Movement signals were digitized at a ticulator's peak velocity relative to the same pre- sampling rate of 500 Hz with 12 bit resolution ceding articulatory event (jaw opening for the except for subject four in which movement signals vowel). From the identified events of interest, the were sampled at 400 Hz. The acousticsignal was following measures were made (see Figure 1): sampled at 500 Hz to indicate the onset and offset of the stimuli only. Data analysis. Prior to data analysis, all movement signals were digitally low pass filtered (Butterworth implementation, two-pole zero phase lag, 20 Hz cutoff). As transduced, the lower lip signal reflects the combined movement of the lower lip and jaw. In order to obtain lower lip movement only, the jaw signal was software subtracted from the lower lip signal, yielding net lower lip movement. First derivatives were obtainedusingathree-point numerical differentiation reutine (central difference) and stored as part of the data file. From these S A PA PP LE instantaneous velocity signals, movement onsets and offsets were determined for the individual Ulm closing movements. Computer software was used to identify zero crossings in the velocity traces of the respective upper lip, lower lip, and jaw signals and movement onset and offsets were marked. In LLv addition, the movements of the upper lip, lower lip, and jaw were combined to yield a measure of Jv (le overall lip aperture (LA). For the present experiment, subjects had been instructed to start each utterance from a position LAv of rest with the lips in contact, and the upper and lower teeth lightly touching. Using this relatively 250 msoc consistent starting point it was possible to evaluate the relative position of the different FIgure 1 A representative example of upper lip (UL), lower lip(LL), jaw(J),and lip aperture (LA) articulators for the /s/ and the following vowel. displacement (above) and velocity (below) signals for Each acquired token consisted of eight signals one of the stimulus words, "sapapple." All trials were illustrated in Figure 1. Two complete movement initiated from a rest position with the lips together and cycles or sequences were identified from the lip teeth lightly touching. Each production included two aperture signal (LA) and indicated as sequence 1 opening/closing sequences identified from the LA signal as S_1 and S_2. Within each sequence or cycle, opening and 2 (S_1, S_2) in the figure. Each sequence con- and closing phases were also identified (Op_1, Cl_l, sists of two phases, an opening and closing phase. Op_2, Cl_2). Measurements are indicated by the Sequences were defined as the time interval be- numbers in the figure (see text for details). tween the onset of the opening movement to the offset of the closing movement in the LA signal us- 1) extent of oral opening and associated jaw ing a zero velocity criteria. Onsets of the opening position for /s/, relative to the subjects' rest and closing movements for the lips and jaw were position, also determined from zero crossings in the first 2 extent of oral opening for the first vowel derivatives of the respective articulator motions. (Op_1) and the associated jaw opening movement Maximum displacement and duration were de- characteristics,

74 68 Gracco

3) lip and jaw closing movement characteristics Movement characteristics for the first closing, Sequence One 4) oral opening (Op_2) and closing (Cl_2) Cycle duration. One of the most robust findings movement characteristics for the second cycle in the present study was a change in the duration (S_2) for the /aep/ in "apple," of the first movement cycle. For the group, the 5) time of upper lip, lower lip, and jaw peak cycle duration obtained from the lip aperture closing velocity relative to the peak jaw opening signal was, on average, shorter for the /i C/ than velocity for the first vowel opening, /ae C/ context (191.8 msec Vs 227.3 msec). In 6) time of upper lip, lower lip, and jaw peak addition, the cycle duration was shorter when the opening velocity for the second opening relative to consonant was /p/ than /b/ or /m/ (203.3 vs. 210.2 the peak upper lip closing velocity for the first oral vs. 215.1 msec, respectively). Shown in Figure 2 closing. are three examples of the lowerlip and jaw Statistical analysis. The data were analyzed movements for "sapapple," "sabapple" and using two factor repeated measures ANOVA with "samapple" from a single subject. All movements vowels and consonants as factors. When there in the figure were aligned to the same articulatory were significant interactions, post hocanalyses event; jaw opening peak velocity for the first vowel were c.onducted using Scheffe's F-test withalpha /ae/ (dotted line). As illustrated by the arrows in level set to .01 for all comparisons. Figure 2, the interval between the J opening peak velocity for /ae/ and the time of maximum lip and Results jaw closing velocity and displacement is shortest Context-dependent changes in the opening and when the consonant is /p/. Because the jaw closing movements will be examined for the first opening peak velocities are aligned in these movement sequence. Next, movement parameters examples, it can also be seen that the adjustment of the second sequence will be examined for within the cycle was localized to the interval carryover effects from the phonetic context of the spanning the terminal phase of the opening action first sequence. Finally, the coordination of lip and and the time of the peak velocity for the closing jaw motion will be examined for closing and action. This interval for all phonetic contexts opening effects and phonetic context. Individual and subjects for the first movement cycle is articulator adjustments will be concurrently presented in Figure 3. Since there were no examined to evaluate their functional role in the articulator specific differences, only the LL will be speech production process. considered.

Displacement Velocity

Figure 2. Three representative lower lip (LL) and jaw movements (displacement and velocity) from S2 for "sapapple," "sabapple," and "samapple." All signals were aligned to the jaw opening peak velocity (dotted line). Opening is toward the bottom; closing is toward the top). Arrows in jaw opening displacement panel (lower left) indicate the shortening of the jaw opening movement duration for the different consonants. Similarly, the LL closing movement for /p/ achieves both peak velocity (arrows, upper right panel) and peak displacement (upper left panel) earlier for /p/ than /b/ and /m/. Vertical calibration is 6 mm (displacement) and 150 mm/sec (velocity).

I i) Some Organizational Characteristics of Speech Movement Control 69

S 1 S 3

90

45

0 0 lad S 2 S 4 lil

0O. /p/ Ibl kJ/ /p/

Figure 3. Average interval (in milliseconds) between the jaw opening peak velocity for the first vowel (Op_1) and the time of the LL peak velocity for the first oral closing (Cl_1) for the different phonetic contexts for all subjects. Single asterisk (*) reflects a significant /p/ - /b/ or /p/ - /m/ difference; a double asterisk (**) indicates that both comparisons were reliable (p < .01).

Table I. Mean difference (xd) and associated F-value (Scheffe's F-test) for within-vowel (/aep-aeb/, /aep-aem/, /ip-ib/, /ip-itn./) consonant comparisons and across-vowel (/aep-ip/, /aeb-ib/. /aem-im/) consonant comparisons for the Op_1 CLI interval and lower lip closing movement duration for all subjects. Degrees of freedom for S1-3 (5, 234), S4 (5,232); asterisk indicates a significant difference at p < .01.

Op_l C1-1 interval

/aep-aeb/ /acp-aem/ /ip-ibl /ip-im/ /aep-ip/ /aeb-ib/ lacm-im/

SI (xd) -6.75 3.25 -5.4 -5.75 35.55 36.9 26.55 F 1.15 .27 .73 .83 31.8* 34.26* 17.73* S2 -43.5 -53.55 -4.4 -12.1 13.5 52.6 54.95 75.99* 115.16* .78 5.88* 7.32* 111.11* 121.26* S3 -22.95 -17.5 -10.05 -9.75 14.6 27.5 22.35 18.9* 10.99* 3.62* 3.41* 7.65* 27.14* 17.93* S4 -22.1 -31.29 -39.69 -38.99 44.71 27.12 37.01 12.88* 25.81* 42.06* 40.09* 52.71* 19.65* 36.11*

Movement duration

/acp-aeb/ /aep-acm/ Iip-ibl /ip-itn/ /acp-ip/ /acb-ib/ /acm-im/

SI (xd) -5.05 -3.35 .3 -5.1 -2.8 2.55 -4.55 F 1.93 .85 .01 1.97 .59 .49 1.57 S2 -19.2 -11.8 -.15 -13.2 -18.25 .8 -19.65 11.74* 4.44* .(X) 5.55* 10.61* .02 12.3* S3 -20.2 -22.45 -3.65 -6.35 -13.1 3.45 3 9.98* 12.32* .33 .99 4.2* .29 .22 S4 -4.66 -5.16 -.94 -15.38 -3.6 .12 -13.81 .51 .61 .02 5.59* .31 .0003 4.51* 70 Gracco

Within vowel and consonant comparisons The voicing character of the consonant had no revealed a significantly shorter interval when the consistent effect on the jaw opening position. No consonant was /p/ than /b/ and /rn/ in both vowel significant vowel or consonant effects were noted contexts, and a longer interval when the vowel for the UL+LL. was /ae/ compared to hi/. These effects were reliable for all subjects except for S1 whose consonant effect did not reach significance. Inspection of the data from S1 indicated that for Opening-V1 LA the first block of ten repetitions the same trend as 20 seen in the other subjects was present.However, for the subsequent blocks, the overall speech rate was increased resulting in a reduction inthe /p/ - 15 /b/ - /rn/ interval differences. In contrast to the interval results, the LL closing movement 10 durations were not consistently affected by the consonant or vowel identity. Table 1 presents the average Op_l Cl_l interval and movement 5 duration differences for the LL for each subject and the associated F statistic (see Table 1). 0 Oral openingPosition/Displacement/Velocity Since the subjects began each experimental trial UL+LL with their lips together and teeth lightly touching, 10 the opening position for the first sound (/s/) in the movement sequence could be examined for 8 anticipatory adjustments associated with the different phonetic context (upcoming vowel and/or 6 consonant). The vertical oral opening for /s/ obtained from the lip aperture signal, averaged 4 6.78 mm (S1) to 9.38 mm (S2) across subjects and conditions and was not affected by subsequent phonetic context (p > .2).In contrast, the 2 maximum oral opening for the vowel was highly dependent on vowel identity. Figure 4 presents 0 the average oral opening for the group and the individual lip and jaw contributions to the opening Jaw for the different vowel-consonant contexts. The 10 oral opening was significantly larger for /ae/ compared to lil for all subjects (S1[F(1,238) = 8 230.81, p = .0001];S2[F(1,238)=2188.6, p = .00011; S3[F(1,238)= 796.51,p =.0001]; 6 S4[F(1,236) = 343.96, p = .0001]). Oral opening for /ae/ averaged 15.2, 17.6, 17.9, and 13.3 mm, while 4 oral opening for lil averaged 12.2, 9.8, 11.5, and 10.0 mm for S1-4 respectively. As can be seen from the consistent UL plus LL contribution to the 2 oral opening in Figure 4, the jaw is the articulator solely responsible for the oral opening changes. 0 Jaw position was consistently higher for /i/ than /p/ /b/ /m/ /ae/ for all subjects (S1[F(1,238) = 535.41, p = .0001]; S2[F(1,238) = 2067.8, p=.0001]; Figure 4 Group lip aperture (LA) opening, in millimeters (mm) relative to the rest position, for the fitst vowel (V1) S3[F(1,238) = 610.89, p =.0001]; S4[F(1,236) = and consonant contexts. Below the LA data are the 505.12, p =.0001]). Collapsed across consonants contributions of the UL plus LL (UL+LL) and jaw to the jaw opening position averaged 3.4, 7.0, 6.1, and oral opening Vertical bars indicate one standard 3.5 mm higher for /if than /ae/ (S1-4 respectively). deviation. Some Orsanizational Characteristics of Speech Movement Control 71

In the present phonetic context, it was often not movements were modified in their extent. As possible to unambiguously identify UL or LL shown in Figure 6 for the group, movement extent opening movement associated with the vowel from for each articulator is greater for the more open movements associated with /s/. Thus, the opening vowel /ae/. Presented in Table 2 are the individual velocity for the first movement sequence focused subject comparisons for the upper lip, lower lip, exclusively on the jaw. Because of the large vowel and jaw closing displacements. There was a highly related jaw opening displacement effect, jaw significant vowel effect for the J for all subjects opening characteristics were examined for with larger closing movement displacements consonant-related effects only. Figure 5 presents in the /ae/ context compared to /i/. Results for the the jaw opening velocity results for the four UL were in general agreement with those of the J subjects. Post hoc testing revealed significant but to a lesser degree;S2,S3, and S4 consonant effects in the /ae/ context only with jaw demonstrated significantlylargerclosing opening velocity higher before /p/ than /m/ for all displacements in the /ae/ context compared to /V. subjects (Sl[F(2,117) = 8.11, p < .01]; S2 F = 30.97, The LL results were less consistent with fewer p < .01]; S3 F = 29.88 , p < .01]; S4[F(2,116) = 9.0, comparisons reaching significance. No significant p < .01]); /p/ - /b/ differences were only significant UL or LL differences related to vowel identity for S2 (F = 9.0, p < .01) were noted for Sl. Consonant related movement adjustments were much less consistent for all Oral closingDisplacement articulators. The J displacement for /p/ was larger As a consequence of changes in oral opening than /m/ for three subjects; Sl, S3, S4 in the /ae/ displacement for the different vowels, oral closing context.

0

- 00

1111 he/ Iii N S 1 S 3 -180

0 I''.1',4''; II -90

0 -135 Tu S 2 S 4 -100 /p/ /b/ Int/ /b/ /nil

Ftgure 5. Average jaw opening velocity, in millimeters per second (mmlsec), for the different phonetic contexts for each of the four subjects. Vertical bars indicate one standard error.

78 72 Gracco

Shown in Figure 7 is a summary of the LL closing movement velocities for all subjects for the Closing UL different phonetic contexts. The closing velocity for /p/ was higher for /b/ for three subjects (S2, S3, S4) and higher than /rn/ for two subjects (S3, S4) in the /ae/ context. For S1 the LL closing velocity was significantly reduced for /p/ compared to /m in the /a& context (p < .01). Table 3 is a summary of the post hoc comparisons for each articulator and subject. As shown, the UL results are generally similar to the LL with S2, S3, and S4 showing small increases in closing velocity for certain comparisons in the /se/ context. Only two subjects (S3, S4) showed higher J closing velocities for /p/ compared to /b/ or /in/ in the /ae/ context. Jaw LL closing velocity for all subjects was significantly 8 higher in the /ae/ context. A similar tendency was noted for the UL and LL although the results were not as robust as those for the J. 6 It appears that the magnitude of the LL closing velocity provides a fairly robust metric differenti- 4 ating voiceless from voiced consonants. However, it was of interest to determine if a combination of kinematic variables, especially those for the UL 2 and J, might provide a more reliable measure. One construct that has been used to describe speech movements and suggested as an important 0 control variable for speech is mass-normalized Jaw stiffness, expressed as the ratio of peak velocity to peak displacement (Kelso et al., 1985; Ostry & Munhall, 1985). Changes in derived stiffness have been shown to co-occur with changes in movement duration, speaking rate and emphasis. The corre- lation of velocity to displacement was generally high for all subjects and articulators with the cor- relation magnitudes ordering J > LL > UL. Velocity/displacement correlations were generally high across subjects for the LL and J ranging from r = .49 to r = .79 for the LL and from r = .91 to r = .96 for the jaw; the upper lip was less robust with correlations ranging from r = .16 to r = .86. /p/ ml /m/ Presented in Figure 8 are the UL, LL, and J stiff- ness values for the different consonants for the Figure 6. Average upper lip (UL), lower lip (LL), and jaw group. The LL values are generally higher for /p/, (J) closing displacement (in millimeters) for the phonetic a finding consistent with the peak velocity mea- contexts for the group. Vertical bars indicate one standard deviation. sures presented in Figure 7. For the UL and J, the ratios of peak velocity to peak displacement sug- gest that in the /ae/ context, /p/ is less stiff than Oral closingVelocity/Stiffness /rn/, a result opposite to that for the LL. Vowel re- Lower lip closing velocity has previously I:en lated differences were also inconsistent across shown to be higher for /p/ than /b/ (Sussman et al, subjects with the exception that J stiffness was 1973; Summers, 1987). A similar result for the LL always higher for hi/ than lae./. The data from the was found in the present investigation although individual subjects reflected the group trend and the differences, with few exceptions, were small. are presented in Table 4.

79 Some Organizational Characteristics of Speech Movement Control 73

Table 2. Mean difference (xd) and associated F-value (Scheffe's F-test) for within-vowel (/aep-aeb/. /aep-aem/, /ip-irn/) consonant comparisons and across-vowel (/aep-ip/, /aeb-ib/. /aem-im/) consonant comparisons for the upper lip, lower lip, and jaw closing displacement (mm) for all subjects. Degrees of freedom for S1-3 (5, 234) S4 (5.232); asterisk indicates a significant difference at p < .01.

Upper Lip

laep-aeb/ /aep-aem/ /ip-ib/ lip-im/ /aep-ip/ /aeb-ib/ laern-iml S 1 (xd) .75 .56 .2 .26 .4 -.14 .1 F 8.32* 4.61* .61 .99 2.4 .31 .16 S2 .58 .18 .32 .4 .77 .51 .99 6.86* .69 2.06 3.26* 12.2* 5.33* 19.95* S3 .-55 -.07 -.01 -.07 .51 1.04 .5 7.26* .11 .003 .14 6.24* 26.41* 6.09* S4 .58 -.03 .34 .-06 1.06 .82 1.04 1.95 .01 .67 .02 6.59* 4.0* 6.3* Lower Lip

/aep-aeb/ /aep-aem/ /ip-ib/ /ip-im/ /aep-ip/ /aeb-ib/ /aem-im/ SI (xd) -.24 -.36 -.03 -.34 .26 .47 .27 F .66 1.42 .01 1.33 .77 2.49 .84 S2 .06 -.28 -1.17 -1.45 2.0 .77 .82 .02 .59 10.05* 15.49* 29.17* 4.33* 4.98* S3 .33 .97 .08 -.21 1.76 1.51 .58 .7 6.0* .04 .28 19.81* 14.59* 2.17 -.18 S4 1.1 .48 .08 -.75 1.04 .03 7.09* 1.34 .04 3.26* 6.4* .01 .19 Jaw

/aep-aeb/ /aep-aem/ /ip-ib/ lip-iml /aep-ip/ /aeb-ib/ /aem-im/ SI (xd) .48 .87 .46 .2 2.47 2.44 1.79 F 1.06 3.49* .96 .18 27.81* 27.31* 14.68* S2 -.46 .46 .99 .28 3.93 5.38 3.75 1.25 1.26 5.82* .47 91.51* 171.59* 83.32* S3 -.21 1.97 .09 .18 4.58 4.88 2.79 .38 31* .06 .26 167.02* 190.02* 61.92* S4 .77 1.84 .47 .57 2.25 1.95 .98 11.93* 68.34* 4.61* 6.61* 102.3* 78.32* 19.53*

S1 S3 160

sn 120

80 0 40

0 S2 S4 160 . . A 120I."-- , 80 1 N"" ,:;." I: : I .i 8 40V .1 .,. 1 117, : I 3 1.6-.... -...,.. ...,:. I /m/ /p/ /m/ /p/- /b/ Figure 7.Mean lower lip peak closing velocity (in millimeters per second) for the different phonetic contexts for the four subjects. Vertical bars indicate one standard error; asterisk indicate significantly higher closing velocity(p <.01) for /p/ compared to /b/ and /m/. 74 Graaf)

Table 3. Mean difference (xd) and associated F-value (Scheffe's F-test) for within-vowel (/aep-aeb/, /aep-aem/, /ip-ib/. /ip-im/) consonant comparisons and across-vowel (/aep-ip/, /aeb-ibl, lczem-im/) consonant comparisons for the upper lip, lower lip, and jaw closing velocity (mm/sec) for all subjects. Degrees of freedom for S1-3 (5, 234), 54 (5,232); asterisk indicates a significant difference at p < .01.

Li

/aep-aeb/ /aep-aem/ /ip-ib/ /ip-im/ /aep-ip/ Iaeb-ibl /aem-irn/ S I (xd) 4.52 -1.16 3.11 3.43 -2.18 -3.59 2.42 1.36 .09 .64 .78 .32 .86 .39 S2 2.26 8.6 4.34 -5.64 10.46 8.38 24.7 .4 5.87* 1.49 2.53 8.68* 5.57* 48.43* S3 5.96 5.92 -.27 -2.24 .24 6.47 8.4 4.26* 4.2* .01 .6 .01 5.02* 8.45* S4 13.76 2.86 8.47 10.66 9.49 -4.27 17.29 4.64* .2 1.78 2.79 2.21 .45 7.34* Lower Li

/aep-aeb/ /aep-aem/ /ip-ib/ /ip-irn/ /aep-ip/ /aeb-ib/ /aem-im/ 2.99 6.97 8.79 S I (xd) -4-9 -10.14 -.92 -4.34 F .79 3.38* .03 .62 .29 1.6 2.54 S2 12.45 8.62 -13.88 -11.08 47.79 21.46 28.09 343* 1.64 4.26* 2.71 50.49 10.18* 17.45* .27 S3 22.22 35.05 17.33 15.06 20.26 2.5.37 6.26* 15.57* 3.81* 2.87 5.2* 2.99 .0001 S4 23.87 13.6 15.46 4.93 14.5 1.59 5.84 14.8* 3 41* 4.47* .45 3.88* .05 .63 Jaw

/aep-aeb/ /aep-aem/ /ip-ib/ /ip-im/ /aep-ip/ /aeb-itil /aem-im/ S 1 (xd) 2.53 4.3 6.32 4.52 34.16 37.95 34.37 F .11 .32 .69 .35 19.99* 24.68* 20.24* S2 -1.33 10.76 14.1 1.82 68.36 83.81 59.44 .03 2.26 3.87* .06 91.11* 136.87* 68.85* S3 -8.13 32.21 2.11 4.13 80.43 90.67 52.35 1.43 22.53* .1 .37 140.5* 178.56* 59.53* S4 14.62 27.33 6.23 8.93 35.67 27.27 17.26 18.67* 65.2* 3.43* 6.96* 111.03* 65.75* 26.02*

Table 4. Derived stiffness values, defined as the ratio of peak velocity to peak displacement for the firstoral closing movement for the upper lip (UL), lower lip (LL), and jaw (J) for the foursubjects. Results are presented according to vowel and consonant context for the first movement sequence. A single asterisk indicates a significant /p/ /b/ or difference at p < .01; double asterisks indicate that /p/ was significantly different than both /b/ and /m/. All comparisons were within vowel.

law/ Lath/ laeml IibL /irn./ S I UL 13.4 (.24)" 14.8(.25) 15.5(.25) 15.1(.16) 15.1(.20) 15.2(.18) LL 18.2(.28) 18.3(.22) 18.7(.19) 18.5(.20) 18.6(.22) 18.1(.22) 17.7(.20)* 19.0(.28) 20.2(.26) 20.8(.28) 22.2(.34) 20.7(.36) S2 UL 12.7(.20)** 15.1(.32) 15.0(.17) 12.6(.27)* 14.7(.21) 12.4(.18) LL 21.0(.27)* 19.4(.34) 19.0(.24) 19.8(.22)" 18.3(.21) 17.1(.25) 18.9(.19) 17.7(.15) 18.4(.17) 21.8(.42)" 29.4(.79) 24.2( 46) S3 UL 17.0(.32) 16.6(.63) 18.7(.37) 20.4(.31) 20.3(.36) 19.0(.45) LL 20.1(.21)" 17.9(.28) 17.6(.42) 23.2(.49)" 20.3(.42) 19.5(.42) 19.6(.14) 20.3(.20) 21.3(.24) 28.0(.85) 27.8(.82) 27.1(.63) S4 UL 15.6(.22) 14.6(.24) 14.9(.21) 16.9( .24)* 16.3(.27) 14.7(.25) LL 20 0(.28) 18.7(.48) 19.2(.35) 21.5(.34)* 18.4(.34) 17.5(.30) 16.8(.20) 16.2(.27) 18.3(.26) 17.8(.23)* 19,9(.43) 19.2(.40) Some Organizational Characteristics of Speech Movement Control 75

UL reliably related to phonetic context. However, vowel-relatedpositionsof thedifferent articulators did vary and those variations influenced the extent of articulator displacement for the closing movement. Shown in Figure 9 are results from the LL (left side) and J (right side) opening position-closing displacement relations for the individual subjects. As shown, the relative articulatory positions for the vowel produced systematic variations in the magnitude of the oral closing movement. The LL data (left panel) reflect all consonant and vowel combinations; due to the -large movement difference for the J, only the data LL for the lee/ context are included (right panel). 30 Including luin the lee/ data created a range effect that artificially inflated the correlation. 25 As shown in Figure 9, correlation coefficients for the LL range from r = .37 to r = .66. The J closing 20 displacement was even more strongly related to 15 the preceding J opening position with correlation coefficients ranging from r = .50 to r = .73; the UL 10 (not shown) was less consistent (correlation coefficients ranging from r = .05 to r = .55). As 5 indicated, the lower the position of the lips and jaw from the subjects defined rest position prior to closure, the greater the resulting displacement. Jaw In addition to the appar nt dependence of articulator displacement on articulator position, other characteristics of oral opening and oral closing were found to covary. For example, the jaw opening velocity for /ae/ and the subsequent closing velocity for the consonant demonstrated a covariation with correlations ranging from r = .47 to r = .67 for the four subjects (Figure 10). It should also be noted that there was a trend for jaw opening and closing velocity for /p/ and /b/ to be faster than for /m/. It appears that there are systematic spatiotemporal variations in oral /p/ /b/ Iml opening and closing and such interactions are not necessarily specific. Figure 8. Average mass-normalized stiffness for the upper lip (UL), lower lip (LL) and jaw (J) oral closing for Articulator interactions. In previous studies it the different vowel-consonant combinations for the has been suggested that the movement of individ- group. Mass normalized stiffness was derived bv ual articulators are subordinate to the combined dividing the peak velocity, in mm/sec, by the peak movement of the contributing parts (Hughes & closing displacement, in mm, resulting in units of Abbs, 1976; Gracco & Abbs, 1986; Saltzman, frequency (1/sec). Vertical bars indicate one standard deviation for the group. 1986). That is, there appears to be a higher level motor plan in which the action of individual ar- Opening/closing interactions ticulators is partially dependent on the action of Inspection of the oral opening and closing the other articulators contributing to a multiar- movement relations collapsed across consonants ticulator goal. In the present study, separate revealed some interactions that were not stepwise regressions were done on the individual phonetically related but apparently reflect more UL, LL, and J displacements for oral closing, us- general characteristics of speech movement ing the positions of the three articulators for the organization. As indicated above, the positions of preceding vowel as independent variables, to ex- the lips and jaw for the first vowel were not amine for evidence of articulatory interactions.

52 76 Gracco

LL S1 :JAW S1 8 .1144::% "As "IF f7frot 4 all.1100:ill 10 0

, r = .37 , r = .68 -12 -8 -4 0 S2 2. ei IP 6 NW. ; 6. Ss. .4p,

2 r = .731 -16 -12 -8 10 S3

6 O. V..

544 S. 2 r = .62 -20 -14 -8

S4 8 6

6 4 044.: expliej 4 2 ipo:ell kaa

2 0 r = .50 -7 -3 1 -12 -8 -4 Opening Position (mm)

Figure 9 . Scatter plots of the lower lip (LL; left side) and jaw (Jaw; right side) closing displacements as a function of the relative positions of the respective articulators for the preceding vowel . Product-moment correlation coefficients (r) are presented at the bottom right hand corner of each plot. Only the jaw opening positions/jaw closing displacements are presented for he/ (see text for explanation). 83 Some Organizational Characteristics of Speech Movement Control 77

180 Jaw Si

140 0:10.4114 Ss 100 /tAY 911 60

r = .67 20 -240 -200 -160 -120 -80

180 140 S2 S 4 4. 140 0. 105 7 -lob 4,,

100 70 e 4liff.. 60 35 ... :

r = .58 20 --1 -180 -140 -100 -60 -20 -180 -140 -100 -60 -20 OpeningVelocity(min/sec) Opening Velocity (non/sec)

Figure 10. jaw opening peak velocity for lae/ and the corresponding jaw closing peak velocity for the three consonants. Product-moment correlation coefficients (r) are presented at the bottom right hand corner of each plot. As shown, as the jaw opens faster there is a tendency for the jaw to close faster as well.

In all cases, the position of a particular articulator preceding oral closing consonant was /p/ than for the preceding vowel was found to have the when it was /b/ or /m/ (S1 (F = 52.06, p .00011; strongest influence on the movement displace- S3 (F. 42.23, p = .00011; S4 [F = 49.13, p = .0001]). ment for this articulator. However, in all cases There were no consistent vowel related effects for significant increases in the amount of explained these subjects (p > .05). For S2 the duration of the variance were noted when the position of at least second movement sequence was only longer for /p/ one other articulator was included in the regres- when the preceding vowel was /if ([F(5,234) = 6.92, sion model. A summary of the regression results is p < .01]); a significant vowel effect was noted in presented in Table 5. the /p/ context (F = 14.06, p < .01). To determine whether the longer cycle duration Second sequence was localized to a single phase of the movement Cycle duration. In the present investigation, the sequence and a single articulator, or distributed stimuli required two complete movement se- across the opening and closing phases and con- quences; two opening and two closing. Following tributing articulators, the movement characteris- the first sequence in which the vowel and conso- tics of the respective phases for all three articula- nant varied, the remaining sequence for "apple" tors were examined. Shown in Figure 12 are the was similar in phonetic context; an opening for the group averaged upper lip, lower lip, and jaw open- vowel /a& and a closing for /p/. Thus the data ing and closing movement durations for the sec- could be examined for carryover effects of the first ond movement sequence. As can be seen, the open- syllabic context on the second movement se- ing movements account for the changes in cycle quence. Examination of the duration of the second duration associated with the preceding phonetic movement cycle presented in Figure 11 revealed context noted in Figure 11. For th oup the significant differences as a function of the preced- opening movement duration was long. llowing ing consonant. For subjects Sl, S3 and S4, the du- /p/ than /11/ or /m/ for the UL ([F(2,95b, = 47.07, P ration of the second movement sequence, obtained = .00011) the LL ([F(2,956) = 95.9, p = .00011) and from the lip aperture signal, was longer when the the J ((F(2,956) = 31.53, p = .0001)). There were no 78 Gracca significant closing movement duration changes for UL and LL respectively). For the J just the oppo- any articulator or context (p > .05). In addition, site was found; jaw opening duration was longer vowel related differences were noted that were ob- for /i/ compared to /ae/ ([F(1,957) = 160.45, p = scured by examining the sequence duration from .0001]). Opposing vowel-related effects for the lips the lip aperture signal. For the UL and LL the and jaw apparently offset one another in the com- opening movement was shorter for /i/ cempared to bined lip aperture signal resulting in no net /ae/ ([F(1,957) = 212.02, 125.01, p = .00011 for the change to the second movement cycle.

Table 5. Stepwise regression of the upper lip (ul), lower lip (II), and jaw (j) oral closing displacement (disp) for the first opening/closing sequence. The displacement of each individual articulator was regressed on the relative position of each of the three articulators for the preceding vowel.

Upper Lip adj. R-squared

S 1 UL disp = 3.45 + .56(uI) +.I5(j) +.27(11) .66 .42 S2 UL disp = 3.06 + .38(u1) + .08(j) .37 .12 S3 UL disp = 1.45 + .64(uI) + .13(j) + .09(11) .67 .43 S4 UL disp = 2.02 + .84(u1).21(j) + .2(11) .67 .44 Lower Lip adj. R-squared SI LL disp = 2.63 + .37(11) + .26(j) + .26(u1) .67 .44 S2 LL disp = 2.66 + .71(11) + .18(j) .72 .50 S3 LL disp = 3.99 + .32(11) + .I4(j) .36 .11 S4 LI disp = .1 + .6(11) + .28(j) + .46(u1) .78 .60 Jaw adj. R-squared SI J disp = .85 + .58(j) + .66(u1) .74 .53 S2 J disp = 3.1 + .61(j) + .32(u1) + .22(11) .76 .57 S3 J disp = .07 + .37(j) - .49(u1) .64 .40 S4 J disp = .01 + .48(j) .50 .24

S 3

.L

Inc/ S 2 S 4 200

a) 210

c 140 .-0

172 7 70

/p/

Figure 11. Duration (in milliseconds) of the second movement cycle (S-2) for the oral opening for /ae/ and subsequent closing for /p/ following the different vowel-consonant combinations in the first cycle (S-1) for each of the four subjects. Vertical bars indicate one standard error; asterisks indicate significant /p/ - /b/ or /m/ difference (p < .01).

o0r Some Organizational Characteristics of Speech Movement Control 79

150LL

2100

50

Figure 12. Group means of the duration of opening (Op_2) and closing (Cl_2) movementsfor the second sequence for the upper lip (UL), lower lip (LL) and jaw. The opening movement durations forthe UL and LL were reliably longer for /p/ compared to /b/ and /m/. Vertical bars indicate one standard error.

Velocity/Displacement consequence, the jaw opening displacement starts from a significantly higher position. The higher J Other characteristics of the opening phase of the position results in larger opening movement dis- second sequence revealed articulator-specific con- placements, longer opening movements and textual differences in movement extent and higher opening velocities for the subsequent vowel movement velocity consistent with these results. opening. In the opening phase, the most significant effects In contrast to the vowel related opening differ- were vowel related for the J and consonant related ences, consonant related differences were noted for the LL. As summarized in Figure 13 for the but only for the LL. The magnitude of the opening four subjects, the J opening movement for the /ae/ velocity following closure was found to order went farther (S1-S3[F(1,238) = 179.72, 65.1, according to consonant identity with /p/ < /b/ < Iml 709.07 respectively, p = .0001); S4[F(1,237) = for all subjects. As can be seen in Figure 14, this 298.2, p = .00011) and faster when following A/ result was reliable for all subjects for the LL (S1- (S1-S3[F(1,238) = 78.06, 12.01, 419.24 respec- S4 F = 52.95, 237.76, 54.44, 160.85 respectively, p tively, p = .00011; S4[F(1,237) = 306.88, p = = .0001). For the J post hoc testingrevealed only .0001]). The effect for the LL was less robust two significant differences; /b/ - /m/ for S1 (F = (displacement; S3 F = 290.46, p = .0001; S4 F = 6.01, p < .01) and /p/ - /in/ for S2 (F = 31.52, p < .01). 52.05, p = .0001; velocity S2 F = 6.93, p = .009; S3 F = 21.7, p = .0001) and when significant went in Lip-jaw Coordination the opposite direction to that observed for the J. Oral closing. Starting from a relatively steady The greater J opening displacement, duration and state for the /s/ in the six stimulus words, the velocity for /ae/ following hi/ appears to be due to upper lip, lower lip, and jaw attained some open the higher jaw position prior to the second opening posture for the vowel and subsequently closed the movement following /i/. The jaw position for /i/ is oral opening cooperatively. Using the jaw opening higher than when the first vowel is /ae/ and as a peak velocity for the vowel as a reference point 80 Gracco

(see Figure 2), the consistency of the relative relative timing for the four subjects is presented timing of the three articulators was examined. in Figure 15 with the corresponding correlations Consistent with previous studies the timing of the shown at the bottom. Similar results were UL, LL, and J covaries during oral closing obtained for the LL-J with all within-vowel (Gracco, 1988; Gracco & Abbs, 1986). The UL-LL correlations ranging from r = .86 to r = .99.

JAW 0

0

-150

IN ll

S1 S2 S3 S4 S1 S2 S3 S4 Subject Subject

Frgure 13. Peak opening velocity and displacement for jaw (jaw) and lower lip (LL) for the second movement sequence for the four subjects. Vertical bars indicate one standard error; asterisks indicate significant vowel differences (p < .01). As shown, the jaw opening is faster and farther when the preceding vowel is Ii/ compared to /ae./. The same trend is not seen in the lower lip.

JAW LL 0 tole

a)

>. 100 1

-150 /p/ /b/ Irn/ /p/ /b/ /m/

Frgu,r 14 Jaw and lower lip opening velocity for the second movement sequence as a function of preceding consonant identity for the four subjects. For all subjects, the LL opening velocity was highest for /m/ compared to /p/ or /b/; the same trend was not found for the Jaw. Vertical bars indicate one standard error.

S7 Some Organizational Characteristics of Speech Movement Control Sl

Oral opening. In contrast to the UL,LL, and J ing (Figure 15). Similar results were obtained for timing relations for the oral closing, thetiming of the LL-J with correlations ranging from r = .57 to the three articulators for oral opening wasless r = .80. consistent. Figure 16 presents the UL-LLrelative Context effects. A final issue related to the lip- timing for oral opening for the foursubjects. For jaw coordination for oral closing focused on whether the coordinative relations among the ar- these data the UL-LL opening velocitytiming is referenced to the time of the preceding ULpeak ticulators are fundamentally the same or different lips for different contexts. To examine the articula- closing velocity. While it appears that the two between the are certainly related in theirtiming, they do not tory relations in detail, the intervals display the same consistency seen for theoral clos- UL-LL and LL-J peak velocities were examined.

S1 S3

04, 2 0 0 UL

_155 AP' 6 ter = .97 LL r = .94 2 0 S2 S4 200 5"--1

155

110

65

r = .99 r = .98 20 20 65 110 155 200 20 65 110 155 200 Time of Peak Velocity (msec) Time of Peak Velocity (msec)

Figure 15. Scatter plots of the relative timing of the upperlip WU and lower lip (LI..) peak closing velocity (Cli)for the different vowel-consonant contexts for the four subjects.Time of peak velocity was referenced to the time of the jaw opening peak velocity for the first vowel (0p3).Product-moment correlation coefficients (r) are presented atthe bottom right hand corner of each plot.

c 8 82 Gracco

The interpeak interval is a measure of the 3.82 and 3.82 for /b/ and /m/ respectively; S3 F = absolute time difference between articulator pairs 6.12 for /m./; S4 F = 20.66 and 4.52 for /p/ and /m/ and the sign of the difference (positive or negative) respectively, p < .01). For the /i/ context, the LL-J indicates whether one articulator leads or lags interval tended to increase indicating that the another. Figure 17 presents the averages of the J timing was not being adjusted to the vowel UL-LL (right side) and LL-J (left side) interpeak context. Consonant related effects for the UL-LL intervals for oral closing for the four subjects. The were essentially absent with only two /p/ most significant effects were vowel-related. For comparisons reaching significance (S3 F = 8.51 the UL-LL, the tendency was for the interpeak in- and S4 F = 5.5, p < .01). For the LL-J, there was terval to decrease in the /i/ context with four of the a trend for the interval for /p/ closure to be longer twelve possible within-vowel comparisons (3 com- than /b/ and/or /m/ with reliable differences found parisons X 4 subjects) reaching significance (S2 F for three subjects (S2 F= 18.99 for /p/ - /b/ in the = 6.88 for /b/; S3 F = 4.25 for /p/; S4 F = 5.32 and F /ae/ context and F = 45.27 in the /i/ context; S3 = 9.91 for /p/ and /b/ respectively, p < .01). The LL- F = 7.48 for /p/ - /ml; S4 F = 14.36 for /p/ - /b/ in the J interval also displayed vowel related effects for /ae/ context and F = 48.29 for /p//b/ in the /i/ all subjects, however, the trend was opposite of context; p < .01). Again it appears that the J that seen for the UL-LL. Of the twelve possible timing is less related to the phonetic context than differences, five were found to be reliable (S2 F = are the lips.

S1 S3 2 00 UL 155 -

1 1 0

6 5

r =.68 LL r = .68 2 0

S2 S4 2 00

1 55

1 1 0 4111r

6 5

r = .89 r = .45 2 0 2 0 6 5 1 1 0 155 200 20 65 110 155 200 Time of Peak Velocity (msec) Time of Peak Velocity (msec)

Figure 16.Scatter plots of the relative timing of the upper lip (UL) and lower lip (LL) peak velocity for the second oral opening (Op_2) for the four subjects. Time of peak velocity was referenced to the time of the upper lip peak velocity for the oral closing (CU). Product-moment correlation coefficients (r) are presented at the bottom right hand corner of each plot.

59 Some Organizational Characteristics of Speech Movement Control 83

UL-LL LL-J

18 16

14 12 8 10 cD 4 6 0 CD 2 -4 -8

S4

/p/ /b/ /m/ /p/ /b/ /nal

Figure 17 . Upper lip-lower lip (UL-LL; left side) and lower lip-jaw (LL-J; right side) closing velocity interpeak intervals for the four subjects The interpeak interval was defined as the time, in milliseconds, between the occurrence of the peak velocities for the different articulator pairs. A single asterisk centered over either /p/ bar indicates a single significant within-vowel consonant comparison; a double asterisk indicates that both comparisons (/p/ - /b/ and /p/ - /m/) reached significance (p < .01). An asterisk centered over any consonant pair indicates a significant within- consonant vowel comparison (p < .01). 84 Grace°

It can also be seen that the sequence of velocity accompany hI. However, while there may have events varied across subjects and articulator been some differences in the anterior-posterior pairs. Positive values for the two interpeak inter- dimension, it seems clear that lip motion is not vals indicate that the average sequence of articu- the major factor in differentiating the vowels /ae/ lator velocity peaks was UL-LL-J. For S2 and 54, and IV. negative values were obtained for /b/ and /m/ for The vowel-related jaw opening movements the LL-J and UL-LL respectively indicating that resulted in systematic and distributed differences the sequence of events reversed from UL-LL-J to in oral closing movement characteristics. As jaw either UL-J-LL or LL-UL-J. In general, the UL-LL opening was modified for the high or low vowel, timing for the different phonetic contexts suggest the extent and speed of lip and jaw oral closing that the coordinative relations among these artic- characteristics were similarly adjusted; vowel- ulators remains similar for different consonants. related oral closing movement adjustments were When the movement characteristics are substan- distributed to all the contributing articulators. tially modified, as occurred for the different vow- These vowel-related adjustments suggest that oral els, there is a tendency for the relative timing of opening and oral closing are not independent the articulators to change with differences noted events. Rather, oral closing actions are dependent for the lips. on characteristics of the preceding oral opening action. Other evidence for systematic movement DISCUSSION relations between oral opening and oral closing The present study investigated characteristics of can be found in results reported by Folkins and the lip and jaw movements associated with a class Canty (1986), Folkins and Linville (1983), Hughes of consonants(bilabials). A number of and Abbs (1976) and Kozhevnikov and Chistovich observations were made that reflect on certain (1965). speech movement principles and the manner in Some of the opening/closing interactions ob- which such principles are modulated for phonetic served were apparently consonant-related such as context. The most consistent finding of the present higher opening and closing velocities for the voice- investigation is that the timing or phasing of less consonant /p/. Jaw opening velocity for /ad/ contiguous movement phases is an important was faster when the subsequent closing movement mechanism for modifying speech movements for was the voiceless /p/ compared to /b/ or /m/ certain phonetic differences (see also Browman & (Summers, 1987). In addition, LL closing velocity Goldstein, 1989; 1990; Edwards, Beckman, & was generally higher for /p/ than /b/ or /m/ inde- Fletcher, 1991; Saltzman & Munhall, 1989). pendent of the preceding vowel, and, LL opening Speech movements appear to be organized and velocity following closure, was always lowest for hence controlled at a level in which multiple /p/ and highest for /m/. Previous investigations articulatory actions and contiguous movement have demonstrated faster lower lip or jaw closing phases are the functional units of control and velocity for voiceless compared to voiced conso- coordination. These conclusions will be discussed nants (Chen, 1970; Fujimura & Miller, 1979; below. Sussman et al., 1973; Summers, 1987), and slower Movement adjustments. The two vowels used in lip opening velocity for the voiceless consonant the present study resulted in articulatory changes (Sussman et al., 1973). It has been suggested that in position, extent and speed due to greater the higher LL closing velocity is to accommodate movement requirements for the jaw (Lindblom, the higher intraoral pressures for /p/ compared to 1967; Lindblom, Lubker, & Gay, 1979). As such, /b/ (Chen, 1970; Sussman et al., 1973). That is, the size of the oral opening, determined predomi- higher intraoral air pressure may require greater nantly by jaw position, is a significant factor in lip force, manifest as higher lip closing velocity, to determining movement adjustments for different maintain lip contact during the closure interval. vowel contexts. In contrast, the upper and lower While it is generally observed that oral pressure lips were not affected by the identity of the two for a voiceless sound is higher than for its voiced vowels. Similar findings have been reported by counterpart which is higher than its voiced nasal Macchi (1988) in which lip position for /p/ was not counterpart (Arkebauer, Hixon, & Hardy, 1967; affected by the vowel environment while the jaw Black, 1950), empirical evidence suggests that was due to tongue height. It should be noted that higher lip closing velocity does not directly trans- transducing lip motion in only one dimension late into greater lip contact force. Using a minia- (inferior-superior) limited any observations of an- ture pressure sensor, Lubker and Paris (1970) terior-posterior lip spreading, often assumed to were unable to find reliable differences in lip con- Some Organizational Characteristics of Speech Movement Control 85 tact pressure for /p/ and /b/. An additional consid- Kroll, & Loftus, 1990). As such, the earlier onset eration is the higher lower lip opening velocity for of the closing action for the /p/ is a kinematic /m/. If oral pressure and lip velocity covary, then manifestation of the differential vowel length the lowest oral pressure sounds, like /m/, should effect (Denes, 1955; House & Fairbanks, 1953). consistently have the lower closing and opening Similarly, the longer opening movement duration velocity. However, in the same vowel context, following /p/ closure is consistent with longer lower lip opening velocity following /m/ closure acoustic closure duration reported for voiceless was higher than /p/. It seems premature atbest to sounds. It seems, from the acoustic durational conclude that lip closing velocity is modulated ac- effects reported previously and the kinematic cording to oral pressure characteristics especially effects reported here and elsewhere, that voiced when a direct measure of lip contact does not sup- and voiceless consonants differ primarily in their port such an interpretation. timing with other kinematic variations, such as Similarly, mass-normalized stiffness, defmed as velocity differences, secondary. The movement the ratio of the peak closing velocity and peak velocity changes may be considered a natural displacement, was not found to consistently consequence of the timing requirements for the differentiate the different vowel and consonant different consonants. combinations. While the LL stiffness estimates for However, an explanation of articulatory timing the closing movements were found to be as a means to differentiate sounds within a class consistently higher for /p/ than /b/ or /m/, the leaves open the question as to why articulatory upper lip and jaw estimates were not.For the timing differs. Recently it has been suggested that upper lip, estimates of mass-normalizedstiffness /p/ - /b/ differences in vowel duration reflect a often varied reciprocally with the LL values while perceptually salient feature of speech production the jaw values showed no consistent patterns. It is which enhances the acoustic differences between hard to reconcile a differential modulation of voiced and voiceless consonant sounds (Kluender, stiffness for three articulators contributing to the Diehl, 8:- Wright, 1988). That is, voiced and same movement or phonetic goal. Whilestiffness voiceless consonants differ in their timing because may be a convenient means of describingthe of the perceptual enhancement the durational motion or rate of an articulator, it is not clear that differences provide to the listener. A recent it qualifies as a controlled variable. Rather than a investigation by Fowler (1992), however, suggests single physical control variable it appears that that there is no auditory-perceptual enhancement speech movements are more likely organized to be obtained from the shorter vowel durations according to principles that take into account the and longer consonant closure durations for /p/ function of the action related to the overall goal of versus /b/. A simpler, and less speculative communication (Gracco, 1990; 1991). explanation is that timing differences among the A potential explanation for voice/voiceless bilabial consonants reflect modifications to differences can be presented from examining the accommodate concomitant articulatory actions. In relative timing of the opening and closing actions. the case of /p/, the longer closure duration The relative timing or phasing of articulatory compared to /b/, is to accommodate the laryngeal motion was found to be a consistent variable adjustment associated with devoicing. As such, differentiatingthevoicedandvoiceless from a functional perspective /p/ is an inherently consonants. For the voiceless /p/, the closing action longer sound than /b/. The shortening of the vowel was initiated earlier than for /b/ or /rn/ while for reflects an intrusion of the phonetic gesture for /p/, oral opening, /p/ was initiated later than either /13/ including the laryngeal devoicing, on the vowel. or /m/. These two timing results are consistent The closing must be faster to integrate the lip and with previous acoustic resultsrelatedto the larynx actions while the opening is slower to durational differences for voiced and voiceless provide the appropriate voice onset time. It is consonants. In many languages, two related suggested that all voiced and voiceless sound pairs durational phenomena have been reported for (cognates), are fundamentally differentiated by voiced and voiceless sounds. Vowel length is their relative timing or phasing which is a direct shorter and duration of the consonant closure, consequence of the supralaryngeal-laryngeal defined acoustically, is longer for voiceless than interaction and consistent with aerodynamic for voiced consonants. In the present phonetic requirements (Harris et al., 1965; Lisker & context, the interval between jaw opening velocity Abramson, 1964; Lubker & Parris, 1970). for /ae/ and oral closing velocity is essentially Speech movement coordination. Another robust identical to the duration of the vowel (McClean, finding of the present investigation was the 86 Gracco consistency of the upper lip-lower lip timing kinematics, the underlying principles governing relations associated with the different bilabial their coordination remains unchanged. Moreover, consonants. For allsubjects, the temporal the articulator interactions such as the lip relations of the velocity peaks for the upper and adjustments to changes in jaw position and the lower lips were essentially constant across opening and closing velocity covariation are phonetic contexts, indicating a constraint on their additional manifestations of the interdependency coordination. While the UL-LL timing was of speech movement coordination. relatively consistent, the sequence of events (UL- LL-J) did not remain constant as had been Opening/Closing differences observed in more limited phonetic contexts The present results also suggest that opening (Caruso et al., 1988; Gracco, 1988; Gracco & Abbs, and closing movements are fundamentally differ- 1986; McClean et al., 1990). However, based on a ent actions operating under different constraints number of investigations that have reported (Gracco, 1988). Oral opening was found to be the different sequences, it is not clear that the actual locus of the most significant consonant and vowel pattern or sequence of articulatory events is of related articulatory adjustments. Oral closing ad- importance (De Nil & Abbs, 1991). Rather, it is the justments, on the other hand, varied primarily as general trend for covariation that is indicative of a function of oral opening. Only the LL peak clos- an underlying invariant control parameter ing velocity for the different consonants demon- (Gracco, 1990; 1991). That is, while the sequence strated any consistent change across subjects. In of events may change with rate, stress or phonetic contrast to the tightly coupled lip and jaw timing context, such variation does not indicate a lack of during closing, oral opening coordination was invariance as suggested recently by De Nil and found to be more variable. The different spa- Abbs (1991). As pointed out by von Neumann tiotemporal patterns observed suggest that open- (1958) the nervous system is not a digital device in ing and closing phases of sequential speech motor which extremely precise manipulations are actions are differentially modifiable for context. possible but an analog device in which precision is Oral closing is generally faster, involving rela- subordinate to reliability. Establishing the tively abrupt constrictions or occlusions in various presence of invariance on fixed stereotypic portions of the vocal tract. Oral opening is gener- patterns, similar tothose observed in a ally slower, involving resonance producing events mechanical system, is inconsistent with principles associated with vowel sounds. Hence these two of biological systems. Rather, constraints on classes of movements reflect fundamentally differ- coordinationaremostlikelyspecified ent speech actions with distinct functional stochastically with specific limits on variation (aerodynamic and acoustic) consequences (see also contingent on the overall goal of the action (see Fowler, 1983; Perkell, 1969). also Lindblom, 1987 for arguments for the Why this might be so may, in part, reflect the adaptive nature of variability). As suggested by different biomechanical influences on the closing the quantal nature of speech (Stevens,972; and opening actions. Oral opening involves tem- 1989), there are vocal tract actions that may porarily moving the lips and jaw from some rest or require relatively more precise control than others neutral position to a position that requires because of the potential acoustic consequences. stretching the associated tissues (skin, muscle, However, it is doubtful that the timing of bilabial ligament). For closing, the lip and jaw motion is closure is a context in which absolute precision is assisted by the elastic recoil from the opening required (see also Weisrner, 1991). stretch. A consequence of the different biomechan- More importantly, in the present study as in ical influences on opening and closing would be previous investigations of speech movement that opening movements could be controlled di- timing, the consistent relative timing relations rectly by agonistic muscle actions. That is, among the UL-LL-J indicate that the timing of changes in lip and jaw opening may result from these articulators is not controlled independently. direct modification of jaw and lip opening muscle Rather, timing adjustments covary across activity. In contrast, to counteract the elastic re- functionally related actions and contributing coil of the lip and jaw tissue, oral closing adjust- articulators are effectively constrained to ments would require some combination of reduc- minimize the degrees of freedom to control tion in agonistic activation and/or agonist-antago- (Gracco, 1988). While performance variables such nist co-contracti,-.:It is suggested that the open- as phonetic context, speaking rate and the ing action would be an easier task to control than production of emphasis will change the surface the closing action and that the biomechanical in-

'3 3 Some Organizational Characteristics of Speech Movement Control $7 fluences would differentially affect the two ac- Speech motor programming/Serial timing tions. The closing action would be more rapid due Based on the opening/closing interactions out- to the contribution of tissue elasticity and less lined above, it is not unreasonable to suggest that variable. Such considerations are possible factors speech movements are organized minimally across influencing the patterns observed in the present movement phases with movement extent and investigation reflecting biomechanical optimiza- speed specified for units larger than individual tions and may account for certain aspects of the opening and closing actions. The size of the orga- ontogeny of the phonological system. nizational unit is at least on the order of a syllable and perhaps larger encompassing something on Articulator differencesConsonants and the order of a stress group (Fowler, 1983; vowels Sternberg et al., 1988). It can further be suggested In addition to the different functions of the that modulation of the relative timing of move- opening and closing actions, it can also be sug- ment phases is an organizational principle used to gested that the lips and jaw contribute to sound differentiate many sounds of the language. As production in different ways. Examination of the such, serial timing and the associated mechanism lip and jaw movement characteristics suggest that is an important concept in speech production but while all three articulators contribute to the gen- one that has not received much empirical atten- eral opening and closing, their roles are not iden- tion. It has been suggested previously that the tical. The lower lip closing and opening velocity serial order for speech is a consequence of an varied as a function of the different consonant underlying rhythm generating mechanism sounds. The upper lip, in contrast did not. In addi- (Kozhevnikov & Chistovich, 1965; Lashley, 1951; tion, the lower lip and jaw closing displacements Kelso & Tuller, 1984; Gracco, 1988; 1990). For were significantly and systematicallyrelated to Lashley (1951), the simplest of timing mecha- the opening position for the preceding vowel; the nisms are those that control rhythmic activity and upper lip closing movement displacement was he drew parallels between the rhythmic move- only weakly related. It is plausible that the upper ments of respiration, fish swimming, musical lip contributes in a more stereotypic manner to rhythms, and speech (see also Stetson, 1951). For consonant actions while the lower lip and jaw pro- speech, however, the mechanism can not be sim- vide more of the details related to phonetic ma- ple or stereotypic since, as shown in this investi- nipulations. Each lip contributes in an interde- gation, the different sounds in the language have pendent but not redundant manner to the different temporal requirements. Moreover, such a achievement of oral closing. rhythmic mechanism most likely reflects a net- From the lip and jaw differences noted in the work property (Martin, 1972) as opposed to a present investigation it can be suggested that property of a localized group of neurons, since consonant and vowel adjustments are articulator dysprosody results from damage to many different specific. The jaw, while assisting in the oral regions of the nervous system (e.g.., Kent & closing, was more significantly involved in the Rosenbeck, 1982). More likely, any centrally gen- production of the vowels than was either of the erated rhythmic mechanism for speech and any lips. This was noted not only for the first syllable other flexible behavior must be modifiable by in- in which the vowel varied between a phonetically ternal and external inputs (cf. Getting, 1989; high /i/ and low /ae/, but in the second syllable in Glass & Mackey, 1988; Harris-Warrick, 1988; which jaw motion for the same syllable varied Rossignol, Lund, & Drew, 1988). dependent on the identity of the preceding vowel. Recent investigations on the discrete effects of The same pattern was not observed for the LL mechanical perturbation on sequential speech motion. In contrast, consonant-dependent opening movements are beginning to identify aspects of movement differences were noted for the LL but the underlying serial timing mechanism . were not found for the J. These results suggest Mechanical perturbations to the lower lip during that boundaries between phonetic segments sequential movement result in an increase or de- and/or phonemic classes such as consonants and crease in the movement cycle frequency depending vowels, may be identified by differential on the phase of the movement the load is applied. articulatory actions that overlap in time. The (Gracco & Abbs, 1988; 1989; Saltzman, Kay, speech mechanism can be thought of as a special Rubin, & K.-Shaw, 1991). One interpretation of purpose device in which individual components these results is that a rhythmic mechanism is the contribute in unique and complementary ways to foundation for the serial timing of speech move- the overall aerodynamic/acoustic events. ments and that this mechanism is modifiable not 88 Gracco stereotypic (Gracco, 1990; 1991). The present re- (Gracco, 1990; 1991; Kent, Carney, & Severeid, sults demonstrating changes in opening and clos- 1974). These functional groupings are stored in ing phasing with phonetic context reflect another memory and map categorically onto the sounds of aspect of the serial timing for speech. The conso- the language. The apparent constraint on coordi- nants acted to differentially modify the ongoing nating functionally-related articulators observed speech rhythm. As such, consonants can be con- in this and other investigations (Gracco, 1988; sidered to act as perturbations on an underlying 1990; Gracco & Laqvist, 1989; in preparation) vowel-dependent rhythm. Consistent with this suggests that the temporal patterning among ar- speculation is evidence from two apparently con- ticulators is a component of the neuromotor repre- tradictory results obtained by Ohala (1975) and sentation. Observable speech movements (or vocal Kelso et al. (1985). Searching for the underlying tract area functions) reflect a combination of preferred speech movement frequency Ohala stored representations with flexible sensorimotor (1975) was unable to find an isochronous fre- processes interacting to form complex vocal tract quency associated with jaw movements during patterns from relative simple operations (Gracco oral reading. In contrast, Kelso et al. (1985) pro- & Abbs, 1987; Gracco, 1990; 1991). Stored neuro- vided evidence suggesting that jaw movements motor representations and sensorimotor interac- during reiterant speaking were produced with tions simplify the overall motor control process by very little variation around a characteristic fre- minimizing much of the computational load. quency. The difference in these two investigations Important areas of future research are to deter- is in the phonetic content used by the different in- mine the precise role and contribution of biome- vestigators. The reiterant speech contained the chanical properties to the observable patterns, to syllable "ba" produced with different stress levels evaluate the relative strength of articulator inter- whereas Ohala used unconstrained oral reading actions, and to identify how articulator move- passages. Sounds of the language may have an in- ments are modified by linguistic and cognitive herent frequency (intrinsic timing; Fowler, 1980) considerations. The major contributions to under- which interacts with a central rhythm resulting in standing speech motor control and underlying a modal frequency with significant dispersion. nervous system organization will come from a bet- Similarly, the degree to which a characteristic ter understanding of the neural, physical, and rhythm or frequency is modulated during speak- cognitive factors that govern this uniquely human ing is no doubt a result of an interaction of a num- behavior. ber of functional considerations such as the articu- latory adjustments for the specific phoneme, and REFERENCES other suprasegmental considerations such as Abbs, J. H., & Gracco, V. L. (1984). Control of complex motor stress and rate (Gracco, 1990; 1991). While specu- gestures: Orofacial muscle responses to load perturbations of lative, further investigations of the serial timing the lip dunngspeech. Journal of Neurophysiology, 51(4),705-723. characteristics or rhythm generation for speech Adams, S. G., Weismer, G., & Kent, R. D. (1993). Speaking rate (Gracco & Abbs, 1989; Saltzman et al., 1991) com- and speech movement velocity profiles.Journal of Speech and bined with computational models to evaluate the Hearing Research,36, 41-54. potential oscillatory network or serial dynamics Arkebauer, H. J., Hixon, T. J., &, Hardy, J. C. (1967). Peak intraoral air pressures during speech.Journal ot Speech and Hearing (Laboissiere et al., 1991; Saltzman & Munhall, Research.10, 196-208. 1989) associated with the temporal patterning for Barlow, S. M., Cole, K. J., &, Abbs, J. H. (1983). A new head- speech are important areas of future research. mounted lip-jaw movement transduction system for the study Speech production: A functional neuromotor of motor speech disorders.Journal of Speech and Hearing Research,26, 283-288. perspective Bernstein, N. (1967).The co-ordination and regulation of movements. The observations above suggest that the neuro- New York: Pergamon Press. motor representation for speech is more compli- Black, J. W. (1950). The pressure component in the production of consonants.Journal of Speech and Hearing Disorders,15, 207-210. cated than can be captured by a single control Browman, C. P., & Goldstein, L. M. (1989). Tiers in articulatory variable or sensorimotor mechanism. Rather, phonology, with some implications for casual speech. In J. speech production is organized according to prin- Kingston & M. Beckman (Eds.), Papersin laboratory phonology 1. ciples that incorporate the sequential nature of Between the grammar and of speech(pp. 341-376). the process, the functional character of the pro- Cambridge, England: Cambridge University Press. Browman. C. P., & Goldstein, L. M. (1990). Articulatory gestures duction units, and the articulatory details that as phonological units.Phonology, 6:2. shape the acoustic output. It appears that speech Caruso, A. J., Abbs, J. H., & Gracco, V. L. (1988). Kinematic motor control is organized at a functional level analysis of multiple movement coordination during speech in according to sound-producing vocal tract actions stutterersPram.177, 439-455. Some Organizational Characteristics of Speech Movement Control 89

Chen, M. (1970). Vowel length variation as a function of the Speech motor control and stuttering (pp. 53-78).North Holland voicing of the consonant environment. Phonetica, 22, 129-159. Elsevier. Cooke, J.D. (1980). The organization of simple, skilled Gracco, V. L., & Abbs, J. H. (1985). Dynamic control of the penoral movements. In G. E. Stelmach & J. Requin (Eds.), Tutorials in system during speech: kinematic analyses of autogenic and motor behavior (pp. 199-212). Amsterdam: North-Holland. nonautogenicsensorimotorprocesses.Journalof Daniloff, R., & Moll, K. (1968). Coarticulation of lip rounding. Neurophysiology, 54, 418-432. Journal of Speech and Hearing Research, 11, 707-721. Gracco, V. L., & Abbs, J. H. (1986). Variant and invariant Denes, P. (1955). Effect of duration on the perception of voicing. characteristics of speech movements. Experimental Brain Journal of the Acoustical Society of America, 27, 761-764. Research, 65, 156-166. DeNil, L. & Abbs, J. (1991). Influence of speaking rate on the Gracco, V. L., & Abbs, J. H. (1987). Programming and execution upper lip, lower lip, and jaw peak velocity sequencing during processes of speech motor control. Potential neural correlates. bilabial closing movements. Journal of the Acoustical Society of In E. Keller, & M. Gopnick, Eds.), Motor and sensory processes of America, 89, 845-849. language (pp. 163-202). Hillsdale, NJ: Lawrence Erlbaurn. Edwards, J., Beckman, M. E., Sr Fletcher, J. (1991). The articulatory Gracco, V. L., & Abbs, J. H. (1988). Central patterning of speech kinematics of final lengthening. Journal of the Acoustical Society movements. Experimental Brain Research, 71, 515-526. of America, 89, 369-382. Gracco, V. L., & Abbs, J. H. (1989). Sensorirnotor characteristics of Folkins, J. W. (1981). Muscle activity for jaw closing during speech motor sequences. Experimental Brain Research, 75, 586- speech. Journal of Speech and Hearing Research, 24, 601-615. 598. Folkins, J. W, & Abbs, J. H. (1975). Lip and jaw motor control Gracco, V. L.., & Lofqvist (1989). Speech movement coordination: during speech: responses to resistive loading of the jaw. Journal Oral-laryngeal interactions. Journal of the Acoustical Society of of Speech and Hearing Research, 18, 207-220. America, 86, S114. Folkins, J. W., & Linville, R. N. (1983). The effects of varying Harris, K. S., Lysaught, G. F., &, Schvey, M. M. (1965). Some lower-lip displacement on upper-lip movements: Implications aspects of the production of oral and nasal labial stops. for the coordination of speech movements. Journal of Speech and Language and Speech, 8, 135-147. Hearing Research, 26, 209-217. Harris-Warrick, R. M. (1988) Chemical modulation of central Folkins, J. W., & Canty, J. (1986). Movements of the upper and pattern generators. I, A. Cohen, S. Rossignol, & S. Grillner lower lips during speech: Interactions between lip with the (Eds.), Neural control of rhythmic movements in vertebrates (pp. jaw fixed at different position. Journal of Speech and Hearing 285-332). John Wiley & Sons: New York. Research, 29, 348-356. House, A. S., & Fairbanks, G. (1953). The influence of consonant Folkins, J. W., Sr Brown, C. K. (1987). Upper lip, lower lip, and jaw environment upon the secondary acoustic characteristics of interactions during speech: Comments on evidence from vowels. Journal of the Acoustical Society of America, 25, 105-113. repetition-to-repetition variability. Journal of the Acoustical Hughes, 0.. & Abbs, J. (1976). Labial-mandibular coordination in Society of America, 82, 1919-1924. the production of speech: implications for the operation of Fowler, C. A. (1980). Coarticulation and theories of intrinsic motor equivalence. Phonetica, 33, 199-221. timing. Journal of Phonetics, 8, 113-133. Kelso, J. A. S. (1986). Pattern formation in speech and limb Fowler, C. A. (1983). Converging sources of evidence on spoken movements involving many degrees of freedom. In H. Heuer & and perceived rhythms of speech: Cyclic production of vowels C. Fromm (Eds.), Generation and modulation of action patterns in monosyllabic stress feet. Journal of Experimental Psychology: (pp. 105-128). Berlin: Springer-Verlag. General,. 112, 386-412. Kelso, J. A. S., & Tuller, B. (1984). Converging evidence in support Fowler, C. A. (1992). Vowel duration and closure duration in of common dynamical principles for speech and movement voiced and unvoiced stops: there are no contrast effects here. coordination. American Journal of Physiology, 75, R928-R935. Journal of Phonetics. 20. 143-165. Kelso, J. A. S., Tuller, B., V.-Bateson, E., & Fowler, C. (1984). Fowler, C. A., Gracco, V. L., V.-Bateson, E. A., & Romero, J. Functionally specific articulatory cooperation following jaw (submitted) Global correlates of stress accent in spoken perturbations during speech: Evidence for coordinative sentences. Journal of the Acoustical Society of America. structures. journal of Experimental Psychology Human Perception Fromkm, V. A. (1966). Neuro-muscular specification of linguistic and Peiforrnance, 10, 812-832. units. Language and Speech, 9, 170-199. Kelso, J. A. S., V.-Bateson, E., Saltzman, E. L., & Kay, B (1985). A Fujimura, 0., & Miller, J. (1979). Mandible height and syllable- qualitative dynamic analysis of reiterant speech production: final tenseness. Phonetica, 36, 2630272. Phase portraits, kinematics, and dynamic modeling. Journal of Getting, P. A (1989). Emerging principles governing the the Acoustical Society of America, 77, 266-280. operation of neural networks. Annual Review of , 12, Kent, R. D. (1983). The segmental organization of speech. In P 185-204. MacNeilage (Ed.), The production of speech. New York: Springer. Glass, L., & Mackey, M. C. (1988). From clocks to chaos. Princeton: Verlag. Princeton University Press. Kent, R. D., Carney, P.J., & Severeid, L. R. (1974)Velar Gracco, V. L. (1987). A multilevel control model for speech motor movement and timing: evaluation of a model for binary activity. 1 H. Peters & W. Hulstijn (Eds.), Speech motor dynamics control. Journal of Speech and Hearing Research, 17, 470-488 in stuttering (pp. 57-76). Wien: Springer-Verlag, Kent, R. D., & Rosenbek, J. C. (1982). Prosodic disturbances and Gracco, V. L. (1988). Timing factors in the coordination of speech neurologic lesion. Brain laanguage, 15, 259-291. movements. Journal of Neuroscience. 8, 4628-4634. Kluender, K. R., Diehl, R. L., & Wright, B. A. (1988) Vowel-length Gracco, V. L. (1990) Characteristics of speech as a motor control differences before voiced and voiceless consonants An system. In G Hammond (Ed.), Cerebral control of speech and limb auditory explanation. Journal of Phonetics, 16, 153-169. movenients (pp. 3-28). North Holland: Elsevier, . Kollia, Gracco cSr Harris (1992). Functional organization of velar Gracco. V. L (1991). Sensorimotor mechanisms in speech motor movements following jaw perturbation. Journal of the Acoustical control In I IPeters, W Hultsijn, & C W. Starkweather (Eds 1. Society of America, 91(2), 2474

S 6 Gracco

Kozhevnikov, V., & Chistovich, L. (1965). Speech: Articulation and Nelson, W. L (1983). Physical principles for economies of skilled perception. Joint Pubi.cations Research Service, 30,453; U.S. movements. Biological Cybernetics, 46, 135-147. Department of Commerce. Ohala, J. J. (1975). The temporal regulation of speech. In, G. Fant Kuehn, D. P. & Moll, K. (1976). A cineradiographic study of VC & M. A. A. Tatham (Eds.), Auditory analysis and perception of and CV articulatory velocities. Journal of Phonetics, 4, 303-320. speech (pp. 431-454). London: Academic. Kuehn, D. P., Reich, A. R., & Jo-clan, J. E. (1980). A cineradio- Ostry, D. J., Keller, E., & Parush, A. (1983), Similarities in the graphic study of chin marker positioning: Implications for the control of the speech articulators and the limbs: kinematics of strain gauge transduction of jaw movement. Journal of the tongue dorsum movement in speech. Journal of Experimental Acoustical Society of America, 67, 1825-1827. Psychology: Human Perception and Performance, 9, 622-636. Laboissiere, R., Schwartz, J.-L., & Bailly, G. (1991). Motor control Ostry, D. J., & Munhall, K. G. (1985). Control of rate and duration for speech skills: A connectionist approach. In D. S. Touretzky of speech movements. le Jrnal of the Acoustical Society of America, & G. E. Hinton (Eds.), Proceedings of the 1990 Connectionist 77, 640-648. Models Summer School (pp. 319-327). San Mateo, CA: Morgan Parush, A., Ostiy, D. 0., ist Munhall, K. G. (1983). A kinematic Kaufmann. study of lingual coarticulation in VCV sequences. Journal of the Lashley, K. S. (1951). The problem of serial order in behavior. In L. Acoustical Society of Americ.a, 74, 1115-1125. A.. Jeffress (Ed.), Cerebral mechanisms in behavior: The Hixon Perkell, J. S. (1969). Physiology of speech production. Research symposium. New York: Wiley. Monograph (53). Cambridge, MA: MIT Press. Lofqvist, A., & Yoshioka, H. (1981). Interarticulator programming Rossignol, S., Lund, J. P., & Drew, T. (1988). The role of sensory in obstruent production. Phonetica, 38, 21-34. inputs in regulating patterns of rhythmical movements in Lofqvist, A., & Yoshioka, H. (1984). Intrasegmental timing: higher vertebrates: A comparison between locomotion, Laryngeal-oral coordination in voiceless consonant production. Respiration, and mastication. In A. Cohen, S. Rossignol, & S. Speech Communication, 3, 279-289. Grillner (Eds.), Neural control of rhythmic movements in Lindblom, B. E. F. (1987). Adaptive variability and absolute vertebrates (pp. 201-284). John Wiley & Sons: New York. constancy in speech signals: two themes in the quest for Saltzman, E. L. (1986). Task dynamic coordination of the speech phonetic invariance. Proceedings of the Eleventh International articulators: a preliminary model. In H. Heuer & C. Fromm, Congress of Phonetic Sciences, 3, 9-18. (Eds.), Generation and modulation of action patterns (pp. 129-144). Lindblom, B. E. F., Lubker, J., & Gay, T. (1979). Formant Berlin: Springer-Verlag. frequencies of some fixed-mandible vowels and a model of Saltzman, E. L., & Munhall, K. G. (1989). A dynamical approach to speech motor programming by predictive . Journal of gestural patterning in speech production. Ecological Psychology, Phonetics, 7, 147-161. 1(4), 333-382. Lisker, L., & Abramson, A. S. (1964). A cross-language study of Saltzman, E. L., Kay, B., Rubin, P., & Kinsella-Shaw, J. (1991). voicing in initial stops: acoustical measurements. Word, 20, 384. Dynamics of intergestural timing. PERILUS, 16, 47-55. Lubker, J. F., & Parris, P. J. (1970). Simultaneous measurements of Sternberg, S. Knoll, R. L., Monsell, S., & Wright, C. E. (1988). intraoral pressure, force of labial contact, and labial Motor programs and hierarchical organization in the control of electromyographic activity during production of the stop rapid speech. Phonetica, 45, 175-197. consonant cognates /p/ and /b /. Journal of the Acoustical Stetson, R. H. (1951). Motor phonetics: A study of speech Society of America, 47. 625-633. movements in action. North Holland Publishing Co. Luce, P. A., & Charles-Luce, J. (1985). Contextual effects on vowel Stevens, K. N. (1972). On the quantal nature of speech: Evidence duration, closure duration and the consonant/vowel ratio in from articulatory-acoustic data. In E. E. David & P. B. Denes speech production. Journal of the Acoustical Society of America, (Eds.), Human communication: A unified view (pp. 51-66). New 78, 1949-1957. York: McGraw-Hill. Macchi, M, (1988). Labial articulation patterns associated with Stevens, K. N. (1989). On the quantal nature of speech. Journal of segmental features and syllable structure in English. Phonetica, Phonetics, 17, 3-45. 45, 109-121. Summers, W. V. (1987). Effects of stress and final-consonant Martin, J. G. (1972). Rhythmic (hierarchical) versus serial structure voicing on vowel production: Articulatory and acoustic in speech and other behavior. Psychological Review, 79, 487-509. analyses. Journal of the Acoustical Society of America, 82, 847-863. McClean, M. D., Kroll, R. M., & Loftus, N. S. (1990). Kinematic Sussman, H. M., MacNeilage, P. F., & Hanson, R. J. (1973). Labial analysis of lip closure in stutterers' fluent speech. Journal of and mandibular dynamics during the production of bilabial Speech and Hearing Research, 33, 755-760. consonants: Preliminary observations. Journal of Speech and Moore, C.A., Smith, A., & Ringel, R. L.. (1988). Task-specific Hearing Research, 16, 397-420. organization of activity in human jaw muscles. Journal of Speech Tatham, M. A. A., & Morton, K. (1968). Some electromyography and Hearing Research, 31, 670-680. data towards a model of speech production. University of Muller, E. M., & Abbs, J. H. (1979). Strain gage transduction of lip Essex, Language Centre, Occasional Papers 1. and jaw motion in the midsagittal plane: Refinement of a von Nuemann, J. (1958). The computer and the brain. New Haven: prototype system. Journal of the Acoustical Society of America, 65, Yale University Press. 481-486. Weismer, G. Assessment of articulatory timing. In, J. Cooper Munhall, K., Ostry, D. J., & Parush, A. (1985). Characteristics of (Ed.), Assessment of Speech and Voice Production: Research velocity profiles of speech movements. Journal of Experimental and Clinical Applications, (pp. 84-95). NIDCD Monograph. Psychology: Human Perception and Performance, 11, 457-474. Munhall, K., LOfqvist, A., & Kelso, J. A. S. (in press). Lip-larynx coordination in speech: effects of mechanical perturbations to FOOTNOTE the lower lip. Journal of the Acoustical Society of America. 'Journal of Speech and Hearing Research, in press.

.97 BEST COPY AVAILABLE Haskins Laboratories Status Report on Speech Research 1993, SR-113, 91-94

The Quasi-steady Approximation in Speech Production*

Richard S. McGowan

Because boundary-layer separation is important in determining force between flowing air and solid bodies and separation can be sensitive to unsteadiness, the quasi-steady approximation needs to be examined for the flow-induced oscillations of speech (e.g., phonation and trills). A review of the literature shows that vibratory phenomena, such as phonation and tongue-tip trills, may need to be modeled without the quasi-steady approximation.

The quasi-steady approximation is commonly and, hence, of the energy exchange between the made in modeling air flow and vibration in speech air and solid in flow-induced oscillations, such as production. Experimentally, the quasi-steady ap- phonation and tongue-tip trills. However, a review proximation means that time-varying situations, of recent literature (e.g., Bertram & Pedley, 1983; such as vocal fold oscillation or tongue-tip trills, Cancel li & Pedley, 1985; Pedley & Stephanoff, can be studied with a series of static configura- 1985; Sobey, 1983; Sobey, 1985) shows that the tions simulating the air flow in a vocal tract. For quasi-steady approximation can only be used with mathematical modeling purposes, the quasi- great care in the fluid mechanics regimes steady approximation means that acceleration involving boundary-layer separation. Because terms involving partial derivatives with respect to boundary-layer separation occurs in the vocal time in the equations of motion of air can be ne- tract these recent works bear consideration for glected. This allows the modeling of air flow as a understanding speech production. Some of the sequence of static flow configurations. Although recent literature that brings the quasi-steady an inductive term representing the effect of approximation into question will be reviewed here, acceleration of air in the constriction is often after some of its relevance to speech production included in mathematical modeling, other modeling has been discussed. unsteady effects are ignored. When the quasi- The fact that characteristic Strouhal numbers steady approximation is made, vorticity and are often small in speech has been used to justify turbulence distributions, important in force and the quasi-steady approximation. For periodic energy balance considerations, are assumed to be motion the Strouhal number is the product of a unaffected by unsteady air acceleration. In characteristic frequency and characteristic length particular, the quasi-steady approximation is scale divided by a characteristic velocity, and it is applied to boundary-layer separation, which is an the coefficient of the time derivative terms in important determinant of vorticity distribution, nondimensional versions of the mass and mo- mentum conservation equations. If the Strouhal number is small compared to one, it is presumed This work was supported by NIH grants DC-00121 and DC- that these terms can be neglected, that is, it is 00865 to Haskins Laboratories. Thanks to Anders Lbfqvist and possible to make the quasi-steady approximation. Philip Rubin for comments. (This assumes that other terms are of order one,

91 88 92 McGowan which is generally the case.) The quasi-steady the direction of flow can result from such a approximation is often used in aerodynamic pressure head difference. There islikely to be a considerations for the modeling of phonation, stop separation point near the boundary between the release, and fricatives. For phonation, with windward and the leeward sides of such an object maximum air speeds in excess of 2000 cm/sec, because of the adverse pressure gradient. (There frequencies on the order of 100 Hz, and length is an adverse pressure gradient because the flow scales at most on the order of 1 cm, producing a slows down on the leeward side.) The windward Strouhal number less than .05, this may appear to side has a higher pressure head than the leeward be a valid approximation to use for simplifying a side, so there is a higher static pressure on the model (Catford, 1977, p. 98). Before reviewing the windward side than on the leeward side. These reasons that this may be faulty, it is necessaryto considerations are important for the energy consider boundary-layer separation and its exchange between moving objects and the air, importance for vibratory phenomena in the vocal because energy is the integral of force against tract. distance moved. Thus, the change in boundary- When air flows over a solid surface, a layer of air layer separation behavior with the removal of the with a high coAlcentration of vorticity is formed quasi-steady approximation may have important next to the solid surface, and this layer is a consequencesinmodelingflowinduced boundary layer. Boundary-layer separation occurs oscillations of speech, such as phonation and at places on the surface of a solid where the tongue-tip trills. vorticity of the boundary layer abruptly leaves the In the last decade the quasi-steady approx- region close to the solid boundary and is imation for internal flows (e.g., flows inside the subsequently convected by the flow. The places vocal tract), even at very small Strouhal number, where separation occurs are called separation has been questioned. The Reynolds number, equal points. (A separation point in two-dimensional to the product of a characteristic velocity and modeling represents a line of separation in the length scale divided by the kinematic viscosity, third dimension.) Separation occurs where there is has been shown to be an important parameter for a sufficient adverse pressure gradient, as occurs questions of unsteadiness. Pedley (1983), in a when flow is decelerated (Lighthill, 1963). An review article on physiological fluid flows, quotes adverse pressure gradient occurs when there is results on channel flows driven sinusoidally from flow along solids from regions of low pressure to a side wall with amplitudes of between .28 and .57 regions of high pressure, and if the spatial rate of of the channel width. For Reynolds numbers of' change of pressure is sufficiently large, the between 300 and 700 based on cross-sectionally boundary layer will separate. For example, the averaged, steady fluid particle velocity and flow from the constriction for an /s/ separates channel width, quasi-steady behavior disappears because of the abrupt, adverse pressure change for a Strouhal number of about .008. A vortex caused by the sudden area expansion after the wave is observed to travel downstream from the constriction. During phonation, the flow separates oscillating portion of the wall (Pedley & from the folds, thus providing the time-varying Stephanoff, 1985; Sobey, 1985), which is not flow resistances that are essential for the predicted by quasi-steady theory. There is also production of modal voice (Ishizaka & Flanagan, other behavior that marks the inadequacy of the 1972). quasi-steady approximation in unsteady flow. The locations of separation points are important Bertram and Pedley (1983) have shown for the study of flow induced oscillations in the experimentally that impulsively started flow over vocal tract because they help to determine the an indentation of the channel wall can create forces between the air and the tissue. Because separated flow on the lee side of the indentation, vorticity is transported from the boundary layer with the separation point moving upstream as the into the main portion of' the flow field at a steady state is approached. Sobey (1983) has used separativr point, there can be a change in numerical of unsteady flow in pressure head in traversing the region near a channels with wavy walls to show the moving separation point. A net force on a blunt object in separation point on the lee slopes of the wavy

89 The Quasi-steady Avproximation in Speech Production 93 walls. For a Reynolds number based on peak for phonation (McGowan, 1992). Thus, both these velocity and minimum channel half-width of only vibratory phenomena should be considered to be 75 and a Strouhal number of .01, there are truly unsteady and the quasi-steady assumption qualitative differences in the separation behavior seriously questioned. from that expected from the quasi-steady One possible consequence of unsteadiness may apprcximation. For one thing the flow, once be a mechanism for energy exchange from air to separated during acceleration, does not reattach solid during vibration. For instance, for essentially to the walls after deceleration to zero flow. one-degree-of-freedom solid motion during falsetto Requiring that separation vorticity disappears voice there could be a hysteresis in the separation when the flow reverses in oscillatory flow, Sobey point position between opening and closing phases derived a very restrictive relation between of the motion. If the separation point tends to be Strouhal number and Reynolds number for the further forward during the opening phase than quasi-steady approximation to be valid. The during the closing phase, the pressure on the Strouhal number must be less thaz .2 of the upstream portion of the folds could be greater square inverse of the Reynolds number. For a during the opening than during the closing phase. Reynolds number of 100 the Strouhal number This mechanism could supplement others would need to be less than .00002 to meet Sobey's including a glottal air induction that depends on criterion for quasi-steady behavior. Thus, a small vocal fold position (Wegel, 1930) and an inductive Strouhal number is not sufficient to ensure quasi- loading of the supraglottal vocal tract (Flanagan steady behavior. & Landgraf, 1968; Ishizaka & Flanagan, 1972) in Sobey (1983) furthermore gives an argument as accounting for energy exchange. to why unsteady separation phenomena are In this note, it has been shown that the quasi- different from steady separation phenomena, even steady approximation may not be a valid at very small Strouhal numbers. In unsteady flow, approximation in speech, particularly when flow the fluid particle velocity at a point in space can separation is involved. Unsteady effects may have be considered both a function of time and a to be included to account for some aspects of function of a time-variable Reynolds number. To phonation and tongue-tip trills. For instance, obtain the total time derivative of the fluid mechanisms proposed for the energy exchange particle velocity, one needs to include a term that from air to the vocal folds when each vocal fold is the derivative of the fluid particle velocity with has essentially one degree of freedom may be respect to Reynolds number times the time supplemented with a moving separation point. derivative of the Reynolds number. Because flow conditions are singular near a separation point, REFERENCES the derivative of flow velocity with respect to the Bertram, C. D., & Pedley, T. J. (1983). Steady and unsteady Reynolds number in a region of a separation point separation in an approximately two-dimensional indented channel. Journal of Fluid Mechanics,130, 315-345. can be quite large. Thus, the total time derivative Cancelli, C., & Pedley, T. J. (1985). A separated-flow model for does not scale exclusively with Strouhal number collapsible tube oscillations. Journal of Fluid Mechanics, 157, 375- near a separation point, but also depends on the 404. Ca tford,J. C.(1977). Fundamental problems inphonetics. Reynolds number. Bloomington: Indiana University Press. It is easily seen, .using Sobey's criterion, that Flanag.n, J. L., & Landgraf, L. L. (1968). Self-oscillating source for flow-induced oscillations in speech may not be vocal-tract synthesizers. IEEE Trans. Audio and Electroacoustics, AU-I6. 57-64. quasi-steady if flow separation is concerned. Lighthill, M. J. (1963). Introduction. Boundary layer theory. In Based on maximum velocity of 2000 cm/sec and L. Rosenhead (Ed.), Laminar boundary layers. Oxford: The characteristic dimension of 1 cm, the Reynolds Clarendon Press. number during phonation is about 13000. The McGowan, R. S. (1992). Tongue-tip trills and vocal-tract wall compliance. Journal of the Acoustical Society of America, 91, 2903- Strouhal number for phonation is .05 based on a 2910. 100Hz oscillation frequency. Based on an Ishizaka, K., & Flanagan, J. L. (1972). Synthesis of voiced sounds oscillation frequency of 30 Hz, the tongue-tip trill form a two-mass model of the vocal cords. Bell System Technical Journal, 51, 1233-1269. Strouhal number is 1/3 of that for phonation, and Pedley, T. J. (1983). Wave phenomena in physiological flows. the Reynolds number is in the same range as that Journal of Applied Mathematics, 32, 267-287.

1 t; 94 McGowan

Ped ley, T. J., & Stephanoff, K. D. (1985). Flow along a channel Wegel, it L (1930). Theory of vibration of the larynx. Journal of the with a time dependent indentation in one wall: The generation Acoustical Society of America, 1, No. 3, Part 2, 1-21. of vorticity waves. Journal of Fluid Mechanics, 160, 337-367. Sobey, I. J. (1983). The occurrence of separation in oscillatory flow. Journal of Fluid Mechanics, 134, 247-257. FOOTNOTE Sobey I. J. (1985). Observation of waves during oscillatory channel flow. Journal of Fluid Mechanics, 151, 395-426. *Journal of the Acoustical Society of America, 94, 3011-3013 (1993).

101 Haskins Laboratories Status Report on Speech Research 1993, SR-113, 93-106

Implementing a Genetic Algorithm to Recover Task- dynamic Parameters of an Articulatory Speech Synthesizer

Richard S. McGowan

A genetic algorithm was ,lsed to recover the dynamics of a model vocal tract from the speech pressure wave t....at it produced. The algorithm was generally successful in performing the optimization necessary to do this : a problem with significance in the psychology, biology and technology of speech. A natural extension of this work is to study speech learning using a classifier system that employs a genetic algorithm.

INTRODUCTION specified by a vector with origin on the outline of the tongue body, whose angle, TA, is relative to This paper reports work on the recovery of the jaw and tongue body, and whose length is TL dynamic parameters used to drive the articulators The lips are specified in Cartesian coordinates, in a model vocal tract of an articulatory with the vertical dimension specifying the synthesizer from the spectral properties of the dimension of lip closure/opening, and the model's speech output. A genetic algorithm was horizontal specifying protrusion. The upper lip's used to scudy this inverse problem. Descriptions of vertical position, ULV, is in relation to the fixed the articulatory synthesizer and its dynamics will skull, and the lower lip vertical position, LLV, is be provided as needed in this introduction. in relation to the jaw. The lips are yoked in the Articulatory synthesizer horizontal dimension, so this coordinate is The Haskins Laboratories articulatory synthe- specified for both lips as lip horizontal, LH. sizer, ASY, (Rubin, Baer, & Mermelstein, 1981) The vocal tract formed by the placement of the (Figure 1) uses a model vocal tract developed by articulators is assumed to be a variable-area tube Mermelstein (1973). The shape of the vocal tract is supporting acoustic wave propagation. It can be controlled by the positions of various articulators, modeled as a linear filter with a rational transfer examples of which are the jaw, tongue body, function, with the poles corresponding to tongue tip and the lips. The coordinates of the jaw resonances, or formants. The first three formant are specified by the angle of a fixed-length vector, frequencies provided the data of the speech JA, with origin at the condyle and end at the point acoustics used to map back into the dynamics of marked jaw in Figure 1 (Mermelstein, 1973). The the vocal tract articulators in the experiments position of the tongue body center is specified hy a reported here. vector relative to the jaw vector, with its origin at Task dynamics and the one-to-many problem the cordyle and end at the tongue-body center What does it mean to map into a dynamics of articulacr. This vector has a length, CL, and an the articulators? Why not use an optimization angle, CA, relative to the jaw. The tongue tip is procedure to find the positions of the ASY articulators at any time given the first three form ant frequencies? Previous methods for per- forming the mapping from acoustics to vocal-tract This work was supported by N1H grant DC-01247 to Haskins Laboratories. Thanks goes to Carol Fowler, Philip shape have relied on mapping from static frames Rubin and Elliot Saltzman for making very helpful comments of acoustic data to static shapes of the vocal tract on this work. (e.g., Atal, Chang, Mathews, & Tukey, 1978).

95 102 96 McGowan

CL \ I Velum .55 at

Tongue body LLV center

Hyoid

Figure 1. ASY model vocal tract.

One problem encountered in these static maps is instance, instead of recovering the individual that of mapping one acoustic signal onto many contributions of the jaw and lips in a bilabial vocal tract shapes when there is limited acoustic closure, the dynamics of the lip aperture reduction data. In practical terms this indicates that an would be recovered without part:cular regard to optimum solution will be nonrobust, so that any the individual contributions from the jaw and lip perturbation in the data may lead to wildly articulators. Figure 2 illustrates the tract different results. The constraint that vocal tract variables and lists the articulators associated with shape be continuous in its movement helps to each. The focus is the tract variables tongue body alleviate the one-to-many problem when an constriction location and degree, TBCL and impoverished set of acoustic data, such as the first TBCD, tongue tip constriction location and degree three formant frequencies, is used to map onto a TTCL and TTCD, lip aperture, LA, and lip vocal tract shape (e.g., Shirai & Kobayashi, 1986). protrusion, LP. TBCL is the location along the Alternatively, it has been proposed to map entire vocal tract of the minimum cross- ,ectional area trajectories of acoustic data, corresponding to formed with the tongue body, and TBCD is consonant-vowel or vowel-consonant intervals, a measure of the size of that area. Similar onto vocal-tract articulator trajectories described interpretations apply to TTCL and TTCD. LA is a as parameterized functions of time (McGowan, measure of the lip opening area, and LP is 1991). This procedure should be more efficient a measure of the extent to which the vocal tract than mapping several frames of acoustic data onto is lengthened by the lips. The task dynamics of several vocal tract shapes, optimizing the relation the tract variables is described by a set of each time, while enforcing continuity. uncoupled, second-order, mass-spring systems. In experiments on recovering the articulatory Not only are the parameters of natural movement in the vocal tract, it became apparent frequencies and dampings specified, but target that the optimization should be done on a map- positions, and activation intervals (the time ping from the acoustic data to task dynamics during which the task dynamics is active) for (Saltzman & Kelso, 1987; Saltzman & Munhall, target positions other than the default are also 1989) instead of the articulator trajectories. The specified. Given this set of parameters, the current reasons for using task dynamics will be given af- implementation of task dynamics of the vocal tract ter its description. Task-dynamics applied to the describes the formation and releasing of vocal tract models the coupled motion of the vocal constrictions by the coordinated activity of the tract articulators in performing speech tasks. For component articulators.

1 3 Implementing_a Genetic Algorithm to Recover Task-dynamic Parameters of an Articulatory Speech Synthesizer 97

ASY ARTICULATOR COORDINATES TRACT (0j ;= 1. 2, ....n; n=8) LH JA ULVLLV CL CA TL TA (2 ....m; m=6)(01 )(02 ) (c; )(04 ) (05)(06 )(0 7) (08 ) ,..rr,==mattei,..,norraccasumson=manc-thanomaxamet.=mcsamasarerama=macauturart$,....,...... z... LP (Z ) i

LA (Z2)

, TBCL (Z3)

TBCD (Z 4)

TTCL (Z5) 0 TTCD (Z6) e

4TBC1,4

TBCD

TTCD LA 4 LP LTH

origin of head-centered coordinate system GLO

Figure 2. Tract variables.

A reason for optimizing for the transformation strictly only one articulatory specification for a from acoustics to tract variables rather than for given vocal tract shape, there may be many and the transformation from acoustics to articulators disparate sets of articulatory coordinates that are is that the tract variables specify the acoustically near by, in the sense of producing similar acoustic salient features of vocal tract shape, constriction output. For instance, an alveolar constriction (the location and degree, more directly than do the ar- tongue tip close to the ridge behind the upper ticulators. Boe, Perrier, and Bailly (1992) have teeth) can be specified with varying amounts of noted thrt acoustic output, in terms of formant jaw, tongue body and tongue tip displacement, frequencies, seems to be most sensitive to place because these articulators may compensate for and degree of constriction and emphasized that each other to attain the prescribed constriction this fact should be used in acoustic-to-articulatory degree and location. While there is not complete mapping. While it may be true that there is compensation throughout the vocal tract, the most

104 98 McGowan acoustically salient parts of the vocal tract shape, were so complicated. A genetic algorithm was constriction degree and location, can be preserved chosen as a nonderivative-based algorithm for using compensation. Thus, by mapping to task dy- optimization of the task-dynamic parameters in na.mics, ambiguity caused by articulatory compen- gestural scores. A genetic algorithm was not the sation may be removed. only possibility for a non-derivative based As indicated in Figure 2, tract variables recruit algorithm, but it had several advantages. These various articulators as their dynamics are instan- advantages will be explained after the algorithm tiated in the vocal tract. The articulators are re- is described. Genetic optimization procedures are cruited with various weightings to achieve the stochastic procedures and there has been some target as specified by task dynamics at the tract precedence for stochastic procedures in research variable level. Therefore, in the implementation of on the inverse problem. Schroeter, Meyer, and task dynamics a mapping between the tract vari- Parthasarthy (1990) have used random access of ables and articulators, in this case ASY articula- codebooks in their approach to the inverse tors, is specified. While each tract variable ap- problem in speech. pears to act as an independent dynamical system, The specific genetic algorithm employed for this some may control the same articulator coordinate study was a modified version of an algorithm de- simultaneously, as the way tongue body constric- scribed by Goldberg (1989a). In a genetic algo- tion degree, TBCD, and lip aperture, LA, both use rithm, the individuals of a population are assigned the jaw angle, JA. Thus, the set of task dynamics randomly chosen parameter sets that are coded equations transformed into the articulatory space into binary strings called "chromosomes", and becomes a coupled set of equations. Once the task- each is assigned a fitness. The fitness used in this dynamic parameters are specified, the articulator study was the inverse of the sum, taken over 10 position trajectories are specified by this set of ms intervals, of the squares of the differencebe- coupled differential equations. A fourth-order tween the formant frequencies of a proposed solu- Runge-Kutta routine is used to obtain the articu- tion and those of the speech data, for the first lators' movements and the resulting acoustic out- three formant frequencies. Based on fitness, indi- put is obtained from ASY. viduals are chosen to breed with others to form a In the experiments here, and in previous exper- new population of chromosomes. When two indi- iments (McGowan, in press), the task dynamics of viduals mate their chromosomes split at a ran- test utterances resembling /abw/ and /adw_/ were domly chosen location with each of two progeny specified in modified gestural scores generated obtaining one part of their chromosome from each from the linguistic gestural model of Browman parent. The children's fitnesses are evaluated and Goldstein (1990). A gestural score is a file based on their parameter sets as coded in their containing the task-dynamic specifications for chromosomes. A small probability of mutation is tract variable activation times, stiffnesses, and allowed. other information necessary to run the task-dy- It should be noted that the use of genetic namic simulation. For each utterance the first algorithms requires that the parameters for three formant frequencies were extracted at 10 ms optimization be coded into finite length strings intervals, thus creating the formant frequency tra- (chromosomes), which limits the range of any pa- jectories for the acoustic data. The cost function, rameter and essentially discretizes the parameter or inverse fitness function, was the sum over time space. The degree of discretization of each param- of squares of the differences of each of the first eter can be varied to tune the optimization. This three lowest formant frequencies produced by a was controlled by varying the range of allowed pa- proposed dynamics and the those of the data. rameters and the number of bits given to code a specific parameter. Because the ranges of starting The genetic algorithm and ending activation times were both limited to Since there was no analytic expression between discrete steps and finite rarge, the potentially in- the task-dynamic parameters and the cost func- finite set of parameters was made finite. tion (a numerical extrapolation of nonlinear dif- The coding that was used here was a simple bi- ferential equations was necessary to evaluate the nary code for the real parameters, such as starting cost function), it was difficult to find the necessary activation time. The coded parameters were con- derivatives between the ,:ost function and the task catenated to form a complete chromosome. A bet- dynamic parameters. Attempting to calculate ter way of forming the chromosomes might be to derivatives numerically would have been very split the binary representations of the parameters time consuming, because the function evaluations so that the most significant bits of all the parame-

1 0 5 Implementing a Genetic Algorithm to Recover Task-dynamic Parameters of an Articulatory Speech Synthesizer 99 ters are grouped together, then the next signifi- children chromosomes have been produced as the cant bits are grouped together, and so on. This result of mating a generation of individuals, the may be better because of it is more likely that children's fitnesses can be evaluated on physically shorter lengths of chromosomes stay together distinct processors. This capability was not used through the mating process and the fitness of any in this work. individual depends on how the parameters Significance interact. There were some particular advantages in em- Recovery of articulation by optimization is ploying a genetic algorithm for the optimization reminiscent of the analysis by synthesis model of procedure in this work. The basic genetic speech perception (Stevens, 1960; Stevens & algorithm was relatively easy to implement, and Halle, 1967; Stevens & House, 1970). In analysis this was an important consideration because of by synthesis, as originally conceived, the content the exploratory nature of this research: not much of a speech signal (the data) is recovered when time would have been spent if things had not hypothesized phonetic segments and features are worked out. Also, it was easy to watch the used to provide instructions to an articulatory optimization procedure evolve, and, thus, to tune mechanism which, in turn, produces speech to be the parameters of the genetic algorithm. A very compared with the data. When the comparison is important consideration was that genqtic algo- good enough, the hypothesized phonetic segment rithms provide a natural way to L._und the ranges and features are the perceived phonetic segments of the parameters that are to be optimized, and features. Rival theories of speech perception because the parameters are coded in finite length also take recovery of the vocal tract articulatory strings. This meant that the task-dynamic model movement or the intended gesture over time as would not be driven beyond certain bounds and central to perceiving the spoken message (e.g., numerical overflow during optimization could be Liberman & Mattingly, 1985). Whatever the avoided. Another important factor was a genetic psychological importance of the articulatory algorithm's ability to incorporate an on-off gene. movement or the motor system for speech For example, the optimal solution may or may not perception in adults, children develop their speech involve the activation of the tongue tip (TTCL and by listening and by trying to be understood, and TTCD). It was easy to include a bit in each thus this learning process may be considered an individual's chromosome that determines whether analysis by synthesis over an extended time scale. the tongue-tip was to be activated or not. If the Also, because task-dynamics and articulatory tongue tip was to be activated, then the part of the movement are slow relative to the variations in chromosome containing tongue-tip task param- the acoustic pressure wave, there are potential eters could be used, and if not, then that part of technological advantages in recovering vocal-tract the chromosome would be ignored. Finally, the movement from the speech signal to achieve bit- genetic algorithm can be used for purposes other rate reduction for transmission (Schroeter & than straight-forward optimization. Speech Sondhi, 1992; Sondhi, 1990). In the work learning is an obvious extension of the work presented here, selected task-dynamic parameters presented here, so the fact that genetic algorithms were recovered, without regard to phonetic can be used in classifier systems is an important categories: a model of speech perception is not consideration. offered. However, there are future directions that The power of the genetic algorithm comes from this work might take for speech recognition and what John Holland, the originator of genetic algo- for modeling speech development. rithms, called implicit parallelism. Although the population of chromosomes is finite, say number Extensions to previous work N, the number of patterns of chromosomes, called In previous work using the methods described schemata, being processed is on the order of N3 above, the focus has been on recovering articula- (Goldberg, 1989, pp. 40-41). This implicit paral- tory movement rather than task dynamics lelism, as well as the probability of mutation, (McGowan, in press). Based on limited testing, the makes the algorithm much less likely to become method seems to be successful at doing this. While stuck in local maxima. Also, the implicit parallel this method may help solve the one-to-many processing property makes this algorithm more problem in the transformation from acoustics to efficient than exhaustive search of the parameter articulation, it was not known whether the space. Parallel processing in the usual sense of us- acoustics maps onto the task-dynamics of the tract ing many processors is also possible. Given that variables uniquely. In the previous work, the task-

106 100 McGowan dynamic parameters were either assumed to be on of /able/, for the initial movement to lip closure, in- or they were assumed to beotr before the opti- volving the tract variables LA and LP, the activa- mization was run. The only task parameters al- tion interval and targets were taken as known. lowed to be activated were the ones known to have However, the subsequent activation interval for synthesized the speech data. It may have been LA and LP was taken as unknown, as was the possible to obtain nearly identical articulatory target position for LA. Only the target position for movement from very different task dynamics. LP was assumed known in its second activation. That is, there may be a one-to-many problem in Tongue body movement was taken as unknown, so mapping from articulation to task dynamics, or that the activation interval and the target posi- from acoustics to task dynamics. Thus, for the tions for TBCL and TBCD were varied for the op- present study, on-off genes were added so that se- timum fit, although it was assumed that TBCL lected tract variables could either be activated or and TBCD were activated for at least 100 ms inactivated. This allowed the optimization itself to during the utterance. The fact that TTCL and select which tract variables to use. TTCD were not activated for /abm/ was assumed to be unknown. The test was to see whether the ge- PROCEDURE netic algorithm would leave these tract variables Constraining relations between the parameters deactivated in its optimal solution. This can be were used to keep the number ofunknown compared with kdx/ where there was activation parameters to a minimum. It was assumed that both of the tongue-tip tract variables, TTCL and all movements were critically damped, and that TTCD, and the tongue body tract variables, TBCL the activation intervals were equal to the period and TBCD. The lip movement for the final vowel, based on the natural frequency parameter. Also, which occurs after the tongue-tip closure, was the movement periods, and hence the activation taken as completely known. The parameters for intervals were assumed to be at least 100 ms long the tongue tip were taken as unknown and the al- to avoid movements that were too stiff for the gorithm was allowed to turn them on or off. The task-dynamic simulation. With these constraints, parameters of the tongue-body tract variables, the unknowns for a given tract-variable activation TBCL and TBCD, in the transition to final vowel were the beginning and ending activation times were taken as unknown, but they were presumed and the target position. There could have been activated for at least 100 ms some time during the more than one activation of any tract variable, utterance as for /abaY. and these could have overlapped in time. Also, the Given the synthetic acoustic data, a genetic al- actions of the certain tract variables were grouped gorithm (Goldberg, 1989a) was used to recover according to common constriction goals, so that task-dynamic parameters when random noise they would have identical activation intervals. with a flat distribution between -10 and +10 Hz The tongue body tract variables TBCL and TBCD was added to each formant frequency data at each were in one such group, the tongue-tip variable:: time frame. The noise was used to test robustness rra, and TTCD were in another group, and the of the procedure. For each test an initial popula- lip variables LP and LA were in a group. These tion of chromosomes of 60 individuals was gener- groups are known as constriction or task spaces. ated using a random number generator. Their fit- Gestural scores were designed to produce articu- nesses were evaluated by decoding the chromo- latory movement for utterances that resembled somes of each individual into the task dynamic /abw./ and /adR/, within the constraints mentioned parameters of a gestural score, which enabled the above. The trajectories of the tract variables re- task-dynamic simulation to drive the articulatory sulting from the scores for /abm/ and Iadml are il- synthesizer, ASY, which, in turn produced for- lustrated in Figures 3 and 4. The first utterance mant trajectories. The coding was done so that involves moving from neutral position into a bil- task-dynamic parameters were placed contigu- abial closure. Subsequently, as the lips open, the ously coded into binary strings with all parame- tongue lowers for the final vowel. The second ut- ters in each task space grouped together. These terance involves the tongue making an alveolar parameters included the beginning of activation, closure, and then the release of that closure with a the activation interval, and the targets of the tract lowering of the tongue body for the final vowel. variables within a given task space. Table 1 The heights of the hatched boxes for activation indicates the range of the target values for each denote the target positions. In recovering the task- tract variable, the number of bits used in the dynamic parameters, some parts of the gestural coding of that parameter into the chromosomes, score were taken as known and fixed. In the case and the resulting resolution of that target.

107 Implementing a Genetic Algorithm to Recover Task-dynamic Parameters of an Articulatory Speech Synthesizer 107

back. low

front. high

OPen

close back. low OFF

front. high °Pen

close

0 TIME (MS) 400

Pr A acirvation interval

Figure 3. Original gestural score for Iabz/.

back. low

front.high #4 often

....* dose backlow

rfront. r ,,...... -__ °Pert

close IAlga' open L. A - // dose / out in 0 TIME (MS)

activation interval

Figure 4. Original gestural score forIadvd.

108 102 McGowan

Table 1. Target value specifications. recovered. There were some discrepancies, because the activation interval for the tongue body (TBCL and TBCD) was delayed, while the activation TractMaximum/Minimum Number of Bits in ChromosomeResolution interval for (LA and LP) was slightly delayed, as Variable Target Value well as shortened. The recovered paths of the tract .033 cm variables matched closely those of the original. It LA 1.80/-.3 cm 6 is significant here that the tongue tip variables TBCL 3.16/.51 rad 6 .042 rad (TTCL and TTCD) were not activated in the TBCD{Iabai 11.80/-.30 cm 6 1.033 cm 11.63/-.13 cm 1.028 cm optimal solution even though they showed TTCL 1.16/.40 rad 6 .012 rad movement because of their dependence on the TTCD 2.15/-.65 cm 6 .044 cm tongue body. Thus, the procedure was able to correctly decide whether the tongue tip was under active control or not in this instance. The optimal Beginning and ending times for the activation recovery of /adx/ seems to be very good (Figures 4 intervals were resolved within the data frame rate and 6), except that the activation interval for the of 10 ms. Recall that the fitness of an individual tongue body (TBCL and TBCD) started late and was the inverse of the sum of squares of the was too short. In this utterance, not only did the differences of each of the lowest three formant fre- genetic algorithm have the opportunity to shut off quencies produced by the individual and that of the tongue tip (TTCL and TTCD), but there was a the data in 10 ms steps. Thirty pairs of individu- sequence of gestures involving the tongue tip and als were chosen with probability proportional to tongue body for which dynamic parameters were each of their fitnesses. There was a .6 chance of unknown. In this instance, the procedure found each of these pairs to mate, where mating con- that the tongue tip was activate& sisted of randomly selecting a cut point for the Quantitative measures of the goodness of fit are pair, and then exchanging substrings on each side indicated by the mean square error in the Formant of the cut point to form two children strings. There frequency fits. In the optimal fit for /abm/ the mean was also a .001 chance of mutation at a single po- square error was 1099, which corresponds to an sition in any of the children. If mating did occur, average error of 19 Hz per formant per frame of the fitness of the children strings would then be data. For /acW the mean square error for the evaluated. In a variation of the standard genetic optimal fit was 4444, which corresponds to an algorithm, the best individual of a given genera- average error of 38 Hz per formant per frame of tion was always retained into the succeeding data. Using the recovered gestural scores it was generation. possible to generate the articulatory trajectories The choice of pairs and possible mating was themselves, such as that for jaw angle, JA (see allowed to go for 60 generations. At the end of 60 Figure 1). Tables 2 and 3 show that both the RMS generations the individual with the greatest error and maximum error in the optimal recovery fitness had its string decoded into task-dynamic of important articulatory trajectories are small in parameters of a gestural score, which were saved absolute terms. for later comparison. This procedure was repeated Perhaps as important as evaluating the perfor- 8 times for a new initial random population of 60 mance of the best, or optimal individual, is to individuals. The saved gestural scores were used evaluate the performances of other, suboptimal to drive ASY, and the gestural score and individuals. For /ab&, the second fittest subopti- articulator trajectories of the best individual of mal individual was a gestural score that activated each run were compared to those that generated the tongue tip (Figure 7). As can be seen, the ac- the original acoustic data. tivation interval of the tongue tip was just over the minimum 100 ms, and the targets were close RESULTS to neutral position, which was the direction of The fittest individual of the 8 runs for each ut- movement for TTCL and TTCD during that terance is termed the optimal solution, and the interval anyway. Therefore, this activation had individuals having the highest fitness in each of little effect on the tongue tip. The mean square the other runs are termed suboptimal solutions. error of this fit was 2004, which corresponds to an Figures 5 and 6 show the optimal recovered ges- average error of 26 Hz per formant per time frame tural scores for the utterances /abw../ and /adx/. re- of data The resulting error in the articulation is spectively. A comparison between Figures 3 and 5 shown in Table 2, where the absolute errors were shows that the gestural score for /alxe/ was well almost as small as those for the optimal solution. Implementing a Genetic Algorithm to Recover Task-dynamic Parameters of an Articulatoni Speech Synthesizer 103

LA

beck. low

front. hkgh ci

close dc. low

I. high

A A

0 460 TIME (MS)

actria0co nterval

Figure 5. Optimal gestural score for Iabw./.

back low

front. NO

01

se 0ck. low

A 1 rcot. NO 0pen

close , c

A c

c .:1

/ TIME (ME)

ectwatoon interval

Figure 6. Optimal gestural score for /acke/. 104 McGowan

Table 2.Comparison of original and recovered trajectories for utterance /atm/.

Articulator RMS difference RMS difference suboptimal Maximum absolute difference Maximum absolute difference coordinate optimal solution solution (example) optimal solution suboptimal solution (example)

CL 0.0083 cm 0.0067 cm 0.079 cm 0.18 cm CA 0.00023 rad 0.0019 rad 0.0050 rad 0.019 rad JA 0.00043 rad 0.0032 rad 0.0070 rad 0.029 rad

Table 3.Comparison of original and recovered trajectories for utterance /adz/

Articulator RMS difference RMS difference suboptimal Maximum absolute difference Maximum absolute difference coordinate optimal solution solution (example) optimal solution suboptimal solution (example) CL 0.023 cm 0.10 cm 0.26 cm 1.26 cm CA 0.0037 rad 0.0040 rad 0.035 rad 0.051 rad TL 0.012 cm 0.098 cm 0.23 cm 1.12 cm TA 0.0034 rad 0.0035 rad 0.067 rad 0.42 rad JA 0.0033 rad 0.0049 rad 0.043 rad 0.059 rad

For the utterance /adx/, a relatively common /adw-/, the example of a suboptimal solution did not feature of the suboptimal solutions was rese- produce an utterance all that close to the original, quencing the tongue tip and tongue body but the 60 individuals after 60 generations in this activation intervals. Because these tract variables particular run were nearly identical, suggesting are coupled through the anatomy of the model that the algorithm had found a locally optimal articulators,thesetractvariablescould solution far from the optimal solution. In general, compensate for one another, with the tongue body it is important to run the genetic algorithm activated to make the initial alveolar constriction several times to ensure that the proposed for the /d/ and the tongue tip activated to make suboptimal solution is not trapped in a locally the vowel 4/. There were 4 suboptimal solutions optimal region far from the optimal region. that resequenced the tongue tip and tongue body There are some improvements in the algorithm activations. The fittest of these individuals (fourth that can be tried so that local optima might be fittest overall) is shown in Figure 8. This fit had a avoided. One is to code the parameters in the mean square error in the formant fit equal to chromosomes so that the most arithmetically 21584, corresponding to an average error of 85 Hz significant bits of all the parameters are close by, per formant per time frame of data. It is apparent and the next significant bits are all close by, and from Figure 8 that the tongue body was activated so on. In this way the likelihood of breaking apart to make the initial tongue-tip closure, and that chromosomes with roughly good combinations of the tongue tip was activated to make the body target positions and activation intervals might be move down and back for the vowel 4/. Table 3 avoided. Another improvement would be to run shows that the articulation with this suboptimal several populations in parallel, and then take the solution is not nearly as close to the original best from each to run again (Goldberg, 1989b). utterance as the optimal solution: there are large From the experience gained with optimization, a maximum errors in the vector lengths CL and TL. classifier system based on a genetic algorithm could be used to successfully learn dynamic CONCLUSION parameters. Extending the research brings a host These results indicate that the genetic of related questions. What happens if there is a algorithm is useful for recovering task dynamics. mismatch between the synthesizer used for the However, the suboptimal solutions show that the classifier and the vocal tract that produced the algorithm can be driven into locally optimal data, as would be the case if the speech data was solutions with incorrect tract variable activations. natural speech? How would the fitness function be In the case of /abm/ a suboptimal solution produced affected? For a machine to repeat or learn human articulatory paths close to the original utterances speech with an articulatory synthesizer, it does despite activating the tongue tip. This indicated not matter whether the correct dynamics is recov- that there is a one-to-many situation in the ered, as long as the articulatory synthesizer has mapping from articulation or acoustics into the enough degrees of freedom to produce a very good task dynamics of the tongue tip. In the case of optimal match with the human acoustic data. Iii Implementing a Genetic Algorithm to Recover Task-dynamic Parameters of an Articulatory Speech Synthesizer 105

open

/A dose

dose open

A dose out

in

0 TIME (MS) 400

V A activation interval Figure 7. Suboptimal gestural score for /absei.

L - ; LLt411.1,l1-44k4.L44-1

back, low

front high open

close back, low "414444444.

front, high open

dose open I. A dose out

in 0 TIME (MS) 400

Figure 8. Suboptimal gestural score for lachPJ.

112 106 McGowan

A fixed fitness function evaluated in the acoustic Browman, C. P., & Goldstein, L. (1990). Gestural specification parameter space, analogous to the simple one using dynamically-defined articulator structures. Journal of Phonetics, 18, 299-320. used here will suffice. If the goal is to model Goldberg, D. E. (1989a). Genetic algorithms in search, optimization, human speech development and the resulting and machine learning. Reading MA: Addison-Wesley Publishing speech perception and production, then it is Company, Inc. important to ask whether children learn speech by Goldberg, D. E. (1989b). Sizing populations for serial and parallel genetic algorithms. In J. D. Schaffer (Ed.), Proceedings of the matching acoustic output. Do children recover the Third International Conference on Genetic Algorithms (pp. 70-9). movement of adults' articulators or their San Mateo, California: Morgan Kaufman Publishers, Inc. dynamics? It is plausible that children neither Liberman, A. M., & Mattingly, I. G. (1985). The motor theory of recover adult articulation, nor match acoustic speech perception revisited. Cognition, 21, 1-36. output. What does this mean for the simulation of McGowan, R. S. (1991). Recovering tube kinematics using time- varying acoustic information. In Proceedings of the 12th human learning? What is the fitness function to International Congress of Phonetic Sciences, August 19-24, Aix-en- use? Children, somehow, recover the adults' Provence (Volume 4, pp. 486-89). Aix-en-Provence: Universite meanings and understand adults according to de Provence, Service des Publications. their own needs and desires. They adapt to adult's McGowan, R. S. (in press). Recovering articulatory movement speaking with the added plus of having a speech from fonnant frequency trajectories using task dynamics and a apparatus that they can use to be understood (e.g. geneticalgorithm:Preliminary modeltests.Speech Communication. to get adults to do what they want). A child could Mermelstein P. (1973). Articulatory model for the study of speech construct a dynamics based on his own vocal production. Journal of the Acoustical Society of America, 53, 1070- apparatus and associate others speech sounds 82. with their own dynamics, even when itis Rubin, P., Baer, T., & Mermelstein, P. (1981). An articulatory impossible to match another's speech sounds synthesizer for perceptual research. Journal of the Acoustical Society of America, 70, 321-28. exactly, because he can be understood using these Saltzman, E. L., & Kelso, J. A. S. (1987). Skilled actions: A task- movements. Thus, it is not necessary that the dynamic approach. Psychological Review, 94, 84-106. child recover adults vocal movements by an Saltzrnan, E. L., & Munhall, K. G. (1989). A dynamic approach to analysis by synthesis with a comparator. He gestural patterning in speech production. Ecological Psychology, 14, 333-82. knows which of his own movements would Schroeter, J., Meyer, P., & Parthasarthy, S. (1990). Evaluation of produce like meaning. The fitness function would improved articulatory codebooks and codebook access distance not be based on acoustic match with adults' measures. IEEE Proceedings, ICASSP 90, Albuquerque, New utterances exactly, but one that involves both Mexico. meaning and associations with the child's vocal Schroeter, J., & Sondhi, M. M. (1992). Speech coding based on apparatus. The fitness function, in the case of de- physiological models of speech production. In S. Furui & M. M. Sondhi (Eds.), Advances in speech signal processing (pp. 231- velopment, would be a "moving target" depending 268). New York: Marcel Dekker, Inc. on the state of the adult listeners and speakers, as Shirai, K., & Kobayashi, T. (1986). Estimating articulatory motion well as the child's state. from speech wave. Speech Communication, 5, 159-70. Sondhi, M. M. (1990). Models of speech production for speech analysis and synthesis. Journal of the Acoustical Society of REFERENCES America, 87, S14 (A) Stevens, K. N. (1960). Toward a model for speech recognition. Atal, B. S ., Chang, J. J., Mathews, M. V., & Tukey, J. W. (1978). journal of the Acoustical Society of America, 32, 47-55. Inversion of articulatory-to-acoustic transformation in the Stevens, K. N., & Halle, M. (1967). Remarks on analysis by vocal-tract by a computer sorting technique. Journal of the synthesis and distinctive features. In W. Wathen-Dunn (Ed.), Acoustical Society of America, 63, 1535-55 Models for the perception of speech and visual form. MIT Press: L-J., Perrier, P., & Bailly, G. (1992). The geometric vocal tract Cambridge, Massachusetts. variables controlled for vowel production: proposals for Stevens, K. N., & House, A. S. (1970). Speech perception. In J. constraining acoustic-to-articulatory inversion. Journal of Tobias (Ed.), Foundations of modern auditory theory, Volume II Phonetics, 20, 27-38. (pp. 3-63).

113 Haskins Laboratories Status Report on Speech Research 1993, SR-113, 107-130

An IVIRI-based Study of Pharyngeal Volume Contrasts in Akan

Mark K. Tiede

Characteristic differences in pharyngeal volume between Akan +/-Advanced Tongue Root (ATR) vowel pairs have been investigated using Magnetic Resonance Imaging (MRI) techniques, and compared to the similar Tense/Lax vowel distinction in English. Two subjects were scanned .!Siring steady-state phonation of three pairs of contrasting vowels. Analysis of the resulting images shows that it is the overall difference in pharyngeal volume that is relevant to the Akan vowel contrast, not just the tongue root advancement and laryngeal lowering previously reported from xray studies. The data also show that the ATR contrast is articulatorily distinct from the English Tense/Lax contrast.

INTRODUCTION (2) mI + di --> midi"I eat" Several West African languages have a [1st Sg] [eat] phonological process of Vowel Harmony that splits 0 di --> odi "he eats" the distribution of their vowels into two congruent [3rd Sg] [eat] and mutually exclusive sets. The Akan language, a member of the Niger-Congo Kwa group (Stewart mI + di --> midi"I am called" 1971) spoken in Ghana, is typical of this pattgrn. [1st Sg] [be called] The vowels of Akan may be grouped as follows.1 0 + di --> 31;11 "he is called" (1) Set 1 Set 2 [3rd Sg] [be called]

3 Drawing on a cineradiograp: ic study of the re- a lated language Igbo by Ladefoged (1964), Stewart (1967) proposed an analysis of Akan in which In general vowels in an Akan word harmonize; corresponding vowels from the two harmony that is all vowels are drawn exclusively from one groups are distinguished articulatorily by set or the other. (The low mid vowel /a/ may occur differencesintonguerootposition. in a word with vowels from either set.2 ) Affixes Cineradiographic studies of Akan undertaken by have two forms for compatibility with stems Lindau (1975, 1979) confirmed the primacy of having type I or type 2 vowels. For example (from tongue root position in the harmony mechanism, Dolphyne 1988:15): but also showed correlated variation in larynx height: advanced tongue root was combined with lowered larynx, and retracted root with raised larynx. In reporting this work Lindau suggested I am grateful to the many people who assisted me in this that the relevant contrast was not just the relative project, particularly Cathe Browman, Alice Faber, Louis positions of these articulators but rather the Goldstein, Carol Gracco, Robin Greene, Kathy Harris, Pat Nye, overall difference in pharyngeal volume produced Alfred Opoku, Elliot Saltzman, Doug Whalen, and all those who agreed to be magnetized for Science. Goofs and garbles by their cooperative positioning, and proposed the remain of course my responsibility. This work was supported feature "Expanded" to desceibe it. A contrast by NIH grant DC-00121. based on differences in pharyngeal volume

107 114 108 Tiede suggests that vowels from the two groups may vocal tract area functions of four point vowels, and also differ in the left-to-right or lateral dimension in similar work by Lakshminarayanan, Lee, and of the pharynx, assuming that this is subject to McCutcheon (1991). voluntary control, but studies based on The English Tense/Lax distinction eludes cineradiography are inherently unable to explore precise articulatory description, but is similar in this possibility, since the technique collapses all many ways to the Akan contrast: tense vowels are lateral information into a flat (sagittal) image. generally articulated with an advanced tongue The purpose of the study discussed here was to root, and sometimes a lowered larynx. English investigate the predicted difference in pharyngeal vowels comparable to those used in Akan may be volume using Magnetic Resonance Imaging, which grouped as in (1) above: is not subject to the same limitation. (3) Tense Lax The magnetic resonance technique has only recently become a viable imaging alternative. 3 Developed pripiarily for medical diagnostic purposes, MRITexploits the behavior ofhydrogen Previous attempts to identify the English dis- nuclei in a magnetic field to construct an image tinction with the same mechanism used in the correlated with the concentration of hydrogen in Akan ATR contrast include a proposal by Halle the scanned tissue.3 Because different tissue types and Stevens (1969), and a cineradiographic study have differing hydrogen densities and bondings, by Perkell (1971). However a more extensive MR images provide soft tissue definition over a cineradiographic study conducted by Ladefoged et range inaccessible to xray techniques. Two ca. (1972) showed that tongue root advancement is additional advantages make MRI an especially just one of several complementary strategies used attractive imaging modality: there are currently to implement the Tense/Lax contrast, and is not no known health risks for the subject associated used consistently by all speakers. Similarly with the technique, and because the imaging electromyographic data reported by Raphael and plane may be reoriented without moving the Bell-Berti (1975) showed differences between subject, three dimensional data collection is subjects in patterns of muscle tension used to possible. distinguish between tense and lax vowels. Factor Despite these advantages the MR technique has analysi. of xray-derived tongue shapes by substantial drawbacks in its potential for phonetic Harshman, Ladefoged, and Goldstein (1977) research, chief of which is the tradeoff between showed that tongue position for English tense/lax imaging time and resulting image quality. pairs can be predicted very completely by Although image acquisition rates continue to drop reference to just two parameters along which each as the technology evolves, they are currently still of the tongue root and tongue dorsum positions too slow by an order of magnitude for capturing covary, whereas a similar analysis of Akan by dynamic speech. In addition, three dimensional Jackson (1988) found three parameters necessary scanning requires multiple passes through the for tongue shape specification. While it is probably same volume, further increasing acquisition time. the case that different mechanisms are involved in This limits the current usefulness of MR1 in each language, they are similar enough to make phonetics to studies involving static vocal tract direct comparison feasible and interesting, and so shapes and sustainable patterns of phonation. they have been treated in parallel in the current The current study was able to proceed under study. these constraints. Although vowels sustained for many times their normal speaking duration rep- Method resent an admittedly artificial source of data, all In this experiment the contrast in cross- of the vowels examined (except English lax vow- sectional area was examined at adjacent levels els) can occur in open syllables, and are therefore through the pharynx for corresponding expanded artificial only in duration, and not in syllable and constricted vowels (i.e. +/-ATR; Tense/Lax).5 structure. Furthermore, sung vowels of constant The experiment involved two subjects, both male, pitch and quality and extended duration occur in in their early thirties. subject AO is a native the musical traditions of Ghanaian and American speaker of the Asante dialect of Akan; subject MT cultures.4 There is also precedent for use of the is a native speaker of Midwestern American MRI technique applied to measurement of static English. The vowels selected for comparison were vowel shapes in the work of Baer, Gore, Gracco, & /i:,/, /e :c/, and /u : u/, chosen because of their Nye (in press), who successfully used it to obtain reasonable similarity across the two languages.

115 An MRI-based Study of Pharyngeal Volume Contrasts in Akan 109

Prior to the experiment a target stimulus tape Prior to each run the experimenter verified the was prepared for each subject by twice recording target vowel by reminding the subject of the each of the target vowels in a characteristic word, characteristic word containing it. Playback of the extracting the vowel portion using digital editing target vowel was then initiated through the techniques, then recording it as a continuous earphone, and scanning begun shortly after the utterance by concatenation. The following target subject began phonating. Subjects were instructed words were used: to produce the target vowel with steady pitch and uniform quality, to take shallow breaths while (4) Vowel Akan English retaining vocal tract configuration, and to refrain [i] pi "many" hid "heed" from head movement and swallowing as best they "hid" could. [1] fi "to vomit" hid Two scans were made for each vowel. The first [e] jqe "empty" held"hayed" took approximately two and a half minutes to complete and produced eight adjacent sagittal [c] oitie"he looked at" had "head" images (bisecting face) at 3mm intervals (see [u] bu "to break" hud"who'd" Table 1 for imaging paranieters used). The second scan took approximately three minutes; it [u] bu "to be drunk" hud"hood" produced 28 adjacent images at 5mm intervals in the axial orientation perpendicular to the pharynx These words were also recorded for acoustic (see Table 2). analysis: Table 1. Sagittal imaging parameters. [o] ako "parrot" hod"hoed"

[o:a]kap "red"had "hawed" Subject AO (Akan) [a:x] daa "everyday" hxd"had" 256 x 256 pixels x 2 passes (NEX) The experiment was performed on a General GRE/30 Multi (Grass Echo) Electric Signa machine installed at the Yale New 28cm field of view Haven Hospital. The Signa system consists of a 3mm thickness toroidalsuperconductingelectromagnet Omm interscan skip developing a 1.5 Tesla flux density, placed in a 8 images scanning room designed to minimized interference Vowel TR TE Time from external electromagnetic noise. The magnet is controlled from an operator's console outside the 32 13 2:25 37 11 2:36 scanning room, and an attached computer is used 39 13 2:44 for image reconstruction, collation, and storage. 39 12 2:44 After divesting themselves of all ferrous mate- 37 11 2:36 rial subjects were fitted with an earphone, mi- 37 11 2:36 crophone, and neck RF transceiver imaging coil, and positioned on their backs inside the bore of Subject MT :English) the magnet. The earphone was the terminus of a 256 x 256 pixels x 2 passes (NEX) length of plastic tubing connected to a small GRE/30 Multi (Grass Echo) speaker placed as far away from the magnet as 28cm field of view possible (because the strength of the magnetic 3mm thickness field tended to overwhelm the speaker coil). The Omm interscan skip speaker was driven by an amplifier outside the 8 images scanning room, and was used to communicate with the subject and to play the prerecorded Vowel TR TE Time target stimuli during scanning. The microphone 36 10 2:32 was used to record subject phonation immediately 36 10 2:32 prior to and following scanning; while phonation 37 11 2:32 37 11 2:32 during scanning was also recordedit was 36 10 2:32 unusable for analysis because of the intensity of 37 11 2:32 the noise produced by the machine.

116 no Tiede

Table 2. Axial imaging parameters Analysis The images obtained were converted from 16 bit Subject AO (Akan) Signa format to an 8 bit (255 gray level) format compatible with display and analysis software. 256 x 128 pixels x 1 pass (NEX) The axial images were normalized for a standard SPGR/45 Volume (Spoiled grass) image density so that a given pixel magnitude represented the same value across all series; this 28cm field of view was done so that air-tissue boundary measure- 5mm thickness ments could be made consistently. Image analysis Omm interscan skip was performed on a Macintosh II computer using 28 images the NIH IMAGE program.6 TR 45 The sagittal images were used to replicate the cineradiographic results. Measurements of rela- TE 5 tive height for the tongue dorsum, jaw, and larynx Time 3:05 were obtained for each vowel, using the top of the second cervical vertebra as the reference, from the Subject MT (English) image judged to be closest to sagittal midline for that scan set. Dorsum height was measured from 256 x 128 pixels x 1 pass (NEX) the point on the tongue that maximized the verti- SPGR/45 Volume (Spoiled grass) cal distance from the reference (see Figure 1); jaw 30cm field of view (i. I. u, U) height was measured from a point at the base of the root of the lower incisors;, and larynx height 28cm field of view (e. E) was measured from its vertical distance to the ref- 5mm thickness erence. Relative tongue root advancement was Omm interscan skip measured as the length of a line drawn from the 28 images bottom of the vallecular sinuses to the rear pha- TR 45 ryngeal wall. Figure 1 illustrates how measure- ments were obtained, and shows an overlay of the TE 5 vocal tract outlines obtained for the comparison Time 3:05 between Akan /V and Id.

Dorsum Height

Reference

Larynx Height

Figure 1. Sagittal measurements, showing Subject AO (Akan) for /if overlain by /i/.

117 An MRI-based Study of Pharyngeal Volume Contrasts in Akan 112

The axial images were used to obtain data for posterior boundaries of pharyngeal airspace the left-to-right lateral dimension inaccessible to corresponding to tongue root advancement, which the cineradiographic studies. For purposes of will be referred to as "depth"; the novel left-to- comparison 11 sequential images were chosen right lateral distance across pharyngeal airspace, from each axial scan set. The reference used for which will be referred to as "width"; and a comparison was the lowest image in each series in measure of the cross-sectional area of pharyngeal which the epiglottis was visible as a detached airspace at that level; these are illustrated in entity; this point corresponds to the bottom of the Figure 3. The distance measurements were made vallecular sinuses used in the sagittal images to along a straight line at the point of widest identify the tongue root position. This image was distensionfor each dimension. The area used as the pivot for choosing five sequential measurement was computed by determining a images below that level in the pharynx, and five perimeterofstandardimageintensity more above, for the total of eleven.Th...., corresponding to the pharyngeal air-tissue approximate locations of axial scans measured are boundary, counting the number of enclosed pixels, shown overlain on a sagittal outline in Figure 2. and converting to area measure. Except when The reason for using the epiglottis to coordinate determining the reference or pivot level, the inter-vowel comparison was to minimize any epiglottis was ignored in all measurements; in effects due to laryngeal height differences, so that particular at levels above the reference (those in comparison would be made at correspondingwhich a detached epiglottis was visible), the anatomical points for both vowel conditions. anterior wall was taken to be the base of the Three measurements were obtained from each tongue, and area measurements included any axial image: the distance from the anterior to the visible epiglottal tissue.

+25 Reference 0

25

Figure 2. Appmximate location of.'.....------) Axial scans.

118 112 Tiede

ANTERIOR

PHARYNGEAL WALL

LEFT RIGHT SIDE SIDE

POSTERIOR

Figure 3. Axial measurements.

Axial image measurement is complicated by the the top of the image shows the front of the face or trifurcation of the airspace below the epiglottis neck. Notice the difference in pharyngeal airspace: into three tubes by the aryepiglottic folds: the the sagittal view on the left shows a wider opening central laryngeal passageway and two deadend in the lower pharynx than does the corresponding sidechannels, the piriform sinuses. The approach view for the constricted variant on the right. taken for these images was to obtain an overall Similarly the axial view of the expanded vowel on area measurement by adding the cross-section the left shows a larger cross-sectional area than measured for the central larynx tube to those the corresponding constricted vowel on the right. obtained for each of the piriform sinuses (i.e. In subjective terms the images obtained varied aryepiglottic tissue was not included). The width from poor to good quality compared to others of measurement was made from the left side of the this type,7 representing in general a reasonable left sinus to the right side of the right sinus, but compromise between image noise and required the depth measurement was made across the scanning time. Measurements were based on point of widest distension of the central larynx air/tissue interfaces, which tended to image well tube only. even if resolution of tissue-internal anatomical The acoustic recordings of pre- and post-scan details was nonoptimal.Resolution was phonation were used to verify that subject vowel approximately one millimeter per image pixel in quality remained on target during scanning. The both orientations.8 resulting utterances were digitized at a sampling To determine whether normalization of mea- frequency of 10 kHz. Formant values for each to- surements for different head sizes was appropri- ken as well as the original vowel target utterances ate, subject tract lengths were measured using the were obtained by a linear-predictive-coding auto- midline sagittal image obtained for the vowels /i/ correlation analysis, using a 20ms Hamming and /1/ in each language. The measurement was window shifted 10ms to fit a 14th-order polyno- made from the level of the vocal folds, around the mial, with averaging of successive frames for curve of the tongue, to the midpoint of a line bi- which the first three formants were available. secting the narrowest lip approximation, pre- sented here in centimeters: Results Vocal Tract Lengths (cm) Images. Figure 4 is representative of the (5) resulting images. The left side of the figure shows Subject AO Subject MT a midline sagittal view and mid-pharynx view Vowel (Akan) (English) from the expanded (+ATR) variant of the Akan 17.4 16.9 vowel /i/, and on the right the corresponding 15.7 15.3 constricted views (h/). The orientation of the axial images corresponds to a slice through the neck, These values were considered to be sufficiently perpendicular to the pharynx, presented so that close as t make normalization unnecessary.

119 An MRI-based Study of Pharyngeal Volume Contrasts in Akan 113

Subject AO (Akan) vowel /i/ Subject AO (Akan) vowel Il

Mid-Sagittal Mid-Sagittal

Mid-Pharynx Mid-Pharynx

Figure 4. Representative images.

Errors affecting measurements may have been possibilities affected sagittal scanning more than introduced in two different ways: from distortion axial. Sagittal scans were assembled individually of the image due to subject movement during one after another, whereas the axial scans were scanning, or from subject head tilt.Both obtained using a method that computed all images

F.' ill WIIP utnw "Ivo it Pk:10 r brA ,rktit.-16I E 114 Tiede concurrently. Subject movement in a sagittal scan Akan + Akan - set resulted in blurring of the single image being English + acquired at that moment, maldng it unusable for Moon= Height /, English - measurement, while movement in an axial set 35 caused only a loss of defmition across all images in the set, leaving them allstillviable for measurement. 25 The effect of any head tilt or rotation was to 20 cause the scanning plane to intersect with the subject at an oblique angle, rather than producing E'" a true bisection of the head. This was not a 10 problem for axial images, since any tilting would 5 have been too slight to cause measurable distortion in that orientation. For the sagittal images however, even slight tilting was sufficient to make determination of the head midline Jaw Hags* difficult; for example, one image might show midline at the level of the larynx, yet be off center at the level of the palate, thus affecting tongue 10 dorsum height measurements. Therefore all sagittal tongue measurements were confirmed on 20 images adjacent to the one chosen as midline. Sagittal Orientation. The sagittal measure- E30

ments obtained are given in Table 3, and illus- 40 trated graphically in Figure 5. Figure 6 shows the (expanded - constricted) difference between re- sults obtained for corresponding vowel pairs. 50 11Table 3. Sagittal measurements. Lalyngeal Height 10 Subject AO (Akan) 20 (MM) 30 Root Dorsum Laryngeal Jaw E40 VowelAdvancementHeight Depth Depth E50

i 22.97 19.69 68.91 48.13 60 I 17.50 19.69 57.97 45.94 70 e 21.88 25.16 54.69 41.56 e 19.69 24.06 51.41 47.03 80 u 32.81 21.88 74.38 43.75 u 17.50 18.59 67.81 41.56 Tongue Root Advancemert 35

Subject MT 3) (English) Com) Root Doisum Laryngeal Jaw as Vowel AdvancementHeight Depth Depth E ,

i 22.97 30.63 55.78 30.63 10 I 21.88 28.44 47.03 32.81 c 21.88 27.34 51.41 36.09 5 e 19.69 25.16 49.22 32.81 0 u 30.63 30.63 55.78 33.91 u 18.59 26.25 54.69 33.91 Figure 5. Sagittal results.

1 21 An MRLbased Study of Pharyngeal Volume Contrasts in Akan 225 1

Dorsnm Height expanded - constricted very similar:for each tense variant root advancement was larger, larynx height was lower, IIII Akan and tongue dorsum higher. However, the English English measurements show smaller differences in magnitude for both root advancement and laryngeal lowering than Akan, and greater differences in tongue dorsum height. Results obtained for jaw height showed small and inconsistent differences in both languages. Axial Orientation. The axial measurements ob- tained are given in Tables 4a and 4b. Figure 7 illustrates the difference in cross-sectional area between Akan vowel pairs. In each graph in this ii.' and subsequent figures the height of each bar Larynx Height expanded - constricted shows the difference in area between expanded and constricted variants at corresponding levels of the pharynx, increasing in height at 5mm intervals from five measurement levels below the base of the epiglottis to five levels above. Observe that the pharyngeal airspace is larger at all measured levels for the expanded (+ATR) variants of vowels Ill and /u/, and for all levels of expanded /e/ above the epiglottal pivot. By combining each sequence of measured cross-sections the following approximations to pharyngeal volume were obtained (representing the section delimited by +/- 25mm from the base of the epiglottis):9 (6) Derived Pharyngeal Volumes for Subject AO (Akan) (cm3) Root Advancement expandedconstricted 16 14 - 111 Akan English I E I.1 12 - I +ATR 30.58 18.38 42.04 10 -ATR 18.90 17.27 25.09 8

6 The larger volumes observed for the "Expanded" 4 variants of each vowel show that Lindau's term is an apt name for the feature being contrasted. It is 2 interesting that the ratio of expanded to con- 0 stricted volume is nearly the same for vowels /i/ and /u/ (1.62 vs.1.68), and the absence of a difference of similar magnitude for /e/ suggests Figure 6.Expanded - constricted sagittal results. that the data obtained for that vowel should be viewed with caution (although they agree with the In each case the Akan results confirm the cinera- sagittal results, obtained during a separate diographic studies mentioned above: tongue root scanning sequence, which also showed the advancement was larger for the expanded (+ATR) smallest root advancement difference for /e/). The variant of each vowel pair, larynx height was smaller differences observed for /e:e./ may be due lower for each expanded variant as well, and to the fact that the inherently lower tongue tongue doreum height showed either no difference constriction location characteristic of these vowels (iA) or slightly higher values for expanded leaves less room in the lower pharynx for effecting alternates (c:e, u:u). The results for English were the contrast.

122 Tiede 116

Table 4a. Axial measurements for Subject AO(Akan).

Area (nun2)

Level (mm)

107.67 508.42 245.24 -25 309.84 133.98 69.38 205.76 534.74 290.70 -20 230.88 137.57 137.57 807.50 363.67 -15 296.68 230.88 174.66 293.09 327.78 820.65 504.83 -10 419.90 373.24 294.29 788.35 520.39 -5 492.87 313.43 328.98 311.04 1233.37 878.08 0 656.76 422.29 476.12 297.88 374.44 1165.19 764.43 5 750.07 451.00 488.09 380.42 721.36 480.91 10 787.16 455.79 462.96 418.70 646.00 381.62 15 722.56 437.84 429.47 640.01 349.32 20 714.18 421.09 404.35 374.44 362.48 543.12 238.06 25 734.52 403.15 410.33

Width (mm)

Level (mm)

28.44 33.91 29.53 -25 35.00 28.44 28.44 35.00 30.63 -20 32.81 29.53 29.53 29.53 39.38 31.72 -15 32.81 32.81 31.72 30.63 38.28 31.72 -10 35.00 33.91 33.91 29.53 37.19 36.09 -5 36.09 31.72 35.00 30.63 37.19 38.28 0 32.81 29.53 33.91 25.16 35.00 32.81 5 30.63 24.06 29.53 24.06 35.00 30.63 10 30.63 26.25 22.97 25.16 35.00 27.34 15 28.44 28.44 28.44 24.06 35.00 27.34 20 31.72 25.16 24.06 22.97 33.91 20.78 25 30.63 22.97 25.16 17.50

Depth (mm)

Level (mm)

26.25 -25 19.69 18.59 20.78 12.03 37.19 -20 25.16 15.31 20.78 14.22 31.72 21.88 -15 27.34 18.59 19.69 15.31 31.72 24.06 -10 25.16 17.50 24.06 15.31 29.53 22.97 -5 17.50 14.22 14.22 14.22 30.63 20.78 0 33.91 24.06 27.34 17.50 43.75 30.63 5 33.91 18.59 27.34 20.78 42.66 28.44 10 31.72 18.59 18.59 22.97 32.81 21.88 15 29.53 24.06 20.78 21.88 27.34 18.59 20 32.81 22.97 24.06 24.06 27.34 15.31 25 31.72 24.06 24.06 25.16 25.16 12.03

123 An MI-based Study of Pharyngeal Volume Contrasts in Akan 117

Table 4b. Axial measurements for Subject MT (English).

Area (mm2)

Level (mm)

-25 438.08 508.12 470.14 486.89 753.94 618.48 -20 517.73 519.10 411.52 514.40 699.01 617.29 -15 527.34 616.61 391.19 614.89 806.12 608.91 -10 661.93 649.57 576.61 590.97 862.43 752.47 -5 606.99 585.02 523.97 644.80 896.76 718.97 0 914.6! 659.18 608.91 532.35 1031.34 848.17 5 951.69 630.34 653.17 368.46 958.56 802.71 10 942.08 611.11 575.42 295.48 944.82 724.95 15 948.94 597.38 517.99 324.19 855.56 628.05 20 940.70 649.57 526.37 337.35 811.61 541.92 25 961.30 708.62 525.17 394.78 752.56 485.69

Width (mm)

Level (mm)

-25 32.81 33.98 33.91 33.91 33.98 32.81 -20 37.50 36.33 36.09 36.09 36.33 33.91 -15 39.84 39.84 38.28 38.28 39.84 36.09 -10 41.02 39.84 39.38 41.56 41.02 39.38 -5 41.02 41.02 39.38 40.47 41.02 40.47 0 38.67 38.67 40.47 38.28 39.84 39.38 5 37.50 38.67 39.38 37.19 38.67 39.38 10 42.19 36.33 36.09 36.09 39.84 39.38 15 43.36 41.02 35.00 33.91 41.02 31.72 20 44.53 38.67 36.09 35.00 41.02 36.09 25 45.70 42.19 33.91 33.91 39.84 38.28

Depth (mm)

Level

-25 26.95 29.30 29.53 26.25 30.47 30.63 -20 21.09 22.27 21.88 22.97 28.13 28.44 -15 19.92 19.92 21.88 24.06 29.30 26.25 -10 21.09 19.92 17.50 19.69 24.61 29.53 -5 17.58 17.58 17.50 17.50 24.61 24.06 0 31.64 23.44 24.06 17.50 31.64 26.25 5 30.47 22.27 22.97 15.31 30.47 27.34 10 30.47 21.09 20.78 14.22 29.30 19.69 15 30.47 18.75 20.78 12.03 2.5.78 18.59 20 31.64 23.44 18.59 12.03 24.61 16.41 25 29.30 23.44 20.78 17.50 21.09 15.31

124 118 Tiede

Ill- h vancement. Notice that while somewhat noisier 500 than the area data, the overall trend for both Epiglottis 400 width and depth is for consistently larger values 300 for the expanded variant of each vowel.10 Notice also that the observed differences in lateral width 200 observed for ante- 5 are almost as large as those E 100 rior/posterior depth, showing that for this speaker at least control of the lateral dimension is anim- portant part of the mechanism used to producethe -100 contrast, thereby confirming Lindau's position -200 that speakers of Akan seek to maximize the dif- 125 -20-15 -10 %5'0'S'10'15 '20'25 mm (increasing heightthrough pharynx) ference in overall pharyngeal volume in effecting the contrast. let - In! Figures 9, 10, and 11 provide a comparison of 500 the English area, width, and depth results with Epiglottis those of Akan. As in the previous figures,the 400 height of each bar shows the difference between 300 expandedandconstrictedvariantsat 200 corresponding levels through the pharynx. Combining area cross-sections as for Akan the E 100 following approximations to pharyngeal volume were obtained: -100 (7) Derived Pharyngeal Volumes for Subject MT (English) (cm3) -200 25 -20 -15-10-5 0 5 101520 5 mm (increasing height through phalynx) I E U 37.25 26.28 43.10 /u/ - /u/ Tense 500 Epiglottis Lax 30.13 23.55 34.31 400

300 As in Akan, the expanded (tense) variants show consistently larger overall pharyngeal volume 200 than their lax counterparts. Again,itis 100 interesting to note that the ratio of expanded to constricted volume is nearly the same for vowels iliilliiii /i/ and /u/ (1.24 us. 1.26), although the magnitude -100 of the difference is considerably less than that -200 observed for Akan. 1-25 '-20 '-15 110-5'0 5'1015 '20 '25 (increasing height through pharynx) Acoustics. Figure 12 shows a combined vowel mm space chart using the averaged stimulus target Figure 7. Delta cross-sectional area for Subject AO vowel formants from each language. The primary (Akan). acoustic effect of the ATR contrast on Akan vowel pairs appears to be a raised Fl for (-ATM The observed difference in pharyngeal volume is variants, with F2 relatively unaffected. Lax consistent with the tongue root advancement ob- vowels in English on the other hand show both a served in the sagittal images and cineradiographic raised Fl and lowered F2 compared to their tense studies, but leaves open the question of whether counterparts, for a net centralizing effect. Values the difference is due entirely to root position. or measured for F3 were lower for constricted whether pharyngeal lateral width contributes as variants in both languages. Two analyses of well. The relative contributions of width and variance were performed toquantify the depth to the area values obtained for Akan are il- significance of these observed effects: one grouped lustrated in Figure 8. recorded tokens into source groupirxr: of target Recall that width is a measure of side-to-side or (the original stimulus words), sagittal run, and lateral distance, and depth is a measure of ante- axial run; the second analysis grouped them into rior/posterior distance corresponding to root ad- target, pre-scan, and post-scan source groupings. An MRI-based Study of Pharyngeal Volume Contrasts in Akan 119

WID TH DEP TH 16 /it -/1/ 16 14 Epiglottis 14 Epiglottis 12 12 10 10 8 8 6 6 4 4 2 2

-2 -2 -4 -4 -6 -6 -25 - 0-1 -50 0 5 5 :20 :15 :10 T-5'0'510 '15 20 25 mm (increast ng heiffit through phatyrud mm (increasingheight through pharynx)

16 16 14 Epiglottis 14 Epiglottis 12 12 10 10 a 6 6

2 2 0 0 -2 -2 -4 -4 -6 250 -15 - 0 -505 0 5 5 5 05 -10 -5 0 5 0 50 5 rnn1 (tn creast ng heignt througn phatyn)d mm (increastng height through pharynx) .-

/u/- /u/ /u/ 16 Epiglottis Epiglottis 14 12 10 8 6 E4 2

-2 -4 6 25 215 -'10-15'0'5110 150 25 5 220 215 .!10 15'0'5'10'152025 EincreaF4 ng height through pharymJ mm 111M (increasing height through pharynx]--a.

Flpre 8. Akan pharyngeal width and depth.

126 120 Tiede

AKAN 500 lil ENGLI SH 500 hi - 400 Epiglottis 400 Epiglottis 300 200 300 N. g 100 200 E 100 0 0 -100 -200 -100 -200 -300 25 - 0 -15 -10 -50 51050 -300 rnm (increasing height through phatynx) -25 -20 -15 -10 -505 10 15 20 25 mm (increasing height thrcugh pharynx)111.

500 lel - Id 500 lel - lel 400 Epiglottis 400 Epiglottis 300 300 200 200 g 100 g 100 0 0 -100 -100 -200 -200 -300 -300 5 - 0 -15 -1 -5 I 5 1 5 -20 -15 -10 -50 5 05 mm (increasing (legit through pharynx) min (Increasing height through pharynx) ..

IuI- /u/ lul -hI 500 Epiglottis 500 Epiglottis 400 400 300 300 200 200 E100 g 100 0 -100 -100 -200 -200 -300 -300 2520 15 10 4-5 40 510 )5 '25 5 0 -15 -10 -505 0 5 5 MITI Dna-easing height through pharynx) mm (increasing height through pharynx)

Figure 9. Comparison of cross-sectional area.

Z 7 An MRI-based Study of Pharyngeal Volume Contrasts in Akan 121

AKAN ENGLISH 16 [il-hi 16 hl - Il 14 Epiglottis 14 Epiglottis 12 12 10 10 8 6 6 4 4 2 2 0 0 -2 -2 -4 -6 -6 5 0 -15 - 0 -505 0505 -25 - -15 -10 .6051015 mm (increasing hdght through pharynx) mm (increasing height throtghpharynx)

lel - lel /e/ - /e/ 16 16 14 14 Epiglottis 12 12 10 10 8 8 6 6 E4 2 2

-2 -2 -4 4 -6 25 - 5 -10 -50 51015 mm (ina-eastng height through pharynx) mm (increasing he(ght throughpharynx)

1W-lW 16 16 14 Epiglottis 14 Epiglottis 12 12 10 10 8 8 6 c 6 E 4 t4 2 2

-2 -2 -4 -4 -6 -6 5 -2D -15 -10 -50 5 05 05 -25 -20 -15 - 0 -505 0 505 mm (increasing hdght through pharynx) ... mm (Ina-easing hdght through pharynx)

Figure 10. Comparison of pharyngeal width.

128 /22 Tiede

AKAN ENGLISH 16 li/ - hl 16 14 Epiglottis 14 Epiglottis 12 12 10 10 8 8 6 6 4 2 2 0 0 -2 -2 -4 -4 6 -6 5 220 -15 210 1-5'0'5'10 15 -20 25 mm (increasing height through pharynx) mm (increasing height through pharynx)

lel - lel ful-lu/ 16 14 Epiglottis Epiglottis 12 10 8 6

2 2 0 0 -2 -2 -4 -4 5 0 -15 - 0 -505 0505 mm (Increasing heigrit through pharynx) as mm (Increastng height through pharynx)

/u/ - /u/ lel - kl 16 16 14 Epiglottis 14 Epiglottis 12 12 10 10 8 8 6 6 4 4 2 2 0 0 -2 -2 -4 -4 -6 -6 -25 -20 -15 -10 -50510 15 20 25 5-20-15-10 -50 5 050 5 mm Oncreastng height througn pharyn4 mm (Increasing height through pharynx)

Figure 11. Comparison of pharyngeal depth. An MRI-based Study of Pharyngeal Volume Contrasts in Akan 123

F2- F1 2500 1000 200 1201

i I i 30 e U Akan e i English I 0 0 400

E u 1 U 500 3 e

600 a

m 700

800 a F1

Figure 12. Combined vowel space.

The results of both analyses confirmed the correlation) from the separate formant analyses. significance of the ATR and Tense/Lax effects on The values obtained (listed in Table 5) show that Fl (p < .0001) and F3 (p < .05), showed strong token source category was in fact a significant significance for the effect of the Tense/Lax source of variance for both subjects, both as a contrast on F2 (p < .0001), and showed no main effect, and in interactions with vowel and significance for ATR effects on F2. ATR factors, but the results were mixed Figures 13 and 14 show vowel space plots for depending on how the runtime tokens were subjects AO (Akan) and MT (English) respectively grouped. When analyzed by type (i.e. target, pre- incorporating both stimulus and run time formant scan, post-scan), the Akan subject showed a values. Missing values for certain configurations marginally significant (p < .05) interaction of resulted from subjects initiating phonation after source with ATR (the feature of primary interest), scanning had commenced, or respiring as it ended but no significance for this interaction when (recall that machine noise precluded analysis of analyzed by run (i.e. target, sagittal, axial). This phonation recorded during scanning). Ideally the suggests that while subject AO .drifted slightly formant values measured for a given vowel should from the initial target over time, the way in which be independent of its source category; in practice he did so was consistent across runs, at least with this expectation was unrealistic, given that the respect to ATR effects. The English subject MT target vowels were separately recorded under showed the reverse pattern: the interaction of acoustically ideal conditions, while pre- and post- source with the Tense/Lax parameter was scan tokens were recorded with subjects prone marginally significant (p < .05) for the analysis of and encased by several tons of intimidating high tokens grouped by run, but not significant technology.n Effects due to token source on grouped by type, indicating that while this subject measured formant values were assessed by produced slightly different vowels across runs, he computing Wilks' A (a measure of multivariate did not drift significantly during scanning.

1 3 (1 124 Tiede

F2 - Fl

2500 1000 200 cm, i i 1 ,

U 300 C U 1 a 0 i 0 i target 400 irun C t

, e U u 50C 3

60C

70(

80( a a Fl 90(

Figure 13. Subject AO (Akan) vowel space.

F2 - Fl

2000 1000 400 oo

1

50 U ii U i target iva SI irun e 150 C 0

II

I 0 450 0 ... I c 1 500 e

r N 550 t C A

II

50

700Fl

Figure 14. Subject MT (English) vowel space.

131 An MRI-based Study of Pharyngeal Vtme Contrasts in Akan 125

Table 5. Wilks' Lambda values from MANOVA of formant data. Tokens grouped as Target:Sagittal:Axial

Akan Effect Value F-Value Num DF Den DF P-Value Vowel 0.001 94.142 6.000 18.000 0.0001 ATR 0.112 23.867 3.000 9.000 0.0001 Source 0.223 3.348 6.000 18.000 0.0215 Vowel * ATR 0.063 8.951 6.000 18.000 0.0001 Vowel * Source 0.153 2.847 9.000 22.054 0.0206 AM * Source 0.414 1.665 6.000 18.000 0.1871 Vowel * ATR * Source 0.415 1.656 6.000 18.000 0.1893 English Effect Value F-Value Num DF Den DF P-Value Vowel 0.001 136.776 6.000 28.000 0.0001 ATR 0.055 80.075 3.000 14.000 0.0001 Source 0.410 2.619 6.000 28.000 0.0383 Vowel * ATR 0.067 13.420 6.000 28.000 0.0001 Vowel * Source 0.149 3.281 12.000 37.332 0.0025 ATR * Source 0.369 3.020 6.000 28.000 0.0210 Vowel * ATR * Source 0.456 1.074 12.000 37.332 0.4078

Tokens grouped as Target:Pre:Post scan Akan Effect Value F-Value Num DF Den DF P-Value Vowel 0.001 140.397 6.000 20.000 0.0001 ATR 0.054 58.871 3.000 10.000 0.0001 Source 0.086 8.057 6.000 20.000 0.0002 Vowel * ATR 0.080 8.483 6.000 20.000 0.0001 Vowel * Source 0.172 2.889 9.000 24.488 0.0174 ATR * Source 0.268 3.106 6.000 20.000 0.0257 Vowel * ATR * Source 0.684 1.542 3.000 10.000 0.2639 English Effect Value F-Value Num DF Den DF P-Value Vowel 0.000 205.738 6.000 28.000 0.0001 ATR 0.060 72.659 3.000 14.000 0.0001 Source 0.167 6.748 6.000 28.000 0.0002 Vowel * ATR 0.016 32.756 6.000 28.000 0.0001 Vowel * Source 0.058 6.016 12.000 37.332 0.0001 ATR * Source 0.520 1.805 6.000 28.000 0.1343 Vowel * ATR * Source 0.304 1.765 I 2.000 37.332 0.0907

Recall that the purpose of collecting acoustic vowel configuration, and does not indicate subject data was to verify subjects were infact phonation drifted so much as to seriously skew the articulating the desired vowel configuration imaged tract configurations. throughout the scanning sequence. The above results indicate that due possibly to nervousness Discussion or fatigue subjects were not in fact entirely The fact that the English sagittal measurements consistent in vowel production. However, if the F- show smaller differences in magnitude for both values derived using Wilks' A are regarded as tongue root advancement and laryngeal lowering reflecting the relative importance of the individual than Akan, and greater differences in tongue factors, observe that vowel type and ATR and dorsum height, suggests a relatively more their interaction show larger values than any significant role for tongue height in maintaining term involving token source category, for both the English contrast. It should be noted however languages and both analyses; in other words while that root advancement and dorsum height may be the source factor is significant, it is less important intrinsically linked. An electromyographic study relative to the other parameters determining by Baer, Alfonso, and Honda (1988) claimed that

1 3 2 126 Tiede the same contraction of the posterior genioglossus are encompassed in the Language x Group x ATR (GGP) muscle effecting tongue root advancement interaction. Results of the analysis for the width also forces the tongue dorsum upwards, while parameter showed no significance for this interac- contraction of the separately controlled anterior tion (F=1.12), but moderate significance for depth genioglossus (GGA) pulls the dorsum forward and (F=9.48, p<.05), and strong significance for area down. The results obtained here suggest that GGA (F=15.84, p<.005), reflecting the difference in activity is greater in Akan than in English: active subepiglottal behavior between the two subjects. control of the GGA in Akan works against the Further evidence of divergent behavior is dorsum-raising effect of the GGP, resulting in provided by a regression analysis of width against smaller net dorsum height differences than those depth for corresponding measurement levels, observed for English. This predicts that the summarized in Table 6, and illustrated in Figure correlationof dorsum height with root 15. The results show a significant (p < .01) advancement should be stronger in English than positive correlation between Akan width and in Akan, which is confirmed by a regression depth across all measurement levels, but no analysis outlined in the table below: correlation for the same analysis on English. (8) Dorsum Height regressed against Root When the English values are analyzed separately Advancement (all vowels) by upper/lower grouping however, the results Subject AO Subject MT show a significant (p < .01) positive correlation (Akan) (English) between width and depth for measurement levels d.f. 4 above the epiglottal pivot, and a significant (p < Pearson's r 0.226 0.782 .01) negative correlation at levels below it; in other r2 0.051 0.611 words below the level of the epiglottis in English t-ratio 0.465 2.51 an increase in anterior/posterior depth is n.s. <0.05 accompanied by a decrease in lateral width. Separate analysis of the Akan values by group Assuming the pattern of muscle activity men- shows significant (p < .01) positive correlations tioned above, one apparent difference therefore between width and depth for measurement levels between the ATR and Tense/Lax mechanisms lies above and below the epiglottal pivot. in how each language treats the inherent dorsum- raising effect of root advancement: Akan seeks to Table 6. Regression of Axial Width and Depth. neutralize its effect, whereas English (at least in this subject's dialect) appears to exploit it. Another differenceis apparent from the All Vowels patterning of axial data at measured levels below Subject AO Subject MT the epiglottis. With one exception (area measured (Akan) (English) at the three lowest levels of lel), the area, width, d.f. 58 58 and depth measurements obtained for Akan show Pearson's r 0.443 0.090 consistently larger values for expanded (+ATR) r2 0.196 0.008 t-ratio 3.76 -0.687 variants at all measured levels, above and below <0.01 n.s. the epiglottis. But while the English data also show consistently larger values above the Akan Vowels (Subject AO) epiglottal pivot, at levels below that point below epiglottis above epiglottis differences between tense and lax variants are d.f. 28 28 inconsistent in sign and considerably smaller in Pearson's r 0.545 0.744 magnitude. The change occurs abruptly, and is r-') 0.297 0.553 evident to some degree in all three parameters t-ratio 3.44 5.89 measured. p <0.01 <0.01 An attempt was made to quantify the signifi- cance of this divergence by performing an analysis English Vowels (Subject MT) of variance on each measurement parameter, below epiglottis above epiglottis treating the individual levels as repeated mea- d.f. 28 28 sures nested within an upper or lower grouping Pearson's r -0.671 0.680 factor.12 Under this design, inter-language differ- 0.450 0.462 ences in behavior above and below the epiglottis t-ratio -4.79 4.91 <0.01 <0.01 between expanded and constricted configurations

133 An MRI-based Study of Pharyngeal Volume Contrasts in Akan 117

Akan: all vowels English: below epiglottis y.751x + .571. r2 ..196 y-1.024 + 62.433. r2 .45

25 30 35 40 45 wiith (mxr)

English: all vowels English: above epiglottis y.167x + 17.009, r2.008 1.475x - 33.627, r2 . .462 45 40 40- 35 35- 30 30 25 .525-

-§* -8 20 20

15- 15

10 10 20 25 30 35 40 45 30 32 34 36 38 40 42 44 46 wi:kh (nrn) tsickh (mon)

Figure 15. Regressior of axial width x depth.

The fact that this divergence in behavior depth also supports the hypothesized non- appears to occur exactly at the level of the base of involvement of the medial constrictor in the the epiglottis suggests an anatomical explanation. Tense/Lax mechanism: when GGP action on the It can be seen from the sagittal images that the tongue pullsit up and forward in Tense base of the epiglottis is at approximately the same configurations, the relaxed constrictor does not height as the hyoid, which serves as an anchor for oppose it, and so the lateral pharyngeal walls the medial pharyngeal constrictor. The primary collapse slightly below the epiglottis as a function of this muscle is to control pharyngeal consequence of the advanced tongue root. In Akan aperture during swallowing in the region of the however active control of the constrictor results in pharynx between the level of the hyoid and the the pharyngeal walls being tensed in +ATR con- upper larynx. Since this is where the Akan and figurations, preventing similar deformation of the English patterns diverge, the source of the cross- lateral sides. Hardcastle (1976) refers to two language difference in subepiglottal behavior (for modes of pharyngeal muscle contraction: an these two subjects at least) may be the active "isotonic" mode inducing sphincteral narrowing of involvement of this muscle in effecting the ATR pharyngeal aperture, and an "isometric" mode contrast, and its lack of involvement in the serving to tense pharyngeal walls without Tense/Lax distinction: active control of pharyngeal reducing aperture. Under this account of the aperture would account for the consistently larger differences in subepiglottal behavior between (expanded-constricted) difference values English and Akan, English does not actively observed for Akan, and its presumed absence control the pharyngeal constrictor, whereas Akan would explain the small and inconsistent uses isotonic contraction of the pharyngeal differences observed for English. The trading constrictor to minimize pharyngeal aperture in relation observed between English width and [-APR) configurations, and isometric contraction

134 128 Tiede in (+ATM configurations to prevent deformation constrictor and possibly the anterior genioglos- of pharyngeal lateral walls. Pharyngeal isometric sus), from an articulatory standpoint it appears to tension may be significant in a different context: be more complex with respect to the active control Hardcastle (1973) has suggested that it may be and coordination needed to produce it. important in the production of tensed initial stops Conclusions in Korean. Assuming that the patterns observed for these Although exploratory in scope, this study did two subjects are in fact representative ofAkan succeed in achieving certain objectives.It and English, it is evident that the ATR and demonstTated that despite its inherent drawbacks Tense/Lax distinctions are only superficially Magnetic Resonance Imaging can be a useful similar. In producing +ATR vowels Akan speakers technique for obtaining information about static enlarge the pharyngeal cavity by advancing the vocal tract configurations, one that will root of the tongue, lowering the larynx, and undoubtedly increase in importance as the maintaining tension in the pharyngeal walls. In technology is further refined. Measurements from -ATR configurations the rootis retracted, the sagittal images collected here successfully pharyngeal aperture is constricted, and the larynx replicated previous results from cineradiographic is raised, minimizing pharyngeal volume. Because studies showing the significance of tongue root speakers appear to make adjustments to maintain advancement and larynx height in effecting the relatively constant dorsum height across the two Akan ATR contrast, and those obtained from axial configurations, the ATR distinction is essentially a images extended these results for the first time contrast in pharyngeal volume. Tense vowels in into the dimension of lateral width. The axial English are also articulated with an advanced measurements confirm Lindau's position that it is tongue root, enlarging the pharyngeal cavity as a the overall difference in pharyngeal volume that is consequence; but below the level of the epiglottis relevant to the Akan vowel contrast, not just this enlargementiscounteracted by the relative larynx or tongue root positions. Finally deformation of the lateral walls from lack of the parallel analysis of the English Tense/Lax constrictortension.Inlax configurations contrast gave results showing that despite pharyngeal volume is smaller, but it is unclear superficial similarities with the Akan ATR whether this is due to actual constriction by the distinction, the two contrasts are not the same. medial constrictor, or simply the relaxed position of the tongue root. The dorsurn-raising effect of REFERENCES root advancement is not adjusted for in English, Baer, T., Alfonso, P., & K. Honda (1988) Electromyography of the and instead constitutes an integral part of the tongue muscles during vowels in /@pVp/ environment. contrast. Annual Bulletin of the Research Institute for Logopedics and Generalizations of this sort are somewhat pre- Phoniatrics (University of Tokyo),22, 7-19. sumptuous given the limited scope of this study, Baer, T., Gore, J. C., Gracco, L. C., & Nye, P. W. (1991). Analysis of vocal tract shape and dimensionsusingmagnetic resonance and those made for English in particular should imaging:Vowels. Journal of the Acoustical Society of America, 90 be viewed in the context of the Ladefoged et a1. (2), 799-828. (1972) and Raphael & Bell-Berti (1975) findings Bradley, W. G., Newton, T. H., & Crooks, L. E. (1983). Physical mentioned above, showing that different English principles of nuclear magnetic resonance. In T. H. Newton & D. speakers produce the Tense/Lax contrast differ- G. Potts (Eds.),Modern neuroradiology: Advanced imaging techniques(pp. 15-62). San Francisco: Clavadel Press. ently. Separate studies by Lindau in 1975 and Clements, G. N.(1980). Vowel harmony in a nonlinear generative 1979 involving four subjects each found consistent phonology: An autosegmental model.Indiana University articulatory implementation of the ATR mecha- Linguistics Club. nism, so the generalizations made here for Akan Dolphyne, F. A. (1988)The Akan (Twi-Fante) Language.Accra: are perhaps on firmer ground, especially as the Ghana Universities Press. Halle, M., & Stevens, K. N. (1969). On the feature Advanced sagittal results of this study replicated her find- Tongue Root.Quarterly Progress Report,94, Research Laboratory ings. In any case, while different speakers of of Electronics, Massachusetts Institute of Technology, 209-215. English may approximate more or less closely the Hardcastle, W. J. (1973). Some observations on the tense-lax Akan ATR articulatory mechanism, the point is distinction in initial stops in Korean.Journal of Phonetics,1, 263- that the Tense/Lax contrast is not identical to it, 272. Hardcastle, W. J. (1976).Physiology of speech production.New York: and must be represented by a different feature. Academic Press. Furthermore, because the Akan distinction ap- Harshman, R., Ladefoged, P., & Goldstein, L. (1977). Factor pears to involve two muscles that the English analysis of tongue shapes.Journal of the Acoustical Society of contrast does not exploit (the medial pharyngeal America, 62, 693-707.

135 An MRI-based Study of Pharyngeal Volume Contrasts in Akan 119

Jackson, M. T. (1988). Phonetic theory and cross-linguistic 5To facilitate cross-language comparison in the following vanation in vowel articulation. Working Papers in Phonetics, 71, discussionIfreely apply the terms "expanded" and Los Angeles: University of California. "constricted" to both Akan and English, but do not mean to Ladefoged, P. (1964). A phonetic study of West African languages. imply by this that the same feature mechanism is involved in Cambridge: University Press. both languages; nor am I referring to Lindau's term for Ladefoged, P., DeClerk, J., Lindau, M., St Papcun, G. (1972). An the Akan feature. The terms are simply meant as shorthand auditory-motor theory of speech production. Working Papers in physicaldescriptionsoftherespectivecontrasts, Phonetics, 22, 48-75. Los Angeles: University of California. so that "expanded" for example should be understood as Lakshminarayanan, A. V., Lee, S., St McCutcheon, M. J. (1991). [+ATR] in the context of Akan, and [-ETensej in the context of MR Imaging of the vocal tract during vowel production. Journal English. of Magnetic Resonance Imaging, 1, 71-76. 6Image version 1.29q by W. Rasband, Lindau, NI. (1975). [Features) For Vowels. Working Papers in 7Image quality was inferior to that obtained by Baer et al. (1991); Phonetics, 30. Los Angeles: University of California. however those researchers used custom-made cephalostats and Lindau, M. (1979). The feature expanded. Journal of Phonetics, 7, longer scanning times in obtaining their best results. 163-176. 8The scanning process resolved a given volume into a flat 256 x Perkell, J. (1971). Physiology of speech production: a preliminary 256 pixel image. Sagittal scanning used a 280mm width x study of two suggested revisions of the features specifying 280mm depth x 3mm thickness giving a resolution of vowels. Quarterly Progress Report, 102, Research Laboratory of 1.09rnm/pix. Half the axial scans were done using a 280mm x Electronics, Massachusetts Institute of Technology, 123-139. 280mm x 5mm volume resulting in the same 1.09mm /pix Raphael, L. J., St Bell-Berti, F. (1975). Tongue musculature and the resolution, and half were done using a 300mm x 300mm x 5mm feature of tension in English vowels. Phonetica, 32, 61-73. volume giving a resolution of 1.17mm/pix. See Tables 1 and 2 Stewart, J. M. (1967). Tongue root position in Akan vowel for full scanning specifications used. harmony. Phonetica, 16,185-204. 9Each volume element represents(scan areascan thickness), Stewart, J. M. (1971). Niger-Congo, Kwa. In T. Sebeok (Eds.), where scanned image thickness was a constant 5mm (see Current trends in linguistics (pp. 179-2121. The Hague: Mouton. preceding footnote). 1611e uppermost axial levels measured for Akan /e:e/ show FOOTNOTES (slightly) larger depth values for the contracted variant, hligh vowels from both harmony groups (and low /a/) also reversing the pattern found everywhere else. One possible have nasalized counterparts which were not investigated in explanation is that at these levels scanning has moved out of this study. the region manipulated for the ATR contrast, and into the 2In Clements' (1980) analysis of Akan vowel harmony /a/ is relatively invariant region of the tongue constriction treated as an opaque vowel that induces the [-ATRI feature on characterizing the vowel. any subsequent vowels. Some dialects of Akan have an I I Both subjects r?ported feelings of nervousness when first placed additional (1-FATRI) front vowel /3/ that harmonizes with I a/; inside the magnet; as the diameter of the central bore is however the exact distribution of this vowel is unclear, and it is approximately two feet only, claustrophobia is definitely a ignored here. potential problem in studies of this type. 'The magnetic resonance technique relies on strong magnetic 12For purposes of symmetry the uppermost level was dropped fields to align the magnetic moments of hydrogen nuclei, and from the analysis, so that the five subepiglottal levels pulsed radio-frequency energy to set them into resonance. The constituted the 'lower' group, and the epiglottal pivot and its signal intensity in a MR image depends on the density of the four succeeding levels the 'upper.' ANOVA design (levels as hydrogen nuclei in the scanned tissue and its resonance decay repeated measures nested in group): level (measurement level, characteristics, determined by the atomic environment of the 1-5); group (below/above epiglottis); vowel (I /E/U); ATR protons within each molecule. See Bradley et al. (1983) for (expanded/constricted); language (Akan/English). The further information. levelvowelATRlanguage interaction was used as the error 4Personal communication from my Akan informant. term (16df).

136 Haskins Laboratories Status Report on Speech Research 1993, SR-113, 131-134

Thai*

M. R. Kalaya Tingsabadht and Arthur S.Abramsont

Stand r'.-d Thai is spoken by educated speakers lent to Eugenie J. A. Henderson's "combinative in every part of Thailand, used in news broadcasts style" (Henderson, 1949). In a more deliberate on radio and television, taught in school, and reading of the text many words in the passage described in grammar books and dictionaries. It would be transcribed differently. The main fea- has evolved historically from Central Thai, the tures subject to such stylistic variation are vowel regional dialect of Bangkok and the surrounding quantity, tone, and glottal stop. Thus, for exam- provinces. ple, /ted 'but' is likely under weak stress to be /t1 The transcription of a Thai translation of The with a short vowel; the modal auxiliary /tcii3/ North Wind and the Sun is based on recordings 'about to' becomes Acii, with change of tone from made by three cultivated speakers of the lan- low to mid and loss of final glottal stop. guage, who were asked to read the passage in a re- The prosodic and syntactic factors that seem laxed way. In fact, we find them all to have used a to be at work here remain to be thoroughly fairly formal colloquial style, apparently equiva- explored. Consonants

Post- BilabialLab-dent.DentalAlveolaralveolarPalatal Velar Glottal

Plosive p ph b t th d k kh ?

Nasal m n a

Fricative f s h

Affricate tc tch

Trill r Approxi- m ant j w Lateral Approx. I p pi: 'elder aunt' t tim to pound' k kiaj 'fish bone' phphi: 'cloth' th thim 'to do' 0 khi:13 'side' b bi: 'insane' d dim 'black' 13 qi: 'tusk' m mi: 'to come' n ni: `ricefield' w win 'day' f fij 'pimple' s saj 'clear' j ji:m 'watchman' r rak `to love' 7 ?tian 'fat' I lik to steal' h hij 'earthen jar' tc tci:m to sneeze' tch whim 'dish'

131

137 132 Tingsabadh and Abramson

Vowels with confidence in the vowel quadrilateral. There are nine vowels. Length is distinctive for The vowel /a/ in unstressed position, including the all the vowels. (In some phonological treatments, endings of the diphthongs hia tua usil, is likely to be NV is analyzed as /VV7.) Although small spectral somewhat raised in quality. The final segments differences between short and long counterparts of the other two sets of phonetic diphthongs: are psychoacoustically detectableand have some (1) [iu, eu, em, cm, ally am, iau] and (2) [ai, ai, ai, ai, effect on vowel identification (Abramson & Ren, ui,-1-1,uai, mai] are analyzed as Av / and /y 1990), we find the differences too subtle to place respectively.

ia ua

i krit 'dagger' i: kill `to cut' ia riin to study'

e ?en 'ligament' e: ?e:n to recline' umrtuin'house'

c phe? 'goat' C: phe. : to be defeated' uaruin 'to be a fin to dream' a: f i:n `to slice' provocative' a kl5q 'box' a: kla:g 'drum'

o khOn 'thick (soup)' o: khO:n to fell (a tree)'

u silt 'last, rearmost' lc sill 'to inhale'

1- win 'sil ver' T: dim 'to walk'

tu Ichtfin to go up' ur khltiln 'wave'

I. 3 8 Thai 133

Tones initial stops as the first element in clusters occurs There are five tones in Standard Thai: high /1, during the articulation of the secon 1 element, mid I" / , low % /, rising and falling r/. which must be a member of the set /1, r, Only /p, t, k, 2, m, n, sj, w, Y occur in syllable- khi: 'to get stuck' final position. Final /p, t, k, ?/ have no audible khi:'to engage in trade' kha: `galangar release. The final oral plosives are said to be accompanied by simultaneous glottal closure khi: kha:'leg' (Henderson, 1964, Harris, 1972). Final m is omitted in unstressed positions. Initial It/ and /f/ Stress are velarized before close front vowels. Primary stress falls on the final syllable of a The consonant /r/ is realized most frequently as word. The last primary stress before the end of a Er] but also as [r].Perceptual experiments major prosodic group commonly takes extra stress. (Abramson, 1962: 6-9) have shown that the distinction between /r/ and /1/ is not very robust; Conventions nevertheless, the normative attitude among The feature of aspiration is manifested in the speakers of Standard Thai is that they are expected fashion for the simple prevocalic oral separate phonemes, as given in Thai script. This stops /ph, th, kh/. The fairly long lag between the distinction is rather well maintained by some release of the stop and the onset of voicing is filled cultivated speakers, especially in formal speech; with turbulence, i.e., noise-excitation of the however, many show much vacillation, with a relatively unimpeded supraglottal vocal tract. In tendency to favor the lateral phone [1] in the the special case of the "aspirated" affricate /tch/, position of a single initial consonant. As the however, the noise during the voicing lag excites a second element of initial consonant clusters, both narrow postalveolar constriction, thus giving rise /1/ and /r/ tend to be deleted altogether. to local turbulence. It is necessarily the case, then, In plurisyllabic words, the low tone and the high that the constriction of the aspirated affricate tone on syllables containing the short vowel /a/ lasts longer than that of the inaspirate followed by the glottal stop in deliberate speech, (Abramson, 1989). Not surprisingly, it follows become the mid tone when unstressed, with loss of from these considerations that the aspiration of the glottal stop.

Transcription of recorded passage khi'ni? thl lOm'nuia k ,phrithlt I kimlig 'thiig kin 'wi: I 'khrij tci 'mi:k kwi 'kin I k5 'ml: 'phit'nuinj 'dir:n 'phin 'mi: I Sij 'stiia,kirenilw II jörUndia Iphri?ifit 'twig ,thkliig kin 'Ithrij lhim hij phüni 6:t 'Slia,kinbilV sim'ret 16:n I tci I pen 'phi: thi 'ml: 'kwi: II 0 le:wI liim'ntiia ki kri'phtir 'phit 'ji:0 'sit 'rEg II te jbj phit 'rem 'mi:k 'phiag 'dij k5 'chin) kritchip kip lila 'khuln phiaj nin le? 'nij thi 'sit I ,Iiim'ntia 1(3

'115m 'khwi:m phijijim II 'tci:k 'nin Iphrilith'it tctki 'sse:ij r5:n rE:0 ?M.( 'mi: II 'nik,d5r.n'thig 1(3 tt ?5:k 'than .sitt likurnia tainj icim tnj ji:m 'rip 'wi:I phri?ifit mi: philig 'kwi: on II

The passage in Thai script ' e... ?I E.F cia mix iirteriiit-sniiajiiwril ii1,46-i1l.11fIrl 11KU 0,111.11-11,fi 11111.1W 11 IkIlil U1-11111J1 LA_Lbn turic u1 1 si Invi lizIl10111C1EIVCIE111421-14-TI...L Ai fiffunin ti-thiw-11.611111)4fitilIDFIL ei.n1111$11r110D11 ii115 Dna u BIIID111,1,11C4 t1111,41141ryinal Lyit, Lin 1 n Inli iinflilir. AD v`ica Biz tpIlki3 -AMEN yri1tliant1111/1179.1iil au-um 111114f1 a 4i141fib- riumul.21$1 nit:INA-1J 6% maul/ 14 8,11111 ll fitthlilffil II ULM ilankFinnI, A 111J Vip En 1.2D1/1 111045 al F1C1 8 D.1 filAll Fi46J- .15D ULlisman 3)1 II-nal utronnofiL ibriwilITIDDFIITIA (11 litinfiumilD'$4 ilgID4 BD all 111145t1D11-71C1 El 1:1 Vas! Winn l'IM U 1

[3:S1 CY AVAILABLE 139 Tingsabadh and Abramson 134

N. C. Scott, & J. L. M. Trim (Eds.), In honour of Daniel Jones: REFERENCES Papers contributed on the occasionof his eightieth birthday 12 Abramson, A. S. (1962). The vowels and tonesof Standard Thai: September 1961 (pp. 41-424). London: Longmans. Acoustical measurements and experiments. Bloomington:Indiana University Research Center in Anthropology, Fo%lore, and FOOTNOTES Linguistics, Publication 20. Abramson, A. S. (1989). Laryngeal control in theplosives of 'Journal of the International Phonetic Association, in press. Standard Thai. Pasaa, 19, 85-93. *Department of Linguistics, Faculty of Arts, Chulalongkorn Abramson, A. S., & Ren, N. (1990). Distinctive vowellength: University, Bangkok Duration vs. spectrum in Thai. Journal of Phonetics, 18,79-92. Also Department of Linguistics, The University of Connecticut, Harris, J. G. (1972). Phonetic notes on some Siamese consonants. Storrs In J. G. Harris & R. B. Noss, (Eds.), Tai phonetics andphonology We thank Dr. Chalida Rojanawathanavuthi, Dr. Kingkarn (pp. 8-22). Bangkok: Central Institute of EnglishLanguage. Thepkanjana, and Miss Surangkana Kaewnamdee, whose Henderson, E. J. A. (1949). Prosodies in Siamese: Astudy in of the passage underlie our transcription. Part of the synthesis. Asia Major New Series 1, 189-215. work of the second author was supported by Grant HD01994 Henderson, E. J. A. (1964). Marginalia to Siamese phonetic from the U.S. National Institutes of Health to Haskins studies. In D. Abercrombie, D. B. Fry, P. A. D. MacCarthy, Laboratories. Haskins Laboratories Status Report on Speech Research 1993, SR-113, 135-144

On the Relations between Learning to Spell and Learning to Read*

Donald Shankweilert and Eric Lundquist*

The study of spelling is oddly neglected by The second section discusses how spelling and researchers in the cognitive sciences who devote reading are interleaved in a child newly themselves to reading. Experimentation and introduced to the orthography of English. Here, theories concerning printed word recognition one central question is precedence: Does the continue to proliferate. Spelling, by contrast, has ability to read words precede the ability to spell received short shrift, at least until fairly recently. them, or, alternatively, might some children be It is apparent that in our preoccupation with ready to apply the in writing reading, we have tended to downgrade spelling, before they can do so in reading? A related passing it by as though it were a low-level skill question isstrategy. Do children sometimes learned chiefly by rote. However, a look beneath approach the two tasks in very different ways? the surface at children's spellings quickly Finally, the last section discusses how analysis of convinces one that the common assumption is children's spellings may illuminate aspects of false. The ability to spell is an achievement no less orthographic learning that are not readily deserving of well- directed study than the ability accessible in the study of reading. to read. Yet spelling and reading are not quite opposite sides of a coin. Though each is party to a HOW WRITERS AND READERS ARE common code, the two skills are not identical. In EQUIPPED TO COPE WITH THE view of this, it is important to discover how INFORMATION PROVIDED BY AN development of the ability to spell words is phased with development of skill in reading them, and to ALPHABETIC SYSTEM discover how each activity may influence the Writing differs from natural and conventional other. Thus, this chapter is concerned with the signs in that it represents linguistic units, not relationship between reading and writing. meanings directly (DeFrancis, 1989; Mattingly, It is appropriate to begin by asking what infor- 1992). The question of how the orthography maps mation an alphabetic orthography provides for a the language is centrally relevant to the course of writer and reader, and to briefly review the possi- acquisition of reading and spelling. All forms of ble reasons why beginners often find it difficult to writing permit the reader to recover the individual understand the principle of alphabetic writing and words of a linguistic message. Giventhat to grasp how spellings represent linguistic struc- representation of words is the essence of writing, ture. In this connection, would an orthography it is important to appreciate that words are best suited for learning to spell differ from an or- phonological structures. To apprehend a word, thography best suited for learning to read? whether in speech or in print, is thus to apprehend (among other things) its phonology. But in the manner of doing this, A. M. Liberman We are indebted to Professor Peter Bryant of the University of Oxford for making it possible for one of us (DS) to study a (1989; 1992) notes that there is a fundamental group of children whose data are discussed in this chapter, and difference between speech on the one hand and for help with the analysis and much valuable discussion. reading and writing on the other. For a speaker or Thanks are due to Stephen Crain, Anne Fowler, Ram Frost, listener who knows a language, the language Leonard Katz, Alvin Liberman, and Hyla Rubin for their comments on earlier drafts. This work was supported in part apparatus produces and retrieves phonological by Grant HD-01994 to Haskins Laboratories from the National structures by means of processes that function Institute of Child Health and Human Development. automatically below the conscious level. Thus,

135 136 Shankweiler and Lundquist

Liberman notes that to utter a word one does not of the fact that the printed word CLAP has four need to know how the word is spelled, or even that segments only if they are aware that the spoken it can be spelled. The speech apparatus that forms word "clap" has four (phonemic) segments. part of the species-specific biological specialization Accordingly, in order to master an alphabetic for language "spells" the word for the speaker system it is not enough to know the phonetic (that is, it identifies and orders the segments). In values of the letters. That knowledge, necessary contrast, writing a word, or reading one, brings to though it is, is not sufficient. In order to fully the fore the need for some explicit understanding grasp the alphabetic principle, it is necessary, in of the word's internal structure. Since in an addition, to have the ability to decompose spoken alphabetic system, it is primarily phonemes that words phonemically. Indeed, experience shows are mapped, those who succeed inmastering the that there are many children who know letter- system would therefore need to grasp the phoneme correspondences yet have poor word phonemic principle and be able to analyze words decoding skills (Liberman, 1971). as sequences of phonemes. Considerable evidence now exists that children's The need that alphabetic orthographies present skill in segmenting words phonemically and their for conscious apprehension of phonemic structure progress in reading are, in fact, causallylinked poses special difficulties for a beginner (ee (e.g., Adams, 1990; Ball & Blachman, 1988; 1991; Gleitman & Rozin, 1977; I. Y. Liberman, 1973; Bradley & Bryant, 1983; Byrne & Fielding- Liberman, Shankweiler, Fischer, & Carter, 1974). Barnsley, 1991; Goswami & Bryant, 1990; The nub of the problem is this: phonemes are an Lundberg, Frost, & Petersen, 1988). One would abstraction from speech, they are not speech also expect to find that the same kind of sounds as such. Hence, the nature of the relation relationshipprevailsbetween phoneme between alphabetic writing and speech is segmentation abilities and spelling. And, indeed, necessarily indirect and, as we now know, often the data are consistent with that expectation. proves difficult for a child or a beginner of any age Studies by Zifcak (1984) and Liberman et al. to apprehend. In order to understand why this is (1985) have shown substantial correlations so it will pay us to dwell for a moment on the ways between performance on tests of phoneme in which it is misleading to suppose that an segmentation of spoken words and the degree to alphabetic orthography represents speech sounds which all the phonemes are represented in (see Liiierman, Rubin, Duques, & Carlisle, 1985; children's spellings. The findings of Rohl and Liberman et al., 1974). Tunmer (1988) confirm this association. They First, the letters do not stand for segments that compared matched groups of older poor spellers are acoustically isolable in the speech signal. So, with younger normal ones and found that the poor for example, one does not find consonants and spellers did significantly less well on a test of vowels neatly segmented in a spectrogram in phoneme segmentation. (See also Bruck & correspondence with the way they are represented Treiman, 1990; Juel, Griffith, & Gough, 1986, and in print. Instead phonemes are co-articulated, Perin, 1983). thus overlappingly produced, in syllable- sized The complex relation between phonemic bundles. Accordingly, apprehension of the segments and the physical topography of speech is separate existences of phonemes and their serial one sense in which alphabetic writing represents order requires that one adopt an analytic stance speech sounds only remotely. This, we have that differs from the stance we ordinarily adopt in supposed, constitutes an obstaclefor the speech communications, in which the attention is beginning reader/writer to the extent that it directed to the content of an utterance, not to its makes the alphabetic principle difficult to grasp phonological form. In view of this,it is not and difficult to apply. Two further sources of the surprising to discover that preschool children have abstractness of the orthography should also be difficulty in segmenting spoken words by phoneme mentioned, which may be especially relevant to (see Liberman et al., 1989; Marais, 1991 for the later stages of learning to read and to spell. reviews). First, alphabetic orthographies are selective in Without some awarenessof phonemic regard to those aspects of phonological structure segmentation, it would be impossible for a that receive explicit representation in the beginning reader or writer to make sense of the spellings of words (Klima, 1972; Liberman et al., match between the structure of the printed word 1985). No natural writing system incorporates the and the structure of the spoken word. So, for kind of phonetic detail that is captured in the example, writers and readers can take advantage special-purpose phonetic writing that linguists On the Relations betzveen Learning to Spell and Learning to Read 137 use. Much context- conditioned phonetic variation ambiguity in cases in which context does not is ignored in conventional alphabetic writing,1 in immediately resolve the matter. Writers, on the addition to the variation associated with dialect other hand, would perhaps be better served by a and idiolect. Hence, conventional writing does not system that minimizes inconsistencies in mapping aim to capture the phonetic surface of speech, but the surface phonology. For writers, the presence of aims instead to create a more generally useful homophones which are distinguished by their abstraction. It is enlightening to note, in this spellings increases the arbitrariness of the connection, that young children's "invented orthography, and hence the burden on memory. spellings"2 often differ from the standard system Because it has to serve for both purposes, the in treating English writing as though it were more standard system can be regarded asa nearly phonetic than it is (Read, 1971; 1986). compromise, in some instances favoring readers A second source of abstractness stems from the and in other instances favoring writers. fact that the spelling of English is more nearly Scrutiny of the words that users of English find morphophonemic than phonemic. English difficult to spell confirms that morphologically orthography gives greater weight to the complex words are among those most often morphological structure of words than is the case misspelled (Carlisle, 1987; Fischer, Shankweiler, with some other alphabetic orthographies, for & Liberman, 1985). Carlisle (1988) notes that in example, Italian (see Cossu, Shankweiler, derived words the attachment of a suffix to the Liberman, Katz & To la, 1988) and Serbo-Croatian base may involve a simple addition resulting in no (see OFpienovid , Lukatela, Feldman, & Turvey, change in either pronunciation or spelling of the 1983). El nmples of morphological penetration in base (ENJOY, ENJOYMENT). Alternatively, the the writi 2g of English words are easy to find. A addition may result in a pronunciation change in ubiquitous phenomenon is the consistent use of s the base (HEAL, HEALTH), a spelling change but to mark the plural morpheme, even in those not a pronunciation change (GLORY, GLORIOUS) words, like DOGS, in which the suffixis or a case in which both spelling and pronunciation pronounced not [s], but [z]. The morphemic aspect change (DEEP, DEPTH). Difficulties in spelling of English writing appears also in spellings that morphologically complex words appear to stem in distinguish words that are homophones, for part from their phonological complexity and example, CITE, SITE; RIGHT, WRITE. irregular spellings. But they may also stem from The knowledge that spellings of some English failure to recognize and accurately partition words may sacrifice phonological transparency to derivationally related words. Carlisle (1988) capture morphological relationships brings into tested school children aged 8to13for perspective certain seeming irregularities, as morphological awareness. They were asked to several writers have noted (Chomsky & Halle, respond orally with the appropriate derived form, 1968; Klima, 1972; Venezky, 1970). Homophone given the base followed by a cueing sentence spellings are instances in which the two modes of designed to prompt a derivative word (e.g., representation, the phonemic and the morphemic, "Magic. The show was performed by a are partially in conflict (DeFrancis, 1989). In these It was found that awareness of derivative spellings the principle of alphabetic writing is relationships was very limited in the youngest compromised to a degree, but it is not abandoned, children, especially in cases in which the base since most letters are typically shared between undergoes phonological change in the derived words that have a common pronunciation. A form (as in the above example). Moreover, the lexicaldistinctionin homophone pairsis ability to produce derived forms has proven ordinarily indicated by the change of only a letter deficient in children and adults who are poor or two. Thus, homophone spellings in English spellers (Carlisle, 1987; Rubin, 1988). All in all, present an irregularity from a narrowly the evidence supports the expectation that both phonological standpoint, while nonetheless phonologic and morphologic aspects of linguistic keeping the irregularity within circumscribed awareness are relevant to success in spelling and limits. reading. Such examples are telling. They led DeFrancis So far we have discussed the common basis of (1989)to make a novel and stimulating reading and writing, pointing first to the great suggestion: that the needs of readars and writers divide that separates speech processes on the one may actually conflict to some degree.The hand from orthographic processes on the other. convention of distinct spellings for homophones Then we proceeded to identify the factors that would benefit readers by removing lexical make learning an alphabetic system difficult. The

143 738 Shankweiler and Lundquist idea was also introduced that reading and spelling repeatedly. Some writers have suggested that, may tax orthographic knowledge insomewhat contrary to the view that reading is easier, different ways. It is to these differences that we children may indeed be ready to write words, in turn next. some fashion, before they are able to use the alphabetic principle productively in reading. CAN CHILDREN APPLY THE Montessori (1964) expressed this view, and it has ALPHABETIC PRINCIPLE IN SPELLING more recently been articulated by several prominent researchers. In part, these claims are BEFORE THEY ARE ABLE TO APPLY IT based on experiences with preschool children who IN READING? were already writing using their owninvented The possibility that the needs of readers and spellings. Carol Chomsky (1971; 1979) stressed writers may differ with respect to the kind of that many young writers do this at a time when orthographic mapping that is easiest to learn they cannot read, and, indeed, may show little raises the broader issue of the relation between interest in reading what they have written. learning to write and learning to read. Does one Others who have proposed a lack of coordination precede the other? Do children adopt different between spelling and reading in children's strategies for the one than for the other? To acquisition of are Bradley and Bryant answer these questions we will want to examine (1979), Frith (1980), and Goswami and Bryant what is known about how spelling articulates with (1990). reading in new learners. In order to discuss the question of precedence As to the first question, one may wonder we must first consider how we are going to defme whether precedence is really an issue. Just as in spelling and reading. By spelling, do we mean primary language development, where it is often spelling a word according to conventional spelling? noted that children's perceptual skills run ahead To adopt that criterion would ignore the of their skills in production, so in written phenomenon of children's invented spelling. That language, too, it would seem commonsensical to would seem unwise since it is well-established suppose that a new learner's ability to read words that some children are able to write more or less would exceed the ability to spell them. Most users phonologically before they know standard of English orthography have probably had the spellings (Read, 1971; 1986). Itwould be experience of being unsure how to spell some appropriate for some purposes to credit a child for words that they recognize reliably in reading. spelling a word if the spelling the child produces Contributing to the difficulty is the fact that there approximates the word closely enough that it can is usually more than one way for a word to be be read as the intended word. spelled that would equivalently represent its The criterion of reading is in one sense less phonological structure. (Consider, for example, problematical, but in another sense it is more so. "clene" and "cleen" as equivalent transcriptions of For someone to be said to have read a word, that the word clean). The reader's task is to recognize word, and not some other word (or nonword) must the correspondence between a letter string that have been produced in response to the printed stands for a word (i.e., its morphophonological form. It is also relevant to ask how the response structure) and the corresponding word in the was arrived at. Words written in an alphabetic lexicon. It is not required that the reader know system can be approached in a phonologically exactly how to spell a word in order to read it analytic fashion or, alternatively, they can be only that the printed form (together with the learned and remembered holistically (i.e., as context) should provide sufficient cues to prompt though they were logographs). As Gough and recognition of the represented word and not some Hillinger (1980) stress, the difficulty with the other word. In contrast, the writer must generate logographic strategy is that it is self-limiting the one (and ordinarily only one) spelling that because it does not enable a reader to read new corresponds to the conventional standard. So it is words. Moreover, as the vocabulary grows and the natural to assume that spelling words requires number of visually similar words increases, the greater orthographic knowledge than reading memory burden becomes severe and the them. We therefore might expect that a beginner logographic strategy becomes progressively more would have the ability to read many words before inaccurate. Should we therefore consider someone necessarily being able to spell them correctly. a reader if she can identify high frequency words, Nonetheless, questions about precedence in the but cannot read low frequency words or nonwords? development of reading and writing have arisen There is some consensus that we should not (e.g.,

144 On the Relations between Learning to Spell and Learning to Read 139

Adams, 1990; Gleitman & Rozin, 1977; Gough & spelling tests are highly correlated, at least in Hillinger, 1980; Liberman & Shankweiler, 1979). older children and adults(Perfetti, 1985; The possibility of reading new words, not Shankweiler & Liberman, 1972). Bradley and previously encountered in print, is a special Bryant stressed that the correlation between advantage conferred by an alphabetic system. It is reading and spelling scores depends on the words reasonable to suppose that someone who has chosen. They proposed that the words that mastered the system will possess that ability. children at the beginning stages find difficult to However, in the view of some students of read are not always the words that are difficult to reading, most children when they begin to read, spell, and vice versa. Words that tended to be read and perhaps for a considerable time afterward, correctly but misspelled were words whose read logographically, and only later learn to spellings presented some irregularity, like EGG or exploit the alphabetic principle (Bradley & LIGHT, whereas words spelled and not read Bryant, 1979; Byrne, 1992; Gough & Hillinger, tended to be regular words, like MAT and BUN 1980). Given the absence of agreement as to what (Bradley & Bryant, 1979). is to be taken as sufficient evidence of reading The finding that the spell-only words and the ability, the question of whether spelling or reading read-only words did not overlap very much in the comes first is less the issue than whether children beginning would lend support to the hypothesis initially employ discrepant strategies for reading that children at this stage use different strategies and writing. for spelling and reading. The greater difficulty in The strategy question is brought into focus by spelling irregular words is what one would expect Goswami and Bryant (1990). As noted above, they if the children were attempting to spell according suppose that the child's initial strategy in reading to regular letter-to-phoneme correspondences. (the default strategy) is to approach alphabetically They would tend to regularize the irregular words written words as though they were logographs. and thus get them wrong. Moreover, the failure to They contend that children tend to do this even read regular words suggests that the children when they have had instruction designed to were using some nonanalytic strategy for reading, promotephonemicawareness. Reading responding perhaps to visual similarity. That analytically might require more advanced word would make them prone to miss easy words analysis skills than are available to most whenever their appearance is confusable with beginning readers. Writing, on the other hand, other words that look similar. If they were reading forces the child to think in terms of segments. The analytically they would read these words process of alphabetic writing is by its nature correctly. Thus, Bryant and his colleagues cite segmental and sequential: The writer forms one findings that seem to underscore the differences letter at a time and must order the letters between early reading and spelling. according to some plan. Thus, Goswami and Should we, then, accept Goswami and Bryant's Bryant suppose that children's initial approaches paradox and suppose that reading and writing are to writing would tend to be phonologically cognitively disjunct at the early stages, even in analytic. Goswami and Bryant (1990) find it children who have receivedtrainingin paradoxicalthatchildren'snewlyfound phonological awareness? We think not. First, as phonological awareness, which most often is the succeeding section shows, some data (to which introduced in the context of instruction in reading, we turn next) point to concurrent development of has an immediate effect on their spelling, but not reading and spelling skills. Secondly, it is too on their reading. "So at first there is a discrepancy early to assess fully the impact on children's and a separation between children's reading and reading and spelling of the several experimental spelling. It is still not clear why children are so approaches toinstructioninphonological willing to break up words into phonemes when awareness (e.g., Ball & Blachman, 1988; 1991; they write, and yet are so reluctant to think in Blachman, 1991; Byrne & Fielding-Barnsley, terms of phonemes when they read (p. 148)." 1991; in press). Therefore, we believe that the Bryant and his colleagues (see especially question must remain open. Bradley and Bryant, 1979) deserve much credit for A new research study, which coordinated the grasping the need for a coordinated approach to investigation of spelling and reading in six year the study of reading and spelling. They recognized olds (the subjects were selected only for age), does that this undertaking would require testing not find evidence that incompatible strategies are children on reading and spelling the same words. employed by beginners (Shankweiler, 1992). It is well known that performance on reading and Unlike the Bradley and Bryant study, the test

145 /40 Shankweiler and Lundquist words in this experiment included no words with approach has the disadvantage of throwing away irregular spellings. The test words did contain much of the potential information in the incorrect phonologicalcomplexities, however. Each responses. It fails to distinguish reading errors contained a consonant cluster at the beginning or that are near misses from errors that are wild the end. guesses, and it does not distinguish misspellings There was a wide range in level of achievement that capture much of a word's phonological within this group of six year olds. Nine of the 26 structure from those that capture little of it. If we children were unable to read and spell more than give partial credit for wrong responses, we must one word correctly. The remaining 17 wereable to create a scheme to evaluate the many possible read a mean of 70 percent of the words correctly ways of misspelling a word and assign relative but were able to correctly spell only 39 percent. weights to each. These findings show that the spelling difficulties As an illustration of how we might proceed, we of beginners are not confined to irregular words.3 turn again to the research study last described Regularly spelled words can cause difficulty if (Shankweiler, 1992). In this study, reading was they are phonologically complex, as when they assessed by the Decoding Skills Test (DST, contain consonant clusters. With the exception of Richardson, & Di Benedetto, 1986). The test one child, all read more words correctly than they consists of 60 real words, chosen to give were able to spell. Finally, analytic skill in representation to the major spelling patterns of reading, as indexed by ability to read nonwords, English, and, importantly, it also includes an was almost perfectly correlated (r = .93)with equal number of matching nonwords, the latter spelling performance (on a variety of real words).4 formed by changing one to three letters in each of These data do not sit well with the conclusion that the corresponding words. For the purposes at early reading and spelling are cognitively hand, phonotactically legal nonwords constitute dissociated. On the contrary, the findings lend the best measure of reading for assessing the support to the idea that skill in reading and skills of the beginning readerbecause only these spelling tend to develop concurrently over a wide can provide a true measure of decoding skill. range of individual differences in attainment. Because they are truly unfamiliar entities, It is notable that spelling accuracy consistently nonwords test whether a reader's knowledge of lagged somewhat behind reading. Only 6 percent the orthography is productive. As noted earlier, of the words were spelled correctly and read only that kind of knowledge enables someone to incorrectly, whereas 37 percent were read and not read new words not previously encountered in spelled. Thus the children showed what might be print (see Shankweiler, Crain, Brady, & expected to be true generally: that spelling the Macaruso, 1992). Responses to the Decoding Skills words would prove to be more difficult than Test were recorded on audiotape and transcribed reading them, if by reading we mean correct in IPA phonemic symbols for later comparison identification of individual words, and by spelling with the spelling measures. we mean spelling these words according to To gain a fine-grained measure of spelling for standard conventions. comparison with the reading error measures, the children's written spellings were scored phoneme INTERPRETING ERROR PATTERNS IN by phoneme, using the following categories: SPELLING AND READING So far we have been comparing spelling and Correct spelling reading at a coarse level of analysis. To address Phonologically acceptable substitute more rigorously the question of whether new (e.g., k for ck) learners us;:t similar or dissimilar strategies for Phonologically unacceptable substitute spelling and :eading we would wish to make a (e.g., c for ch) detailed compar:son between the error pattern in Phoneme not represented spelling word., and reading them. But, as it happens, thi.1 turns out to be a difficult thing to When we try to compare the error pattern in do. reading and spelling, we encounter a further difficulty: Reading is a covert process that is Problems of comparability assessed only by its effects. One cannot directly Most of the published information on the infer what goes on in the head when someone correlations between reading and spelling scores attempts to read a word. When we ask the child to is based simply on right/wrong scoring. This read aloud unconnected words in list form, we

146 On the Relations between Learning to Spell and Learning to Read 141 encounter an obstacle: children are often consonant that could represent the first phoneme unwilling to make their guesses public. Of course, in the word. A child who does this is apparently a beginning reader who is stuck on a particular aware that letters represent phonological entities word may be entertaining a specific hypothesis even though she is not yet able to analyze the about the word's identity, but in the absence of an internal structure of the syllable. Altogether, first overt response, we cannot discover the hypothesis consonants were represented in 95% of cases. and use it as a basis for inferring the source of the There was a strong tendency to omit the second difficulty. segment of a consonant cluster: that is, the L in Writing, on the other hand, leaves a visible CL, the T in ST, the M in SM, the R in CR, and so record of the writer", hypothesis about how to forth. These were omitted in 56% of occurrences, spell a word. The findings of the study we have yet when these consonants occurred alone in been discussing bear this out. Many of the initial position, they were rarely omitted. Bruck children declined the experimenter's invitation to and Treiman (1990) report the same trends, both guess at the words they were having difficulty in in normal children and dyslexics. Tht. tendency to reading. Yet the same children produced a omit the second segment from an initial cluster spelling for nearly every word they were asked to fits with Treiman's idea (1992) that children may write. The upshot is that we have nearly a initially use letters to represent syllable onsets complete set of responses to the spelling test, but and rimes rather than phonemes.5 many gapsinthe record occur on the The ability to represent the second segment of corresponding items on the reading test. This initial consonant clusters was a very good yields an unsatisfactory data base for comparing predictor of overall spelling achievement. It was the error pattern in spelling and reading. Thus, also a good predictor of the accuracy of word the kind of word-by-word comparison we would reading. Regression analysis showed that this like to make may be unattainable. part score accounted for 45 percent of the variance Nonetheless, there is much to be gained by a in either spelling or reading when a different set linguistic analysis of children's spellings. Indeed, of words is tested, after age, vocabulary (Dunn, it is chiefly through their writing, and not through Dunn, & Whetton, 1982) and a measure of their reading, that children reveal their phonemic segmentation skill (Kirtley, 1989) had hypotheses about the infrastructure of words. already been entered. Representation of the interior segment in final clusters does almost as Children's conceptions of the infrastructure of well when entered in the regression. The results of words as revealed in their spellings fine scoring give further support to the view that When encouraged to invent spellings for words, reading and spelling skill are closely linked even young children invent a system that is more in beginners. compatible with their linguistic intuitions than Why are consonant clusters a special source of the standard system. Whether the result difficulty? Two possibilities might be considered, corresponds to standard form is simply not a each related to the phonetic complexity of clusters. question that would occur to the child at this First,itis well known that clusters cause stage. In Carol Chomsky's words, creative spellers pronunciation difficulties for young children. "appear to be more interested in the activity than Perhaps the spelling error signals a general the product (1979, p. 46)." There is evidence that tendency to simplify these consonant clusters - a children's invented spellings tend to be closer to failure to perceive and produce them as two the phonetic surface than the spellings of the phonemes. But there was no indication that this standard system (Read, 1986). The standard was the case. All the children could pronounce the system of English, as we noted, maps lexical items cluster words without difficulty. at a level that is highly abstract, both because the An alternative possibility is that the children conventional system is morphophonemic, and had difficulty in conceptually breaking clusters because it tends not to transcribe phonetic detail apart and representing them as two phonemes. In that is predictable from general phonological that case, the difficulty in spelling could be seen rules. as a problem in phonological awareness. So, also, In the comparative study of reading and writing could the problems in reading the cluster words. in six year olds which we have discussed Reading analytically would require the reader to (Shankweiler, 1992), even the least-advanced decompose the word into its constituent segments, beginners, who wrote only a single letter to and the presence of clusters would increase the represent an entire word, usually chose a difficulty of making this analysis.

147 142 Shanktveiler and Lundquist

Research conducted during the past twodecades Whether a child initially adopts a logographic or has shown that phonological awareness is notall an analytic strategy for reading maydepend in of a piece. Full phoneme awareness is a late stage large part on the kind of pre-reading instruction in a process of maturation and learning thattakes the child was provided with. There is evidence years to complete (Bradley& Bryant, 1983; that both phonological awareness and knowledge Liberman et al., 1974; Morais, Cary, Alegria,& of letter-phoneme correspondences are important Bertelson, 1979; Treiman & Zukowski, 1991). to promote grasp of the alphabetic principle, and Although the order of acquisition is notcompletely are thus important to skillin spelling and settled, there is evidence that before they can decoding (Ball & Blachman, 1988; 1991; Bradley segment by phoneme children are able to segment & Bryant, 1983; Byrne & Fielding-Barnsley, 1991; spoken words using larger sublexical unitson- Gough, Juel & Griffith, 1992). Neither is sufficient sets and rimes, and syllables, particularlystressed alone. The phasing of these two necessary syllables that rhyme (Brady, Gipstein, & Fowler, components of instruction may turn out to be 1992; Liberman et al., 1974; Treiman, 1992). critical in determining the child's initial approach The role of literacy instruction in fostering the to the orthography. development of phonological awareness has been much discussed in the research literature (See REFERENCES chapters in Brady & Shankweiler, 1991, and in Adams, M. J. (1990). Beginning to read: Thinking and learning Gough, Ehri, & Treiman, 1992). In this about print. Cambridge, MA: MIT Press. connection, Treiman (1991) urges that an analysis Ball, E. W., & Blachman, B. A. (1988). Phoneme segmentation of spelling is the best route by which to study training: Effect on reading readiness. Annals of , 38, 208-225. those aspects of phonological awareness that Ball, E. W., & Blachman, B. A. (1991). Does phoneme awareness depend on experience with reading and writing. training in kindergarten make a difference in early word We would tend to agree. This is not to say, recognition and developmental spelling? Reading Research however, that writing, but not reading would feed Quarterly, 26, 49-66. Blachman, B. A. (1991). Phonological awareness: Implications for this development in young children. It is to be prereading and early reading instruction. In S. A. Brady & D. P. expected that a child's interest and curiosity about Shankweiler (Eds.), Phonological processes in literacy: A the one activity would encourage and nourish an tribute to Isabelle Y. Liberman. Hillsdale, NJ: Lawrence interest in the other.6 Erlbaum Associates. To sum up, because reading and writing are sec- Bradley, L., & Bryant, P. E. (1979). The independence of reading ondary language functions derived from spoken and spelling in backward and normal readers. Developmental Medicine and Child Neurology, 21, 504-514. language, they display a very different course of Bradley, L., & Bryant, P. E. (1983). Categorizing sounds and acquisition than speech itself: unlike speech, mas- learning to reada causal connection. Nature, 30, 419-421. tery of alphabetic writing requires facility in de- Brady, S. A., Gipstein, M., & Fowler, A. E. (1992). The composing words into phonemes and morphemes. development of phonological awareness in preschoolers. Since both reading and writing depend upon grasp Presentation at AERA annual meeting, Apri11992. Brady, S. A., & Shankweiler, D. P. (1991). Phonological processes in of the alphabetic principle, it could be expected literacy: A tribute to Isabelle Y. Liberman. Hillsdale, NJ: Lawrence that both would develop concurrently, though Erlhaum Associates. spelling, being the more difficult, would progress Bruck, M., & Treiman, R. (1990). Phonological awareness and more slowly. Several researchers, however, have spelling in normal children and dyslexics: The case of initial raised challenging questions about the order of consonant clusters. Journal of Experimental Child Psychology, 50, 156-178. precedence, suggesting that spelling, due to the Byrne, B. (1992). Studies in the Acquisition Procedure for inherently segmental nature of writing words al- Reading: Rationale, hypotheses, and data. In P. B. Gough, L. C. phabetically, emerges earlier than the ability to Ehri, & R. Treiman (Eds.), Reading acquisition (pp. 1-34). decode in reading. At present, the evidence is Hillsdale, NJ: Lawrence Erlbaum Associates. mixed. It is significant that recent research com- Byrne, B., & Fielding-Barnsley, R. (1990). Acquiring the alphabetic principle: A case for teaching recognition of phoneme identity. paring children's reading and spelling errors indi- Journal of Educational Psychology, 82, 805-812. cates that in both spelling and reading, regularly Byrne, B., & Fielding-Barnsley, R. (1991). Evaluation of a program spelled words present difficulties to beginners to teach to young children. Journal of when the words contain phonologically-complex Educational Psychology, 83, 451-455. consonant clusters. Thus, beginners' difficulties in Carlisle, J. F. (1987). The use of morphological knowledge in spelling derived forms by Learning Disabled and Normal reading and spelling do not necessarily involve Students. Annals of Dyslexia, 37, 90-108 different kinds of words, as had been suggested Carlisle, J. F. (1988). Knowledge of derivational morphology and earlier. This undercuts the claim of incompatible spelling ability in fourth, sixth, and eighth graders. Applied strategies. Psycholinguistics, 9, 247-266.

148 On the Relations between Learning to Spell and Learning to Read 143

Chomsky, C. (1971). Write first, read later. Childhood Education, 47, Biobehavioral measures of dyslexia (pp. 163-176). Parkton, MD: 296-299. York Press Chomsky, C. (1979). Approaching reading through invented Liberman, I. Y., & Shankweiler, D. (1979). Speech, the alphabet, spelling. In L. B. Resnick & P. A. Weaver (Eds.), Theory and and teaching to read. In L. B. Resnick & P. A. Weaver (Eds.). practice of early reading, vol. 2 (pp. 43-65). Hillsdale, NJ: Theory awl practice of early reading, vol. 2 (pp. 109-134). Hillsdale, Lawrence Erlbaum Associates. NJ: Lawrence Erlbaum Associates. Chomsky, N., & Halle, M. (1968). The sound pattern of English. New Liberman, I. Y., Shankweiler, D., Fischer, F. W., & Carter, B. York: Harper and Row. (1974). Explicit syllable and phoneme segmentation in the Cossu, G., Shankweiler, D., Liberman, I. Y., Tola, G., & Katz, L. young child. Journal of Experimental Child Psychology, 18, 201- (1988). Phoneme and syllable awareness in Italian children. 212. Applied Psycholinguistics, 9, 1-16. Liberman, I. Y., Shankweiler, D., & Liberman, A. M. (1989). The DeFrancis, J. (1989). Visible speech: The diverse oneness of writing alphabetic principle and learning to read. In D. Shankweiler & systems. Honolulu: University of Hawaii Press. I. Y. Liberman (Eds.), Phonology and reading disability: Solving the Dunn, L. M., Dunn, L. M., & Whetton, C. (1982). British picture reading puzzle. IARLD Monograph Series, Ann Arbor, MI: vocabulary scale. Berks., England: NFER-Nelson. University of Michigan Press. Ehri, L. C. (1989). The development of spelling knowledge and its Lundberg, I., Frost, J., & Petersen, 0.-P. (1988). Effects of an role in reading acquisition and reading disability. Journal of extensive program for stimulating phonological awareness in Learning Disabilities, 22, 356-365. preschool children. Reading Research Quarterly, 23, 263-284. Ehri, L. C., & Wilce, L. S. (1987). Does learning to spell help Mattingly, I. G. (1992). Linguistic awareness and orthographic beginners learn to read words? Reading Research Quarterly, 22, form. In L. Katz & R. Frost (Eds.), Orthi graphy, phonology, 47-64. morphology, and meaning (pp. 11-26). Amsterdam: Elsevier Fischer, F. W., Shankweiler, D., & Liberman, I. Y. (1985). Spelling Science Publishers. proficiency and sensitivity to word structure. Journal of Memory Montessori, M. (1964). The Montessori method. New York: Schocken and Language, 24, 423-441. Books. Frith, U. (1980). Unexpected spelling problems. In U. Frith (Ed.), Morais, J. (19911. Constraints on the development of phonemic Cognitive processes in spelling. London: Academic Press. awareness. In S. A. Brady & D. P. Shankweiler (Eds.), Gleitman, L. R., & Rozin, P. (1977). The structure and acquisition Phonological processes in literacy: A tribute to Isabelle Y. Liberman. of reading I: Relations between orthographies and the structure Hillsdale, NJ: Lawrence Erlbaum Associates. of the language. In A. S. Reber & D. L. Scarborough (Eds.), Morais, J., Cary, L., Alegria, J., & Bertelson, P. (1979). Does Toward a psychology of reading. Hillsdale, NJ: Lawrence Erlbaurn awareness of speech as a sequence of phonemes arise Associates. spontaneously? Cognition, 7, 323-331. Goswami, U., & Bryant, P. (1990). Phonological skills and learning to Ognjenovid, V., Lukatela, G., Feldman, L. B., & Turvey, M. T. read. Hillsdale, NJ: Lawrence Erlbaurn Associates. (1973). Misreadings by beginning readers of Serbo-Croatian. Gough, P. B., Ehri, L. C., & Treiman, R. (1992). Reading acquisition. Quarterly Journal of Experimental Psychology, 35A, 97-109. Hillsdale, NJ: Lawrence Erlbaum Associates. Penn, D. (1983). Phonemic segmentation and spelling. British Gough, P. B., & Hillinger, M. L. (1980). Learning to read: An Journal of Psychology, 74, 129-144. unnatural act. Bulletin of the Orton Dyslexia Society, 30, 179-196. Perfetti, C. A. (1985). Reading ability. New York: Oxford University Gough, P. B., Juel, C., & Griffith, P. L. (1992). Reading, Spelling, Press. and the orthographic cipher. In P. B. Gough, L. C. Ehri, & R. Read, C. (1971). Pre-school children's knowledge of English Treiman (Eds.), Reading acquisition (pp. 35-48). Hillsdale, NJ: phonology. Harvard Educational Review, 41, 1-34. Lawrence Erlbaum Associates. Read, C. (1986). Children's creative spelling. London: Routledge and Juel, C., Griffith, P. L., & Gough, P. B. (1986). Acquisition of Kegan Paul. literacy: A longitudinal study of children in first and second Richardson, E., & DiBenedetto, B. (1986). Decoding skills test. grade. Journal of Educational Psychology, 78, 243-255. Parkton, MD: York Press. Kirtley, C. L. NI. (1989). Onset and rime in children's phonological Rohl, M., & Tunmer, W. E. (1988). Phonemic segmentation skill development. Unpublished doctordissertation. University of and spelling acquisition. Applied Psyclwlinguistics, 9, 335-350. Oxford. Rubin, H. (1988). Morphological knowledge and early writing Klima, E. S. (1972). How alro;abets might rellect language. In J. F. ability. Language and Speech, 31, 337-355. Kavanagh & I. G. Mattingly (Eds.), Langu,. cc. by ear and by eye. Shankweiler, D. (1992). Surmounting the Consonant Cluster in Cambridge, MA: MIT Press. Beginning Reading and Writing. Presentation at AERA annual Liberman, A. M. (1989). Reading is hard lust Decause listening is meeting, Apri11992. easy. In C. von Euler, I. Lundberg & G. Lennerstrand (Eds.), Shankweiler, D., Crain, S., Brady, S., & Macaruso, P. (1992). Wenner-Gren Symposium Series 54, Brain and Reading. London: Identifying the causes of reading disability. In P. B. Cough, L. Macmillan. C Ehri, & R. Treiman (Eds.), Reading acquisition (pp. 275-305). Liberman, A. M. (1992). The relation of speech to reading and Hillsdale, NJ: Lawrence Erlbaum Associates. writing. In L. Katz & R. Frost (Eds.), Orthography, phonology, Shankweiler, D., & Liberman, I. Y. (1972). Misreading: A search morphology, and meaning (pp. 167-178). Amsterdam: Elsevier for causes. In J. F. Kavanagh & I. G. Mattingly (Eds.), Language Science Publishers. by car and by eye: The relationships between speech and reading (pp. Liberman, I. Y. (1971). Basic research in speech and lateralization 293-n7). Cambridge, MA: MIT Press. of language: Some implications for reading disability. Bulletin of Treirnan, R. (1985). Onsets and rimes as units of spoken syllables: the Orton Society, 21, 71-87. Evidence from children. Journal of Experimental Child Psychology, Liberman, I. Y. (1973). Segmentation of the spoken word. Bulletin 39, 161-181. of the Orton Society, 23, 65-77. Treiman, R., & Zukowski, A. (1991). Levels of phonological Liberman, I. Y., Rubin, H., Duques, S., & Carlisle, J. (1985). awareness. In S. A. Brady & D. P. Shankweiler (Eds.), Linguistic abilities and spelling proficiency in kindergartners Phonological processes in literacy: A Tribute to Isabelle Y. Liberman. and adult poor spellers. In D. B. Gray and J. F. Kavanagh (Eds.), Hillsdale, NT: Lawrence Erlbaurn Associates.

149 BEST 144 Shanktveiler and Lundquist

Treiman, R. (1991). Children's spelling errors on syllable-initial with their their conceptions of the phonetic values of the letters consonant clusters. Journal of Educational Psychology, 83,346-360. and the segmental composition of the words they wish to write. Treiman, R. (1992). The role of intrasyllabic units in learning to This phenomenon has t...!en studied extensively by Read (1986). read and spell. In P. B. Gough, L. C. Ehri, & R Treirnan (Eds.), The question of whether invented spellings can regularly be Reading acquisition (pp. 65-106). Hillsdale, NJ: Lawrence elicited from children with varied educational and family backgrounds was addressed by Zifcak (1981). In a study of 23 Erlbaurn Associates. Treiman, R. (1993). Beginning to spell: A study of first grade children. inner-city six year olds from blue-collar families, it was found that nearly all the children were willing to make up spellings for New York: Oxford University Press. Venezky R. L. (1970). The structure of English orthography. The words though most had little knowledge of the standard Hague: Mouton. orthography. Zifcak, M. (1981). Phonological awareness and reading 3These results are in full agreement in this respect with those of acquisition. Contemporary Educational Psychology, 6, 117-126. Treiman (1993), who carried out a comprehensive study of spelling in six year olds. The findings of both studies support the caveat that one should not be too quick to attribute children's spelling errors to the irregularities of English orthography. FOOTNOTES 4Spelling was correlated with reading real words, .91 and .81, L. Katz & R. Frost (Eds.), OrthographY, phonology, morphology, and respectively, based on two independent measures of reading. meaning (pp. 179-192). Amsterdam: Elsevier Science Publishers 5The onset consists of the string of consonants preceding the vowel nucleus. When the onset consists of a single consonant, as (1992). t Also University of Connecticut. Storrs. in the example of CAR, Treiman (1985) showed that children liniversity of Connecticut, Storrs. may treat it as a segment distinct from the remainderof the 'For example, it has often been noted that aspirate and inaspirate syllable, which corresponds to the rime. At the same time, they /p/, /t/ and /k/ are not distinguished in English spelling. In are unable to decompose the rime into separable components. the word COCOA, for example, both the initial and medial An invented spelling, like CR for CAR or BL for BELL is consonant are spelled alike although phonetically and consistent with such partial knowledge of the internal structure acoustically they are different. of the syllable. 20ften children who have had little or no formal instruction 6Adams (1990), Ehri (1989; Ehri & Wilce, 1987) and Treiman (in attempt to write words using the letters that they know, together press) reach a similar conclusion.

I o 0 Haskins Laboratories Status Report on Speech Research 1993, SR-113, 145-152

Word Superiority in Chinese*

Ignatius G. Mattinglyt and Yi Xut

In a line of research that began with Cattell superiority is not found in these systems, we (1886), it has been demonstrated that words play would have to view the word superiority found for a special role in the recognition of text. A letter word-frame systems as merely orthographic, to be string that forms a word is recognized faster and attributed,perhaps,tothe reader's long moze accurately than a nonword string; a letter is experience in dealing with this particular kind of recognized faster and more accurately if it is frame. In the case of Chinese, we would then presented as part of a word than if it is presented expect to find evidence for the superiority of alone or as part of nonword (e.g., Reicher, 1969; morphemic syllables. On the other hand, if word- for a review, see Henderson, 1982). This superiority is found even when the frames of the constellation of findings, often referred to as 'word writing system do not correspond to words, we superiority," suggests that the reader does not would have to say that the superiority of the word simply process the text letter by letter, and that must depend essentially on its linguistic rather words are crucial. The letters in a word are than its orthographic status. processed so automatically that the reader is At least two other investigators have unaware of recognizing them, and when he is investigated word superiority in Chinese. C.-M. required to report a letter presented in a word, he Cheng (1981) compared the accuracy with which a finds it most efficient to infer the identity of the briefly presented target character could be letter from that of the word. identified in real-word and in nonword two- Most of these experiments, however, have been character strings. (A nonword consisted of two carried out for writing systems like that of valid characters that did not constitute a real English, in which the "frame" units (W. S.-Y. word.) A string containing the target character Wang, 1981) explicitly correspond to linguistic was presented either preceding or following a words. However, some writing systems lack this "distractor" string, and the subject had to report property, notably that of Chinese. The frames of whether the target character was in the first or the Chinese system, the characters, correspond to the second string. Performance was better for monomorphemic syllables, rather than to words, words than for nonwords, and better for high- even though it is true that because there are many frequency words than for low-frequency words. monosyllabic words in Chinese, there are many Cheng also carried out a second experiment, characters that stand for words as well as for syl- parallel to the first, in which the targets were rad- lables. But most Chinese words are polymor- icals presented as components of real characters, phemic and are therefore written with strings of or of pseudocharacters, or of noncharacters. two or more characters, and these strings are not Pseudocharacters were created by interchanging specially demarcated in text. Moreover, there are radicals in two real characters, keeping their posi- many bound morphemes in Chinese, and a eparac- tion within the character constant; noncharacters ter for such a morpheme occurs only as ,:iement were created by interchanging radicals and locat- of a character string. ing them in orthographically impossible positions. It is of some importance to establish -7hether The character containing the target was presented word superiority is observable in writing systems either preceding or following a distractor charac- in which the frames are not words. If word ter, and subjects had to report whether the target was in the first or the second character. Performance was better for real characters than This work was supported by NICHD Grant HD-01994 to for pseudocharacters, and better for pseudochar- Haskins Laboratories. acters than for noncharacters. This result is con-

145 151 146 Mattingly and Xu sistent with a word-superiority effectfor when part of a two-character word than when part monomorphemic words; however, as Hoosain of a two-character pseudoword. Our experiment (1991) suggests, it is not conclusive. Because the thus complements Cheng's (1981) first experi- real characters stood for morphemic syllables as ment, in which accuracy of identification was the well as for words, the result is equally consistent dependent variable. The paradigm we used was with morphemic-syllable superiority. It would be adapted from Meyer, Schvaneveldt, and Ruddy desirable to repeat the experiment, comparing (1974) and has already been used for Chinese by characters standing for bound morphemes with C.-M. Cheng and S.-I. Shih (1988). In each trial in characters for free morphemes. our experiment, a Chinese subject saw a sequence I.-M. Liu (1988) asked subjects to pronounce the of two characters on a monitor screen. This se- character occurring in first, second, or third posi- quence might consist of two genuineChinese tion in word strings, in pseudoword strings or in characters, which might form either a real word or isolation, and measured reaction time. (The posi- a pseudoword. In either case the subject was to re- tion of a character in isolation apparently refers to spond, "Yes." Alternatively, the sequence might the position of the same character when presented consist of a genuine character preceded or fol- in a word or pseudoword.) Characters in real lowed by a pseudocharacter, in which case the words were pronounced faster than characters in subject was to respond, "No." Reaction time was pseudowords in some though not in all positions. measured for all responses. However, characters in isolation were pronounced Methods as fast as, and in some positionsfaster than, char- acters in words. Thus, if an advantage for the real- Design To make possible the various critical word context over isolation is considered essential comparisons in which we were interested, a rather for word superiority, Chinese may not properly be complex design was used; see Table I. There were said to exhibit this phenomenon and Cheng's first three main types of two-character sequences: Real experiment may, as Liu argues, merely demon- bimorphemic Chinese words; pseudowords, each, strate "compound-word superiority." like Cheng's (1981) nonwords, consisting of two But perhaps it is not reasonable to expect the real characters that did not form a real word; and real-word context to be superior to isolation if the pseudocharacter sequences, each consisting of a target characters may themselves be words. We real character preceded or followed by a would not be surprised to find the advantage of pseudocharacter. The real characters were drawn words over single letters for English breaking from the inventory used in the People's Republic down if I or a, which happen to be words as well of China, and thus included some "simplified" as letters, were the targets. Analysisof Liu's charactersnotusedelsewhere.The results for target characters standing for bound pseudocharacters, like Cheng's, each consisted of a morphemes might clarify the issue. genuine, appropriately located semantic radical The experiment reported here explores further and a genuine, appropriately located phonetic the phenomenon of word-superiority in Chinese. radical that do not actually occur together in the We asked whether a character is recognized faster Chinese character inventory.

Table 1. Experimental design.

Pseudocharacter Se uence Tv e: Real Words (128) Pseudowords (128) se uences (256)

Word F: High F (64) Low F (64)

Char F: HH HL LH LL HH HL LH LL HH HLLH LL HNLN NH NL 16 16 16 16 16 16 16 16 32 32 32 32 64 64 64 64

A 4 4 4 4 4 4 4 4 8 8 8 8 16 16 16 16 Subj. B 4 4 4 4 4 4 4 4 8 8 8 8 16 16 16 16 Group: C 4 4 4 4 4 4 4 4 8 8 8 8 16 16 16 16 D 4 4 4 4 4 4 4 4 8 8 8 8 16 16 16 16

152 Ward Superiority in Chinese 147

There were 128 real words, half of which were of were paidfortheir participation in the relatively high frequency and half, of relatively experiment. low frequency. Each of these two word-frequency Procedure. Subjects were divided arbitrarily into sets was divided into four secondary sets with the four equal groups, and each group received a four possible character-frequency patterns: high- different test. Each subject was tested separately. high, high-low, low-high, low-low. We relied on H. A subject was told that on each trial in the test, he Wang et al. (1986) for information about word would see a sequence of two characters on the frequencies and character frequencies. Finally, Macintosh computer monitor. If he was sure that each of these eight secondary sets was divided both were genuine Chinese characters, he was to arbitrarily into four tertiary four-word sets A, B, press the key designated as "Yes." Otherwise, he C, D. There were in all 32 such tertiary sets. The was to press the "No" key. The next trial began average nurber of strokes per character was kept two seconds after the subject's response, or, if he approximately equal across these tertiary sets. failed to respond, two seconds after the sequence There were 128 pseudowords. For each of the had appeared. Before the experiment began, the tertiary sets, four pseudowords were formed by subject was given a 24-trial practice session with swapping initial characters within the set, feedback. discarding any resulting real words. Thus, Responsesandreactiontimeswere pseudowords were perfectly matched with real automatically recorded by the computer, using a words with respect to character frequency. program written by Leonard Katz and slightly There were 256 pseudocharacter sequences, modified by us. Reaction time for a trial was each consisting of a real character and a measured from the instant the two characters pseudocharacter. Four pseudocharacter sequences began to be written on the monitor screen. This were derived by swapping the semantic radicals of measurement were subject to an error of ±8.33 the word-initial characters within each of the msec. because the write-time could not be known original tertiary sets, discarding any resulting with any greater accuracy. real characters. Four more pseudocharacter sequences were formed by repeating this Results operation with word-final characters. The data for 13 subjects who responded with From these materials, four different but less than 90% accuracy were excluded from equivalent tests were compiled. Each test included further analysis. For the remaining 40 subjects, 32realwords, 32 pseudowords, and 64 the accuracy rates were 98.5% for real words, pseudocharacter sequences. In a particular test, 92.2%forpseudowords, and 92.3%for the real words, the pseudowords, and the pseudocharacter sequences. pseudocharacter sequences were composed of Reaction-time data for the pseudocharacter unrelated tertiary sets. For example, one test sequences, for which the correct response was consisted of set A real words, set B pseudowords, "No," are shown in Figure 1. Reaction times were pseudocharacter sequences with word-initial shorter for pseudocharacter-initial than for pseudocharactersbasedonsetC,and pseudocharacter-final sequences. They were also pseudocharacter sequences with word-final shorter when the real character in the sequence pseudocharacters based on set B. None of the 256 was of high frequency than when it was of low characters occurred more than once within a test, frequency. However, real-character frequency had and each test was balanced with respect to word a much smaller effect for pseudocharacter-initial frequency, character-frequency pattern within a sequences than for pseudocharacter-final ones. sequence, ordinal position of pseudocharacters, An analysis of variance was carried out on the and number of strokes per character. Across the pseudocharacter sequence data for which the fac- four tests, each of the original 256 real characters tors were: Subject group (A/B/C/D), pseudocharac- occurred equally often in a real word and in a ter position (initial/final), and real-character fre- pseudoword, and equally often in a real character quency (high/low). There was no effect of subject sequence and in a pseudocharacter sequence. group: F(3,36) = .729. The effect of pseudocharac- Subjects There were 53 subjects. All were ter position was highly significant: F(1,36) = speakers of Mandarin and graduate students or 39.58, p. .0001. The effect of real-character fre- spouses of graduate students at the University of quency was mildly significant: F(1,36) = 6.67, p. < Connecticut. All had been born and educated in .05. There was a highly significant interaction be- the People's Republic of China, and were thus tween pseudocharacter position and real-character familiar with the simplified characters. Subjects frequency: F(1,36) = 14.40, p. = .0005.

153 148 Mattingly and Xu

1000 -

900 - Petudocharactr position 800 - tr-- Initial IF Final 700 -

600 High Low Real-characterfrequency

Figure 1. Reaction times for pseudocharacter sequences.

1000 -

900 -

800

700

600 High LOw

(a) Initial-characterfrequency

1000 1

900 - Sequence type

800 - Pseucloword LF Real HF Real 700 -

600 High LOw (b)Final-characterfrequency

Figure 2. Reaction times for real word and pseudoword sequences as a function of initial- and final-character frequency.

154 Word Superiority in Chinese 149

The data for the real words and the pseu- An analysis of variance was carried out on the dowords, for which the correct response was "Yes," real word data alone to determine the effects of are plotted in Figures 2a and 2b. Figure 2a shows word frequency. The factors were: Subject group, the effect of varying initial character frequency; word frequency (high/low), initial-character Figure 2b, the effect of varying final-character frequency, and final-character frequency. There frequency. From both figures, it is apparent that was no effect of subject group: F(3,36) = .32. The reaction times are shorter for real words than for effect of word frequency was highly significant: pseudowords, and shorter for high-frequency F(1,36) = 19.79, p. .0001. The effects of both words than for low-frequency words. For both real character-frequencyfactorswerehighly words and pseudowords, reaction times are significant: Initial, F(1,36) = 15.03, p. < .0005; shorter for initial high-frequency characters than final, F(1,36) = 18.31, p..0001. There was no for initial low-frequency characters (Figure 2a) interaction between word frequency and either of and similarly, shorter for final high-frequency the character-frequency factors: Initial, F (1,36) = characters than for final low-frequency characters .24; final, F(1,36) = .65. There was a significant (Figure 2b). However, reaction times for pseu- interaction among subject group, word frequency, dowords are more affected by character frequency initial character-frequency, final-character fre- than are reaction times for real words. quency: le(3,36) = 6.56, p. < .005. We believe this An analysis of variance was carried out on the interaction to be artifactual, reflecting an real and pseudoword data. The factors were: unfortunate choice of items in one of the tertiary Subjectgroup,sequencetype(real subsets. word/pseudoword), initial-character frequency The pseudoword function in Figure 2b is steeper (high/low),andfinal-character frequency than the pseudoword function in Figure 2a, (high/low). There was no effect of subject group: suggesting that character frequency has more of F(3,36) = .17. The effect of sequence type was an effect in final position than in initial position. highly significant: F(1,36) = 438.61, p. < .0001. To explore further the effect of character- The effects of both character-frequency factors frequency order, reaction time data for high-low were highly significant: Initial, F(1,36) = 41.47, p. and low-high character-frequency patterns are .0001; final, F(1,36) = 66.22, p..0001. There plotted in Figure 3. For pseudowords, reaction were significant interactions between sequence times for the low-high pattern are shorter than type and each of the character-frequency factors: for the high-low pattern. For real words, there is Initial: F(1,36) = 7.29, p. < .05; final: F(1,36) = no comparable effect of character-frequency 19.31, p. < .0001. pattern.

1000 -

900 -

Sequencetype

800 - Pseudoword LF Real HF Real 700 -

600 High-Low Low-High Character-frequency order

Figure 3. Reaction times to real and pseudoword sequences as a function of character-frequency order.

135 150 Mattin711, and Xu

An analysis of variance was carried out on the he simply a matter of recognizing one character at real and pseudoword data for the high-low and a time, then deciding that a character string is a low-high patterns alone. The factors were: Subject word. group, sequence type, and character-frequency This proposal is supported by the fact that pattern(high-low/low-high). There was no effect of evidence of serial processing is found for subject group: F(3,36) = .23. The effect of sequence pseudocharacter sequences and pseudowords, but type was significant: F(1,36) = 209.35 P. < .0001. not for words. In the case of the pseudocharacter The effect of character-frequency pattern fell just sequences, it was found that sequences beginning short of significance: F(1,36) = 3.86, p. = .0571. with a pseudocharacter were rejected faster than There was a significant interaction between those beginning with a real character, and that sequence type and character-frequency pattern: real-character frequency affected the latter but F(1,36) = 4.39, p. < .05. not the former (Figure1).The obvious An analysis of variance was also carried out for interpretation is that if the first character was the high-low and low-high patterns in the real genuine, the subject evaluated it, the amount ef word data alone. The factors were; Subject group, time required depending on the frequency of the word frequency, and character-frequency pattern. character. Then he had to evaluate the There was no effect of subject group: F(3,36) = .24. pseudocharacter, which required more time, The effect of word frequency was significant: before he could reject the sequence. On the other F(1,36) = 21.45, p..0001. There was no effect of hand, if the first character was a pseudocharacter, character-frequency, pattern: F(1,36) = .15. There he could reject the sequence as soon as he could was no interaction between word-frequencyand evaluate this character; there was no need to character-frequency pattern: F(1,36) = .06. There consider the second character at all. What is of was a significant interaction between word interest here is simply that the subject is frequency and subject group: F(3,36) = 6.06, p. < processing the characters serially. .005. We believe this interaction has the same As for the pseudowords, there was an effect of source as the artifactual interaction mentioned character-frequency order (Figure 3): The effect of above. character frequency was greater for the initial character than for the final character. Thus, a Discussion low-frequency character in initial position inhib- These results complement those of Cheng (1981) ited the response more than a high-frequency and provide further evidence of a word superiority character in final position facilitated it; con- effect for Chinese. The key finding is that a char- versely, a high-frequency character in initial posi- acter is evaluated more rapidly when part of a tion facilitated the response more than a low-fre- real-word sequence than when part of a pseu- quency character in final position inhibited it. We doword sequence (Figure 2a,b). The advantage for are not able to offer a conclusive explanation for real words is consistent across differences in char- this phenomenon without further experimenta- acter-frequency pattern. The magnitude of the ad- tion, but it is plausible that when there was a low- vantage, around 200 msec., is very large. frequency character in final position, the subject The results also reveal something about the was apt to delay his evaluation, whereas he was basis of word superiority. Let us -ompare the way less apt to do so when there was a low-frequency subjects deal with sequences that are real words character in initial position because he had to and sequences that are not: the pseuth;character hurry on to evaluate the final character. What is sequences and the pseudowords. Common to all clear, however, is that the phenomenon is an order three sequence types is the effect of character effect. It suggests that the characters in pseu- frequency (Figures 1, 2a,b). We can conclude from dowords, like those in pseudocharacter sequences, this simply that word superiority is not magical: are being processed serially. Word recognition in Chinese, whatever its other This is not the case for real words. There is no properties, is mediated by character identification. effect of character-frequency order (Figure 3). The recognition of a word is evidently facilitated Given the order effects observed for the other two by previous encounters with its characters in sequence types, this finding has two implications. other words. On the other hand, the fact that we It suggests first that the characters in real words found a word-frequency effect (Figure 2a,b), are processed in parallel. Assume that the independent of the character-frequency effect, for "logogens" (Morton, 1969) for Chinese are strings a task ostensibly requiring only character of characters corresponding to words. When a recognition, suggests that word recognition cannot string matching a particular logogen appears in Word Superiority in Chinese 151 text, all of its constituent characters will be REFERENCES activated at the same or nearly the same time, Cattell, J. M. (1886). The time taken up by cerebral operations. just like the letters of an English word (Sperling, Mind, 11, 220-242. 1969). This means that in the case of a real word, Cheng, C.-M. (1981). Perception of Chinese characters. Acta a subject in our experiment would have had only Psychologica Taiwanica, 23, 137-153. Cheng, C.-M., & Shih, S-.I. (1988). The nature of lexical access in one decision to make instead of two. Having Chinese: Evidence from experiments on visual and recognized the word, he would have known phonological priming in lexical judgment. In I.-M. Liu, H.-C. immediately that both characters must be Chen. & M. J. Chen (Eds.),Cognitive aspects of the Chinese genuine. Much of the advantage of real words over Language, Volume I (pp. 1-13). Hong Kong: Asian Research pseudowords may be due to this fact. Service. The second implication is that if logogens for Henderson, L. (1982). Orthography and word recognition in reading. London: Academic Press. two or more character strings of unequal length Hoosain, R. (1991). Psycholinguistic implications for linguistic are activated, the logogen for the longest string is relativity. A case study of Chinese. Hillsdale, NJ: Lawrence preferred. For the present experiment, this means Erlbaum. that a subject was not free to choose whether to Liu, I.-M. (1988). Context effects on word /character naming: treat a sequence as two separate monomorphemic Alphabetic versus logographic languages. In I.-M. Liu, H.-C. Chen, & M. J. Chen (Eds.), Cognitive aspects of the Chinese characters or as a single bimorphemic word. Iiinguage, Volume 1 (pp. 81-92). Hong Kong: Asian Research Rather, bimorphemic logogens automatically took Service. precedence if activated. Without this "Longest Meyer, D. E., Schvaneveldt, R. W., & Ruddy, M. (1974). Functions String Principle," the subject would have been free of graphemic and phonemic codes in visual word recognition. to waste time by processing the real words Memory & Cognition, 2, 309-321. Morton, J. (1969). Interaction of information in word recognition. serially, just as if they were pseudowords. This Psychological Review, 76, 165-178. principle also explains why readers of Chinese can Reicher, G. M. (1969). Perceptual recognition as a function of the read rapidly despite the absence of explicit word meaningfulness of the stimulus material. Journal of Experimental boundary markers in the text. Psychology, 81, 275-280. Sperling, G. (1969). Successive approximations to a model for Conclusion short-term memory. In R N. Haber (Ed.), Information processing approaches to (pp. 32-37). New York: Holt, Word superiority has been demonstrated for Rinehart & Winston. Chinese, and it has been argued that it depends in Wang, H., Chang, B., Li, Y., Lin, L., Lin, J., Sun, Y., Wang, Z., Yu, great part on the reader's ability to process the Y., Zhang, J., & Li, D. (1986). Frequency dictionary of contemporary characters of a word in parallel. Of course, it may Chinese. Beijing: Beijing Language Institute Press. be that Liu (1988) is right to insist that experi- W. S.-Y. Wang (1981) Language structure and optimal orthography. In 0. L. Tzeng & H. Singer (Eds.), Perception of ments like this one and the first experiment in print: Reading research in experimental psychology (pp. 223-236). Cheng (1981) demonstrate merely "compound- Hillsdale, NJ: Lawrence Erlbaum. word superiority"; the superiority of monomor- phemic words remains in question. But even com- FOOTNOTES pound-word superiority is of great theoretical im- portance. Because there are no word boundaries in 'Presented at the Sixth international Symposium on Cognitive Aspects of the Chinese Language, Taipei, Taiwan, September Chinese writing, our results are evidence that 14, 1993 word superiority generally depends not on ortho- -Also Department of Linguistics. University of Connecticut, graphic experience, but on linguistic experience. Storrs.

157 Haskins Laboratories Status Report on Speech Research 1993, SR-113, 153-170

Prelexical and Postlexical Strategies in Reading: Evidence from a Deep and a Shallow Orthography*

Ram Frost t

The validity of the orthographic depth hypothesis (ODH) was examined in Hebrew by employing pointed (shallow) and unpointed (deep) print. Experiments 1 and 2 revealed larger frequency effects; and larger semantic priming effects in naming, with unpointed print than with pointed print. In Experiments 3 and 4 subjects were presented with Hebrew consonantal strings that were followed by vowel marks appearing at stimulus onset asynchronies ranging from 0 ms (simultaneous presentation) to 300 ms from the onset of consonant presentation. Subjects were inclined to wait for the vowel marks to appear even though the words could be named unequivocally using lexical phonology. These results suggest that prelexical phonology is the default strategy for readers in shallow orthographies, providing strong support for the ODH.

Most early studies of visual word recognition way (Mattingly, 1992; Scheerer, 1986). The match were carried out in the English language. This between writing system and language insures a state of affairs was partly due to an underlying degree of efficiency for the reading and the writing belief that reading processes (as well as other process (for a discussion, see Katz & Frost, 1992). cognitive processes) are universal, and therefore Theories of visual recognition differ in their studies in English are sufficient to provide a com- account of how the different characteristics of plete account of the processes involved in recogniz- writing systems affect reading performance. One ing printed words. In the last decade, however, such characteristic that has been widely studies in orthographies other than English have investigatedisthe way the orthography become more and more prevalent. These studies represents the language's surface phonology. have in common the view that reading processes The transparency of the relation between cannot be explained without considering the read- spelling and phonology varies widely between or- er's linguistic environment in general, and the thographies. This variance can often be attributed characteristics of his writing system in particular. to morphological factors. In some languages (e.g., Various writing system have evolved over time in English), morphological variations are captured in different cultures. These writing systems, by phonologic variations. The orthography, how- whether logographic, syllabic, or alphabetic, ever, was designed to preserve primarily typically reflect the language's unique phonology morphologic information. Consequently, in many and morphology, representing them in an effective cases,similar spellings denote the same morpheme but different phonologic forms: The same letter can represent different phonemes This work was supported in part by a grant awarded to the author by the Basic Research Foundation administered by the when it is in different contexts, and the same Israel Academy of Science and Humanities, and in part by the phoneme can be represented by different letters. National Institute of Child Health and Human Development The words "steal" and "stealth," for example, are Grant HD-01994 to Haskins Laboratories. I am indebted to similarly spelled because they are morphologically Len Katz for many of the ideas that were put forward in this related. Since in this case, however, a morphologic article, to Orna Moshel and Itchak Mendelbaum for their help in conducting the experiments, and to , Bruno Repp, derivation resulted in a phonologic variation, the Charles Perfetti, Albrecht Inhoft and Sandy Pollatsek for their cluster "ea" represents both the sounds [i] and [El. comments on earlier drafts of this paper. Thus, alphabetic orthographies can be classified

153 158 154 Frost according to the transparency of their letter to relativelycompleteconnectionsbetween phonology correspondence. This factor is usually graphemes and sub-word pronunciation, they can referred to as "orthographic depth" (Katz & recover most of a word's phonological structure Feldman,1981;Klima,1972; Liberman, prelexically, by assembling it directly from the Liberman, Mattingly, & Shankweiler, 1980; printed letters. In contrast, the opaque relation of Lukatela, Popadid, Ognjenovié, & Turvey, 1980). subwords segments and phonemes in deep An orthography that represents its phonology orthographies prevents readers from using unequivocally following grapheme-phoneme prelexical conversion rules. For these readers, the simple correspondences is considered shallow, more efficient process of generating the word's while in a deep orthography the relation of phonologic structure is to rely on a fast visual orthography to phonology is more opaque. access of the lexicon and to retrieve the word's The effect of orthographic depth on reading phonology from it. Thus, phonology in this case is strategies has been the focus of recent and current lexically addressed, not prelexically assembled. controversies (e.g., Besner & Smith, 1992; Frost, Thus, according to the ODH, the difference Katz, & Bentin, 1987). In general the argument between deep and shallow orthographies is the revolves around the question of whether amount of lexical involvement in pronunciation. differencesin orthographic depth lead to This does not necessarilyentailspecific differences in processing printed words. What is predictions concerning lexical decisions and lexical called the orthographic depth hypothesis (ODH) access, or how meaning is accessed from print. suggests that it does. The ODH suggests that The specific predictions of the ODH must be shallow orthographies can easily support a word discussed with referenceto the toolsof recognition process that involves the printed investigation employed by the experimenter. The word's phonology. This is because the phonologic issue to be clarified here is what serves as a valid structure of the printed word can be easily demonstration that phonology is mainly prelexical recovered from the print by applying a simple (assembled) or postlexical (addressed). In general, process of grapheme-to-phoneme conversion measures of latencies and error rates for lexical (GPC). For example, in the Serbo-Croatian writing decisions or naming are monitored. The idea is system the letter-to-phoneme correspondence is that lexical involvement in pronunciation leaves consistent and the spoken language itself is not characteristic traces. The first question to be phonologically complex. The correspondence examined is, therefore, whether the lexical status between spelling and pronunciation is so simple of a word affects naming latencies. Lexical search and direct that a reader of this orthography can results in frequency effects and in lexicality expect that phonological recoding will always effects: Frequent words are named faster than result in an accurate representation of the word nonfrequent words, and words are named faster intended by the writer. A considerable amount of than nonwords (but see Balota & Chumbley, 1984, evidence now supports the claim that phonological for a discussion of this point). Thus, one trace of recoding is extensively used by readers of Serbo- postlexical phonology is that naming latencies and Croatian (see, Carello, Turvey, & Lukatela, 1992, lexical decision latencies are similarly effected by for a review). In contrast to shallow orthographies, the lexical status of the printed stimulus (e.g., deep orthographies like English or Hebrew Katz & Feldman, 1983). If phonology is assembled encourage readers to process printed words by from print prelexically, smaller effects related to referring to their morphology via the printed the word's lexical status should be expected, and word's visual-orthographic structure. In deep consequently lexical status should affect naming orthographies lexical access is based mainly on and lexical decisions differently. A second method the word's orthographic structure, and the word's of investigation involves the monitoring of phonology is retrieved from the mental lexicon. semantic priming effects in naming (see Lupker, This is because the relation between the printed 1984; Neely, 1991, for a review). If pronunciation word and its phonology are more opaque and involves postlexical phonology, strong semantic prelexical phonologic information cannot be easily priming effects will be revealed in naming. In generated (e.g., Frost et al. 1987; Frost & Bentin, contrast, if pronunciation is carried out using 1992b; Katz & Feldman, 1983). mainly prelexical phonology, naming of target The ODH's specific predictions mainly refer to words would be less facilitated by semantically the way a printed word's phonology is generated related primes. in the reading process. Because readers of shallow Keeping the above tools of investigation in mind, orthographies have simple, consistent, and the exact predictions of the ODH can be now fully

159 Pre lexical and Post lexical Strategies in Reading: Evidence from a Deep and a Shallow Orthography 155 described. Two versions of the hypothesis exist in 1984) or in English (Perfetti, Bell, & Delaney, the current literature. What can be called the 1988; Perfetti, Zhang, & Berent, 1992; Van Orden, strong ODH claims that in shallow orthographies, 1987). Note than the weaker version of the ODH is the complete phonological representations can be not inconsistent with these claims as long as it is derived exclusively from only the prelexical not claimed that naming is achieved exclusively by translation of subword spelling units (letter or let- prelexical analysis. All alphabetic orthographies ter clusters) into phonological units (phonemes, may make some use of prelexically derived phoneme clusters, and syllables). According to this phonology for word recognition. According to the view, readers of shallow orthographies perform a weak form of the ODH, shallow orthographies phonological analysis of the word based only on a should make more use of it than deep orthogra- knowledge of these correspondences. Rapid phies, because prelexical phonology is more read- naming, then, is a result of this analytic process ily available in these orthographies. Substantial only, and does not involve any lexical information prelexical phonology may be generated inevitably (see Katz & Frost, 1992, for a discussion). In when reading Serbo-Croatian, for example. This contrast to the strong ODH, a weaker version of phonological representation may be sufficient for the ODH can be proposed. According to the weak lexical access, but not necessarily for pronuncia- version of the ODH, the phonology needed for the tion. The complete analysis of the word's phono- pronunciation of printed words comes both from logic and phonetic structure may involve lexical prelexical letter-phonology correspondences and information as well (Carello et al., 1992). from stored lexical phonology. The latter is the Evidence concerning the validity of the ODH result of either an orthographic addressing of the comes from within- and cross-language studies. lexicon (i.e., from a whole-word or whole- But note that single-language experiments are morpheme spelling pattern toitsstored adequate only for testing the strongest form of the phonology), or from a partial phonologic represen- ODH. The strong ODH requires lexical access and tation that was assembled from the print and was pronunciation to be accomplished entirely on the unequivocal enough to allow lexical access. The basis of phonological information derived from degree to which the prelexical process is active is a correspondences between subword spelling and function of the depth of the orthography; prelexi- phonology. That is, the reader's phonological cal analytic processes are more functional in analysis of the printed word, based on his or her shallow orthographies. Whether or not prelexical knowledge of subword letter-sound relationships, processes actually dominate orthographic process- is the only kind of information that is allowed. ing for any particular orthography is a question of Thus, merely showing lexical involvement in pro- the demands the two processes make on the read- nunciation in a shallow orthography would pro- er's processing resources (Katz & Frost, 1992). vide a valid falsification of the strong ODH It is easy to show that the strong form of the (Seidenberg & Vidanovid, 1985; Sebastian-Galles, ODH is untenable. It is patently insufficient to 1991). However, the demonstration of lexical ef- account for pronunciation even in shallow fects in shallow orthographies cannot in itself be orthographies like Spanish, Italian, or Serbo- considered evidence against the weak ODH, and a Croatian. This is so because these orthographies different methodology is necessary to test it. do not represent syllable stress and, even though Because it refers to the relative use of prelexical stress is often predictable, this is not always the phonologic information in word recognition in dif- case. For example, in Serbo-Croatian, stress for ferent orthographies, experimental evidence bear- two-syllable words always occurs on the first ing on the weak ODH should come mainly from syllable, but not always for words of more than cross-language studies. More important, because two syllables. These words can be pronounced the weak ODH claims that readers in shallow or- correctly only by reference to lexically stored thographies do not use the assembled routine ex- information. The issue of stress assignment is clusively, the evidence against it cannot come even more problematic in Italian, where stress from single-language studies by merely showing patterns are much less predictable. In Italian, the use of the addressed routine in shallow or- many pairs of words differ only in stress which thographies (e.g., Sebastian-Galles, 1991). provides the intended semantic meaning (Colombo The experimental evidence supporting the weak & Tabossi, 1992; Laudanna & Caramazza, 1992). ODH is abundant. Katz and Feldman (1983) Several studies have argued for the obligatory compared semantic priming effects in naming and involvement of prelexical phonology in Serbo- lexical decision in English and Serbo-Croatian, Croatian (e.g., Turvey, Feldman, & Lukatela, and demonstrated that while semantic facilitation

160 Frost 156 was obtained in Englishfor both lexical decision phonology in different orthographies might affect semantic priming reading processes (especially reading acquisition) and naming, in Serbo-Croatian the facilitated oniy lexical decision.Similarly, a to a certain extent, there is disagreement as to relative importance of this factor. What I will call comparison of semantic primingeffects in naming in English and Italian showed greatereffects in here "the alternative view" argues that the pri- the deeper English than in the shallowerItalian mary factor determiningwhether or not the orthography (Tabossi & Laghi, 1992). Astudy by word's phonology is assembled pre- orpost-lexi- simultaneous cally is not orthographic depth butword fre- Frost et al. (1987) involved a view suggests that in any comparison of three languages, Hebrew,English, quency. The alternative orthography, frequent words are very familiar as and Serbo-Croatian, and confirmedthe hypothesis in naming visual patterns. Therefore, these words can easily that the use of prelexical phonology lexical varies as a function of orthographicdepth. Frost be recognized through a fast visually-based et al. showed that thelexical status of the access which occurs before aphonologic represen- stimulus (its being a high- or alow-frequency tation has time to be generated prelexicallyfrom word or a nonword) affected naminglatencies in the print. For these words, phonologic information Hebrew more than in English, andin English is eventually obtained, but only postlexically,from Serbo-Croatian. In a second memory storage. According tothis view, the rela- more than in affect experiment, Frost et al. showed a relativelystrong tion of spelling to phonology should not effect of semantic facilitation in Hebrew (21ms), a recognition of frequent words. Since the ortho- smaller but significant effect in English (16ms), graphic structure is not converted into a phono- and no facilitation (0 rns) in Serbo-Croatian. logic structure by use of grapheme-to-phoneme of conversion rules, the depth of the orthography Frost and Katz (1989) examined the effects these visual and auditory degradation on theability of does not play a role in the processing of subjects to match printed to spokenwords in words. Orthographic depth exerts some influence, English and Serbo-Croatian. They showedthat but only on the processing of low-frequency words both visual and auditory degradation had amuch and nonwords. Since such verbal stimuli areless stronger effect in English than inSerbo-Croatian, familiar, their visual lexical access is slower, and regardless of word frequency. These results were their phonology has enough time to be generated explained by an extension of an interactive model prelexically (Baluch & Besner, 1991; Tabossi & which rationalized the relationship betweenthe Laghi, 1992; Seidenberg, 1985). orthographic and phonologic systems in terms of A few studies involving cross-language research lateral connections between the systems at allof support the alternative view. Seidenberg (1985) their levels. The structure of these lateral connec- demonstrated that in both English and Chinese, tions was determined by the relationship between naming frequent printed words was not effected spelling and phonology in the language: simple by phonologic regularity. This outcome was inter- isomorphic connections between graphemes and preted to mean that, in logographic as in alpha- phonemes in the shallower Serbo-Croatian, but betic orthographies, the phonology of frequent more complex, many-to-one, connectionsin the words was derived postlexically, after the word deeper English. Frost and Katz argued that the had been recognized on a visual basis. Moreover, simple isomorphic connections between the ortho- in another study, Seidenberg and Vidanovid (1985) graphic and the phonologic systems in the shal- found similar semantic priming effects in naming lower orthography enabled subjects to restore both frequent words in English and Serbo-Croatian, the degraded phonemes from the print and the suggesting again that the addressed routine plays degraded graphemes from the phonemic informa- a major role even in the shallowSerbo-Croatian. tion, with ease. In contrast, in the deeper orthog- Several studies in Japanese were interpreted as raphy, because the degraded information in one providing support for the alternative view. system was usually consistent with several alter- Although these studies were carried out within natives in the other system, the buildup of suffi- one language, they could in principlefurnish evi- cient information for a unique solution to the dence in favor or against the weak ODH because matching judgment was delayed, so the matching they examined reading performance in two differ- between print and degraded speech, or between ent writing systems that are commonly used in speech and degraded print, was slowed. Japanese- the deep logographic Kanji and the The psychological reality of orthographic depth shallower syllabic Hiragana and Katakana. is not unanimously accepted. Although it is gener- However, the relevance of these studies to the de- ally agreed that the relation between spelling and bate concerning the weak ODH is questionable, as Pre lexical and Post lexical Strategies in Reading: Evidenceirom a Deep and a Shallow Orthographu 157

I will point out below. Besner and Hildebrant phonological information from the mental lexicon (1987) showed that words that were normally is more or iess effortless relative to the extraction written in Katakana were named faster than of phonological information prelexically. Hence, words written in Katakana that were transcribed the default of the cognitive system in reading is from Kanji. They argued that these results sug- the use of addressed phonology. The weak ODH gest that readers of the shallow Katakana did not denies that the extraction of phonological name the transcribed words using the assembled information from the mental lexicon is effortless. routine, as otherwise no difference should have On the contrary, based on findings showing emerged in naming the two types of stimuli. In extensive phonologic recoding inshallow another experiment, Besner, Patterson, Lee, and orthographies, its working hypothesis is that if the Hildebrant (1992) showed that naming words that reader can successfully employ prelexical are normally seen in Katakana were named phonological information, then it will be used first; slower if they were written in Hiragana and vice the easier it is, the more often it will be used. versa. This outcome suggests that orthographic Thus, the "default" of the cognitive system in word familiarity plays a role even in reading the shal- recognition is the use of prelexical rather than lower syllabic Japanese orthographies. Taken to- addressed phonology. If the reader's orthography gether, the results from Japanese provide strong is a shallow one with uncomplicated, direct, and evidence against the strong ODH, but they do not consistent correspondences between letters and contradict the weak ODH. Although these studies phonology, then the reader will be able to use such compare reading performance in two writing sys- information with minimal resources for lexical tems, they merely show that readers of the shal- access. The logic for the ODH lies in a simple lower Japanese syllabary do not use the assem- argument. The so called "fast," "effortless," visual bled routine exclusively, but use the addressed route is available for readers in all orthographies, routine as well. These conclusions, however, are including the shalkones. If in spite of that it actually the basic tenets of the weak ODH. can be demonstrated that readers of shallow More damaging to the weak ODH is a recent orthographies do prefer prelexical phonological study by Baluch and Besner (1991), who employed mediation over visual access, it is because it is a within-language between orthography design in more efficient and faster for these readers. Persian, similar to the one in Japanese. In this The present study addresses this controversy by study the authors took advantage of the fact that looking at a deeper orthography, namely Hebrew. some words in Persian are phonologically trans- In Hebrew, letters represent mostly consonants, parent whereas other words are phonologically while vowels can optionally be superimposed on opaque. This is because three of the six vowels of the consonants as diacritical marks. The diacriti- written Persian are represented in print as dia- cal marks, however, are omitted from most read- critics and three are represented as letters. ing material, and are found only in poetry, chil- Because (as in Hebrew) fluent readers do not use dren's literature, and religious texts. In addition, the pointed script, words that contain vowels rep- like other Semitic languages, Hebrew is based on resented by letters are phonologically transparent, word families derived from triconsonantal roots. whereas words that contain vowels represented by Therefore, many words (whether morphologically diacritics are phonologically opaque. Baluch and related or not) share a similar or identical letter Besner demonstrated similar semantic priming configuration. If the vowel marks are absent, a effects in naming phonologically transparent and single printed consonantal string usually repre- phonologically opaque words of Persian, provided sents several different spoken words. Thus, in its that nonwords were omitted from the stimulus unpointed form, the Hebrew orthography does not list. These results were interpreted to suggest that convey to the reader the full phonemic structure of the addressed routine was used in naming both the printed word, and the reader is often faced types of words. with phonological ambiguity. In contrast to the Thus, when the weak ODH and the traditional unpointed orthography, the pointed orthography alternative view are contrasted, it appears that is a very shallow writing system. The vowel marks they differ in one major claim concerning the convey the missing phonemic information, making preferred route of the cognitive system for the printed word phonemically unequivocal. obtaining the printed word's phonology. The Although the diacritical marks carry mainly vowel alternative view suggests that in general visual information, they also differentiate in some in- access of the lexicon is "direct" and requires fewer stances between fricative and stop variants of con- cognitive resources. Moreover, the extraction of sonants. Thus the presentation of vowels consid-

162 158 Frost erably reduces several aspects of phonemic ambi- Experiment 1 was to investigate whether a similar guity (see Frost & Bentin, 1992b, for a discussion). gradual deviation of naming from lexical decision Several studies have shown that lexical performance would be observed in pointed relative decisions to Hebrew phonologically ambiguous to unpointed Hebrew. If the weak ODH is correct, words are given prior to any phonological then the reader of Hebrew should use the assem- disambiguation suggesting that lexical access in bled routine to a greater extent when naming unpointed Hebrew is accomplished more often via pointed print than when naming unpointed print. via orthographicrepresentationsthan Method phonological recodings of the orthography (Bentin & Frost, 1987; Bentin, Bargai, & Katz, 1984; Frost Subjects. One hundred and sixty undergraduate & Bentin, 1992a; Frost, 1992; and see Frost & students from the Hebrew University, all native Bentin, 1992b, for a review). The purpose of the speakers of Hebrew, participatedin the present study was to show that even the reader of experiment for course credit or for payment. Hebrew who is exposed almost exclusively to Stimuli and Design. The stimuli were 40 high- unpointed print and generally uses the lexical frequency words, 40 low-frequency words, and 80 addressed routine, prefers to use the prelexical nonwords. All stimuli were three to five letters assembled routine when possible. Note that the long, and contained two syllables with four to six probability of extending the cross-language phonemes. The average number of letters and findings (e.g., Frost et al., 1987) to a within- phonemes were similar for the three types of Hebrew experimental design suffers from the fact stimuli. All words could be pronounced as only one that readers in general are used to regularly meaningful word. Nonwords were created by employing strategies of reading that arise from altering randomly one or two letters of high- or thecharacteristicsof theirorthography. low-frequency real words that were not employed Obviously, if it can be shown that even in spite of in the experiment. The nonwords were all this factor Hebrew readers are willing to alter pronounceable and did not violate the phonotactic their reading strategy and adopt a prelexical rules of Hebrew. In the absence of a reliable routine, it will certainly provide very significant frequency count in Hebrew, the subjective evidence in favor of the ODH. frequency of each word was estimated using the following procedure: A list of 250 words was EXPERIMENT 1 presented to 50 undergraduate students, who In Experiment 1 lexical decision and naming rated the frequency of each word on a 7-point performance with pointed and unpointed print scale from very infrequent (1) to very frequent (7). were compared. The aim of the experiment was to The rated frequencies were averaged across all 50 assess the effects of lexical factors on naming per- judges, and all words in the present study were formance relative to lexical decision in the deep selected from this pool. The average frequency of and shallow forms of the Hebrew orthography. the high-frequency words was 4.0, whereas the This experimental design is very similar to that average frequency of the low-frequency words was employed by Frost et al. (1987) in the first exper- 2.0. Examples of pointed and unpointed Hebrew iment of their multilingual study. Frost et al. words are presented in Figure 1. found that in unpointed Hebrew the lexical status of the stimuli affected lexical decision latencies in the same way that it affected naming latencies. In Unpointed Pointed contrast, in English and in Serbo-Croatian the pronunciation task was much less affected by the Hebrew print -rum nr.1.?12P lexical status of the stimuli. The shallower the or- Phonologic structure / psanter / thography, the more naming performance devi- ated from lexical decision performance. This out- Semantic meaning Piano come confirmed that, in unpointed Hebrew, phonology was generated mainly using the ad- Figure 1. Example of the unpointed and the pointed dressed routine, whereas in the shallower English forms of a Hebrew printed word. orthography the assembled routine had a more significant role. Finally, in the shallowest orthog- There were four experimental conditions: the raphy, Serbo-Croatian, the assembled routine stimuli could be presented pointed or unpointed, dominated the addressed routine, resulting in a for naming or for lexical decision. Forty different nonsignificant frequency effect. The purpose of subjects were tested in each experimental

6 3 Pre lexical and Post lexical Strategies in Reading: Evidence from a Deep and a Shallow Orthography 159 condition. This blocked design was identical to across subjects (F1) and across stimuli (F2), with that of the Frost et al. (1987) study. the main factors of stimulus type (high-frequency Procedure and apparatus. The stimuli were pre- words, low-frequency words, nonwords), point sented on a Macintosh 3E computer screen, and a presentation (pointed, unpointed), and task bold Hebrew font, size 24, was used. Subjects were (naming, lexical decision). The main effect of tested individually in a dimly lighted room. They stimulus type was significant (F1(2,312) = 267.0, sat 70 cm from the screen so that the stimuli sub- MSe =1550, p<0.001, F2(2,157) = 119.0, MSe = tended a horizontal visual angle of 4 degrees on 4705, p<0.001). The main effects of point presen- the average. The subjects communicated lexical tation and task were not significant in the subject decisions by pressing a "yes" or a "no" key. The analysis (F1(1,156) = 1.7, MSe =21,506, p<0.2; dominant hand was always used for the "yes" re- F1(1,156) = 2.1, M S e = 21,506, p<0.2, respec- sponse. In the naming task, response latencies tively), but were significant in the stimulus analy- were monitored by a Mura-DX 118 microphone sis (F2(1,157) = 63.7, M S e = 3258, p<0.001; connected to a voice key. Each experiment started F2(1,157) = 23.1, MSe = 3258, p<0.001, respec- with 16 practice trials, which were followed by the tively). Point presentation interacted with task 160 experimental trials presented in one block. (F1(1,156) = 4.3, MSe = 21,506, p<0.04; F2(1,157) The trials were presented at a 2.5 sec intertrial = 97.9, MSe = 1877, p<0.001). Stimulus type in- interval. teracted both with point presentation and with task (F1(2,312) = 3.0, M S e = 1550 p<0.05; Results F2(2,157) = 11.9, MSe = 832, p<0.001; F1(2,312) = Means and standard deviations of RTs for 30.0, MSe = 1550, p<0.001; F2(2,157)=17.4, MSe = correct responses were calculated for each subject 3258, p<0.001, respectively). The three-way in- in each of the four experimental conditions. teraction did not reach significance. This was Within each subject/condition combination, RTs probably due to a similar slowing of nonword la- that were outside a range of 2 SDs from the tencies relatively to the low-frequency words in respective mean were excluded, and the mean was the two prints. The difference between naming recalculated. Outliers accounted for less than 5% pointed and unpointed presentation was most of all responses. This procedure was repeated in conspicuous while examining the effect of word all the experiments of the present study. frequency. A Tukey-A post-hoc analysis (p<0.05) Mean RTs and error rates for high-frequency revealed that while there was a significant words, low-frequency words and nonwords in the frequency effect in naming unpointed print (28 different experimental conditions are presented in ms), there was no significant frequency effect in Table 1.1 Point presentation had very little effect naming pointed print (9 ms). Finally, the in the lexical decision task. In contrast, in the correlation of RTs in the naming and the lexical naming task the effect of frequency and lexical decision task was calculated for both types of print status of the stimulus were more pronounced in presentation. RTs in the two tasks were highly the unpointed presentation than in the pointed correlated in the unpointed presentation (r=0.56), presentation. Moreover, naming was more similar but much less so in the pointed presentation to lexical decision performance in unpointed than (r=0.28). This suggests that the lexical status of in pointed print. the stimuli affected lexical decision and naming The statistical significance of these differences similarly in unpointed print but not in pointed was assessed by an analysis of variance (ANOVA) print.

Table I.RTs and percent errors in the lexicaldecision and namingtasks the for high-frequency words, low-frequency words, and nonwords in unpointed and pointed print.

UNPOINTED PRINT POINTED PRINT High-freq Low-freq. Nonwords High-freq. Low-freq. Nonwords LEXICAL DECISION 529 617 659 545 627 664 4r7t 5% 5% 5% 5% NAMING 569 597 664 541 550 604 0% 6% 2% 0% 2% 9%

164 160 Frost

Discussion & Laghi, 1992). In general, it has been shown that in languages with deep orthographies such as The results of Experiment 1 replicate to a cer- English, semantic facilitation in naming can tain extent the findings of Frost et al. (1987), with easily be obtained, although the effects are a within-language, between-orthographydesign. usually smaller than the effect obtained in the The pattern of naming and lexical decision laten- lexical decision task (Lupker, 1984; Neely, 1991). cies in unpointed Hebrew was almost identical to In contrast, in shallow orthographies such as the pattern obtained in the Frost et al. study. Serbo-Croatian, semantic priming effects were not However, the relationg of lexical decision to nam- obtained in some studies (e.g., Katz & Feldman, ing latencies in pointed Hebrew were similar to 1983; Frost et al., 1987), but have been shown in those obtained by Frost et al. in the shallower or- other studies (e.g., Care llo, Lukatela, & Turvey, thographies of English and Serbo-Croatian. The 1988; Lukatela, Feldman, Turvey, Care llo, & patterns of response times in naming and lexical Katz, 1989). decisions were fairy similar in unpointed print. An examination of the weak version of the ODH This suggests a strong reliance on the addressed entails, however, a comparison of semantic prim- routine in naming when the vowel marks were not ing effects in a deep and a shallow orthography. present. In contrast, naming deviated from the Katz and Feldman (1983) showed greater seman- pattern obtained in the lexical decision task when tic facilitation in English than in Serbo-Croatian. the vowel marks were presented. The frequency Similarly, Frost et al. (1987) demonstrated a effect almost disappeared, and the overall differ- gradual decrease in semantic facilitation from ence between naming high-frequency words and Hebrew (deepest orthography) to English nonwords was considerably reduced. This outcome (shallower) to Serbo-Croatian (shallowest). suggests that the presentation of vowel marks en- Similar results were suggested by Tabossi and couraged the readers to adopt a strategy of as- Laghi (1992) who compared English to Italian. sembling the printed word phonology by using Recently, Baluch, and Besner (1991) have chal- prelexical conversion rules. Note, however, that lenged these findings, suggesting that all those although the overall difference between high-fre- cross-language differences were obtained because quency words and nonwords was smaller in nonwords were included in the stimulus lists. pointed print than in unpointed print, the differ- According to their proposal the inclusion of non- ence between low-frequency words and nonwords words encouraged the use of a prelexical naming were fairly similar in the two conditions (54 in strategy in the shallower orthographies. Indeed, pointed print vs. 67 in unpointed print). This out- when nonwords were not included in the stimulus come is not fully consistent with a prelexical com- lists, no differences in semantic facilitation were putation of printed stimuli in the shallow pointed found in naming phonologically opaque and print. A possible explanation for the unexpected phonologically transparent words in Persian. slow naming latencies for pointed nonwords is Experiment 2 was designed to examine the that although phonology could be easily computed weak version of the ODH using a semantic prelexically in pointed print, subjects were also priming paradigm, while considering the sensitive to the familiarity of the printed stimuli. hypothesis that the effects of orthographic depth Readers in Hebrew read mostly unpointed print are caused by the mere inclusion of nonwords in but have a large experience in reading pointed the stimuli lists. For this purpose semantic print as well. Because they are obviously more facilitation in naming target words was examined familiar with reading words than nonwords, the in pointed and unpointed Hebrew orthography, slower latencies for nonwords could be attributed when only words were employed. to a response factors rather than factor related to the computation of phonology. Methods Subjects. Ninety-six undergraduate students EXPERIMENT 2 from the Hebrew University, all native speakers of The aim of Experiment 2 was to examine the Hebrew, participated in the experiment for course effects of semantic facilitation in the naming task credit or for payment. when pointed and unpointed words are presented. Stimuli and design. The stimuli were 48 target Semantic priming effects in the naming task have words that were paired with semantically related been used in several studies to monitor the extent and semantically unrelated primes. Related of lexical involvement in pronunciation (e.g., targets and primes were two instances of a Baluch & Besner, 1991; Frost et al. 1987; Tabossi semantic category. In order to avoid repetit.on

165 Prelexical and Postlexical Strategies ni .Reading: Evidence from a Deep and a Shallow Orthography 161 effects, two lists of stimuli were constructed. in the subject analysis (F1(1,47) = 30.8, MSe = Targets presented with semantically related 232, p<0.001), but not in the stimulus analysis primes in one list were unrelated in the other list, (F2(1,94) = 1.0). More important to our hypothesis, and vice versa. Each subject was tested in one list the two-way interaction was significant in both only. There were two experimental conditions: in analyses (F1(1,47) = 6.6, MSe = 207, p< 0.01: F2 one condition all stimuli were pointed, and in the (1,94) = 4.7, MSe = 363, p<0.03). A Tukey-A post- other they were unpointed. Forty-eight subject hoc analysis of the interaction (p<0.05) revealed were tested in each condition, 24 on each list. The that the semantic facilitation was significant only targets were three to five letter words and had with unpointed print, but not with pointed print. two or three syllables. Both primes and targets were unambiguous, and each could be read as a DISCUSSION meaningful word in only one way. Similar to experiment 1, naming in pointed Procedure and apparatus. An experimental print was found to be faster than naming in session consisted of 16 practice trials followed by unpointed print, as revealed by the difference in the 48 test trials. Each trial consisted of a RTs in the unrelated condition. However, whereas presentation of the prime for 750 ms followed by semantic relatedness accelerated naming in the the presentation of the target. Subject were related condition in unpointed print, it had a instructed to read the primes silently and to name much smaller effect on accelerating naming the targets aloud. The exposure of the target was latencies in pointed print. Thus the results of terminated by the subject's vocal response, and Experiment 2 suggest that semantic facilitation is the intertrial interval was 2 seconds. The stronger in the deeper than in the shallower apparatus was identical to the one used in the Hebrew orthography. This outcome is in complete naming task of Experiment 1. agreement with the findings of Frost et al. (1987), and Tabossi and Lagh.i (1992). Note, however, that Results the greater effects of semantic facilitation in RTs in the different experimental conditions are pointed print than in unpointed print were presented in Table 2. Naming of related targets obtained even though nonwords were not included was faster than naming of unrelated targets in in the stimulus set. These results conflict with the both pointed and unpointed print. However, findings of Baluch and Besner (1991). We will semantic facilitation was twice as large with refer to this in the general discussion. unpointed print than with pointed print. EXPERIMENT 3 Table 2. Reaction times and percent errors to related The major claim of the weak ODH is that in all and unrelated targets with pointed and unpointed print. orthographies bothprelexical and lexical phonology are involved in naming. Thus, the use of lexical or prelexical phonology is not an all-or- Unpointed Pointed none process but a quantitative continuum. The Print Print degree to which a prelexical process of assembling phonology predominates over the lexical routine Unrelated 531 512 (I%) (2%) depends on the costs involved in assembling phonology directly from the print. The ODH Related 509 503 suggests that unless the costs are too high in (4%) (1%) terms of processing resources (as is usually the case in deep orthographies), the default strategy Priming of the cognitive system is to assemble a phonologic Effect 22 9 code for lexical access, not to retrieve it from the lexicon following visual access. The following two The statistical significance of these effects was experiments examined this aspect of the assessed by an ANOVA across subjects (F1) and hypothesis by manipulating the processing costs across stimuli (F2), with the main factors of se- forgeneratingaprelexicalphonological mantic relatedness (related, unrelated), and print representation. type (pointed, unpointed). The effect of semantic The manipulation of cost consisted of delaying relatedness was significant (F1(1,47) = 20.9, MSe the presentation of the vowel marks relative to the = 656, p< 0.001; F2(1,94) = 33.2, MSe = 363, presentation of the consonant letters.In p<0.001). The effect of print type was significant Experiment 3 subjects were presented with

166 162 Frost unambiguous letter strings that could be read as course credit or for payment. Forty-eight one meaningful word only (i.e., only onevowel participated in the lexical decision task and 48 in combination created a meaningful word). The the naming task. letters were followed by the vowel marks, which Stimuli and Design. The stimuli were 40 high- were superimposed on the consonants,but at frequency words (mean frequency 5.0 on the different time intervals ranging from 0 ms (in fact, previously described 1-7 scale), 40 low-frequency a regular pointed presentation) to 300 ms fromthe words (mean frequency 3.6), and 80 nonwords. All onset of consonant presentation. Subjects were words were unambiguous; their pronunciation was instructed to make lexical decisions or to name the unequivocal, that is, they could be read as a words as soon as possible. meaningful word in only one way. Nonwords were The vowel marks allow the easy prelodcal as- created by altering one letter of a real word, and sembly of the word's phonology using spelling-to- could not be read as a meaningful word with any phonology conversion rules. However, since the vowel configuration. All stimuli were three to five letter strings were unambiguous and could be letters long, and contained two syllables with four read as a meaningful word in only one way, they to six phonemes. The average number of letters could be easily named using the addressed and phonemes was similar for the three types of routine, by accessing the lexicon visually. In fact, stimuli. as we have shown in numerous studies, this The words were presented to the subjects for naming strategy is characteristic of reading lexical decision or naming. Forty-eight different unpointed Hebrew (see Frost & Bentin, 1992b, for subjects were tested in each task with the same a review). The question was, therefore, whether stimuli. The stimuli were presented in four lag subjects are inclined to delay their response and conditions 0, 100, 200, and 300 ms. Each lag was waitfor the vowel marks that are not defined by the SOA (stimulus onset asynchrony) indispensable for either correct lexical decisions or between the presentation of the consonants and for unequivocal pronunciation. If they are willing the vowel marks. Thus, at lag 0 the consonants to wait for the vowel marks to appear then it must andthevowel marks werepresented be because they prefer the option of prelexical simultaneously, at lag 100 the consonants were assembly of phonology, just as the ODH would presented first and the vowel marks were predict. The relative use of the assembled and superimposed 100 ms later, etc. Four lists of addressed routines was also verified by using both words were formed; Each list contained 160 high-frequency and low-frequency words in the stimuli that were composed of 10 high-frequency stimulus lists. It was assumed that the more words, 10 low-frequency words, and 20 nonwords subjects rely on the vowel marks for naming using in each of the four lag conditions. The stimuli GPC conversion rules, the smaller would be the were rotated across lists by a Latin Square design frequency effect in this task. Nonwords were so that words that appeared in one lag in one list, introduced as well to allow a baseline assessment appeared in another lag in another list, etc. The of the lagging costs. Note that nonwords cannot be purpose of this rotation was to test each subject in unequivocally pronounced before the vowel marks all lagging conditions while avoiding repetitions are presented. In order to correctly pronounce within a list. The subjects were randomly assigned them, subjects have to wait the full lag period. to each list and to each experimental condition However, since at least some of the articulation (lexical decision or naming). program can be launched immediately after the Procedure and apparatus. The procedure and consonants are presented, the absolute difference apparatus were identical to those used in the in RTs between the presentation at lag 0 and at previous experiments. The only difference was lag 300 of nonwords would reflect the actual cost that the letters and the vowel marks appeared in response time for lagging the vowel marks by with the different SOAs. The clock for measuring 300 ms. Any lag effect smaller than this difference response times was initiated on each trial with the would suggest that subject did not wait the full presentation of the letters, regardless of vowel lag period but combined both prelexical and marks. The subjects were informed of the SOA lexical routines to formulate their response. manipulation but were requested to communicate their lexical decisions or vocal responses as soon Method as possible, that is, without necessarily waiting for Subjects. Ninety-six undergraduate students the vowel marks to appear. Each session started from the Hebrew University, all native speakers of with 16 practice trials. The 160 test trials were Hebrew, participated in the experiments for presented in one block.

1 6 7 Pre lexical and Post lexical Stratezies in Reading: Evidence from a Deep and a Shallow Orthography 163

Results ms in vowel marks presentation resulted in 25 ms Mean RTs for high-frequency words, low- difference in response time for high-frequency frequency words and nonwords in the different lag words and 37 ms for low-frequency words. This conditions for both the lexical decision and the outcome suggests that subjects were not inclined naming tasks are presented in Table 3. Although to wait for the vowel marks to formulate their the response pattern at each of the different lags lexical decisions; rather, their responses were is important, for the purpose of simplicity the "lag based on the recognition of the letter cluster. This effect" specified in Table 3 reflects the difference interpretation converges with previous studies between the simultaneous presentation (lag 0) and that showed that lexical decisions in Hebrew are the longest SOA (lag 300). not based on a detailed phonological analysis of The lagging of the vowel information had the printed word, but rely on a fast judgment of relatively little effect in the lexical decision task. the printed word's orthographic familiarity based The presentation of vowels marks 300 ms after on visual access to the lexicon (Bentin & Frost, the letters delayed subjects' responses by only 25 1987; Frost & Bentin, 1992b). The slightly higher ms for high-frequency words, and 37 ms for low- cost of lagging the vowel marks with low- frequency words. The lagging of vowels had a frequency words and the even higher one for somewhat greater effect on the nonwords. Across nonwords proposes that stimulus familiarity all lags, the frequency effect was stable, about 85 played a role in the decision strategy. This ms on the average. In contrast to lexical decisions, suggests that for words that were less familiar, or the effect of lagging the vowel information had a for nonwords, subjects were more conservative in much greater influence on naming. However, the their decisions and were inclined to wait longer for frequency effect was very small at lag 0 and 100. the vowel marks, for the possible purpose of The statistical significance of these differences obtaining a prelexical phonological code as well. In was assessed in a three-way ANOVA with the addition to the lag effect, a strong frequency effect factors of task (lexical decision, naming), stimulus was obtained at all lags. This provides further type (high-, low- frequency, nonwords), and lag (0, confirmation of lexical involvement in the task, 100, 200, 300), across subjects and across stimuli. regardless of the vowel mark presentation. The main effect of task was significant (F1(1,94) = A very different pattern of results emerged in 4.2, MSe = 70,811, p<0.04; F2(1,157) = 35, MSe = the naming task. The effective cost of delaying the 8320, p<0.001), as was the main effect of stimulus vowel marks in naming was much higher. The lag type (F1(2,188) = 252, MSe = 4914, p <0.001; effect obtained in naming words was about 70 ms F2(2,157) = 106, MSe = 13,354, p<0.001), and the on the average, twice as large as the effect found main effect of lag (F1(3,282)=98, MSe = 2495, for lexical decisions. Thus, although the phono- p<0.001; F2(3,471) = 94, MSe = 2587, p<0.001). logic structure of the unambiguous words could be Task interacted with sumulus type (F1(2,188) = unequivocally retrieved from the lexicon following 23.1, MSe = 4914, p<0.001; F2(2,157) = 13,MSe= visual access (addressed phonology), subjects were 8320, p<0.001) and with lag (F1(3,282) = 13.4, more inclined to wait for the vowels to appear MSe = 2495, p<0.001, F2(3,471) = 22, MSe = 1387, in the naming task than in the lexical decision p<0.001). Lag interacted with stimulus type task, presumably in order to generate a prelexical (F1(6,564) = 9.3, MSe = 2076, p<0.001; F2(6,471) = phonologic code. The lag effect for words suggests 9, M Se= 2587, p <0.001). Che three-way that subjects combined both prelexical and lexical interaction did not reach significance in the routines to name words, but the relative use of the subject analysis(F1=1.3) but was marginally prelexical and lexical routines varied as a function significant in the stimulus analysis (F2(6,471) = of the delay in vowel mark presentation. This 2.1, MSe = 1387; p<0.05). A Tukey-A post-hoc test lag effect should be first compared to the effect ob- revealed that the frequency effect in naming was tained for naming nonwords. Because they cannot significant only at the longer lags (200, 300 ms be named correctly without the vowel marks, SOA), and not in the shorter lags (0, 100 ms SOA), the lag effect for nonwords reflects the overall cost whereas in lexical decisions the frequency effect in response time due to a 300 ms delay of vowel was significant at all lags (p<0.05). mark presentation. This cost is smaller than the lag itself because the articulation program can Discussion be initiated as the letters appear, previous to The results of Experiment 3 suggest that the the vowel mark presentation. The results suggest effective cost of delaying vowel marks in lexical that the cost of lagging the vowel marks by decisions in Hebrew is relatively low. A lag of 300 300 ms is about 150 ms in response time.

168 164 Frost

Table 3. Lexical decision and naming RTs and percent errorsfor high-frequency words (HF), low-frequency words (LF), and nonwords (NW) when vowel marks appear atdifferent lags after letter presentation. Words are unambiguous.

Lexical Decision Naming Lag Lag 200 300 Effect LAG 0 100 200 300 Effect 0 100 610 648 62 HF 546 551 560 571 25 587 608 (6%) (6%) (7%) (6%) (4%) (3%) (4%) (3%) 648 678 80 LF 626 642 644 663 37 598 624 (12%) (10%) (11%) (10%) (5%) (3%) (4%) (5%) Frequency 30 effect 80 91 84 92 11 16 38 799 144 NW 641 664 684 710 69 655 700 736 (9%) (8%) (8%) (9%) (5%) (7%) (5%) (7%)

Thus, the adoption of a pure prelexical strategy of Although this interpretation is plausible, it is naming words should have resulted in a similar not entirely supported by the effect of word lag effect. The results suggest that this was not frequency on response latencies. The suggestion the strategy employed in naming words: The lag that the combined use of prelexical and lexical effect for words was only half the effect obtained phonology varied as a function of the cost of for nonwords. However, the lag effect for naming generating a prelexical phonological code is words should be also compared to the lag effect reinforced by examining the frequency effect in obtained in the lexical decision task. Note that, as naming. In contrast to lexical decision, the in the lexical decision task, subjects did not need frequency effect in naming was overall much the vowel marks (and a prelexical phonologic code) smaller, especially at the shorter SOAs. At the 0 to name the words correctly. Nevertheless, in con- mr.; SOA the frequency effect decreased from 80 ms trast to lexical decision, they preferred to wait in lexical decision to 11 in naming. This supports longer for the vowel information. It could be ar- the conclusion that naming with the simultaneous gued that the introduction of nonwords introduced presentation of vowels was mainly prelexical and the prelexical strategy in naming. However, since scarcely involved the mental lexicon. The longer the lag effect for words was much smaller than the the SOA, the larger the cost of generating a lag effect for nonwords, it suggests that a com- prelexical phonologic code and, consequently, the bined use of the assembled and addressed routine greater the use of addressed lexical phonology. was used in pronunciation. This is reflected in the increase in the frequency Another possible interpretation could be effect at the longer SOAs (38 and 30 ms for 200 suggested, however, to account for the lag effect in and 300 ms SOA, respectively). This pattern naming. Because in the naming task subjects are stands in sharp contrast to the lexical decision required to produce the correct pronunciation of task, which yielded very similar frequency effects the printed word, they might have adopted a at all lags.2 strategy of waiting for the vowels in order to verify the phonologic structure of the printed word EXPERIMENT 4 which they generatedlexically.By this The aim of Experiment 4 was to examine the interpretation, phonology was lexically addressed effectof lagging the vowel marks when but verified at a second stage against the delayed phonologically ambiguous words (heterophonic vowel information to confirm.%lelexically homographs) are presented for lexical decision or retrievedpronunciation. The verification naming. In contrast to unambiguous words, the interpretation would regard the difference in correct phonologicalstructureof Hebrew naming words and nonwords in this paradigm as heterophonic homographs can be determined the difference between having words verified by unequivocally only by referring to the vowel the subsequently presented vowels, which is short, marks, which specify the correct phonological and having to construct a pronunciation from the alternative and consequently the correct meaning. vowel information, which is longer. Thus, if the analysis provided in the previous

' 69 Pre lexical and Post lexical Strategies in Reading: Evidence from a Deep and a Shallow Orthographil 165 experiment concerning the cost of lagging the different vowel marks, eight lists of words were vowel marks is correct, the effect of presenting presented to the subjects instead of four (hence ambiguous words on naming should be similar to the larger number of subjects). Each list contained the effect of presenting nonwords. In both cases, only one form (the dominant or the subordinate) of subjects would have to rely on the vowel marks for each homograph, at one of the possible four lags, generatingaphonological representation so that each subject was presented with 40 words necessary for pronunciation. In contrast, lagging and 40 nonwords in a list. The stimuli were the vowel marks of ambiguous words should affect rotated across lists by a Latin Square design, and lexical decisions to a much lesser extent. This is consequently each letter string was presented because lexical decisions for ambiguous words with both vowel mark configurations at all have been previously shown to be based on the possible lags. abstract orthographic structure, and occur prior to Procedure and apparatus. The procedure and the process of phonological disambiguation apparatus were identical to those of Experiment 3. (Bentin & Frost, 1987; Frost & Bentin, 1992a). Each session started with 16 practice trials, In Experiment 4 subjects were presented with followed by the 80 test trials, which were letter strings that could be read as two presented in one block. meaningful words, depending on the vowel configuration assigned to the letters. Nonwords Results were presented as well. The vowel marks were Mean RTs for the dominant alternatives, the superimposed on the letters at different lags, the subordinate alternatives and the nonwords in the same as those employed in Experiment 3. The different lag conditions for both the lexical deci- relative use of orthographic and phonologic coding sion and the naming tasks are presented in Table in the two tasks was again assessed by measuring 4. As in Experiment 3, the effect of lagging the the effect of lagging the vowel information on vowel information had little influence on lexical lexical decision and naming. decision latencies. In fact, the use of ambiguous words reduced the difference between the simul- Method taneous presentation of vowels and their presen- Subjects. One hundred and sixty undergraduate tation 300 ms after the letters to 21 ms for words students from the Hebrew University, all native on the average, and to only 6 ms for nonwords. A speakersof Hebrew, participatedinthe different pattern emerged in the naming task. The experiment for course credit or for payment. effects of lagging the vowel marks on RTs were Eighty participated in the lexical decision task twice as large as the effects found for unambigu- and 80 in the naming task. None of the subjects ous words in Experiment 3, where there was only had participated in the previous experiment. one meaningful promniciation. In fact, the lag ef- Stimuli and design. The words were 40 am- fect on ambiguous words was virtually identical biguous consonant strings each of which with the lag effect on nonwords. represented both a high-frequency and a low- The statistical significance of these differences frequency word. The two phonological alternatives was assessed in a three-way ANOVA with the were mostly nouns or adjectives that were not factors of task (lexical decision, naming), stimulus semantically or morphologically related. The type (high-, low- frequency, nonwords), and lag (0, procedure for assessing the subjective frequencies 100, 200, 300), across subjects and across stimuli. of the words was similar to the one employed in The main effect of task was significant (F1(1,158) the previous experiments: Fifty undergraduate = 39, MSe =114,2811, p<0.001; F2(1,117) = 314, students were presented with lists that contained MSe = 7379, p<0.001), as was the main effect of the pointed disambiguated words related to the 40 stimulus type (F1(2,316) = 42, M Se = 7541, ambiguous letter strings, and rated the frequency p<0.001; F2(2,117) = 10.5, MSe = 11,737, p<0.001), of each word on a 7-point scale. The rated and the main effect of lag (F1(3,474)=69, MSe = frequencies were averaged across all 50 judges. 8459, p<0.001; F2(3,351) = 113, MSe = 2528, Each of the 40 homographs that were selected for p<0.001). Task interactedwith stimulus this study represented two words that differed in type (F1(2,316) = 6.6, M S e = 7541, p<0.001; their rated frequency by at least 1 point on that F2(2,117) = 5.8, MSe = 7,379, p<0.001) and with scale. As in the previous experiments, the lag (F1(3,474) = 46, MSe = 8459, p<0.001, nonwords were constructed by altering one or two F2(3,351)=76, MSe = 2616, p<0.001). Lag did not letters of meaningful words. The design was interact with stimulus type (F1, F2 <1.0). The similar to that of Experiment 3. However, in order three-way interaction did not reach significance to avoid repetition of the same letter string with (F1 = 1.3, F2 = 1.4)

170 166 Frost

Table 4. Lexical decision and naming RTs and percent errors for high-frequency words (HF),low-frequency words (LF), and nonwords (NW) when vowel marks appear at different lags after letter presentation. Words areambiguous.

Lexical Decision Naming

Lag Lag LAG 0 100 200 300 Effect 0 100 200 300 Effect

Dominant 585 608 608 611 26 619 650 692 763 142 (4%) (5%) (3%) (3%) (3%) (2%) (4%) (3%)

Subordinate 611 620 626 627 16 641 699 739 790 150

(7%) (5%) (4%) (2%) (5%) (4%) (6%) (4%)

NW 624 629 635 630 6 677 713 755 828 151 9%) (9%) (10%) (10%) (5%) (4%) (8%) (7%)

Discussion that arises from lexical search and the dominance The results of Experiment 4 suggest that the effect that results from the conflict between two costof delayingthevowel marksfor phonologicalalternativesof heterophonic phonologically ambiguous letter strings in the homographs. Thus, the slower RTs of subordinate lexical decision task was very low. This outcome alternatives, at least in the naming task, were supports the conclusions put forward by Frost and probably due not to a longer lexical search for low- Bentin (1992a), suggesting that lexical decisions frequency words, but to a change in the in Hebrew are based on the recognition of the articulation program. If subjects first considered abstract root or orthographic cluster and do not pronouncing the dominant alternative after the involve access to a specific word in the phonologic ambiguous consonants were presented, the later lexicon. appearance of the vowel marks that specified the The longest delays of response times due to the subordinate alternative would have caused a lagging of vowel information occurred in the change in their articulation plan, resulting in naming task. This was indeed expected. Because slower RTs. the correct pronunciation of phonologically ambiguous words was unequivocally determined General Discussion only after the presentation of the vowel marks, The present study investigated the relative use subjects had to wait for the vowels to appear in of assembled and addressed phonology in naming order to name those words correctly. Thus, these unpointed and pointed printed Hebrew words. stimuli provide a baseline for assessing the effect Experiment 1 demonstrated that the lexical status of lagging the vowel marks on naming latencies. of the stimulus had greater effect in unpointed This baseline confirms the previous assessment, than in pointed print. Experiment 2 confirmed which was based on the responses to nonwords in that semantic priming facilitated naming to a Experiment 3, and suggests that, overall, 300 ins lesser extent in the shallow pointed orthography in delaying the vowel marks cost 150 ms in than in the deeper unpointed orthography, even response time for articulation if the vowels are though nonwords were not included in the stimu- necessary for correct pronunciation. Note that, in lus list. Experiment 3 and 4 examined the effect of contrast to the unambiguous words of Experiment delaying the vowel mark presentation on lexical 3, the difference in RTs between dominant and decision and naming, in order to assess their con- subordinate alternatives of ambiguous words tribution and importance in the two tasks. cannot reveal the extent of lexical involvement. Because the vowel marks allow fast conversion of First, this difference cannot be accurately labeled the graphemic structure into a phonologic repre- as a frequency effect. The two phonological sentation using prelexical conversion rules, their alternatives of each homograph differed only in delay constitutes an experimental manipulation their dominance, and thus could have been both that reflects the cost of assembling phonology frequent or both nonfrequent. Moreover, there is a from print. The two experiments showed that, al- qualitative difference between frequency effects though both naming and lexical decision could be Pre lexical and Post lexical Stratees in Reading: Evidence from a Deep and a Shallow Orthography 167 performed without considering the vowel marks, dimension on which two languages differ. English subjects were inclined to use them to a greater and Serbo-Croatian, for example, differ in extent in the naming task when the cost in grammatical structure and in the size and response delay was low. organization of the lexicon. These confounding These results provide strong support for the factors, it can be argued, may also affect subjects' weaker version of the ODH. The disagreement be- performance. The comparison of pointed and tween the alternative view and the ODH revolves unpointed orthography in one language, Hebrew, around the extent of using assembled phonology allows these pitfalls to be circumvented. in shallow orthographies. It is more or less unan- Taken together, the four experiments suggest imously accepted that in deep orthographies read- that-the presentation of vowel marks in Hebrew ers prefer to use the addressed routine in naming. encourages the reader to generate a prelexical This is because the opacity of the relationship be- phonologic representation for naming. The use of tween orthographic structure and phonologic the assembled routine was detected by examining form, which is characteristic of deep orthogra- both frequency and semantic priming effects in phies, prevents readers from assembling a prelex- naming. Experiment 3 provides an important ical phonological representation through the use insight concerning the preference for prelexical of simple GPC rules. Thus, the major debate con- phonology.When thevowelsappeared cerning the validity of the ODH often takes place simultaneously with the consonants, the on the territory of shallow orthographies, in order frequency effect in naming was small and to show their extensive use of addressed phonol- nonsignificant, again suggesting minimal lexical ogy in naming. What lies behind the alternative involvement. Thus the results of this condition view, therefore, is the assumption (even axiom) replicate the fmdings of Experiment 1. that the use of the addressed routine for naming Two methodological factors should be considered constitutes the least cognitive effort for any reader in interpreting the results of Experiments 3 and 4. in any alphabetic orthography. The first relates to the experimental demand The advantage of examining the validity of the characteristics of the lagged pointed presentation. ODH by investigating reading in pointed and un- It could be argued that this unnatural pre- pointed Hebrew is therefore multiple. First, it is sentation of printed information encouraged sub- well established that Hebrew readers are used to jects to wait for the vowels to appear and conse- accessing the lexicon by recognizing the word's or- quently to use them. In fact, why not adopt a thographic structure. The phonologic information strategy of waiting for all possible information to needed for pronunciation is then addressed from be provided? Although this could be a reasonable the lexicon (Frost & Bentin, 1992b). The question solution for efficient performance in these experi- of interest, then, is: what reading strategies are ments, the results of the lexical decision task adopted by Hebrew readers when they are ex- make it very unlikely that this simple strategy posed to a shallower orthography? Do they adopt a was adopted by our subjects. In this task subjects prelexical strategy even though it is not the natu- were not inclined to wait for the vowel marks. ral strategy for processing most Hebrew reading This result was most conspicuous in Experiment material? A demonstration of the extensive use of 4, in which the lag effect was minimal. This out- the assembled routine in pointed Hebrew there- come suggests that the lagged presentation did fore provides strong support for the ODH. If read- not induce a uniform strategy of vowel mark pro- ers of Hebrew prefer the use of the assembled cessing, but that subjects adopted a flexible strat- routine, surely habitual readers of shallower or- egy characterized by a gradual pattern of relying thographies would have a similar preference. more and more on the vowel marks (prelexical The second advantage in using the two phonology) as a function of the task and of the orthographies of Hebrew is that it allows the ambiguity of the stimuli. manipulation of a within-language between- The differential effect of lag on ambiguous and orthographydesign.Thisdesignhasa unambiguous words in the two tasks suggests that methodological advantage over studies that both the assembled and the addressed routines compare different languages. The interpretation of were used for generating phonology from print. On differences in reading performance between two one hand subjects used explicit vowel information languages as reflecting subjects' use of pre- vs. employing prelexical transformation rules. This is post-lexical phonology can be criticized on reflected in the greater effect of lag on naming methodological grounds. The correspondence relative to lexical decision latencies, and in the between orthography and phonology is only one smaller frequency effect of experiment 3 at the

1 72 168 Frost shorter SOAs. On the other hand, it is clear that Subjects waited the full 150 ms to pronounce the the phonologic structure of unambiguous words nonwords and the ambiguous words but waited was generated using the addressed routine as only half as much time to pronounce the well. This is reflected in the differential effect of unambiguous words. This confirms a mixed lagging the vowel information on naming strategy in reading ambiguous and unambiguous words: There were However, the argument proposed by Baluch and smaller effects of lag on naming unambiguous Besner deserves serious consideration. Not only is than ambiguous words. This confirms that its logic compelling, but also the effect of subjects did not wait for the vowel marks in order nonwords on reading strategiesiswell to pronounce the unambiguous words, as long as documented in several studies. The crux of the they waited in order to pronounce the ambiguous debate is, however, whether this argument can words. The flexibility in using the two routines for account for the observed effects of orthographic naming unambiguous words was affected, depth found in the various studies described above however, by the "cost" of the vowel information. (e.g., Frost et al., 1987; Katz & Feldman, 1983). When no cost was involved in obtaining the vowel The evidence for that claim seems to be much less information (simultaneous presentation), the compelling. There is no argument that nonwords prelexical routine was preferred, as reflected by induce a prelexical strategy of reading in all the small frequency effect. In contrast, when the orthographies (e.g., Hawkins, Reicher, & Rogers, use of vowel information involved a higher cost, 1976; and see McCusker, Hillinger, & Bias, 1981, given the delayed presentation of the vowel for a review). However, note that the weak ODH marks, a gradually greater reliance on the proposes that whatever the effect of nonword addressed routine was observed. This is well inclusion is, it would be different in deep and in reflected by the larger frequency effect at the shallow orthographies. Thus, evidence for or longer lags. This pattern stands in sharp contrast against the weak ODH could only be supported by to the lexical decision task, in which the frequency studies directly examining the differential effect of effect was stable and very similar across lags. nonwords in various orthographies. Such a design The second methodological issue to be consid- was indeed employed by Baluch and Besner ered in interpreting the results of Experiments 3 (1991), but their conclusions hinge on the non- and 4 is the presence of nonwords in the experi- rejection of the null hypothesis. In contrast to ments. One factor that has been recently proposed their results, several studies have provided clear to account for the conflicting results concerning evidence supporting the weak ODH. For example, the effect of orthographic depth is the inclusion of Frost et al. (1987) showed that the ratio of nonwords in the stimulus list (Baluch & Besner, nonwords had a dramatically different effect on 1991). In their study, Baluch and Besner pre- naming in Hebrew, in English, and in Serbo- sented native speakers of Persian with opaque Croatian. Different effects of nonwords inclusion and transparent Persian words and showed that on semantic facilitation in naming in the deep differences in semantic facilitation in the naming English and the shallow Italian, were also task appeared only when nonwords were included reported by Tabossi and Laghi (1992). in the stimulus list. When the nonwords were The results of Experiment 2 also stand in sharp omitted, no differences in semantic facilitation contrast to Baluch and Besner's conclusions. In were found. Baluch and Besner concluded that the Experiment 2 different semantic priming effects inclusion of nonwords in the list encourages sub- were found between pointed and unpointed ject to adopt a prelexical strategy of generating Hebrew, even though nonwords were not included phonology from print, because the phonologic in the stimulus list. How can this difference be ac- structure of nonwords cannot be lexically ad- counted for? A possible explanation for this dis- dressed but only prelexically assembled. crepancy could be related to the strength of the The results of Experiment 3 and 4 do not experimental manipulation employed in the two support the view that the nonwords induced a studies. Baluch and Besner used opaque and pure prelexical strategy. If this were so, the effects transparent words within the unpointed-deep of lag would have been similar for words and for Persian writing system. The phonological trans- nonwords. Subjects would have waited for the parency of their words was due to the inclusions of vowels of every stimulus to appear and the lag letters that convey vowels. However, because effect for unambiguous words, ambiguous words, these letters can also convey consonants in a dif- and nonwords would have been similar, about 150 ferent context, they may have introduced some ms. This clearly was not the outcome we obtained. ambiguity in the print that is characteristic of

P-v Pre lexical and Post lexical Strategies in Reading: Evidence from a Deep and a Shallow Orthography 169 deep orthographies, hereby encouraging the use of orthography precedes semantic disambiguation. the addressed routine. In contrast, the experimen- In another study, Frost and Kampf (1993) showed tal manipulation of using pointed and unpointed that the two phonologic alternatives of Hebrew Hebrew print is much stronger. Hence, differen- heterophonic homographs are automatically acti- tial effects of semantic facilitation in naming vated following the presentation of the ambiguous emerged in Hebrew, providing strong support for letter string. the weak ODH. The results of the present study converge with The debate concerning the ODH, however, can- these findings. They provide an opportunity to not simply revolve around the interpretation of examine the ODH and the alternative view not in various findings without adopting a theoretical thelinguisticenvironmentofshallow framework for reading in general. Interestingly, orthographies, but of deep orthographies. If while contrasting the ODH with the alternative prelerdcal phonology plays a significant role in the view, one might find very similar definitions that reading of pointed Hebrew by readers who are capture these conflicting approaches. Proponents trained to use mainly the addressed routine for of the alternative view would argue that, regard- phonologicalanalysis, then theplausible less of the characteristics of their orthography conclusion is that, in any orthography, assembled readers in different languages seem to adopt re- phonology plays a much greater role in reading markably similar mechanisms in reading (e.g., than the alternative view would assume. Besner & Smith, 1992; Seidenberg, 1992). Thus, the alternative view attempts to offer a universal REFERENCES mechanism that portrays a vast communality in Balota, D. A., & Chumbley, J. I. (1984). Are lexical decisions a the reading process in different languages. good measure of lexical access? The role of word frequency in Surprisingly, proponents of the ODH take a very the neglected decision stage. journal of Experimental Psychology: similar approach. They suggest a basic mecha- Human Perception and Performance, 10, 340-357. nism for processing print in all languages, which Baluch, B., & Besner, D. (1991). Strategic use of lexical and is finely tuned to the particular structure of every nonlexical routines in visual word recognition: Evidence from oral reading in Persian. Journal of Experimental Psychology: language (Carello et al., 1992). The major discus- Learning, Memory, and Cognition, 17, 644-652. sion is, therefore, what exactly this mechanism is. Bentin, S., Bargai, N., & Katz, L. (1984). Orthographic and The basic assumption of the ODH concerns the phonemic coding for lexical access: Evidence from Hebrew. role of phonology in reading. It postulates as a journal of Experimental Psychology: Learning Memory & Cognition, basic tenet that all writing systems are phonologi- 10, 353-368. Bentm, S., & Frost, R. (1987). Processing lexical ambiguity and cal in nature and their primary aim is to convey visual word recognition in a deep orthography. Memory &- phonologic structures, i.e. words, regardless of the Cognition, 15, 13-23. graphemic structure adopted by each system (see Besner. D., & Hildebrant (1987). Orthographic and phonological De Francis, 1989; Mattingly, 1992, for a discus- codes in the oral reading of Japanese Kana. journal of sion). Thus, the extraction of phonologic informa- Experimental Psychology: Learning Memory & Cognition, 13, 335-343. tion from print is the primary goal of the reader, Besner, D., & Smith, M. C. (1992). Basic processes in reading: Is whether skilled or beginner. It is the emphasis the orthographic depth hypothesis sinking? In R. Frost & L. upon the role of phonology in the reading process Katz (Eds.), Orthography, phonology, morphology, and meaning that bears upon the importance of prelexical (pp. 45-66). Amsterdam Advances in Psychology, Elsevier phonology in print processing. The results of the Science Publishers. present study furnish additional support for an Besner, D., Patterson, K., Lee, L., & Hildebrant N. (1992). Two formsofJapaneseKana:Phonologically but NOT increasingly wide corpus of research that provides orthographically interchangeable. Journal of Experimen tal evidence confirming the role of prelexical phonol- Psychology: Learning Memory & Cognition. ogy in reading. This evidence comes not only from Carello, C., Lukatela, G., & Turvey, M. T. (1988). Rapid naming is shallow orthographies like Serbo-Croatian (e.g., affected by association but not syntax. Memory & Cognition, 16, Feldman & Turvey, 1983; Lukatela & Turvey, 187-195. Carello, C., Turvey, M. T., & Lukatela, G. (1992). Can theories of 1990), but also from deeper orthographies like word recognition remain stubbornly nonphonological? In R. English, using a backward masking paradigm Frost & L. Katz (Eds.) Orthography, phonology, morphology, and (e.g.,Perfettietal.,1988) or a semantic meaning (pp. 211-226). Amsterdam: Advances in Psychology, categorization task with pseudohomophonic foils Elsevier Science Publishers. (Van Orden, 1987; Van Orden, Johnston, & Halle, Colombo, L., & Tabossi, P.(1992). Strategies and stress assignment: Evidence from a shallow orthography. In R. Frost 1988). Recently, Frost and Bentin (1992a) showed & L. Katz (Eds.), Orthography, phonology, morphology, and that phonologic analysis of printed heterophonic meaning (pp. 319-340). Amsterdam: Advances in Psychology, homographs in the even deeper unpainted Hebrew Elsevier Science Publishers. 170 Frost

De Francis, j. (1989). Visible speech: The diverse oneness of writing McCusker, L. X., Hillinger, M. L., & Bias, R. G. (1981). Phonologic systems. Honolulu: University of Hawaii Press. recoding and reading. Psychological Bulletin, 89, 217-245. Feldman, L. B., & Turvey, M. T. (1983). Word recognition in Mattingly, I. G. (1992). Linguistic awareness and orthographic Serbo-Croatianisphonologically analytic. Journal of form. In R. Frost & L. Katz (Eds.), Orthography, phonology, Experimental Psychology: Human Perception and Performance, 9, morphology, and meaning (pp. 11-26). Amsterdam: Advances in Psychology, Elsevier Science Publishers. 228-298. Frost, R. (1992). Orthography and phonology: The psychological Neely, J. H. (1991). Semantic priming effects in visual word reality of orthographic depth. In M. Noonan, P. Downing, & S. recognition. In D. Besner & G. W. Humphreys (Eds.), Basic pro- Lima (Eds.), The linguistics of literacy(pp. 255-274). cesses in reading: Visual word recognition. Hillsdale, NJ: Erlbaum. Amsterdam/Philadelphia: John Benjamins Publishing CO. Perfetti, C. A., Bell, L. C., & Delaney, S. M. (1988). Automatic Frost, R., Katz, L., & Bentin, S. (1987). Strategies for visual word (prelexical) phonetic activation in silent word reading: recognition and orthographical depth: A multilingual Evidence from backward masking. Journal of Memory and comparison. Journal of Experimental Psychology:Human binguage, 27, 59-70. Perception and Performance, 13, 104-115. Perfetti, C.A., Zhang, S., & Berent, I. (1992). Reading in English Frost, R., & Katz, L. (1989). Orthographic depth and the and Chinese: Evidence for a universal phonological principle. interaction of visual and auditory processing in word In R. Frost & L. Katz (Eds.), Orthography, phonology, morphology, recognition. Memory & Cognition, 17, 302-311. and meaning (pp. 227-248). Amsterdam: Advances in Frost, R., & Bentin, S. (1992a). Processing phonologicaland Psychology, Elsevier Science Publishers. semantic ambiguity: Evidence from semantic priming at Scheerer, E. (1986). Orthography and lexical access. ln G. Augst different SOAs. Journal of Experimental Psychology: Learning, (Ed.), New trend in graphemics and orthography. (pp. 262-286). Memory, and Cognition. 18, 58-68. Berlin: De Gruyter. Frost, R., & Bentin, S. (1992b). Reading consonants and guessing Sebastian-Galles, N. (1991). Reading by analogy in a shallow vowels: Visual word recognition in Hebrew orthography. In R. orthography. Journal of Experimental Psychology: Human Frost & L. Katz (Eds.), Orthography, phonology, morphology, and Perception and Performance, 17, 471-477. meaning (pp. 27-44). Amsterdam: Advances in Psychology, Seidenberg, M. S. (1985). The time course of phonological code Elsevier Science Publishers. activation in two writing systems. Cognition, 19, 1-30. Frost,R., & Kampf, M. (1993). Phonetic recoding of Seidenberg, M. S. (1992). Beyond orthographic depth in reading: phonologically ambiguous printed words. Journal of Equitable division of labor. In R. Frost & L. Katz (Eds.), Experimental Psychology: Learning, Meniory, and Cognition, 1-11. Orthography, phonology, morphology, and meaning (pp. 85-118). Hawkins, H. L., Reicher, G. M., & Rogers, M. (1976). Flexible Amsterdam: Advances in Psychology, Elsevier Science coding in word recognition. Journal of Experimental Psychology: Publishers. Human Perception and Performance, 2, 233-242. Seidenberg, M. S. & Vidanovid, S. (1985). Word recognition in Katz, L. & Feldman L. B. (1981). Linguistic coding in word Serbo-Croatian and English: Do they differ? Paper presented at recognition. In: A.M. Lesgold & C.A. Perfetti (Eds.), Interactive the Twenty-fifth Annual Meeting of the Psychonomic Society, processes in reading. Hillsdale, NJ: Erlbaum. Boston. Katz, L., & Feldman L. B. (1983). Relation between pronunciation Tabossi, P., & Laghi, L. (1992). Semantic priming in the and recognition of printed words in deep and shallow pronunciation of words in two writing systems: Italian and orthographies. Journal of Experimental Psychology: Learning, English. Memory & Cognition, 20, 303-313. Memory, and Cognition., 9, 157-166. Turvey, M. T., Feldman. L. B., & Lukatela, G. (1984). The Serbo- Katz, L. & Frost, R. (1992). Reading in different orthographies: the Croatian orthography constrains the reader to a phonologically orthographic depth hypothesis. In: R. Frost & L. Katz (Eds.), analytic strategy. In L. Henderson (Ed.), Orthographies and Orthography, phonology, morphology, and meaning (pp. 67-84). reading: Perspectives from cognitive psychology, neuropsychology, Advances in Psychology, Elsevier, North-Holland. and linguistics (pp. 81-89). Hillsdale, NJ: Erlbaum. Klima. E. S. (1972). How alphabets might reflect language. In F. Van Orden G. C. (1987). A ROWS is a ROSE: Spelling, sound and Kavanagh & 1. G. Mattingly (Eds.), Language by ear and by eve. reading. Memory & Cognition, 15, 181-198. The MIT Press, Cambridge, Massachusetts, and London, Van Orden, G. C., Johnston, J. C., & Halle, B. L. (1988). Word England. identification in reading proceeds from spelling to sound to Laudanna, A., & Caramazza, A. (1992). Morpho-lexical meaning. Iournal of Experimental Psychology: [Taming, Memory, representations and reading. Paper presented at the fifth and Cognition, 14, 371-386. conference of the European Society for Cognitive Psychology, Pans, France. FOOTNOTES Liberman, I. Y., Liberman, A. M., Mattingly, I. G., & Shankweiler, 'Journal of Experimental Psychology: Learning, Memory, and D. (1980). Orthography and the beginning reader. in J.F. Cognition, in press. Kavanagh & R. L. Venezky (Eds.), Orthography, reading, and I The error rates presented in this study should be evaluated with dyslexia. Austin, TX: Pro-Ed. care because the criteria for assessing an error in pointed and Lukatela, G., Popadit, D., Ognjenovid, P., & Turvey, M. T. (1980). unpointed print are different. While in unpointed print any Lexical decision in a phonologically shallow orthography. pronunciation consistent with the consonants only is considered Memory & Cognition, 8, 415-423. correct, in pointed print any pronunciation that is inconsistent Lukatela, C , Feldman, L. B., Turvey, M. T., Care llo, C., & Katz, L. with the explicit vowel information is considered an error. This (1989). Context effects in bi-alphabetical word perception. is of special significance while evaluating the nonwords data. In Journal of Memory and Language, 28. 214-236 unpointed print subjects have much greater flexibility in naming Lukatela, G., & Turvey, M. T. (1990). Automatic and prelexical nonwords than in pointed print, hence the larger error rates in computation of phonology in visual word identification. the pointed condition. European Journal of Cognitive Psychology, 2, 325-343. 2 But note that the effect of delaying vowels on naming latencies Lupker, J. S. (1984). Semantic priming without association. A does not linearly decrease with lag, as should be predicted by second look. Journal of Verbal Learning and Verbal Behavior, 23, the increase of frequency effect. In order to obtain a more 709-733. ordered relationship r lore lag conditions should have been used. Haskins IxboratoriesStatus Reporton Speech Research 1993, SR-113, 171-196

Relational Invariance of Expressive Microstructure across Global Tempo Changes in Music Performance: An Exploratory Study*

Bruno H. Repp

This study addressed the question of whether the expressive microstructure of a music performance remains relationally invariant across moderate (musically acceptable) changes in tempo. Two pianists played Schumann's "Träumerei" three times at each of three tempi on a digital piano, and the performance data were recorded in MIDI format. In a perceptual test, musically trained listeners attempted to distinguish the original performances from performances that had been artificially speeded up or slowed down to the same overall duration. Accuracy in this task was barely above chance, suggesting that relational invariance was largely preserved. Subsequent analysis of the MIDI data confirmed that each pianist's characteristic timing patterns were highly similar across the three tempi, although there were statistically significant deviations from perfect relational invariance. The timing of (relatively slow) grace notes seemed relationally invariant, but selective examination of other detailed temporal features (chord asynchrony, tone overlap, pedal timing) revealed no systematic scaling with tempo. Finally, although the intensity profile seemed unaffected by tempo, a slight overall increase in intensity with tempo was observed. Effects of musical structure on expressive microstructure were large and pervasive at all levels, as were individual differences between the two pianists. For the specific composition and range of tempi considered here, these results suggest that major (cognitively controlled) temporal and dynamic features of a performance change roughly in proportion with tempo, whereas minor features tend to be governed by tempo-independent motoric constraints.

INTRODUCTION Tempos vary with generations like the rapidity of When different artists perform the same language. Music's velocity has less organic import musical composition, they often choose very than ;ts phraseology and rhythmic qualities; what different tempi. Even though each artist may be counts in performance is the artistry of phrase and convinced that his or her tempo is "right," and within a tempo. ... [The composer's] Tempo even though the composer may have prescribed a indication is not creation, but an afterthought related specific tempo in the score, there is in fact a range to performance. Naturally an inherently fast piece of acceptable tempi for any composition played on must be played fast, a slow one slowbut just to conventionalinstruments. The American what extent is a decision for players. (Rorern, 1983, composer Ned Rorem has expressed this well: p. 326.) Guttmann (1932) measured the durations of a large number of orchestral performances and This research was made possible by the generosity of found substantial tempo variation across different Haskins Laboratories (Carol A. Fowler. president). Additional conductors, for the same composition. In extreme support came from NIH BRSG Grant RR05596 to the cases, the fastest observed performance was about Laboratories.I am gratefulto LPH for her patient participation in this study, and to Peter Desain and Henkjan 30% shorter than the slowest one. In two recent Honing for helpful comments on an earlier version of the studies, Repp (1990, 1992) compared perfor- manuscript. mances by famous pianists of solo pieces by

171

I 7 6 172 Repp

Beethoven and Schumann and in each case found conforms to peripheral constraints) and "self- a considerable range of tempi.The ratio of the ex- selected" (i.e., not imposed by the experimenter). treme tempi was approximately 1:1.6 in each case, These conditions certainly apply to artistic music corresponding to a 37% difference in performance performance, especially when (as in the present duration. Tempo differences among repeated per- study) it is slow and expressive, so that peripheral formances by the same artist tend to be smaller factors (i.e., technical difficulties) are minimized. but may also be considerable (Guttmann, 1932; Nevertheless, even such a performance has Repp, 1992), notwithstanding the observationof aspects (e.g., the relative asynchrony of chord remarkable constancy for some artists or ensem- tones or the relative overlap of legato tones on the bles (Clynes and Walker, 1986). piano) that are subject to peripheral constraints Tempo varies not only between but also within imposed by fmgering, the spatial distance between performances. Even though no tempo changes keys, and the limits of fine motor control. The may have been prescribed bythe composer in the question of relational invariance thus may be score, a sensitive performer willcontinuously asked at several levels of detail, and the answers modulate the timing according to the structural may differ accordingly. In sheer complexityand and expressive requirements of the music. This degree of precision, expert musical performance temporal modulation forms part of the "expressive exceeds almost any other task that may be microstructure" of a performance. Naturally, it is investigated from a motor control viewpoint (see more pronounced in slow, "expressive" thanin Shaffer, 1980, 1981, 1984). fast, "motoric" pieces. The problem of determining Few previous studies have looked into the the global or baseline tempo of a highly modulated question of relational invariance in music performance is addressed elsewhere (Repp, in performance.Deviationsfromrelational press). In this article the question of interest is invariance seem likely when the tempo changes whether overall tempo interacts with the pattern are large. Handel (1986) has pointed out that of exp:essive modulations. changesintempo may cause rhythmic The null hypothesis to be tested is that reorganization: A doubling or halving of tempo expressive microstructure (which on the piano often results in a change of the level at which the includes not only successive tone onset timing but primary beat (the "tactus") is perceived; this also simultaneous tone onset asynchronies, changes the rhythmic character of the piece, with successive tone overlap or separation, pedal likelyconsequencesforperformance timing, and successive as well as simultaneous microstructure. Very fast tempi may lead to intensity relationships) is relationally invariant technical problems whereas very slow tempi may across changes in tempo. What this implies is that lead to a loss of coherence. These kinds of issues a change in tempo amounts to multiplying all may be profitably investigated with isolated temporal intervals by a constant, so that all their rhythmic patterns, scales, and the like. When it relationships (ratios) remain intact. Nontemporal comes to the artistic interpretation of serious properties of performance (i.e., tone intensities) compositions, however, such dramatic differences are likewise predicted to remain relationally in tempo are rare. The tempi chosen cannot stray invariant. too far from established norms, or else the Relational invariance (also called "proportional performance will be perceived as deviant and duration") is a key concept in studies of motor stylisticallyinappropriate.Inthisstudy, behavior (see Gentner, 1987; Heuer, 1991; Viviarii therefore, we will be concerned only with and Laissard, 1991). It has been used as an moderate tempo variations that are within the indicator of the existence of a "generalized motor range of aestheticacceptabilityfor the program" (Schmidt, 1975) having a variable rate composition chosen. Still, as indicated above, this parameter. Many activities have been examined range is wide enough to make the question of from that perspective, and relational invariance relational invariance nontrivial. has commonly been observed, even though Since different artists' performances of the same stringent statistical tests may show significant music differ substantially in their expressive deviations from strict proportionality (see Heuer, microstructure ( "interpretation") as well as in 1991). The deviations may result from the overall tempo, itis virtually impossible to combination of more central and more peripheral determine the relationship between these two sources of variability. Heuer (1991) conjectures variables by comparing recordings of different that relational invariance is most likely to obtain artists.Althoughmanystudieshave when the motor behavior is "natural" (i.e., demonstrated that individual performers are

177 Relational Invariance of Expressive Microstructure across Global Tempo Changes in Music Performance 173 remarkably consistent in reproducing their the slower tempi, this is consistent with the expressive timing patterns across repeated hypothesis of relational invariance and may have performances of the same music,these accounted for the interaction with tempo. Such an performances are generally also very similar in interaction might not have appeared in log- tempo (e.g., Seashore, 1938; Shaffer, 1984; transformed data.l Finally, even if it were true Shaffer, Clarke, and Todd, 1985; Repp, 1990, that the pianists changed their interpretations 1992). Where substantial differences in tempo do along with tempo, this may have reflected the occur, underlying changes in "interpretation" (i.e., extremely boring and repetitive nature of the in the cognitive structure underlying the music which, as the title says, is a deliberate performance) cannot be ruled out (e.g., Shaffer, harassment of the performer. Therefore, the 1992; Repp, 1992). In other words, the change in generality of Clarke's conclusions is not clear. A tempo may have been caused by a change in very casual report of a follow-up study with a interpretation, rather than the other way around. movement from a Clementi Sonatina (Clarke, Clearly, artists can change their interpretation at 1985) does not change the picture significantly. will, and with it the tempo of a performance. The In a study of piano performance at a simpler question posed in the present study was whether a level, MacKenzie and Van Eerd (1990) found change in tempo necessitates a change in highly significant changes in tone IOIs as a expressivemicrostructure.Therefore,an function of tempo. Their pianists' task was to play experimental approach was taken in which the scales as evenly as possible, and the range of same artist was asked to play a piece at different tempi was very large. The ICH profiles observed tempi without deliberately changing the were not due to musical structure or expressive interpretation. The resulting performance data intent but to fingering; they became more were analyzed and compared. This approach was pronounced as tempo increased. These kinds of supplemented by a perceptual test in which the motor constraints, though they become important tempo of recorded performances was artificially in fast and technically difficult pieces, are likely to modified and the result was judged by musically play only a minor role in slow, expressive trained listeners. performance, as studied here. One previous study that directly addressed the The immediate st'-nulus for the present study hypothesis tested here was conducted by Clarke was provided by some informal observations (1982), following preliminary but inconclusive reported by Desain and Honing (1992a). They observations by Michon (1974). Clarke examined recorded a pianist playing a tune by Paisiello (the selected sections of performances of the highly theme of Beethoven's variations on "Nel cor pin repetitive piano piece, "Vexations," by Erik Satie, non mi sento") at two different tempi, M.M.60 and played by two pianists who had been instructed to 90, and observed that the temporal microstructure vary the tempo. The timing patterns (tone inter- differedconsiderablybetweenthetwo onset intervals, or IOIs) of the same music at six performances. They also used a MIDI sequencer to different tempi were compared, and a significant speed up the slow performance and to slow down interaction with tempo was found. The interaction the fast performance, and in each case they found was in part due to a narrowing of the tempo range that the altered performance sounded unnatural in the course of the excerpt, but a significant effect compared to the original performance having the remained after the tempo drift was removed same tempo. On closer examination, the major statistically. Subsequent inspection of the data differences between the two performances seemed suggested to Clarke that the pianists had changed to lie in the timing of grace notes, in the their interpretation of the grouping structure articulation of staccato notes, and in the "spread" across ternpi. At least one pianist produced more (onset asynchrony) of chords, though the overall groups at the slower tempo, where group timing profiles also looked quite different. boundaries were identified by a temporary slowing In a subsequent paper, Desain and Honing down. Clarke also argued that group boundaries (1992b) developed a computer modelfor that were maintained tended to be emphasized implementing changes in expressive timing. They more at the slower tempi. distinguished four types of "musical objects": While Clarke's interpretation is intuitively sequences of tones, chords, appoggiature, and plausible, the changes in the timing profiles across acciaccature, the last two referring to types of tempi were quite small and do not suggest grace notes. Each type of musical object is subject different structural interpretations to this reader. to different temporal transformation rules, which Although there was more temporal modulation at are applied successively within a hierarchical

78 174 Repp representation of the musical structure. In the expressive timing characteristics and acceptable first instance, these transformations represent tempo range were known from Repp's (1992) de- changes along a continuum of degrees of tailed analysis of 28 expert performances, and also expressiveness (i.e., deviations from mechanical because it is a slow, interpretively demanding, but exactitude) within a fixed temporal frame. technically not very challenging piece. It contains However, the temporal changes within a larger no staccato notes or ornaments, apart from a few unit propagate down to smaller units, effectively expressive grace notes. Thus the main question altering their tempo. Judging from their Figure 5, was whether the expressive timing of the melody Desain and Honing propose that tone onsets tones would or would not scale proportionally with within melodic sequences a--e stretched or shrunk changes in tempo. However, the tempo scaling of proportionally as tempo changes, whereas the other performance aspects (grace notes, chord other three types of structures remain temporally asynchrony, tone overlap, pedaling, dynamics) constant or get "truncated" (in the case of chord was of nearly equal interest. Based on the limited asynchrony). Thus, these authors seem to suggest observations summarized above, it was expected that relational invariance across changes in tempo that the overall timing and intensity profiles does hold in a sequence of melody tones, but not in would exhibit relational invariance across tempo chords or ornaments. As to articulation (i.e., the changes, whereas this might not be true for the overlap or separation of successive tones), they more detailed temporal features. In the course of discuss three alternative transformations, without the detailed analyses, there were opportunities to committing themselves. Their model is a system make new (and confirm old) observations about for implementing different types of rules, not a the relation of musical structure and performance theory of what these rules might be. Nevertheless, microstructure, some of which will be discussed their examples reflect some of their earlier briefly in order to characterize the nature of the informal observations. patterns whose relational (non)invariance was in- The present study investigated not only timing vestigated. A detailed investigation of structural patterns but also tone intensities, another impor- factors is beyond the scope of this paper. tant dimension of expressi- e performance that has In addition to these performance measurements, received much less attention in research so far. a perceptual test was carried out along the lines Todd (1992) has pointed out that an increase of explored by Desain and Honing (1992a), to deter- tempo often goes along with an increase in inten- mine whether artificially speeded-up or slowed- sity, which leads to the hypothesis that the overall down performances would sound odd compared to dynamic level of a performance may be affected by original performances having the same overall a tempo change. MacKenzie and Van Eerd (1990) tempo. The results of that test will be described found such an increase in key press velocity with first, followed by the performance analyses. tempo in their study of pianists playing scales. It is not clear whether any changes in intensity Methods relationships should be expected across tempi, The music however. MacKenzie and Van Eerd found a subtle The score of Schumann's "Traumerei" is shown interaction, but the differences were not expres- in Figure 1; its layout highlights the hierarchical sively motivated. The hypothesis investigated structure of the piece. The composition comprises here was that intensit., microstructure would three 8-bar sections, each of which contains two 4- remain relationally invariant across tempo bar phrases which in turn are composed of shorter changes, but that pianists might play louder melodic gestures. The first section (bars 1-8) is overall at faster tempi. (If this latter effect were repeated in performance. Each of the six phrases absent, the intensity pattern would be absolutely starts with an upbeat that (except for the one at invariant.) the very beginning) overlaps with the end of the The present study aimed at providing some pre- preceding phrase. Phrases 1 and 5 are identical liminary data bearing on these issues. The data and similar to phrase 6, whereas phrases 2, 3, and were limited in so far as they came from a single 4 are more similar to each other than to phrases 1, musical composition played by two pianists, but 5, and 6. Thus there are two phrase types here. they included a number of different measurable Vertically, the composition has a four-voice aspects of performance. The composition, Robert polyphonic structure. For a more detailed Schumann's "Traumerei," was selected because its analysis, see Repp (1992). Relational Invariance of anressive Microstructure across Global Tempo Changes in Music Performance 1 75

a

13I

...- I r r Thn vt 5a 511

El Tr-1177 I

Y I I 1 t. Ii I

I

(ospr.) El Ea

I

N 1).-41"--i I I r r 2:71 rill! lespe.)

e k .....--1 I.a. .d _(...=.______- - .,_,...... ,,_1 I, ) V. f.....;) r____L 1 v t ; 1 ....vi f I I.2. r t)I J f_-1.- I I A ... ..r't1. L,..2..._...... rla,...... ;.,,,.

:77)

Arl

22 (7., RI J. r-r-1

; -I I T 'TN r .,,, I I 14,u,

FIgure 1. Score of Schumann's "Teaurnerei," prepared with MusicProse software following the Schumann edition (Breitkopf & [Tarte!). Minor discrepancies are due to software limitations. The layout of the computer score highlights paralle, musical structures.

60 176 ReP1,

Performers The MIDI data files were converted into text The two performers were LPH, aprofessional files and imported into a data analysis and pianist in her mid-thirties, and BHR (theauthor), graphics program (DeltaGraph Professional) for a serious amateur inhis late forties. Both were further processing. Tone interonset intervals thoroughly familiar with Schumann's "Triiumerei" (Ms) in milliseconds were obtained by computing and had played it many times in the past. differences among successive tone onset times. Recording procedure Tone intensities were expressed in terms of raw The instrument was a Roland RD250Sdigital MIDI velocities ranging from 0 to 127. In the piano with weighted keys and sustainpedal middle range (which accommodated virtually all switch, connected to an IBM-compatible melody tones), a difference of 4 MIDI velocity microcomputer running the Forte sequencing units corresponds to a 1 dB difference in peak rms program. Synthetic "Piano 1"sound was used. The sound level (Repp, 1993). pianists played from the score and monitoredthe Perceptual test sound over earphones. The computer recorded Stimuli. The purpose of the perceptual test was each performance in MIDI format, includingthe to determine whether performances whose tempo onset time, offset time, and velocity of eachkey has been artificially altered sound noticeably more depression, as well as pedal on and off times. The awkward than unaltered performances. To reduce temporal resolution was 5 ms. the length of the test, each performance was Each pianist was recorded in a separate session. truncated after the third beat of bar 8; the After some practice on the instrument, (s)he cadential chord at that point was extended in played the full piece (including the repeat of the duration to convey finality. Moreover, only the first8 bars) ather/his preferred tempo. first two of each pianist's three performances at Afterwards, (s)he listened to the beginning of the each tempo were used. From each of these 12 recorded performance and set a Franz LM-FB-4 truncated original performances, two modified metronome to what (s)he believed to be the basic performances were generated by artificially tempo of the performance. The settings chosenby speeding up or slowing down its tempo, such that LPH and BHR were M.M.63 and 66, respectively. its total duration matched that of an original Subsequently, each pianist performed the piece at performance at a different tempo by the same a slower and at a faster tempo. Thesetempi were pianist. Thus, for example, LPH's first slow chosen by the experimenter to be M.M.54 and 72 performance was speeded up to match the for LPH and M.M.56 and 76 for BHR. All these duration of the first medium performance, and tempi were within the range observed in was speeded up further to match the firstfast commercially recorded performances by famous performance. The tempo modification was pianists (Repp, 1992; in press). The desired tempo achieved by changing the "metronome" setting in was indicated by the metronomebefore each the Forte sequencing program, relative to the performance; the metronome was turned off before baseline setting of M.M.100 during recording. playing started. The pianists' intention was to (This setting was unrelated to the tempo of the play as naturally and expressively as possible at recorded performance but determined the each tempo; no conscious effort was made either to temporal resolution.) maintain or to change the interpretation across Each of the 12 original performances was then tempi. paired with each of two duration-matched Each pianist then repeated the cycle of three modified performances, resulting in 24 pairs. The performances until three good recordings had original performance was first in half the pairs been obtained at each tempo.2 In these repeats, and second in the other half, in a counterbalanced the medium (preferred) tempo was also cued by fashion. The pairs were arranged in a random metronome. LPH's first performance was excluded order and were recorded electronically onto digital because it showed signs of her still getting cassette tape. Performances within a pair were accustomed to the instrument, and an extra separated by about 2 s of silence, whereas about 5 medium-tempo performance was added at the end s of silence intervened between pairs. The test of the session. BHR actually recorded five lasted about half an hour. performances at each tempo, two in one session Subjects and procedure. Nine professional-level and three in another. Only the performances from pianists served as paid subjects. All but one were the second session were used. Thus neither graduate students at the Yale School of Music, pianist's first performance was included in her/his se.ren of them for a master's degree in piano final set of 9 performances. performance and one for a Ph.D. degree in Relational Invariance of Expressive Microstructure across Global Tempo Changes in Music Performance 177 composition. They were tested individually in a dominated by the major excursions in the timing quiet room. The test tape was played back at a profiles. A measure of similarity at a more comfortable intensity over Sennheiser HD420SL detailed level was obtained by computing earphones. The subjects were fully informed about correlations only over the initial 8 bars, which did the nature and manipulation of the stimuli, and not contain any extreme ritardandi. These they were asked toidentify the original correlations (N=130) were 0.64 (p < .0001) performance in each pair by writing down "1" or between LPH and BHR, 0.77 between LPH and "2," relying basically on which performance GAFP, and 0.83 between BHR and GAFP.4 sounded "better" to them. Clearly, both pianists' performances were reasonably representative in the sense that they Results and Discussion resembled the average expert profile, even though Perceptual test they were only moderately similar to each other at Average performance in the perceptual test was a detailed quantitative level. 55.6% correct. This was not significantly above Each pianist also showed high individual chance across subjects [F(1,8) = 2.91, p < .13], consistency in terms of timing patterns both though it was marginally significant across pairs within and between the three performances at of items [F(1,11) = 5.08, p < .05].3 There was no her/his preferred tempo. Thus the average significant difference between the scores for the correlation among the three complete medium- two pianists performances (though BHR's were tempo performances was 0.91 for LPH and 0.95 slightly easier to discriminate) or for different for BHR. For the initial 8 bars the corresponding types of comparisons. These results indicate that correlations were 0.90 for both LPH and BHR. the tempo transformations did little damage to the Needless to say, the timing patterns of the two expressive quality of the performances, and hence renditions of bars 1-8 within each performance that there were probably no gross deviations from were also extremely similar. relational invariance, at least in the first 8 bars. Furthermore, as can be seen in Figure 2, both Timing patterns at the preferred tempo LPH and BHR produced very similar timing The performance aspect of greatest interest was profiles for the identical phrases in bars 1-4 and the pattern of IOIs (the timing pattern or profile). 17-20 (left-hand panels); LPH played the whole Before examining the effects of changes in tempo phrase slower in bars 17-20 whereas BHR slowed on this aspect of temporal microstructure, down only in bar 17. Both pianists also showed however, it seemed wise to look at the timing similar timing patterns for the analogous phrases profiles of the performances at the pianists' in bars 9-12 and 13-16 (right-hand panels), which preferred (medium) tempi, to confirm that they diverged substantially only at the end, due to the were meaningful, representative, and reliable. more pronounced ritardando in bar 16, the end of This seemed particularly important in view of the the middle section. The structurally similar fact that an electronic instrument had been used. phrase in bars 5-8 (right-hand panels) again Figure 2 plots the onset timing patterns aver- showed a similar pattern; apart from slight aged across the three medium-tempo perfor- differences in overall tempo, major deviations mances of each pianist. The format of this figure is from the pattern of the other two phrases occurred identical to that of Figure 3 in Repp (1992), which only in the second bar, where the music in fact shows the Geometric Average timing pattern of 28 diverges (cf. Figure 1), and also in the fourth bar different performances by 24 Famous Pianists for BHR, who emphasized the cadence in the (GAFP for short). The timing patterns of three soprano voice and treated the phrase-final bass related phrases are superimposed in each panel. voice as more transitional than did LPH. Finally, (The two renditions of bars 1-8 were first aver- bars 21-24 (left-hand panels) retained the aged.) Interonset intervals longer than a nominal qualitative pattern of bars 1-4 but included eighth-note are shown as "plateaus" of equal substantial lengthening due to the fermata and eighth-note intervals. the final grand ritardando.5 The two pianists' average medium-tempo Many additional comments could be made about performances resembled both each other and the the detailed pattern of temporal variations and GAFP timing pattern. The correlations of the their relation to the musical structure, but such a complete log-transformed IOI profiles (N=254) discussion may be found in Repp (1992) and need were 0.85 between LPH and BHR, 0.88 between not be repeated here. Instead, we will focus now LPH and GAFP, and 0.94 between BHR and on the comparison among the three tempo GAFP. Of course, these overall correlations were conditions.

S 2 178 Repp

2500 - Bars 0-4

Bars 17-20

Bars 21-24 -J

LU Z1000-

LU Cf) 0 LU

350 1161117414iiii1111.11illilliii iiiiiiiiiii 141 C) .- in c? gr) n 0 - - CY CNI CY 0 en 0 CI 'V th ththAth co cO BAR/EIGHTH-NOTE BAR/EIGHTH-NOTE 2500 --op Bars 0-4 Bars 5.8 BHR s-- Bars 17-20 e-- Bars 9-12

Bars 21-24 Bars 13.16

1, 1111 io.slaoil la.114141 I c? III 0 V/ A 0 V) r, In CI VI A CI VI 0 cJ c:4 cI ei ci 4- 4 4 141 V) (DID ID tO r:.F..r:.r:. BAR/EIGHTH-NOTE BAR/EIGHTH-NOTE

Flgure 2. Eighth-note 10Is as a function of metric distance, averaged across the three medium-tempo performances of each pianist (top: LPH; bottom: BHR). Structurally identical or similar phrases are overlaid in left- and right-hand panels.

1S3 Relational Invariance of Expressive Microstructure across Global Tenwo Changes in Music Performance 179

Effects of tempo on global timing patterns to those at the medium tempo? First, the A first question was whether the pianists were correlational evidence: The average of the 27 as reliable in their timing patterns in the slowand between-tempo correlations (3 x 3 for each of 3 fast conditions as at the medium tempo. After all, pairsof tempo categories)acrossentire they had to play at tempi they would not have performances for LPH was 0.90, which was the chosen themselves. The average overall between- same as her average within-tempo correlation. For performance correlations within tempo categories BHR, the analogous correlations were 0.94 and for LPH were 0.88 (slow) and 0.91 (fast), as corn- 0.95, respectively. At the more detailed level of pared to 0.91 (medium); and for BHR they were bars 1-8, the correlations were both 0.86 for LPH, 0.95 (slow) and 0.94 (fast), as compared to 0.95 and 0.87 versus 0.88 for BHR. Thus, between- (medium). Thus the pianists were just about as re- tempo correlations were essentially as high as liable in their gross timing patterns at unfamiliar within-tempo correlations. tempi as at the familiar tempo. Computed over If, as predicted by the relational invariance hy- bars 1-8 only, the correlations were 0.83 (slow) pothesis, the intervals at different tempi are pro- and 0.86 (fast) versus 0.90 (medium) for LPH, and portional, then the average timing profiles should 0.89 (slow) and 0.86 (fast) versus 0.90 (medium) be parallel on a logarithmic scale. This is shown for BHR. At this more detailed level, then, there is for the first rendition of bars 0-8 in Figure 3. an indication that the pianists were slightly more There is indeed a high degree of parallelism be- consistent when they played at their preferred tween these functions; the few local deviations do tempo. Clearly, however, they produced highly not seem to follow an interpretable pattern. These replicable timing patterns even at novel tempi. data thus seem consistent with the hypothesis of a We come now to the crucial question: Were the multiplicative rate parameter (which becomes ad- timing profiles at the slow and fast tempi similar ditive on a logarithmic scale).

ecineiti4444 BAR/EIGHTH-NOTE BAR/EIGHTH-NOTE

1000 tlow nwpium [BHRJ last

wW4 Inen BAR/EIGHTH-NOTE BAR/EIGHTH-NOTE

Figure 3 Eighth-note timing profiles for the initial 8 bars (first renditions) at three different tempi, averaged across the three performances within each tempo category.

Sti 180 Repp

Nevertheless, further statistical analysis between the slow and medium tempi than revealed that there were some reliable changes in between the medium and fast tempi. temporal profiles across tempi. Analyses of Thus there is statistical evidence that relational variance were first conducted on each pianists' invariance did not hold strictly, as observed also complete log-transformed 101 data, with MI (214 by Heuer (1991) in his discussion of simpler motor levels; see footnote 4) and tempo (3 levels) as fixed tasks. Still, the timing patterns were highly simi- factors, and individual performances nested lar across the different tempi, and the differences within tempi (3 levels) as the random factor. If the that did occur do not suggest different structural log-transformed timing profiles are strictly interpretations. Moreover, they were not suffi- parallel, then the RN by tempo interaction should ciently salient perceptually to enable listeners to be nonsignificant. In other words, the between- discriminate original performances from perfor- tempo variation in profile shape should notmances whose tempo had been modified significantly exceed the within-tempo variation. artificially. (Since there was no evidence that For LPH, this was indeed the case (F(426,1278) = changes with tempo were larger later in the piece 1.031. For BHR, however, there was a small but than at the beginning, the results of the significant interaction [F(426,1278) = 1.55, p < perceptual test for bars 0-8 probably can be .0001]. Moreover, when similar analyses were generalized to the whole performance.) It seems conducted on bars 0-8, corresponding to the fair to conclude, then, that relational invariance of excerpts used in the perceptual test (though with timing profiles held approximately in these all three performances at each tempo included), performances. the IOI by tempo interaction was significant for Grace note timing patterns both LPH [F(96,288) = 1.57, p < .003] and BHR The analysis so far has focused on eighth-note [F(96,288) = 1.77, p < .0003]. Additional analyses (and longer) IOIs only. However, "Triiumerei" also confirmed that BHR showed subtle changes in contains grace notes. The question of interest was timing pattern with tempo throughout most of the whether the timing of the corresponding tones music, whereas LPH showed no statistically also changed proportionally with overall tempo. reliable changes with tempo except at the very There are two types of grace notes in the piece. beginning of the piece.6 Those of the first type, notated as small eighth- Yet another way of analyzing the data revealed, notes, are important melody notes occurring dur- however, that LPH did show some systematic ing major ritardandi. The slowdown in tempo lets deviations from relational invariance, after all. them "slip in" without disturbing the rhythm. One For pairs of tempi, the log-transformed average of these notes occurs in bar 8 (between the fourth IOIs at the faster tempo were subtracted from the and fifth eighth-notes); the other is the upbeat to corresponding IOIs at the slower tempo, and the the recapitulation in bar 17 (following the last correlation between these differences and the IOIs eighth-note in bar 16). The other type, notated as at the slower tempo was examined. The relational pairs of small sixteenth-notes, represents written- invariance hypothesis predicts that the log- out left-hand arpeggi (bars 2, 6, and 18) and is difference (i.e., the ratio) between corresponding played correspondingly faster. While proportional IOIs should be constant, and hence the correlation scaling of the first, slow type was to be expected with ICH magnitude should be zero. However, because they are essentially part of the expressive LPH showed a highly significant positive timing profile, the prediction for the second type correlation for medium versus fast tempo I r(252) was less clear. However, neither type is readily = 0.50, p < .0001], which indicates that long IOIs classifiable in terms of Desain and Honing's changed disproportionately more than short IOIs. (1992b) appoggiatura-acciaccatura distinction. BHR showed a similar but smaller tendency, Several ANOVAs were conducted on the log- which was nevertheless significant [r(252) = 0.18, transformed IOIs defined by these grace notes. p < .01]. Between the slow and medium tempi, Separate analyses of bars 8 and 16 included three LPH showed a weak tendency in the opposite JOIs: the preceding eighth-note ICH and the two direction, whereas BHR showed no significant parts of the eighth-note NM bisected by the onset correlation. Very similar results were obtained for of the grace note. In neither case was there a bars 1-8 only; thus the correlations did not derive significant WI by tempo interaction.7 Thus the solely from the very long IOIs. Relational timing of these grace notes scaled proportionally invariance evidently held to a greater degree with changes in tempo. LPH and BHR differed in

tS 5 Relational Invariance of Expressive Microstructure across Global Tempo Changes in Music Performance 181 their timing patterns: In bar 8, the grace note one major motivation for expressive asynchrony occupied 68% (LPH) versus 57% (BHR) of the (cf. Palmer, 1989). However, it is technically tricky fourth eighth-note IOI; in bar 16, it occupied 55% because it involves both hands in interleaved (LPH) versus 37% (BHR) of the last eighth-note fashion: The first and third tones (numbering from IOI. BHR thus took the grace-note notation top to bottom) are taken by the right hand, while somewhat more literally than did LPH. the second and fourth tones are taken by the left Bars 2 and 18 were analyzed separately from hand. In addition to unintended asynchronies that bar 6, which involved different pitches (cf. Figure may be caused by this physical constellation, 1), though the timing patterns were found to be planned asynchronies may be involved in the very similar. In each case there were three IOIs, proper "voicing" of the chord, which is a sonority of defined by the onset of the second eighth-note in expressive importance. the right hand, the onsets of the two grace notes For each of the eight instances of the chord in in the left hand, and the onset of the highest note each performance, the lOIs for the three lower of the chord in the right hand. In no case was tones (Tones 2-4) were calculated relative to the there a significant IOI by tempo interaction.8 highest tone (Tone 1).10 Lag times were positive This, even for these arpeggio grace notes whereas lead times were negative. The untrans- relational invariance seemed to hold. There were formed IOIs were subjected to separate ANOVAs again individual differences between the two for each pianist, with the factors rendition (i.e., pianists: In bars 2, 6, and 18, the two grace notes the 8 occurrences of the chord), tone (3), and together took up 68% (LPH) versus 79% (BHR) of tempo (3); performances (3, nested within tempo) the second eighth-note IOI, while the second grace were the random variable. The question here was note occupied 63% (LPH) versus 69% (BHR) of the whether the IOIs changed at all with tempo. grace note interval. Both pianists maintained The ANOVA results were similar for the two their individual patterns across bars 2, 6, and 18. pianists. Both LPH [F(1,6) = 70.94, p < .0003] and Both individual timing patterns are within the BHR [F(1,6) = 84.71, p < .0002] showed a highly range of typical values observed in Repp's (1992) significant grand mean effect, indicating that the analyses of famous pianists' performances.8 average IOI was different from zero: The three Chord asynchronies lower tones lagged behind the highest tone by an One of the factors that might enable listeners to average of 16.9 ms (LPH) and 9.6 ms (BHR), discriminate a slowed-down fast performance from respectively. In addition, each pianist showed a an originally slow one is that the asynchronies rendition main effect [LPH: F(7,42) = 3.58, p < among nominally simultaneous tones are enlarged .005; BHR: F(7,42) = 12.40, p < .0001], a tone main in the former. It was surprising, therefore, that effect [LPH: F(2,12) = 12.80, p < .002; BHR: there was no indication in the perceptual data F(2,12) = 49.18, p < .0001], and a rendition by tone that slowed-down fast performances sounded interaction (LPH: F(14,84) = 2.56, p < .005; BHR: worse than speeded-up slow ones. There is no F(14,84) = 4.34, p < .0001]. However, no effects obviousreason why unintendedvertical involving tempo were significant. In particular, asynchronies (i.e., purely technical inaccuracies) there was no indication that lags were larger at should scale with tempo. However, asynchronies the slow tempo. The high significance levels of the that are intended for expressive purposes (cf. other effects show that the data were not simply Vernon, 1937; Palmer, 1989) might possibly too variable for systematic effects of tempo to be increase as tempo decreases. Rasch (1988) reports found. such a tendency for asynchronies in ensemble The systematic differences in patterns of playing. asynchrony across renditions are of theoretical A complete analysis of asynchronies was beyond interest and warrant a brief discussion. They are the scope of this paper. Instead, the analysis shown inFigure4,averaged acrossall focused on selected instances only. One good performances. The mean lags for individual tones candidate was the four-tone chord near the indicated that, for both pianists, the lowest beginning of each phrase (see Figure 1). This (fourth) tone lagged behind more than the other chord occurs eight times during the piece, seven two. For BHR, moreover, the third tone did not times in identical form in F major and once (bar lag behind at all, on the average. This was clearly 13) transposed up a fourth to B-flat major. It does a reflection of the fact that it was played with the not contain any melody tones, which eliminates same hand asthehighest(first)tone. 182 Rq

70 I I 60- LPH 50- -(3 40- E --a) 30- E 20 - cn ._J 10- III -10- -20 234 1234 1234 1234 1234 1234 1234 1234 1 5 1R 5R 9 13 17 21

50 I I 40- BHR -J.; 30- E 0) 20- E = 0 - ca) 11 _1 0 Iv II

-20 1234 1234 1234 1234 1234 1234 1234 1234 1 5 1R 5R 9 13 17 21 Bar number

rigurc 4. Chord asynchronies relative to the highest tone, avetaged acro5s all tempi. The bars for each chord show lag times for tones2-4 (in order of decreasing pitch) relative to tone I. "R" after a bar number stands for "repeat? Relational Invariance of Expressive Microstructure across Global Tempo Changes in Music Performance I S3

Both pianists showed the largest lag times in bar purposes. They alternated between two patterns: 1, which is perhaps a start-up effect of "getting the The left hand (tones 3 and 4) either lagged behind feel" of the instrument. Again for both pianists, or led the right hand (tones 1 and 2). The right- the only rendition in which any lower tones before-left pattern occurred in positions 10-2 (slow preceded the highest tone was the one in bar 13, tempo) and 14-2. The left-before-right pattern which is in a different key. The precedence of the occurred in positions 10-3 (at both slow and fast third tone here may be due to the fact that it falls tempo) and 14-3, as well as in position 10-2 at the on a black key, which is more elevated than the fast tempo. The medium-tempo patterns in white keys and hence is reached earlier by the positions 10-2 and 10-3 (not shown in the figure) thumb of the right hand. Apart from this were split among the two alternatives. The two exceptional instance, BHR maintained his tones in each hand were roughly simultaneous, characteristic 1-3-2-4 pattern across all renditions, except in position 14-3, where the lowest note though the magnitude of the lags varied occurred especially early. The net effect in bar 14 somewhat. LPH was more variable, showing an and, less consistently, in bar 10 was that the tenor average 1-2-3-4 pattern in four renditions, a 1-3-2- and bass voices entered late but continued early, 4 pattern in two, and a 1-2-4-3 pattern in one. so that the first 101 in these voices was It could well be that the chord asynchronies just considerably shorter than the corresponding IOI analyzed did not vary with tempo because they in the soprano voice. This makes good sense: were not governed by expressive intent, only by While the soprano voice slows down towards the fingering and hand position. The chord examined end of a gesture, the tenor voice initiates the did not contain any melody tones and thus did not imitation with a different temporal regime. The motivate any highlighting of voices through discrepancy thus highlights the independent deliberate timing offsets. Therefore, a second set melodic strains. of chords was examined. The tone clusters on the The statistical analysis was conducted on the second and third eighth-notes of bars 10 and 14 left-hand lag times only, because they showed the (see Figure 1) also contain four tones each (three major variation. Due to LPH's strategic changes in one instance), but all of them have melodic with tempo in bar 10, there were highly function. The soprano voice completes the primary significant interactions with tempo in the overall melodic gesture; the alto voice "splits off" into a analysis of her data. A separate analysis on bar counterpoint; the tenor voice enters with an 14, however, revealed no effects involving tempo. imitation of the primary melody, accompanied by The difference between positions 14-2 and 14-3 the bass voice, which has mainly harmonic was obviously significant, as was the effect of tone function. Thus the relative salience of the voices [F(1,6) = 19.57, p < .005] and the position by tone may be estimated as 1-3-2-4, which may well be interaction [F(1,6) = 28.34, p < .002]. reflected in a sensitive pianist s relative timing of BHR, on the other hand, showed a very different the tone onsets. Unlike the chords examined pattern. His asynchronies were much smaller and previously, tones 1 and 2 are played by the right always showed a slight lag of the left hand, more hand, whereas tones 3 and 4 are played by the left so in bar 10 than in bar 14. His patterns were hand. Since the highlighting of the tenor voice highly consistent but reveal little expressive entry (left hand) may be the performer's primary intent, as they are comparable in magnitude to goal, the temporal relationship between the two the asynchronies in the nonmelodic chord hands may be the primary variable here. analyzed earlier. The statistical analysis on the This indeed turned out to be the case, though left-hand lag times showed the difference between with striking differences between the two pianists. bars 10 and 14 to be highly reliable (F(1,6) = LPH changed her pattern of asynchronies 24.51, p < .003]. The tendency for the bass voice to radically (and somewhat inconsistently) with lag behind the tenor voice in bar 10, but to lead tempo in bar 10; otherwise, however, the patterns it in bar 14, was reflected in a significant did not vary significantly with tempo. The main interaction [F(1,6) = 12.66, p < .02]. No effect effects of the tone factor, on the other hand, were involving tempo was significant, however. Thus, generally significant, indicating consistency in neither pianist's patterns scaled with tempo. patterning for each chord. Relational invariance does not seem to hold at this The results are shown in Figure 5. As can be level; instead, there is absolute invariance across seen, LPH showed enormous asynchronies which tempi, except for the qualitative changes noted in were almost certainly intended for expressive some of LPH's data.

S 8 184

80 60- LPH 40- 20- II

-60- II II 11 -80-

-100 - slow, slow fast fast all all -120 1 2 3 4 1 2 3 4 12 34 12 3 4 12 3 4 10-2 10-3 14-2 14-3 40 BHR 20-

0 1 II 1 1 1 1 2 3 4 12 3 4 12 3 4 12 3 4 10-2 10-3 14-2 14-3 Bar/Eighth-note

Figure 5 Chord asynchronies in bars 10 and 14, averaged across all tc except as noted. The second tone is absent in position 14-2.

v Relational Invariance of Expressive Microstructure across Global Tempo Changes in Music Performance 185

Tone overlap interaction)tobesignificant. The data Another detailed feature that was examined, corresponding to the triple interaction [F(6,18) = again in a selective manner, was tone overlap. 4.04, p < .011 are shown in Figure 6 below. The MacKenzie and Van Eerd (1990), in their study of data are averaged over renditions, which had only piano scale playing, found that the absolute over- a main effect [F(1,6) = 10.30, p < .02], representing lap of successive tones in legato scales decreased an increase in overlap with repetition. as tempo increased. At the slowest tempo Perhaps the most interesting fact about LPH's (nominally similar to the present slow tempi, but data is that the overlap pattern differed between really twice as fast because the scales moved bars 1/2 and 5/6 from the very beginning, even along in sixteenth-notes) successive tones over- though the two tone sequences were identical up lapped by about 15 ms, which decreased to near to last tone. There was slightly more overlap zero as tempo increased. Because of the expressive between the first two tones in bars 1/2 than in legato required in "Traumerei," and the slower bars 5/6. While this may not be a significant rate of tones (eighth-notes), larger overlaps were difference, there is no question that the overlap to be expected. The question was whether they got between the following two tones was much greater smaller as tempo increased. The measure of over- in bars 1/2 than in bars 5/6. The same is true for lap was the difference between the offset time of the next two tones, although there is much less one tone and the onset time of the following tone; overlap in absolute terms. Finally, the situation is a negative value indicates a gap between the tones reversed for the last two tones, which show (which may not be audible because of pedalling). enormous overlap in bars 5/6, but much less in The passage selected for analysis was the bars 1/2. The tempo effects within this striking ascending eighth-note melody in the soprano voice interactionarefairlyconsistent (overlap which occurs first in bars 1/2 and recurs in decreasing with increasing tempo), except for the identical or similar form 7 times during the piece third and fourth tones in bars 112, whose overlap (see Figure 1). Because of different fingering and increases with tempoan unexplained anomaly. an additional voice in the right hand, the BHR's overlap times, in contrast to LPH's, were instances in bars 9/10 and 13/14 were not of the expected magnitude, and there was no included. Moreover, because overlap times varied significant effect involving tempo. However, the wildly in the last interval of bar 22 (preceding the main effects of bar and tone, and the bar by tone fermata) for both pianists, the data were interaction [F(3,18) = 24.90, p < .00011 were all restricted to the first four instances only (i.e., bars highly significant. As for LPH, the difference in 1/2 and 5/6, and their repeats). Each of these final interval size affected the overlap pattern of consists of 5 consecutive eighth-notes, yielding 4 the whole melodic gesture. BHR showed negative overlap times. The last two notes span the overlap (i.e., a gap, camouflaged by pedalling) interval of a fourth in two instances (bars 112) and between the first two tones, which was larger in a major sixth in the other two (bars 5/6); bars 5/6 than in bars 1/2. The overlap between the otherwise, the notes are identical. The fingering is second and third tones was larger in bars 5/6, but likely to be the same. The statistical analyses (on that between the third and fourth tones was untransformed data) included bars as a factor, as larger in bars 1/2. Finally, there was overlap well as renditions (here, strict repeats), tones, and between the last two tones in bars 112, but a small tempo. Again, the question was whether the gap in bars 5/6, probably reflecting the stretch to overlap times varied with tempo at all. the larger interval of a sixth. KIR's playing style LPH's data were quite a surprise because her thus can be seen to be only imperfectly legato, overlap times were an order of magnitude larger perhaps due to his less developed technique. than expected. Not only did she play legatissimo Nevertheless, he was highly consistent in his throughout, but she often held on to the second imperfections, and his data suggest independence and fourth tones through the whole duration of of tempo. the following tone, resulting in overlap times in Pedal timing excess of 1s. Since these overlap times thus As the last temporal facet of these per- became linked to the regular eighth-note lOis, it formances, we consider the timing of the sustain is not surprising that effects of tempo were found. pedal. Although variations in pedal timing within The effects were not simple, however. The ANOVA tone IOIs probably had little if any effect on the showed all main effects and interactions involving acoustic output, they are of interest from the bar, tone, and tempo (except for the bar by tone perspective of motor organization and control. 186 RePP

Bars 1/2 Bars 5/6 1600 1400 ILPH 1200 E woo- a 800

13 600 0400 200 0smf smf smf smf smf smf smf smf 1-2 2-33-44-5 1-2 2-33-44-5 60 40 20 III hlIII CoE -20I to.) o -40II -60 II BHR -80 mf smf smf smf smf smf smf smi 1-2 2-33-44-5 1-2 2-33-44-5 Tone pair

Ftsure6.Tone overlap times inbars 1/2 and 516 al s(low), m(edium), and f(ast) tempo. Relational Invariance of Expressive Microstructure across Global Tempo Changes in Music Performance 187

The pianist must coordinate the foot movements present) occurred well after ICH onset, and her with the hand movements, and this is usually pedal onsets occurred around the middle of the done subconsciously, while attention is directed IOI. BHR's pedal offsets, on the other hand, towards the keyboard. occurred at the beginning of the IOI and his pedal A complete analysis of the pedalling patterns onsets were relatively early also. Another striking would lead too far here, as the pedal was used difference between the two pianists is that LPH continuously throughout each performance.11 The held on to the bass tone (as indicated by the analysis focused on the opening motive of each absence of dashes in the figure), except in bar 13 phrase, specifically on the interval between the where this was physically impossible. BHR, on the downbeat melody tone and the following 4-note other hand, despite his larger hands, always chord, whose highest tone was used as the tempo- released the bass tone following the pedal ral reference (see Figure 1). This quarter-note IOI, depression. which occurred eight times in the course of each For LPH, pedalling did not to seem to vary performance (in bars 1, 5, 1R, 5R, 9, 13, 17, and systematically with tempo.12 This was confirmed 21, where "R" stands for "repeat"), usually con- in ANOVAs on pedal offset and onset times, as tained a pedal change (i.e., a pedal offset followed well as on their difference (pedal off time). None of by a pedal onset) in both pianists performances, these three variables showed a significant tempo though BHR's pedal offsets generally fell very effect or a tempo by bar interaction. However, all close to the onset of the downbeat tone, often pre- three varied significantly across bars [F(4,24) = ceding it slightly. Exceptions were bars 1, 1R, and 8.57, p < .0003; F(7,42) = 15.43, p < .0001; F(4,24) 9, which usually contained only a pedal onset, the = 10.00, p < .0002]. For BHR, on the other hand, previous offset being far outside the Kg or absent both pedal offset times [F(2,6) = 10.06, p < .02] (at the beginning of the piece). -nly the pedal on- and pedal onset times [F(2,6) = 6.57, p < .04] sets are marked in the score. The pedal serves decreased as tempo increased, whereas pedal off here to sustain the bass note struck on the down- times did not vary significantly. All three varied beat, which a small left hand needs to relinquish significantly across bars [F(4,24) = 6.09, p < .002; in order to reach its portion of the following chord. F(6,36) = 12.58, p < .0001; F(4,24) = 22.03, p < In bar 13, in fact, the bass note can be sustained .0001], even though the differences were much only by pedalling, az it occurs in a lower octave. smaller than for LPH. Even more clearly than the The question of interest was whether, across the pedalling times, BHR's bass tone offset times variations in tempo, the pedal actions occurred at varied with tempo [F(2,6) = 27.26, p < .001], and a fixed time after the onset of the quarter-note ICH also across bars [F(7,42) = 14.14, p < .00011. or whether their timing varied in proportion with To determine whether BHR's timings exhibited tempo, so that they occurred at a fixed percentage relational invariance, they were expressed as of the time interval. Figure 7 shows the average percentages of the total 101 and the ANOVAs quarter-note IOI durations (squares) at the three were repeated. Of course, the pedal offset times, tempi in the different bars. As expected, there was which varied around the 101 onset, could not be generally a systematic decrease in ICH duration effectively relativized in this way, so that the from slow to medium to fast tempo [LPH: F(2,6) = tempo effect persisted [F(2,6) = 9.36, p < .02]. The 12.85, p < .007; BHR: F(2,6) = 88.25, p < .00011. tempo effect on pedal onset times, on the other Both pianists also showed a highly significant hand, disappeared [F(2,6) = 0.37], whereas a main effect of bar on MI duration (LPH: F(7,421= tempo effect on pedal off times emerged [F(2,6) = 21.36, p < .0001; BHR: F(7,42) = 47.38, p < .0001): 12.62, p < .008]. Thus BHR's data could be The longest IOIs occurred in bar 17 (the beginning interpreted as either exhibiting relational of the recapitulation); LPH also had relatively invariance for pedal onset times, or absolute long IOIs in bar 21 and short ones in bar 1, invariance of pedal off times. There was no whereas BHR had relatively long IOIs in bar 13 question, however, that BHR's bass tone offset (the passage in a different key). There was a small times scaled proportionally with tempo, as there tempo by bar interaction for BHR only [F(14,42) = was no tempo effect on the percentage scores 2.14, p < .031. [F(2,6) = 0.711. All relativized measures, on the Figure 7 also shows average pedal offset and other hand, continued to vary significantly across onset times, as well as bass tone offset times. The bars IF(4,24) = 4.62, p < .007; F(6,36) = 17.81, p < two pianists differed substantially in their .0001; F(4,24) = 23.86, p < .0001; F(7,42) = 18.90, p pedalling strategies. LPH's pedal offsets (when < .0001). This was equally true for LPH's data. 188 Repp

a Chord on Bass off 2000 O Pedal on 180°- LPH * Pedal off 1600 -

1400 -

1200 -

1000 - 0 00 800 - 0oo 00 0 0 0 0 000 600 - 0 0 0 00 0 40°- *** * * 200 - ** 0 smf smf smf smf smf smf smf smf 1600BHR I 1400

1200 is a a 1000 IN a 800 600 -_ - _ 400 00 00000000000000000 0 200 0

0 * smf smf smf smi smf smf smf smf

1 5 1R 5R 9 13 17 21 Bar number

Figure 7.Absolute pedal offset and onset times and bass tone offset times within the first (quarter-note) 101 in each of eight bars, at s(low), m(edium), and f(ast) tempo. Each 10! starts at 0 and ends with the onset of (the highest tone of) the chord. Missing symbols reflect incomplete or absent data.

I3 Relational Invariance of Expressive Microstructure across Global Tempo Changes in Music Performance 189

In summary, these data do not offer much first time that such systematic variation has been support for the hypothesis of relational invariance demonstratedinpedalling behavioran in pedal timing across tempo changes. LPH's data interesting subject for future studies. were highly variable. BHR's data are consistent Intensity microstructure with an alternative hypothesisthat the pedal A detailed analysis of the intensity microstruc- off-on action (a rapid up-down movement with the ture of the performances would lead too far in the foot) was independent of tempo but was initiated present context. To verify, however, that both pi- earlier at the faster tempo. This earlier start may anists showed meaningful patterns of intensity be a reflection of a general increase in muscular variation, Figure 8 shows the intensity profiles tension. (strictly speaking, MIDI velocity profiles) of the The results are much clearer with regard to the melody notes (soprano voice) averaged across the differences in pedalling times across bars. Clearly, three medium-tempo performances of each pianist these effects are not relationally invariant and (and averaged across the two renditions of bars 1- thus must be caused by structural or expressive 8). The format is similar to that of Figure 2, with factors that vary from bar to bar. This may be the corresponding phrases superimposed.

.7 - la - w r, $1 0 - CV CV TV r) C5 atiailal)aif tfooil BAR/EIGHTH-NOTE BAR/EIGHTH-NOTE 90 -.- ear.0-4 80- BHR -a- Bars 17.20

70- Bars 21-24 0 60- 50- 40

30-

20 fl onV1 elWI e: e4 ri A 4. BA ii/EIG HTH- NOTE BAR/EIGHTH-NOTE

Figure 8. Average MIDI velocity as a function of metric distance, averaged across three medium-tempo performances. Structurally identical or similar phrases are overlaid. Only atcent eighth-notes are connected.

194 190 Re 1/1/

The intensity patterns were slightly more change is not largeabout 2 MIDI units, which variable than the timing patterns. Nevertheless, translates into about 0.5 dB (Repp, 1993). both pianists showed structurally meaningful However, there is a second systematic trend in patterns that were similar across phrasesof the data for both pianists: In the course of the similar structure.13 For example, the salient recording session, the average intensity dropped melodic ascent in bars 1-2, 5-6, and so on, always by an amount equal to (BHR) or larger (LPH) showed a steep rise in intensity which leveled off than that connected with a tempo change. or fell somewhat as the peak wasreached. The Interestingly, there was also a tendency for motivic chain during the second part of each the tempi to slow down across repeated phrase showed modulations related to the motivic performances (see Repp, in press), but that structure, especially in the phrase type shown in decrease was much smaller than the difference the left-hand panels, and a steady decrescendo in among the principal tempo categories. Both trends the phrase type shown in the right-hand panels. may reflect the pianists progressive relaxation Both pianists played bars 17-20 somewhat more and adaptation to the instrument; possibly also softly than the identical bars 0-4, and ended the fatigue. Finally, it is clear from Figure 9 that final phrase (bars 21-24) even more softly (left- BHR played more softly, on the average, than hand panels). LPH, but not BHR, played bars 9-12 LPH.14 more strongly than bars 5-8 (right-handpanels). Because of the within-tempo variation in Both pianists played bars 13-16 more softly than average intensity, the between-tempo differences bars 9-12; BHR especially observed the p p were nonsignificant, though they appeared tobe marking in bar 12 in the Clara Schumann edition systematic. For the same reason, the statistical (cf. Figure 1). assessment of variations in intensity patterns Given these fairly orderly patterns, the question across tempi was somewhat problematic, for if of interest was: Did they vary systematically with that variation was contingent on overall intensity tempo? Before addressing this issue, however, it (level of tension") there was no reason to expect it may be asked whether the average dynamiclevel to be larger between than within tempi. ANOVAs was equal across tempi. According to Tcdd (1992), were nevertheless carried out. Across all melody a faster tempo may be coupled with ahigher tones (N=166), the tone by tempo interaction fell intensity. short of significance for LPH [F(330,990) = 1.13, p Figure 9 shows that this was indeed the case in < .08] and was clearly nonsignificant for BHR. the present performances: For both LPH and Across the melody tones of the excerpt used in the BHR, the average intensity of the soprano voice perceptual test (bars 0-8, N=42), the tone by increased from slow to fast and from medium to tempo interaction reached significance for LPH fast tempo; there is no clear difference between [F(82,246) = 1.49, p < .01) but remained clearly slow and medium for LPH. The magnitude of the nonsignificant for BHR.15

62 LPH-1 >-. 60 o LPH-2 2a) 58 LPH-3 56 BHR-1 54 a) eBHR-2 cs) cti 52 BHR-3 a3 < 50 48 medium fast Tempo Ftgure 9. Grand average MIDI velocity as a function of tempo, separately for each pianist's three performances at each tempo.

1S5 Relational Invariance of Expressive Microstructure across Global Tempo Changes in Music Performance 191

Figure 10 shows the average intensity profiles beginning: The dynamic change from theupbeat for the melody tones in bars 0-8 at the three tempi to the downbeat was far larger at the fast for each pianist. The similarities tempo among the three than at the slow tempo. (Curiously, BHRshowed profiles are far more striking than the differences. the opposite.) On the whole, however,the For LPH, one source of the significanttone intensity profiles remained invariantacross by tempo interaction is evident at thevery tempi.

slow 90 medum 80- ISM LPH >,70-

2a) 60- El 50- 40, 30-

20 4 4 1 I Il Isla. I I I I IIIiIiI I t'Ctt3Pr- Ps co) um r. . e4 e4 (.4 e4 6 6 ci 6 4 4 4 4. 6 6 ui 6 6 to co cor, r- r, co co coco BAR/EIGHTH-NOTE BAR/E1GHTH-NOTE

slow 90 --e medium 80- di-- fast BHR >,70- 2*Z-3 60- >c) -a- 50- A ;D. a 40-)2 30-*

2 0 I 1 II 1I I I. 111 i II II III ci e4 1.17 N. r) CI e4 e4 6 r) r, 4. .4 Lel N. N 4i) th4.6 Li:. A I.:. A BAR/E1GHTH-NOTE BAR/EIGHTH-NOTE

Figure 10 Average MIDI velocity profiles for bars 0-8 at three differenttempi. 192 Re1710

perceptual test) was difficult to detect. Because of GENERAL DISCUSSION the limited contribution of these details to the The present study explored in a preliminary overall impression of the performances, it may be way the question of whetherthe expressive mi- concluded that relational invariance of expressive crostructure of a music performance remainsrela- microstructure held approximately across the tionally invariant across moderate changes in tempi investigatA here. This is consistent with tempo. The results suggest that the onsettiming the notion that musical performance is governed and intensity profiles essentially retain their by a "generalized motor program" (Schmidt, 1975) shapes (the latter being very nearly constant), or "interpretation" that includes avariable rate whereas other temporal features are largely inde- parameter. Although there are good reasons for pendent of tempo and hence exhibit local con- expecting the interpretation to change when stancy rather than global relational invariance. tempo changes are large (Clarke, 1985; Handel, After a perceptual demonstration that a uniform 1986), the tempo range investigated here tempo transformation does no striking damage to apparently could accommodate a single structural the expressive quality of performances of interpretation. Schumann's "Trtiumerei," six specific aspects of All microstructural patterns examined exhibited the expressive microstructure were examined. The large and highly systematic variation as a first and most important one was the pattern of function of musical structure, as well as clear tone onset times. That pattern was highly similar individual differences between the two pianists. across tempi, yet deviated significantly from pro- While tile data on timing microstructure portionality. Although relational invariance did (including grace notes) are consistent with the not hold perfectly, the deviations seemed small detailed analyses in Repp (1992), the observations and not readily interpretable in terms of struc- on chord asynchrony, tone overlap,pedalling, and tural reorganization. The second aspect examined intensity microstructure go well beyond these was the timing of grace notes. Somewhatsurpris- earlier analyses. However, since the focus of the ingly, in view of the observations by Desain and present study was on effects of tempo, the Honing (1992a, 1992b) that stimulated thestructural interpretation of microstructural present study, grace notes were timed in a variation was not pursued in great detail here. relationally invariant manner across tempi. This Suffice it to note once again the pervasive may have been due to their relativelyslow tempo influence of musical structure on performance and expressive function; the result may not parameters and the astounding variety of generalize to ornaments that are executed more individual implementations of what appear to be rapidly. The third aspect concerned selected qualitatively similar structural representations. instances of onset asynchrony among the tones of Despite certain methodological parallels, the chords. These asynchronies did not change present study is diametrically opposed to the significantly with tempo, even when theY were scale-playing experiment of MacKenzie and Van rather large and served an expressive function; Eerd (1990), which essentially focused on pianists' apparently, they were "anchored" to a reference inability to achieve a technical goal (viz., tone. The fourth aspect was the overlap among mechanical exactness). By contrast, the primary successive legato tones. When this overlap was of concern here was pianists success in realizing the the expected magnitude (pianist BHR), it was artistic goal of expressive performance. The independent of tempo. For one pianist (LPH), the relative invariances observed in timing and overlaps were unusually large and did vary with intensity patterns were cognitively governed, not tempo. The fifth aspect examined was pedal due to kinematic constraints. However, both types timing. It did not seem to depend on tempo in a of studies have their justification and provide very systematic way and had no audible effect in complementary information. At some of the any case. Finally, the intensity microstructure detailedlevelsexaminedhere(chord was studied. Although overall intensity increased asynchronies, tone overlap) motoric constraints somewhat with tempo, the detailed intensity such as fingering patterns clearly had an pattern remained nearly invariant. influence. Thus, even within a slow, highly In no case were there striking deviations from expressive performance, there are aspects that are relational invariance. Those features that were governed to some extent by "raw technique," and found not to scale proportionally with tempo were ultimately the technical skills that help an artist generally on a smaller time scale, so that to achieve subtlety of expression may be the very artificially introduced proportionality (in the same that help her/him to play scales smoothly. Relational Invariance of Expressive Microstructure across Global Tempo Changes in Music Performance 193

However, the technique serves its true purpose "interpretation" of the piece, their expressive mi- only in the context of an expressive performance crostructure would probably be just as different as and therefore is perhaps better studied in that it was before the tempi were equalized. In other context also. words, tempo as such probably accounts only for a The present study has limitations, the most very small proportion of the variance among dif- obvious of which is that only a single piece of ferent performers' interpretations. Or, to put it yet music was examined. Clearly, the character ofanother way, interpretation usually influences "Traumerei" is radically different from that oftempo, but tempo does not necessarily influence Salieri's "Nel cor phi non mi sento," whose timing interpretation. microstructure Desain and Honing (1992a) found It will of course be desirable to employ a larger to change considerably with tempo. "Traumerei" is number of musicians in future studies, as well as slow, expressive, entirely legato, rhythmically a better instrument. The present two pianists' vague, and contains few fast notes; even the grace performances, however, seemed adequate enough notes are timed deliberately. The Salieri ditty, on for the purpose of the study. The fact that one the other hand, is moderately fast, requires a pianist (BHR) was not professionally trained is detached articulation and rhythmical precision, not perceived to be a problem in view of the fact and contains a number of very fast grace notes that his data were generally more consistent and (acciaccature) that may resist further reduction in less variable than those of the professional duration. Clearly, it would be wrong to conclude pianist, LPH. LPH's greater variability may be from the present findings that expressive attributed to three sources: First, she had less microstructure always remains invariant across time to adapt to the instrument; second, she had tempo changes, even to a first approximation. not played "Traumerei" recently; and third, as a Desain and Honines results, in spite of their professional artist she was not obliged to obey the informality,already seem toprovidea constraints of a quasi-experimental situation. counterexample.16 So do perhaps Clarke's (1982, BHR, on the other hand, was more familiar with 1985) findings. The conclusion seems justified, the Roland keyboard, had practiced Schumann's however, that in some types of music relational "Kinderszenen" less than a year ago, and, as an invariance is maintained to a large extent. This experimental psychologist, was used to exhibiting music is probably the kind that carries expressive consistent behavior in the laboratory.17 gestures on an even rhythmic flow. P.hythmically In summary, the present findings suggest that differentiated music containing many short note for certain types of music it is possible to change values, rests, and varied articulation (such as the tempo within reasonable limits, either examined by Desain and Honing, 1992a) is naturally in performance or artificially in a probably much more susceptible to changes with computer, without changing the quality and tempo. expression of a performance significantly. This In fact, it would be both more precise and more suggests that the very complex motor plan that cautious to say that relational invariance can be underlies artistic music performance contains maintained in certain kinds of music. The present something like a variable rate parameter that pianists, although they did not try very con- permits it to be executed at a variety of tempi sciously to maintain their "interpretation" across while serving the same underlying cognitive tempi but instead attempted to play as naturally structure. Various small-scale performance and expressively as possible, nevertheless re- details, however, may be locally controlled and frained from "changing their interpretation" in the unaffected by the rate parameter. Thus local course of the recording session, whatever that may constancy appears to be nested within global imply in an objective sense. In real-life music per- flexibility in music performance. formance, on the other hand, changes in tempo are probably mere often a consequence of a change in an artist's conception of the music than a delib- REFERENCES erate change in the temp .. as such. Tempo differ- Clarke, E. F. (1982). Timing in the performance of Erik Satie's ences observed among different artists are often to 'Vexations'. Acta Psychologica, 50, 1-19. be understood in this way. What the present find- Clarke, E. F. (1985). Structure and expression in rhythmic performance. In P. Howell, I. Cross, & R. West (Eds.), Musical ings imply, however, is that if different artists structure and cognition (pp. 209-236). London: Academic Press. were forced to play at the same overall tempo, Clynes, M., & Walker, J. (1986). Music as time's measure. Music without giving them time to reconsider their Perception, 4, 85-120.

j 194 ReW

Desain, P., & Honing, H. (1992a). Tempo curves considered Todd, N. P. McA. (1992). The dynamics of dynamics: A model of harmful. In P. Desain & H. Honing (Eds.), Music, mind, and musical expression. Journal of the Acoustical Society of America, machine (pp. 25-40). Amsterdam: Thesis Publishers. 91, 3540-3550. Desain, P., & Honing, H. (1992b). Towards a calculus for Vernon, L. N. (1937). Synchronization of chords in artistic piano expressive timing in music. In P. Desain and H. Honing (Eds.), music. In C. E. Seashore (Ed.), Objective analysis of musical Music, mind, and machine (pp. 175-214). Amsterdam: Thesis performance (pp. 306-345). University of Iowa Studies in the Publishers. Psychology of Music, Vol. 4. Iowa City: University of Iowa Desain, P., & Honing, H. (in press). Does expressive timing in Press. music performance indeed scale proportirnally with tempo? Viviani, P., & Laissard, G. (1991). Timing control in motor Psychological Research. sequences. In J. Fagard & P. H. Wolff (Eds.), The development of Gentner, D. R. (1987). Timing of skilled moz. : :-!rformance: Tests timing control and Temporal Organimtion in Coordinated Action of the proportional duration model. Psychological Review, 94, (pp. 1-36). Amsterdam: Elsevier. 255-276. Guttmann, A. (1932). Das Tempo und seine Variationsbreite. FOOTNOTES Archiv fiir die gesamte Psychologie, 85, 331-350. 'Psychological Research, in press. Handel, S. (1986). Tempo in rhythm: Comments on Sidnell. 'For temporal intervals to be proportional at different tempi, Psychomusicology, 6, 19-23. their absolute differences must be larger at slow than at fast Heuer, H. (1991). Invariant relative timing in motor-program tempi. Clarke mentions that an initial analysis yielded theory. In J. Fagard & P. H. Wolff (Eds.), The development of "identical" results for raw and log-transformed data, but it is timing control and temporal organization in coordinated action (pp. not clear whether that was also true in the later analysis just 37-68). Amsterdam: Elsevier. referred to. MacKenzie, C. L., & Van Eerd, D. L. (1990). Rhythmic precision in 2The pianists' accuracy in implementing the desired tempi will the performance of piano scales: Motor psychophysics and not concern us here; this issue is the subject of a separate paper motor programming. In M. Jeannerod (Ed.), Attention and (Repp, in press). Naturally, the three performances within each performance X111 (pp. 375-408). Hillsdale, NJ: Erlbaum. tempo category were not exactly equal in tempo. Michon, J. A. (1974). Programs and "programs" for sequential 3These values represent the grand mean effects in one-way patterns in motor behavior. Brain Research, 71, 413424. ANOVAs on the average scores minus 50%, equivalent to one- Palmer, C. (1989). Mapping musical thought to musical sample t-tests. The second test was conducted on the average performance. journal of Experimental Psychology: Human scores of pairs of pairs containing the same type of comparison Perception and Performance, 15, 331-346. in opposite order (e.g., LPH's first medium-tempo performance Rasch, R. A. (1988). Timing and synchronization in ensemble followed by the speeded-up version of her first slow performance. In J. A. Sloboda (Ed.), Generative processes in music performance, and the speeded-up version of her second slow (pp. 70-90). Oxford, UK: Clarendon Press. performance followed by her second medium-tempo Repp, B. H. (1990). Patterns of expressive timing in performances performance). of a Bet thoven minuet by nineteen famous pianists. Journal of 4All correlations reported include multiple data points for IOIs the Acoustical Society of America, 88, 622-641. longer than one eight-note (cf. Figure 2), which is equivalent to Repp, B. H. (1992). Diversity and commonality in music weighting longer IOIs in proportion to their duration. This performance: An analysis of timing microstructure in seems a defensible procedure in comparing profiles, and Schumann's "Traumerei." Journal of the Acoustical Society of omission of these extra points had a negligible effect on the America, 92, 2546-2568. magnitude of the correlations. However, in the ANOVAs Repp, B. H. (1993). Some empirical observations on sound level reported further below, only a single data point was used for properties of recorded piano tones. Journal of the Acoustical each long 101, so as not to inflate the degrees of freedom for Society of America, 93, 1136-1144. significance tests. Repp, B. H. (in press). On determining the global tempo of a 5Two differences between LPH and BHR are worth noting here: temporally modulated music performance. Psychology of music. LPH prolonged the interval preceding the fermata chord much Rorem, N. (1983). Composer and performance_ (Originally more than did BHR, and she did not implement the gestural published in 1967.) In N. Rorem, Setting the tone (pp. 324-333). boundary (i.e., the "comma" in the score) in bar 24, executing a New York: Coward-McCann, Inc. continuous ritardando instead. Contrary to the score. LPH Schmidt, R. A. (1975). A schema theory of discrete motor skill generally arpeggiated the chord under the fermata; hence the learning. Psychological Review, 82, 225-260. long 101, which was measured to the last (highest) note of the Seashore, C. E. (1938). Psychology of music. New York: McGraw- chord. Hill. (Reprinted by Dover Publications, 1967). 60ne possible reason for why LPH showed no significant Shaffer, L. H. (1980). Analysing piano performance: A study of interactions is that her tempo variation within tempo categories concert pianists. In G. E. Stelmach & J. Requin (Eds.), Tutorials was larger than that of BHR (see Repp, in press), so that the in motor behavior (pp. 443-455). Amsterdam: North-Holland. separation of within- and between-tempo variation was less Shaffer, L. H. (1981). Performances of Chopin. Bach, and Bartok: clear in her data. Studies in motor programming. Cognitive Psychology, 13, 326- 71n bar 8, BHR showed a triple interaction of rendition (first vs. 376. second playing), 101, and tempo (F(4,12) = 7.72, p < .0031, due Shaffer, L. H. (1984). Timing in solo and duet piano performances. to more even timing of the grace note in the first than in the Quarterly Journal of Experimental Psychology, 36A, 577-595 second rendition, at the slow tempo only. Shaffer, L. H. (1992). H DV; to interpret music. In M. R. Jones and S. 8BHR showed a marginally significant (but not easily Holleran (Eds.), Cognitive bases of musical communication (pp. characterizable) triple interaction with rendition in bar 6 263-278).Washington, DC: AmericanPsychological IF(4,12) = 3.42, p < .05). Association. 9In terms of the IOI ratios plotted in Repp's (1992) Figure 11, the Shaffer, L. H., Clarke, E. F., & Todd, N. P. (1985). Metre and respective coordinate values are approximately 0.5 and 0.6 for rhythm in piano playing. Cognition, 20, 61-77. LPH, and 0.25 and 0.45 for BHR.

19 MIME Relational Invariance of Expressive Microstructure across Global Tempo Changes in Music Performance 195

10The measurements were coarse because the 5-ms temporal did not seem to depend on whether there was a pedal offset resolution was poor relative to the small size of these intervals. nearby. However, the availability of repeated measurements within 13The intensity profiles of the expert performances studied by and across performances attenuated this problem considerably. Repp (1992) are not available for comparison, as they are very "It is common in piano performance for the pedal to be used difficult to determine accurrately from acoustic recordings. more frequently than indicated in the score. It also should be 14-fhis be true only for the melody notes analyzed here, duly noted that the pedal in this instance was a simple foot which indeed seemed to "stand out" more from among the switch that did not permit the subtlety of pedalling possible on four voices in LPHs performance. a real piano. 15Surprisingly, however, there was a significant tempo main 12L1'H's pedalling times were much more variable than BHR's; effect for BHR [F(2,6) = 12.78, p < .007j, suggesting that the this was also true within each tempo. The only substantial within-tempo changes in overall intensity evident in Figure 9 deviation for BHR occurred in bar 1, where pedal onset developed only later during his performances. For LPH, on the occurred at the beginning of the 101 in the absence of a contrary, the between-tempo variation was conspicuously preceding pedal offset. (In two instances at the medium tempo, smaller than the within-tempo variation [F(2,6) = 0.05, p > .95) howevernot shown in the figurethere was a preceding pedal during the initial 8 bars. depression, and offset and onset times were similar to those in 16Their data have been augmented in the meantime and are other bars.) Because of this deviation, BHR's bar 1 was omitted presented in (Desain tr Honing, in press). from further analyses of pedalling times. bi 27 never had a 17For those who are suspicious of authors serving as their own pedal offset near or within the 10I in bars 1R or 9, whereas LPH subjects, it might be added that BHR played with music, not did some of the time; however, her following pedal onset times hypotheses, in his mind.

2 0 Haskins Laboratories Status Report on Speech Research 1993, SR-113, 197-208

A Review of Psycho linguistic Implications for Linguistic Relativity: A Case Study of Chinese by Rumjahn Hoosain*

Xut

The content, and indeed the title, of this book those studies are spread over six chapters in the brings back to us an old and well-known theory book, including an introduction in Chapter 1 and a the Sapir-Whorf hypothesis: that language conclusion in Chapter 6. The second chapter influences thought and determines the world view provides an introduction to general aspects of the of the speaker (Whorf, 1956). Rumjahn Hoosain Chinese language; and the third, fourth, and fifth proposes that "unique linguistic properties of the chapters focus on, respectively, perceptual, Chinese language, particularly its orthography, memory-related, and neurolinguistic aspects. have implications for cognitive processing." Chapter 2 provides background information Although it does not seem to be his intention to about the Chinese language, especially about its revive the Sapir-Whorf hypothesis in its original orthography. This chapter is necessary because form, Hoosain ilearly displays his desire to there have been many myths and misconceptions salvage the theory through extension or revision. about Chinese and its orthography that need to be This is demonstrated by his remark in Chapter 1 clarified. Hoosain's description of Chinese, in that "the influence of orthography on cognition, general, is reasonable. However, there are certain particularly the influence of the manner in which remarks that are misleading and are still the script represents sound and meaning, was not characteristic of common misunderstandings. originally recognized in the work of Sapir and In 2.3, for example, Hoosain states that "a small Whorf." Thus, Hoosain seems to be trying to keep percentage of the (Chinese) characters convey the spirit of the Sapir-Whorf hypothesis alive meaning by pictographic representation, either while applying it more specifically to the effect of iconic or abstract," though he qualifies this by the orthography and particularly to the case of noting that "whereas thesepictographic Chinese. This effect is, as he describes it, "more in characters have an etymology related to pictures, terms of the facility with which language users this relation is unlikely to have psychological processinformation, and the manner of reality in present day usage." The critical issue informationprocessing." Therefore,"such here is the use of the phrase "convey meaning by language effects matter more in data-driven or pictographic representation." What a character in bottom-up processes rather than in conceptually Chinese represents is a linguistic unit, in the case driven or top-down processes." of the character for niao3 2 ("bird," used by Though not a large volume (198 pages), the book Hoosain as an illustration of pictographic covers an impressively large number of characters), a monosyllabic word. The word is a psycholinguistic studies related to the Chinese unit in the languagean entry in the lexicon. A language' and orthography, especially those lexical entry carries meaning, whereas a character presented at five conferences held in Taiwan and is nothing but a visual symbol created and used to Hong Kong in the last dozen years. The reviews of refer to a lexical item. The character by itself carries no meaning whatsoever. It conveys meaning only through its reference to the 1 am grateful to Ignatius G. Mattingly, Alvin M. Liberman, linguistic unit it represents. Thus, the character and Michael T. Turvey for their comments on an earlier for niao3 conveys the concept of "bird," not version of this review. because it has the shape of a bird (otherwise, any

197

2 01 198 Xu picture of a bird would do), or because it is represent whole syllables. They may themselves associated with the idea "bird," but because that constitute frames, i.e., graphic units that are shape is agreed by convention of the Chinese separated by white spaces in writing, or combine literate community to refer to a spoken word with with other nonphonetic elements to form more the pronunciation /niao3/ (in present day complex characters constituting frames. Those Mandarin), which in turn refers to the concept of a graphemes,i.e.,characters that provide feathered animal usually with flying ability.3 On phonological values when combining with other the other hand, real pictures of birds, whether graphic elements to form other characters, are vivid or sketchy, although they may well convey usually called phonetics, or phonetic radicals. the meaning of bird directly, are not part of the In his discussion in 2.4, Hoosain reports that orthography, and thus are completely different there are about 800 phonetics in Chinese, and entities from the bird-shaped character for the about 90% of Chinese characters are phonetic word nizo3 . The character for niao3 is (or has compounds. But in the discussion that follows, he been) "pictographic" only in the sense that it was belittles the function of the phonetics in Chinese created by depicting the shape of a bird, and the orthography by emphasizing the irregularities and pictographic structure of the character may have unreliabilities of their phonetic values, and by served as a mnemonic for the word. So, what can citing studies that show the semantic cueing never be overemphasized is the fact that the bird- function of the radical to be greater than the shaped character represents a LINGUISTIC phonemic cueing function of the phonetic. It is not unitthe spoken word for "bird" in Chinese. made clear by Hoosain what exactly is meant by Therefore, a more accurate definition for so-called "cueing function" in the unpublished study he pictographic characters would be: They are cites. The other two studies (Yau, 1982; Peng, characters created to represent spoken names of 1982) Hoosain cites show that semantic radicals objects by depicting the visual pattern of thetend to be located at the left or the top of the object the spoken names refer to. The shapes of characters. "Because," Hoosain argues, "character the characters, when they are still reminiscent of strokes tend to be written left to right and top the object the corresponding spoken words refer down, the beginning strokes therefore are more to, can be used as mnemonics for the spoken discriminative." Being more discriminative, words. however, is different from being more crucial. As a In 2.5, while rejecting as a misconception the matter of fact, the common use of the word claim that characters in Chinese are logograms in 'determinative' to refer to the semantic radicals in the sense that each of them represents a whole Chinese characters implies that their function is word, Hoosain insists that "the meaning of a to help distinguish between characters that are character is represented by the character directly, otherwise homophones because they share the not mediated by sound symbols, and a syllable is same phonetic radical. The premise here is that indicated as a whole rather than spelled out, as in the phonological values of the phonetic radicals alphabetic languages." There are two confusions are already being used to the fullest possible in this statement. First, as discussed above, extent. Of course, there doexist many Hoosain takes orthographic symbols as the irregularities and unpredictabilities in the meaning-carrying units by themselves. Second, by phonological values of the phonetic radicals in stating that "a syllable is indicated as a whole Chinese. What is wrong in Hoosain's account of rather than spelled out," Hoosain sounds as if the Chinese orthography is that he is using those phonemes were the only possible meaningless irregularities to deny the basic organization linguistic units that can be represented by principles in the construction of the characters. orthographic symbols. Apparently, Hoosain does This is like denying the phonemic principle in the not realize (or often forgets) that syllables are also construction of English words merely because meaningless units that can be used to construct there are tremendous irregularities in the letter- (i.e., to spell) meaningful morphemes and words, phoneme correspondence. and it is the syllable sign that is the basic Psychological studies of languages and reading graphemic unit in the Chinese orthography can not and should not be totally independent of (De Francis, 1984, 1989; Mattingly, 1992). Instead, the knowledge gained over the years about the Hoosain takes the characters as the smallest basic principles of language and speech and of graphemic unit in the Chinese writing system. As writing systems. Hoosain's failure to fully pointed out by De Francis (1989), the basic appreciate those principles is reflected in his view graphemes in Chinese are characters that singly that Chinese characters represent both meaning

"U A Review of Psycholinguistic Implications for Linguistic Relativity A Case Study of Chinese IN Rurnjahn Hoosain 199 and sound directly, which is best illustrated in the between the two may not always unambiguously diagram on page 12 of his book. point to critical differences between the reading mechanisms of the two languages. sound When discussing the size of perceptual units in Chinese: sign (script) reading, it is essential to make sure the units N, meaning being discussed are unambiguous. That is to say, when characters are referred to, we should not English: signs (script) sound meaning sometimes mean words and sometimes morphemes, and we should not equate them with Notice here his ambiguous use of the words other units, say, letters. This is not always easy, 'sign' and 'sound.' Apparently, by 'sign,' he means because in Chinese morphemes and words often letters for English but characters for Chinese. By coincide. This difficulty, however, makes it all the 'sound,' he seems to mean phonemes for English more important for a reviewer of reading studies but syllables for Chinese. As discussed above, to watch out for dubious hidden assumptions while letters are the smallest graphemic units in ignoring this distinction, and for flawed designs as English, characters are not the smallest well as improper interpretations based on those graphemic units in Chinese. The smallest assumptions. This is, however, exactly what graphemic units in Chinese are the phonetic Hoosain often fails to do as a reviewer. radicals in characters.4 So, it is the phonetic For example, in his discussion of the study by C. radicals that map onto syllables directly (although M. Cheng (1981) on word superiority, Hoosain no more accurately than letters map onto concludes that characters in disyllabic Chinese phonemes in English), not the characters. words have more perceptual salience than letters Likewise, it is the morphemes that characters in English, and he calls this phenomenon a map onto, not sound. "contrast between the two languages." As As we willsee,the confusion inthe discussed above, characters in Chinese correspond understanding of the linguistic and orthographic to morphemes, and morphemes are the smallest aspects of the Chinese language is bound to affect meaning carrying units in languages. Letters in Hoosain's interpretation of the studies he reviews. English, however, correspond to phonemes, i.e., Chapter 3 reviews studies concerned with the smallest distinctive segments in speech, perceptual aspects of the Chinese language. The though the correspondence is often inconsistent. conclusion reached by Hoosain in this chapter is Phonemes do not carry meaning by themselves. that "the distinctive configuration of the Chinese They have to combine with other phonemes to script as well as its script-sound and script- formmeaning-carryingmorphemes.So, meaning relation can differentiallyaffect perceptual differences found between characters perceptual processes, compared with similar in Chinese and letters in English should be processes in English." The effects include: "(a) lack regarded as a reflection of the differences between of acuity difference for characters arranged in the two types of linguistic units tested, rather different orientations; (b) preference of direction of than as an indication of any true difference scan according to language experience; (c) the between the psycholinguistic processing of the two individualityof constituent charactersof languages. disyllabic words in some situations; (d) the more Liu (1988) found that a Chinese character direct access to meaning of individual words, standing alone was named as fast as the same although access to sound could require a different character occurring as a constituent in a two- effort due to the lack of grapheme-phoneme character word. In contrast, a simple word in conversion rules; and (e) the variation in eye English, such as air, is named slower in isolation movements in reading connected with different than as a constituent of a compound word, such as manners of reading." Due to lack of space, and airways. Hoosain interprets these findings as an because they are not essential concerns in the indication that "constituent characters of Chinese study of reading, the first two effects and the last words are handled somewhat independently, effect will not be discussed in this review. The rather than as integral parts of words." However, only thing that needs to be pointed out is that there are several problems with Liu's study. First, letters in English and characters in Chinese are at Liu did not mentioned whether the constituent different levels of linguistic representation in the characters of his Chinese stimuli could also two orthographies, and, therefore, cannot be function as words by themselves. (No sample list equated. In other words, differences found of the Chinese words used was provided.) Second,

i) 1 200 Xu neither word frequency in English and Chinese the difference in level of representation of speech nor character frequency in Chinese wascontrolled sound with the presence or absence of grapheme- in the study, which makes any differences found speech-sound correspondence. The lack of between naming latencies of compound words and grapheme-phoneme correspondence in Chinese their constituents suspect. Third, native speakers does not mean the lack of grapheme-speech-sound of Chinese who had also learned to speak English correspondence altogether. Graphemes in Chinese were used as subjects for bothChinese and correspond to speech sound at the level of the syl- English. This use of non-native speakers of lable (DeFrancis, 1989; Mattingly, 1992). This cor- English for reading English in comparison with respondence is not onc to one, but neither is the native speakers of Chinese reading Chinese is correspondence between letters and phonemes one problematic. Fourth, as Hoosain points out to one in English. This confusion between levels of himself, the English compounds like baseball were representation and existence of grapheme-speech- all presented with a hyphen in the middle, no sound correspondeuce determines Hoosain's bias matter whether they are conventionally printed when reviewing studies on phonological access in that way or not. There is no telling what effect reading Chinese. this arbitrary manipulation may have had. So, the In examining studies on naming latency for results of this study as evidence of "another characters, Hoosain focuses on Seidenberg (1985). difference between Chinese and English" are, to In that study, it was found that naming times say the least, inadequate. were faster for phonetic compounds than for non- In summarizing the section on perceptual units, phonetic compounds, but only for low frequency Hoosain concludes that "particularly in cases characters and not for high frequency characters. where component characters are simple and Seidenberg also did a parallel study with English, highly familiar, or when individual characters in which he found that words with exceptional have to be pronounced, there is some degree of pronunciations were named slower than those individual salience for these characters that is not with regular pronunciations. But again, this was likely to be found in the case of bound morphemes found to be true only for low frequency words. in English. This indivZdual salience has to do with Seidenberg took these results to mean that, for the character as sensory unit." While theboth Chinese and English, access to the statement about the salience of individual pronunciation is through the lexicon for high characters may be true to a certain extent, the frequency words and through grapheme-to-sound comparison between characters in Chinese and conversion for low frequency words. bound morphemes in English is inappropriate. In Judging from the example given in Seidenberg's Chinese, most of the high frequency characters paper, there are problems in the material he used can function as monosyllabic words by themselves. in his Chinese experiment. Both of the so-called Of the 1185 characters with frequency of nonphonetic compound characters can be used as occurrence of 100 or higher per million characters, phonetic radicals in many other characters. So, only 44,i.e.,3.7%, can not function as they are in principle also phonetic compounds, monosyllabic words (Wang et al., 1986). The only except that their semantic radicals are nil. comparable situation in English is in cases where Besides, Seidenberg did not control the frequency words are also used as constituents in compound or the phonological consistency of the phonetic words. But the only study Hoosain cites that made radicals used in his study. Those problems makes this comparison is flawed. Without better Seidenberg's finding questionable. evidence, therefore, the best conclusion that can Hoosain also notices some of the above be drawn about this matter is that further studies mentioned problems in Seidenberg's study, but he are needed. questions the results from a different direction. In the discussion in 3.3 on getting at the sound He argues that access to whatever phonological of words in Chinese, Hoosain insists that "the only information is provided by the phonetic radicals in way to get at the sound of Chinese characters is Chinese cannot be compared to grapheme- through lexical information, whether directly or phoneme conversion in alphabetic languages. And indirectly." His major argument is that in Chinese he raises an intriguing question: "Hol.. can it be characters, "the phonetics are characters on their argued that access to the sound of ',Itch own, and their own pronunciations have to be nonphonetic compounds (when they stand atone) learned on a character-as-a-whole-to-sound-as-a- is through lexical information, but access to the whole basis and not through grapheme-phoneme sound of phonetic compounds (which are actually conversion rules." Here Hoosain seems to confuse made up of these same nonphonetic compounds

u 4 A Review of Psycholinguistic Implications for Linguist:c Relativity: A Case Study of Chinese by Rumlahn Hoosain 201 now acting as phonetics) is through graphemic-to- question of the need for phonological recoding phoneme conversion? After all, how is the sound of before access to meaning." By "prior," however, these phonetics arrived at in the first place?" Hoosain means theorderinhistorical While this question is well posed, Hoosain's development, both for human languages in answer is problematic. He insists that access to general and for individual speakers of a language the pronunciations of all the characters in Chinese in particular, leaving plenty of room for the actual is through the lexicon, either directly or indirectly. reading process to either involve or not involve so- This would imply that the naming latency should called phonological mediation. This leads to the be the same whether the phonetic radicals have next question he asks, namely, "whether consistent or inconsistent phonological values in phonological recocling is involved to a different the characters that they are part of. However, this extent depending on the natureof the prediction does not agree with the fmdings by orthography." So, it follows, according to Hoosain, Fang, Horng, and Tzeng (1986), which is also that "if the script primarily represents sounds of mentioned by Hoosain. They demonstrated that the language, script-meaning relation may not be high consistency in pronunciation shared among so direct, and getting at the meaning more likely characters with the same phonetic radical may be mediated by phonological recoding." facilitated naming latency for simple characters as With these questi:ms in perspective, Hoosain wellasforcompoundcharactersand presents several studies he thinks show a more pseudocharacters. This consistency effect was direct connection between script and meaning in shown to be comparable to a similar effect found Chinese than in alphabetic orthographies. The in naming latency for English words (Glushko, first series of studies involve the famous Stroop 1979). phenomenon. Stroop (1935) found that when the Another phenomenon taken seriously by name of a color was written in ink of another Hoosain is the minor finding by Seidenberg (1985) color, it took subjects longer to name the color of that the overall naming latency was much slower the ink than it took them to name the colors of ink for his Chinese subjects than for his English patches. This phenomenon was taken to mean subjects. Hoosain takes this as an indication that that somehow it is unavoidable to process the phonological access in Chinese is slower than in meaning of the printed words even when the task English. It is not noticed by Hoosain, however, is to attend to the color of the ink. This that the English subjects Seidenberg used were explanation, however, is questionable, since the college students, whereas the Chinese subjects he phenomenon can also be understood as an used were described only as native Cantonese. indication that it is somehow unavoidable to This makes one wonder what their educational process the pronunciation of the printed words, backgrounds were and even how much reading thus slowing down the naming process when there experience they had had. Undoubtedly, these is disagreement in pronunciations between the factors would have effectively influenced their printed color name and the color of the ink. With naming latency. this alternative explanation, the even greater To summarize the discussion of getting at the Stroop effect for Chinese color names found by sound of words, there is so far no solid evidence Biederman and Tsao (1979) could, contrary to for Hoosain's view that phonological access for Hoosain's interpretation, be interpreted as an characters in Chinese is solely through the indication that it is even more unavoidable to lexicon. The studies by Seidenberg (1985) and by process the pronunciation of Chinese characters. Fang et al. (1986) suggest that more evidence for Without further evidence, though, the implication the use of phonetic components in characters in of the greater Stroop effect for Chinese should at getting at the sound of characters is very likely to least be considered undetermined. be found. In trying to find more evidence for direct In 3.4, Getting at the Meaning of Words, character-meaning connection in Chinese, Hoosain claims to have found evidence that Hoosain cites the study by Tzeng and Wang getting at the meaning of Chinese characters is (1983), but misunderstands the procedure used in more direct than is the case with printed words in that st,arly.5 In the study, subjects were asked to alphabetic writing systems. At the outset, Hoosain choose from two numbers written in characters rightly warns the reader to bear in mind that the one that is numerically greater than the other. "spoken language is prior to written language... It was found that it took the subjects longer to Therefore sound-m3aning relation is prior to make the deckion when the characters for the script-meaning relation, giving rise to the larger number had a smaller physical size on the

205 202 Ku screen. Hoosain, however, mistakenlydescribes The only study that offers a somewhat stronger the task as that of deciding on the physical size of case for possible fast access to the meaningsof the written numbers. He thus interprets the Chinese characters is the one by Hoosain and results as indicating that it is unavoidable to Osgood (1983) on affective meaning response process the meaning of the Chinesenumbers times. They asked subjects to report the affective when the task is only about the physical size of meaning of a word by saying either "positive" or the printed numbers. This interpretation is "negative" upon seeing the word displayed on a inappropriate because the actual task in the screen. Native speakers of Cantonese couldmake experiment required subjects to focus on the this judgment faster than native speakers of meaning of the numbers to begin with. English. Hoosain and Osgood interpreted this As for the finding that Chinese numbers are result as evidence that the processing of at least translated faster into English than the other way some aspects of meaning of Chinese words was around by native Cantonese speakers (Hoosain, faster than that of English words, and suggested 1986), which Hoosain uses as another piece of that this was because phonology was bypassed evidence foradirect connection between when accessing the meaning of printed Chinese characters and meaning, I find it inconclusive words. Fascinating as the finding may be, it is because no comparable results were obtained from somewhat in conflict with a recent study by native speakers of English. Similarly, the finding Perfetti and Zhang (1991, also see Perfetti, Zhang, that cross-language priming effects for native and Berent, in press), in which it was found that speakers of Chinese are greater from Chinese to phonemic information was immediately available English than in the other direction (Keatley, 1988) as part of character identification in Chinese. is not conclusive evidence for a direct connection Furthermore, these recent experimental results between meaning and characters in Chinese. suggest that whatever the semantic value of a Another study cited by Hoosain as showing character, it is activated no earlier than its evidenceforadirectcharacter-meaning phonological value. If the findings in both studies connection was conducted by Treiman, Baron, and (Hoosain and Osgood, 1983; and Perfetti and Luk (1981). In that study, native speakers of Zhang, 1991) are true, then the faster affective English and Cantonese were required to make meaning response time for Chinese words should truth/falsity judgments of sentences in their own only mean faster decision after the activation of languages. A sentence could be a "homophone" the phonology as well as the semantics of those sentence, such that it sounded true when read out words, rather than faster activation of the loud (e.g., A pair is a fruit). It was found that it semantics, assuming, of course, that the took longer to reject these homophone sentences phonological activation of the Chinese words is no than truly false sentences like "A pier is a fruit." faster than that of the English words. In contrast, this kind of delay due to homophony In conclusion, I do not find solid evidence for a was relatively small for Chinese, and this more direct connection between meaning and difference was taken as evidence that phonological characters in Chinese than between meaning and recoding was less involved in Chinese sentence printed words in alphabetic orthographies. Nor do processing. However, there is a problem in that I find solid evidence for lack of phonological study that Hoosain does not notice. In the sample mediation in accessing meaning from characters Chinese sentence presented in the paper, Mei2 in Chinese. Rather, the studies by Seidenberg shi4 yil zhong3 zhi2 wu4 6 ("Mei2 is a plant"), the (1983) and Fang et al. (1986), which are also cited character mei2, meaning "coal," is said to be by Hoosain, show evidence of similarity between homophonous with the name of a plant. But the the processing of Chinese orthography and the character for mei2 as name of that plant is not a processing of alphabetic orthographies. meaningful word in itself in the spoken language Chapter 4 of Hoosain's book focuses on memory- (neither in Cantonese nor in Mandarin). In the related aspects of the Chinese language. In spoken language, it always combines with some general, Hoosain shows in this chapter that other syllables to form words. The syllable mei2 by phonological recoding is used in memorizing itself corresponds only to one word, namely, "coal." printed Chinese, but, at the same time, he cites So, the smaller impairment for the Cantonese some studies that he thinks show a greater role of speakers in that experiment may well 'oe a visual coding for Chinese than for other consequence of using improper test material languages. rather than an indication of less phonological In 4.1.1, Hoosain cites quite a few studies that recoding in Chinese sentence processing. found a greater digit span for Chinese than for A Review of Psycholinguistic Implications for Linguistic Relativity: A Case Study of Chinese hy Rumjahn Hoosain 203 some other languages. It is shown that this visual characteristics of the characters when difference can be well accounted for by the trying to encode them for immediate recall. As a working memory model proposed by Baddeley and result, poor readers were penalized more by the Hitch (1974) and Baddeley (1983, 1986). This graphic similarities brought about by the visual model assumes an "articulatory loop mechanism" distracters. The finding does not, as Hoosain for short-term memory which has a span of interprets it, simply indicate that poor readers in approximately two seconds worth of speech sound. Chinese have more visual problems. In a Hoosain shows that number names in Chinese comparable study by Ren and Mattingly (1990), it have shorter duration in pronunciation than in was found that good readers in Chinese were some other languages. Because of this, Hoosain relatively more affected by phonologically similar argues, more numbers can be kept in the series than poor readers when performing articulatory loop for the Chinese speakers. immediate recall of series of Chinese characters. The discussion in this section is interesting, These results are comparable to those found for except that there is no emphasis on the fact that English readers (Shankweiler, Liberman, Mark, the phenomenon is purely linguistic, and has Fowler, and Fischer, 1979). In neither of those two almost nothing to do with orthography. Also, studies, the results were interpreted by the Hoosain does not emphasize that the finding is authors as evidence that good readers had more interesting because it confirms one of the phonological problems than poor readers. Rather, hypothetical principles under which all human they were taken to suggest that good readers rely languages function (but see Ren and Mattingly, more heavily than poorer readers on phonological 1990; Mattingly, 1991, for criticism of the working encoding, hence were penalized more by the memory hypothesis). Instead, Hoosain seems to be phonological similarity introduced in the thrilled by the finding itself, that digit span in experimental stimuli.7 So, the results of Woo and Chinese is larger than in some other languages, Hoosain (1984) and Ren and Mattingly (1990) and uses this as an indication that differences both show that one of the crucial differences among languages do affect the way information is between good and poor readers in Chinese is in processed. the manner in which they recode reading In his discussion of the visuo-spatial scratch-pad material; Good readers use more phonological and perceptual abilities, Hoosain cites a study by encoding while poor readers rely more on visual Woo and Hoosain (1984). They conducted a short- encoding. This agrees, rather than contrasts, with term memory test, in which subjects had to findings about the differences between good and identify among a list of seven characters those poor readers in English (Shankweiler et al.). that had appeared among the six characters In a study by Chen and Juola (1982), native shown to them a few seconds before. It was found Chinese and English speaking subjects had to that "weak readers made much more visual decide which of the items on separate test lists distracter errors (when target characters were was graphemically, phonemically or semantically mixed with similar looking distracters) but not similar to the item just presented visually. more phonological distracter errors (when targets Chinese subjects were found to be fastest in were mixed with similar sounding distracters)." making graphic similarity decisions about Hoosain contrasts this finding with that of characters while American subjects were found to Liberman, Mann, Shankweiler, and Werfelman be fastest in phonemic similarity judgments. Chen (1982), who found that, for American subjects, and Juola then concluded, as reported by Hoosain good beginning readers of English did not do without reservation, that "Chinese words produce better than weak beginning readers in memory for more distinctive visual information, but English abstract visual designs or faces. However, they did words result in a more integrated code." Thus the perform better in memorizing nonsense syllables. two different "scripts activate different coding and In comparing this result with the results of Woo memory mechanisms." Hoosain, however, does not and Hoosain (1984), Hoosain comes to the notice at least two problems with the study. First, conclusion that poor readers in Chinese have more the study claimed to use words as stimuli both for visual problems than poor readers in English. English and Chinese. However, of the 144 Apparently, Hoosain misinterprets the nature and characters used, 26 could function only as bound the implication of the results obtained by Woo and morphemes according to the frequency dictionary Hoosain. The finding that poor readers made more of Wang et al. (1986), and another 11 could not visual distracter errors indicates that they were even be found in that dictionary. Using bound relying more heavily than good readers on the morphemes in Chinese necessarily involves more

207 204 Xu homophones, making graphemic discriminations 4.5, that because of the nature of the Chinese or- more indispensable. Second, the taskthe subjects thography, Chinese children have to do more rote had to accomplish requires the kind of learning. And, because this kind of learning calls phonological awareness that most Chinese readers for a large measure of sustained discipline, au- do not need in ordinary reading. Yet, lacking thoritarianism is thus tied to this system of conscious knowledge about the phonemic learning, which in turn restrains students from structure of characters does not necessarily mean giving more open-ended answers. This kind of rea- the absence, or even less use, of phonological soning, although interesting, is only one of the recoding in actual reading. So, here Hoosain is possible ways of account for the data. An alterna- confusing phonological recoding in actual reading tive explanation is that the poor comprehension with conscious phonological manipulation in score for Chinese children may be due to improper performing certain cognitive experimental tasks. testing design. Teaching of reading and writing in Another study Hoosain cites as evidence for Chinese isin general character-oriented. existence of a visual coding strategy for Chinese However, characters correspond to morpihemes, materials is that of Mou and Anderson (1981). and most of the morphemes need to comgne with These authors found predominant use of other morphemes to form a word. So, knowing the phonological recoding in STM for Chinese. Yet, at pronunciation of a character and its meaning in the same time, they found evidence of visual one or several of the words that contain it does not recoding. In that study, the presentation of necessarily lead to knowing its meaning in other stimulus lists was simultaneous rather than words it is part of. This discrepancy between the serial. As pointed out by Yu, Jing, and Sima target of teaching and the units of comprehension (1984) and Yu, Zhang, Jing, Peng, Zhang, and makes it difficult for people to design tests for Simon (1985), measured STM capacity varies measuring progress in comprehension, especially. depending upon the visual presentation method, when it is assumed that knowing the pronuncia- i.e., visual aspects of the stimuli have greater tion of all the characters in a text and their effect upon the results with simultaneous meaning in isolation should be sufficient for com- presentation than with serial presentation. Yu et prehension of the meaning of the text. al. (1985) also concluded, based on the results of To summarize the discussion of Chapter 4, some 14 experiments they carried out in China and the of the alleged differences reported by Hoosain USA, that "no effects were detected that are between the memory aspect of Chinese and other peculiar to ideographic or logographic languages languages are foundtobe inappropriate in contrast to alphabetic languages." interpretations of the experimental findings, while In 4.4 Hoosain looks at memory in relation to others derive from studies with defective designs. reading problems in Chinese. He reports, in par- In conclusion, so-called bottom-up differences, or ticular,an extensive study conducted by differences between processing of Chinese and Stevenson, Stigler, Lucker, Lee, Hsu, and other languages, may well be just a reflection of Kitamura (1982) that compared reading disability differences among languages themselves, or among Chinese, American, and Japanese children. merely of the difficulty in finding parallel ways of The major finding was that the proportion of chil- measuring performance in different languages. dren with reading problems is very similar among In Chapter 5, Neurolinguistic Aspects of the the three different languages, thus disconfirming Chinese Language, Hoosain presents convincing the previous popular belief that there are fewer evidence that the neurolinguistic aspects of reading problems among Chinese and Japanese Chinese are not essentially different from other children. The study did find, however, an interest- languages, as suggested by some of the early ing difference between Chinese and American as studies. At the same time, however, there is a well as Japanese children. Of those Chinese chil- persisting confusion in Hoosain's review between dren who failed the fifth grade test, most failed language and orthography. Though this should becauge of a poor comprehension score, whereas not be entirely blamed on him, since most of the most of the fifth grade American or Japanese chil- studies he reviews did not make this distinction in dren who failed the test did so because of their the first place, as a reviewer he should have been poor vocabulary score. Hoosain suggests this is be- more aware of this confusion. cause pronunciation tests involve more straight- In 5.1, Hoosain shows that .despite a claim (e.g., forward answers than word meaning tests, Hatta, 1977) that characters are processed in the whereas a more open-ended effort is required for right hemisphere, there are plenty of other studies questions about meaning. He further suggests in that demonstrated left hemisphere advantage for A Review of Psycholinguistic Implications for Linguistic Relativity: A Case Study of Chinese by Rumiahn floosain 205 printed Chinese words as well as English words. flect only the way word composition is learned for It is further shown that right hemisphere different orthographies,8 and this process could be advantage for Chinese was only found with single quite peripheral to the essential mechanisms in- characters, whereas two-character words always volved in fluent reading. elicited left hemisphere advantage. Also, with The finding that processing of tones in Chinese single characters the findings are mixed: Both is located in the left hemisphere is interesting, right-hemisphere advantage and left-hemisphere although again this is an aspect of language and is advantage are found, and which hemispheric thus not directly related to orthography. A study advantage is obtained seems to depend heavily on by Packard (1986) showed that patients with left the test conditions. Right-hemisphere advantage hemisphere damage had defective production of seems to be related to short exposure time, low lexical tones in Mandarin. On the other hand, illumination, and high structural complexity of Hughes, Chan, and Su (1983) found that patients the characters. This is further confirmed by the with right hemisphere damage had defects in both finding (Ho and Hoosain, 1989) that, with short production and comprehension of affective exposure time, low luminance, and high stroke intonation, but at the same time had kept their number, right-hemisphere advantage could be ability to process lexical tones almost intact. The observed even for low frequency two character finding that processing of tones is located in the words. left hemisphere, together with the finding that In 5.3 and 5.4, Hoosain reviews twenty-one processing sign language (e.g., Damasio, Bellugi, studies on Chinese aphasics. His general Damasio, Poizner, and Gilder, 1986) is also conclusion is that "there is little evidence to located in the left hemisphere, reveals the suggest any overwhelming neuroiinguistic effects universality shared by all human languages not of Chinese language uniqueness." Although it was only in terms of their ability to convey suggested by some earlier studies that Chinese information, but also in terms of the central language functions are more lateralized in the neural processing mechanism they all utilize. right hemisphere, results of more extensive To summarize this discussion of Chapter 5, the investigations indicate no such tendency. In evidence Hoosain provides for similarity in general, the proportion of right-handed patients lateralization of script as well as general linguistic with after right hemisphere damage does processing in the brain between Chinese and other not exceed the overall proportion of right handed languages is convincing. The tendency for right people who have their speech functions lateralized hemisphere advantage in processing isolated in the right hemisphere (about 4%). So, Hoosain characters shown in some of the studies reviewed concludes that "it is quite clear from these studies by Hoosain, however, does need further that specialization for language tends to be in the exploration. left hemisphere for the Chinese, as for speakers of In final conclusion, Hoosain's major attempt in alphabetic languages." (Notice, however, his use of this book is to introduce a new version of the lin- the word language' when most of the studies were guistic relativity hypothesis. The original version actually about reading.) is the famous Sapir-Whorf hypothesis that lan- There is an indication from some studies that guage influences thought. The new version is con- Chinese reading functions may be localized more cerned less with structure of language or world posteriorly, involving the parietal and occipital view than with the manner in which linguistic in- lobes, and psycho-motor szlemas may play a formation is processed, and in particular, with the greater role in memory for Chinese words. script-sound-meaning aspects of the Chinese or- However, as Hoosain points out, it is premature to thography and their effects on information pro- conclude that there is more involvement of the cessing. Hoosain's final conclusion in his book is occipito-parietal region for Chinese. As for reports that "there are definite correlations between lan- that finger tracing has been used by some patients guage characteristics and cognitive performance." to help recognize Chinese characters, which However, I find that this conclusion is based Hoosain regards as essentially different from mainly on misconceptions about the Chinese lan- spelling out individual letters for helping read guage and its orthography, on results obtained words in an alphabetic orthography, it seems that under defective experimental designs, and on mis- those patients' reading ability has been impaired interpretations of the results due to those miscon- so much that they are doing only what beginning ceptions. Besides, the difficulty in separating lan- learners of an orthography are doing. It is very guage differences and orthographic differences likely that both finger tracing and oral spelling re- from processing differences also affects Hoosain's 206 Xu judgments. More specifically, the difficulty is in Ho, S. K., & Hoosain, R. (1989). Right hemisphere advantage in lexical decision with two-character Chinese words. Brain and finding truly consistent measurements in compar- Language, 27, 606-615. ing different languages and orthographies. Unless Hoosain, R. (1986). Psychological and orthographic variables for this difficulty is overcome, there is always the translation asymmetry. In H. S. R. Kand & R. Hoosain (Eds.), danger of mistaking the differences due to Linguistics, psychology, and the Chinese Language (pp. 203-216). inconsistent measurements for true contrasts in Hong Kong: University of Hong Kong Centre of Asian Studies. Hoosain, R., & Osgood, C. E. (1983). Information processing times the processing of different languages and for English and Chinese words. Perception & Psychophysics, 34, orthographies. 573-577. So, as a final comment on the title of Hoosain's Hughes, C. P., Chan, J. L., & Su, M. S. (1983). Aprosodia in book, to parallel Hoosain's borrowing from physics Chinese patients with right lesions. the notion of 'relativity' to underscore his vision of Archives of Neurology, 40, 732-736. Keatley, C. W. (1988). Facilitation effects in the prirmd lexical decision linguistic processing of different languages, I task within and across languages. Unpublished doctoral would like to borrow, also from physics, the notion dissertation. University of Hong Kong. of 'uncertainty' or 'indeterminacy' to emphasize Liberman, L Y., Mann, V. A., Shankweiler, D., & Werfelman, M. the caution we should exercise when comparing (1982). Children's memory for recurring linguistic and non- different languages and orthographies. It is often linguistic material in relation to reading ability. Cortex, 18, 367- 3; 5. difficult to determine at the same time both the Liu, I. M. (1988). Context effects on word /character naming: difference between two languages or writing Alphabetic versus logographic languages. In I. M. Liu, H. C. systems ar d the differences in their processing. Chen, & M. J. Chen (Eds.), Cognitive aspects of the Chinese Comparing the processing of Chinese characters Language (pp. 81-92). Hong Kong: Asian Research Service. directly with the processing of letters, words, or Mattingly, I. G. (1991). Modularity, working memory, and reading disability. In S. A. Brady & D. P. Shankweiler (Eds.), bound or unbound morphemes in an alphabetic Phonological processes in literacy: A tribute to Isabelle Y. Liberman writing system is often risky; and, attributing any (pp. 163471). Hillsdale, NJ: Lawrence Erlbaum Associates. differences found in this kind of comparison to Mattingly, I. G. (1992). Linguistic awareness and orthographic cognitive or behavioral differences will often prove form. In L. Katz & R. Frost (Eds.), Orthography, phonology, to be a pitfall. morphology, and meaning (pp. 11-261. Amsterdam: Elsevier Science Publishers. Mou, L. C., & Anderson, N. S. (1981). Graphemic and phonemic REFERENCES codings of Chinese characters in short-term retention. Bulletin of Baddeley, A. D. (1983). Working memory. Philosophical the Psychonomic Society, 17, 255-258. Transactions of the Royal Society, London, B302, 311-324. Packard, J. L. (1986). Tone production deficits in nonfluent Baddeley, A. D. (1986). Working memory. Oxford: Clarendon. aphasic Chinese speech. Brain and Language, 29, 212-223. Baddeley, A. D., & Hitch, G. J. (1974). Working memory. In G. A. Peng, R. X. (1982). A preliminary report on statistical analysis of Bower (Ed.), The Psychology of Learning and motivation: Advances the structure of Chinese characters. Acta Psychologica Sinica, 14, in research and theory, Vol. 8 (pp. 47-90). New York: Academic 385-390. Press. Perfetti, C. A., & Zhang, S. (1991). Phonological processes in Biederman, I., & Tsao, I. C. (1979). On processing Chinese reading Chinese words. Journal of Experimental Psychology: ideographs and English words: Some implications from Stroop- Learning, Memory and Cognition, 17, 633-643. test results. Cognitive Psychology, 11, 125-132. Perfetti, C. A., Zhang, S., & Berent, I. (1992). Reading in English Chen, H. C., & Juola, J. F. (1982). Dimensions of lexical coding in and Chinese: evidence for a "universal" phonological principle. Chinese and English. Memory & Cognition, 10, 216-2/4. In L. Katz & R. Frost (Eds.), Ortlwgraphy, phonology, morphology, Cheng, C. M. (1981). Perception of Chinese characters. Acta and meaning. Amsterdam: Elsevier Science Publishers. Psychologica Taiwanica, 23, 281-358. Ren, N., & Mattingly, I. G. (1990). Short-term serial recall Damasio, A., Bellugi, U., Damasio, H., Poizner, H., & Van Gilder, performance by good and poor readers of Chinese. Haskins J. (1986). Sign language aphasia during let-hemisphere amytal Laboratories Status Report on Speech Research. SR-103/104, 153- injection. Nature, 322, 363-365. 164. DeFrancis, J. F. (1984). The Chinese Language: Facts and Fantasy. Seidenberg, M. S. (1985). The time course of phonological code Honolulu: University of Hawaii Press. activation in two writing systems. Cognition, 19, 1-30. DeFrancis, J. F. (1989). Visible speech: The diverse oneness of Shankweiler, D., Liberman, I. Y., Mark, L. S., Fowler, C. A., & writing systems. Honolulu: University of Hawaii Press. Fischer, F. W. (1979). The speech code and learning to read. Fang, S. P., Horng, R. Y., & Tzeng, 0. J. L. (1986). Consistency Journal of Experimental Psychology: Human Learning and Memory, effects in the Chinese character and pseudo-character naming 5, 531-545. tasks. In H. S. R. Kao & R. Hoosain (Eds.), Linguistics, Stevenson, H. W., Stigler, G. W., Luker, G. W., Lee, S. Y., Hsu, C. psychology, and the Chinese Language (pp. 11-21). Hong C., & Kitamura, S. (1982). Reading disabilities: The case of Kong: University of Hong Kong Centre of Asian Studies. Chinese, Japanese, and English. Child Development, 53, 1164- Glushko. R. J.(1979). The organization and activation of 1181 orthographic knowledge in reading aloud. Journal of Stroop, J. R. (1935). Studies of interference in serial verbal Experimental Psycnology: Human Perception and Performance, 5, reactions. Journal of Experimental Psychology, 18, 643-662. 674-691. Treiman, R. A., Baron, J., & Luk, K. (1981). Speech receding in Hatta, T. (1977). Recognition of Japanese kanji in the left and right silent reading: A comparison of Chinese and English. Journal of visual fields. Neuropsychologia, 15, 685-688. Chinese Linguistics, 9, 116425.

jO BEST COPY AVAILABLE2 A Review of Psycholinguistic Implications for Linguistic Relativity: A Case Study of Chinese by Rumjahn Hoosain 207

Tzeng, 0. J. L., Hung, D. L. , & Wang, W. S.-Y. (1977). Speech 3Incidentally, but perhaps more importantly, there are many recoding in reading Chinese characters. Journal of Experimental different shades of meanings attributed to this word as is the Psychology: Human Learning and Memory., 3, 621-630. case with almost all the words in any language. This quality is Tzeng, 0. J. L., & Wang, W. S.-Y. (1983). The first two R's. apparently lacking in any true picture of a bird. .A.merican Scientist. 71, 238-243. 4Chinese character is typically composed of a phonetic radical and Wang, H., Chang, B. R., Li, Y. S., Lin, L. H., Liu, J., Sun, Y. L., a semantic radical. Sometimes there is more than one semantic Wang, Z. W., Yu, Y. X., Zhang, J. W., & Li, D. P. (1986). A radical in a character, and sometimes there is no semantic radical Frequency Dictionary of Current Chinese. Beijing: Beijing at all. When the latter is the case, however, it is usually said that Language Institute Press. the character is a nonphonetic compound. This name is Whorf, B. L. (1956). Language, thought, and reality. Selected misleading since the whole character by itself represents a of Benjamin Lee Whorf. Cambridge, MA: MIT Press. syllable, and thus is no less phonetic than a phonetic radical in a Woo, E. Y. C., & Hoosain, It (1984). Visual and auditory functions compound character. of Chinese dyslexics. Psychologia, 27, 164-170. 5The misunderstanding was probably caused by the ambiguity of Yau, S. C. (1982). Linguistic analysis of archaic Chinese the description in the original paper by Tzertg and Wang (1983). ideograms. Paper presented at the XV International Conference on I found an unambiguous description only in the caption for Sino-TibetanIAnguages and Linguistics. Beijing. Figure 3 in the paper. Yu, B., Jing, Q., & Sima, H. (1984). STM capacity for Chinese 6The spellings used here are in pinyin. They are only used here to words and idioms: Chunking and acoustical loop hypotheses. represent the Chinese characters referred to. They do not reflect Memory & Cognition, 13, 193-201. the phonological values of these morphemes in Cantonese. Yu, B., Zhang, W., Jing, Q., Peng, R., Zhang, G., & Simon, H. A. 7The difference between the two studiesthat is, in Woo and (1985). STM capacity for Chinese and English materials. Hoosain, poor readers were more affected by visual similarity Memory & Cognition. 13, 202-207. whereas in Ren and Mattingly good readers were more affected by phonological similarityis probably due to the difference in presentation of the stimulus list. In the former, it was simultaneous display, whereas in the latter, it was serial display. FOOTNOTES See the discussion of the study by Mou and Anderson (1981) 'Language and Speech, 35, 325-340 (1992). later in this review. tAlso University of Connecticut, Storrs. 8Incidentally, when I was at my preliminary stage of learning I Linguistically, Chinese is actually a large language family English, I did not have the privilege of access to a teacher or consisting of many mutually unintelligible languages. However, even another fellow student, so I learned my English words by because the same writing system is used by the whole Chinese writing them over and over again. So, today, I still have great speaking community in China, and because of the close cultural difficulty spelling words orally or understanding other people's ties among the Chinese people as well as the centralized political oral spelling. But I don't have reading difficulty in English. If I traditions, the Chinese languages are often considered dialects of became aphasic, I don't think I would be able to use spelling to a single language. help recognize English words. On the other hand, many of my 2The number 3 at the end of the spelling indicates the tone of the fellow Chinese can do oral spelling fluently, because they started syllable. learning English with a teacher.

211 209

Appendix

SR * Report Date NTIS# ERIC *

SR-21/22 January-June 1970 AD 719382 ED 044-679 SR-23 July-Seplember 1970 AD 723586 ED 052-654 SR-24 October-December 1970 AD 727616 ED 052-653 SR-25/26 January-June 1971 AD 730013 ED 056-560 SR-27 July-September 1971 AD 749339 ED 071-533 SR-28 October-December 1971 AD 742140 ED 061-837 SR-29 /30 January-June 1972 AD 750001 ED 071-484 SR-31/32 July-December 1972 AD 757954 ED 077-285 SR-33 January-March 1973 AD 762373 ED 081-263 SR-34 April-June 1973 AD 766178 ED 081-295 SR-35/36 July-December 1973 AD 774799 ED 094 444 SR-37/38 January-June 1974 AD 783548 ED 094-445 SR-39/40 July-December 1974 AD A007342 ED 102-633 SR-41 January-March 1975 AD A013325 ED 109-722 SR-42/43 April-September 1975 AD A018369 ED 117-770 SR-44 October-December 1975 AD A023059 ED 119-273 SR-45/46 January-June 1976 AD A026196 ED 123-678 SR-47 July-September 1976 AD A031789 ED 128-870 SR-48 October-December 1976 AD A036735 ED 135-028 SR-49 January-March 1977 AD A041460 ED 141-864 SR-50 April-June 1977 AD A044820 ED 144-138 SR-51/52 July-December 1977 AD A049215 ED 147-892 SR-53 January-March 1978 AD A055853 ED 155-760 SR-54 April-June 1978 AD A067070 ED 161-096 SR-55/56 July-December 1978 AD A065575 ED 166-757 SR-57 January-March 1979 AD A083179 ED 170-823 SR-58 April-June 1979 AD A077663 ED 178-967 SR-59/60 July-December 1979 AD A082034 ED 181-525 SR-61 January-March 1980 AD A085320 ED 185-636 SR-62 April-June 1980 AD A095062 ED 196-099 SR-63/64 July-December 1980 AD A095860 ED 197-416 SR-65 January-March 1981 AD A099958 ED 201-022 SR-66 April-June 1981 AD A105090 ED 206-038 SR-67/68 July-December 1981 AD A111385 ED 212-010 SR-69 January-March 1982 AD A120819 ED 214-226 SR-70 April-June 1982 AD A119426 ED 219-834 SR-71/72 July-December 1982 AD A124596 ED 225-212 SR-73 January-March 1983 AD A129713 ED 229-816 SR-74/75 April-September 1983 AD A136416 ED 236-753 SR-76 October-December 1983 AD A140176 ED 241-973 SR-77/78 January-June 1984 AD A145585 ED 247-626 SR-79 /80 July-December 1984 AD A151035 ED 252-907 SR-81 January-March 1985 AD A156294 ED 257-159 SR-82/83 April-September 1985 AD A165084 ED 266-508 SR-84 October-December 1985 AD A168819 ED 270-831 SR-85 January-March 1986 AD A173677 ED 274-022 SR-86 / 87 April-September 1986 AD A176816 ED 278-066 SR-88 October-December 1986 PB 88-244256 ED 282-278

SR-113 January-March 1993

212 210

SR-89/90 January-June 1987 PB 88-244314 ED 285-228 '01. SR-91 July-September 1987 AD A192081 ** SR-92 October-December 1987 PB 88-246798 SR-93/94 January-June 1988 PB 89-108765 ** SR-95/96 July-December 1988 PB 89-155329 ** SR-97/98 January-July 1989 PB 90-121161 ED32-1317 SR-99/100 July-December 1989 PB 90-226143 ED32-1318 SR-101/102 January-June 1990 PB 91-138479 ED325-897 SR-103/104 July-December 1990 PB 91-172924 ED331-100 SR-105/106 January-June 1991 PB 92-105204 ED340-053 SR-107/108 July-December 1991 PB 92-160522 ED344-259 SR-109/110 January-June 1992 PB 93-142099 ED352594 SR-111/112 July-December 1992 PB 93-216018 ED359575

AD numbers may be ordered from:

U.S. Department of Commerce National Technical Information Service 5285 Port Royal Road Springfield, VA 22151

ED numbers may be ordered from:

ERIC Document Reproduction Service Computer Microfilm Corporation (CMC) 3900 Wheeler Avenue Alexandria, VA 22304-5110

In addition, Haskins Laboratories Status Report on Speech Research is abstracted in Language and Language Behavior Abstracts, P.O. Box 22206, San Diego, CA 92122

**Accession number not yet assigned

SR-113 January-March 1993 ni. . 43 Haskins Laboratories Status Report on

SR-113 JANUARY-MARCH 1993 Speech Research

Contents

Some Assumptions about Speech and How They Changed Alvin M. Liberman 1 On the Intonation of Sinusoidal Sentences: Contour and Pitch Height Robert E. Remez and Philip E. Rubin 33 The Acquisition of Prosody: Evidence from French- and English-Learning Infants Andrea G. Levitt 41 Dynamics and Articulatory Phonology Catherine P. Browman and Louis Goldstein 51 Some Organizational Characteristics of Speech Movement Control Vincent L. Gracco 63 The Quasi-steady Approximation in Speech Production Richard S. McGowan 91 Implementing a Genetic Algorithm to Recover Task-dynamic Parameters of an Articulatory Speech Synthesizer Richard S. McGowan 95 An MRI-based Study of Pharyngeal Volume Contrasts in Akan Mark K. Tiede 107 Thai M. R. Kalaya Tingsabadh and S. Arthur Abramson 131 On the Relations between Learning to Spell and Learning to Read Donald Shankweiler and Eric Lundquist 135 Word Superiority in Chinese Ignatius G. Mattingly and Yi Xu 145 Pre lexical and Post lexical Strategies in Reading: Evidence from a Deep and a Shallow Orthography Ram Frost 153 Relational Invariance of Expressive Microstructure across Global Tempo Changes in Music Performance: An Exploratory Study Bruno H. Repp 171 A Review of Psycho linguistic Implications for Linguistic Relatioity: A Case Study of Chinese by Rumjahn Hoosain Yi Xu 197 Appendix 209

214