DOCUMENT RESUME

ED 212 010 CS 503 730

TITLE Speech Research: A Report on the Status and Progress of Studies on the Nature of Speech, Instrumentation for Its Investigation, and Practical Applications, July 1-December 31, 1981. Status Report 67/68. INSTITUTION Haskins Labs., New Haven, Conn. SPONS AGENCY National Institutes of Health (DREW), Bethesda,Md.; National Inst. of Child Health and Human Development (NIH), Bethesda, Md.; Natiorial Inst. of Education (ED), Washington, D..; National Inst. of Neurological and Communicative Disorders and Stroke (NIH), Bethesda, Md.; National Science Foundation, Washington, D.C. PUB DATE 81 CONTRACT NICHHD-N01-HD-1-2420 GRANT NICHHD-HD-01994; NIE-G-80-0178; NIH- RR- 35596; NINCDS-NS13870; NSF-MCS79-16177 NOTE 275p.

EDRS PRICE MF01/PC11 Plus Postage. DESCRIPTORS *Acoustics; *Articulation (Speech)`; ommunication (Thought Transfer); *Communication Research; Consonants; Context Clues; Hearing Impairments; Language Acquisition; Perception; Perceptual Motor Learning; Phoneme Grapheme Correspondence; *Phonetics; Reading; Sign Language; Silent Reading; *Speech Communication; Vowels ABSTRACT As one of a regular series, this report focuseson the status and progress of studieson the nature of speech, instrumentation for its investigation, and practical applications. Drawn from the period of July 1 to December 31, 1981, the 15 manuscripts cover the following topics: (1) phonetic trading relations and context effects; (2) temporalpatterns of coarticulation; (3) temporal constraintson anticipatory coarticulation; (4) the phonetics of 'stop-consonant sequences; (5) impaired speech production of hearing-impairedspeakers; (6) specialized processes of phonetic perception; (7) reading,prosody, and orthography; (8) children's memory for recurring linguisticend nonlinguistic material in relation to reading ability; (9)phonetic --and auditory trading relations between acousticcues in speech perception; (10) production-perception of phoneticcontrast during phonetic change; (11) decay of auditorymemory in vowel discrimination; (12) phonetic stricture and meaning; (13)ecological acoustics; (14) linguistic conventions and speech-sign relationships; and (15) fricative-stop coarticulation. (RL)

*********************************************************************** * Reproductions supplied by EDRS are the best thatcan be made * * from the original document. * *********************************************************************** usDIAIMMOINT Of EINICATION NATIONAL DIETITUTE OF EDUCATION EDUCATIONAL RESOURCES INFORMATION CENTER (Ella Thu document hes barn reproduced as received from the person or organization SR-67/68 (1981) onpeneeng (1 Minor changes have been mode to omproye repducuon queen?

Points M view nt opinions stated in this docu meet do not moments represent official SHE position or poky

Status Report on

SPEECH RESEARCH

A Report on the Status and Progress of Studies on the Nature of'Speech, Instrumentation for its Investigation, and Practical Applications

1 July - 31 December 1981

Haskins Laboratories 270 Crown Street New Haven, Conn. 06510

Distribution of this document is unlimited.

(This document contains no information not freely available to the general public. Haskins Laboratories distributes it primarily for library use. Copiesare available fromtheNationalTechnical Information Service or the ERIC Document Reproduction Service. See the Appendix for order numbers of previwas Status Reports.)

C) SR-67/68 (1981) (July-December)

ACKNOWLEDGMENTS

The research reported here was made possible in part bysupport from the following sources:

National Institute of Child Health and Human Development Grant HD-01994 Grant HD-05677

National Institute of Child Health and Human Development Contract NO1-HD-1-2420

National Institutes of Health" Biomedical Research Support Grant RR-05596

National Science Foundation Grant MCS79-16177 oirant PRF8006144 'ant MS-8111470

National D ce of NeurologiJal and Communicative Disord. ad Stroke Grant NS13870 Grant MS13617.

National Institute of Educatioh Grant G-80-0178

4%,

iii 3 SR-67/68 (;981) (July-December) HASKINS LABORATORIES

Personnel in Speech ,Research

Alvin M. Liberman,* President and Research Director Franklin S. Cooper, Associate Research Director Patrick W. Nye, Associate Research Director Raymond C. Huey, Treasurer Alice Dadourian, Secretary

Investigators Technical and Support Staff Students

Arthur S. Abramson* Eric L. Andreasson Suzanne Boyce Peter Alfonso* Margo Carter Tova Clayman Cinzia Avesani2 Elizabeth P. Clark Steven Eady Thomas Baer Vincent Gulisano Jo Estill Fredericka Bell-Berti* Donald Bailey Laurie B: Feldman 1Catherine Best Terry Halwes Carole E. Gelfer Gloria J. Borden* Sabina D. Koroluk Janette Henderson Susan Brady* Bruce Martin Charles Hoequist Giuseppe Cossu3 Agnes M. McKeon Robert Katz Robert Crowder Nancy O'Brien Peter Kugler Carol A. Fowler* Marilyn K. Parnell Gerald Lame Louis Goldstein Susan Ross Anthony Levas Vicki Hanson William P. Scully Harriet Magen Katherine S. Harris* Richard S. Sharkany Sharon Manuel Alice Healy Leonard Szubowicz Suzi Pollock Kiyoshi Honda1 Edward R. Wiley Brad Rakerd Leonard Katz* David Zeichner Daniel Recasens J. A. Scott Kelso Rosemarie Rotunno Andrea G. Levitt* Hyla Rubin Isabelle I. Liberman* Judith Rubin Leigh Lisker* Arnold Shapiro Virginia Mann* Suzanne Smith Charles Marshall Rosemary Szczesiul Ignatius G. Mattingly* Douglas Whalen Nancy S. McGarr* Deborah Wilkenfeld Lawrence J. Raphael* David Williams Bruno H. Repp Philip E. Rubin Elliot Saltzman Donald P. Shankweiler* Michael Studdert-Kennedy* Betty T:Aler* Michael T. Turvey* Mario Vayra2 Robert Verbrugge*

* Part-time -Visiting from University of Tokyo, Japan -Visiting from Scuola Normale Superiore, Pisa, Italy 3Visiting from Istituto Di Neuropsichiatria Infantile,Sassiri, Italy SR-67/68 (1981) (July-December) CONTENTS

I. Manuscripts and Extended Reports

Phonetic trading relations and context effects: New experimental evidence for a speech mode

of perception--Bruno H. Repp 1

Temporal patterns of coarticulation: Lip rounding-- Fredericka Bell-Berti and Katherine S. Harris 41

Temporal constraints on anticipatory coarticulation-- Carole E. Gelfer, Katherine S. Harris, and Gary Hilt 57

Is a stop consonant released when followed by another stop consonant? Janette B. Henderson and Bruno H. Repp 71

Obstruent production by hearing-impaired speakers: Interarticulator timing and acoustics-- Nancy S. McGarr and Anders L5fcivist 83

On finding that speech is special--Alvin M. Liberman 107

Reading, prosody, and orthography--Deborah Wilkenfeld 145

Children's memory for recurring linguistic and nonlinguistic material in relation to reading ability--Isabelle Y. Liberman, Virginia A. Mann, Donald Shankweiler,-and Michelle Werfelman 155

Phonetic and auditory trading relations between acoustic cues in speech perception: Preliminary results--Bruno H. Repp 165

Production and perception of phonetic contrast during phonetic change--Paul J. Costa and Ignatius G. Mattingly 191

Decay of auditory memory in vowel discrimination-- Robert G. Crowder 197

The emergence of phonetic structure--Michael Studdert-Kennedy 217 0 Auditory information for breaking and bouncing events: A case study in ecological acoustics-- William H. Warren, Jr. and Robert R. Verbrugge 223

Speech and sign: Some comments from the event perspective. Report for the Language Work Group of the First International Conference on Event Perception-- Carol Fowl al2nd Brad Rakerd 241

Fricative-stop coarticulation: Acoustic and perceptual evidence--Bruno H. Repp and Virginia A. Mann 255

vii

5 II. Publications

III. Appendix: DTIC and ERIC numbers (SR-21/22- SR-67/68) I. MANUSCRIPTS AND EXTENDED REPORTS

ix PHONETIC TRADING RELATIONS AND CONTEXT EFFECTS: NEW EXPERIMENTAL EVIDENCE FOR A SPEECH MODE OF PERCEPTION'

Bruno H. Repp

Abstract. This article reviews a variety of experimental findings, most of them obtainedinthe last few years, that show that the perception of phonetic distinctions relies ona multiplicity of acoustic cues and is sensitive to the surrounding context in very specific ways. Nearly all of these effects have correspondences in speech production, and they are readily explained by the assumption that listeners make continuous use of theirtacitknowledge of speech patterns. A generalauditory theory that doesnotmake reference to the specific origin aPifunction of speect can, at best, handle only a small portion of the wealthofphenomena reviewed here. Special emphasis is placed on several recent studies that obtained different patterns of results depending on whether identical stimuli were perceived as speech or as nonspeech. These findings provide strong empirical evidence for the existence of a special speech mode of perception.

INTRODUCTION

Speech isa specifically human capacity. Just as humans are uniquely enabled toproduce the complex stream of soundcalled speech,one might suppose that they make use of special perceptual mechanisms to decode this complex signal. Of course,since speech is remarkably different from all other environmental sounds, it is highly likely that there are perceptual and cognitive processes that occur only when speechis the input. Otherwise, speech simply would not beperceivedas what itis. To make sense,the question ofwhetherspeechperception isdifferent from other forms of perception is bestrestricted to thoseaspects ofspeechthat are not obviously unique, e.g., to its being an acoustic signal that can be described in the same physical terms as other environmental sounds. Then the question may be raised whether the perceptual translation of this acoustic signal into the sequence of discrete linguistic units that we experience (i.e., phonetic perception) requires the assumption of special mechanisms, or whether it can be reduced to a combination of auditory processes known to be involved also in

A revised version is to appear :n Psychological Bulletin. Acknowledgment. Preparation of thispaper was supported by NICHD Grant HD01994 and BRS Grant RR05594 to Haskins Laboratories. Valuable comments on an earlier draft were obtained from Carol Fowler, Alvin Liberman, Michael Studdert-Kennedy,Janet Werker, and an anonyMOus reviewer. My intellectual debt to Alvin Liberman must be evident throughout the paper.

[HASKINS LABORATORIES: Status Report on Speech Research SR-67/68 (1981)]

8 the per'ptlon and interpretation =f nonspeec.h sounds. eventhis modest question, n,,wver, presupposes that thelinguistic_ categories appliedby a listener, even though they are appropriate only for speech, are not unique in

any essentialsense but ratner can t/P viewed as labels applied to specific auditory patterns. This assumption is probably wrong, but it must be granted now for the argument to proceed.

fhe precise nature of the processes and meehanisms that support phonetic perception has beenthe subjectof much discussion. A number of speech researchers hold the view that speech perception is specialIn the sense that It takes account of the origin of the signal inthe action of a speaker's articulatory system. This general view underlies the well-known motor theory of speech perception (e.g., Liberman, Cooper, Shankweller, & Studdert-Kennedy, 1967) as well as the theory of analysis-by-synthesis (Halle & Stevens, 1959). Mure recently, it has been fertilized and augmented by ideas derived from Sibson's (1966) theory of event perception (see, e.g., Bailey & Summerfield, 1980; Neisser. 1976; Summerfield,1979), which postulates that all perception isdirected towardsthe source of stimulation. While, in the Gibsonlan framework. speech,erception is not seenas basically differentfrom the perception of other auditory (and visual) events, the 5,ecial nature of the source (the human vocal tract)is acknowledged and emphasized. In this view, speech perception is special because the source of speech is special, There are other researchers, however, who would concur only with the second half of that statement (the special nature of the source), not with the first. They pursue the nypothesis that the processes involved inspeech perception are essentially the same as those that support theauditory perception of nonspeechsounds, andthatthey operate withoutimplicit reference to the sound-producing mecnanisms that generate the speech signal. In this view, tale specifi complexity of speech perception results merely from the diversity and the number of elementary auditory processes required to deal with an intri- catelystructuredsignal (see, e.g., Divenyi, 1979; Kuhl & Miller, 1978; Pastore,1981; Schouten,1980; Stevens,1975). These two views are perhaps most clearly 'listinguisned oy their different orientations to the evolution of speech perception: Whereas, according to the first view, special perceptual processes evolved handin hand with articulatory capabilities to handle the complex output of a speaker's vocal tract, the second view assumes that the vocal productions of early hominids werefittedintoa mold created by the pre-existing sensitivities and limitations of their auditory systems.

Which of tnese two views is correct is,in part,an empirical question that rests on many possible sources of evidence, including the reactions to speech of animaland human infant subjects,traditionallaboratory experi- ments, electrophysiological and clinical observations. In this review, I will focus on a set of recent attempts to demonstrate the peculiarities of speech perception in the laboratory, using normal adult human subjects. This kind of evidence nas been,and continues to be,centralto the argument, asit is easier to -,btain, permits a variety of approaches, and is perhaps more readily Interpreted tar some of the other research. This is not to deny that some of themost crucial results will come from infant and animalexperiments; however, this research characteristicallylags one step behind the standard laboratory findings,and studies that extend the latest findings on college students' perception to other subject populations7 are just getting under way as tnis review is being written. 9 Lessthan a decade ago, a richsetof experimental dataapparently supported the existence of a special speech mode of perception, distinct from other kinds of auditory perception. However, within a few years that support seems to have all but evaporated. Thehistory of these events will be summarized and commented upon in the first parr of the present paper. Since the main purpose of that section is to set the stage for the following review, mltreatment of what are complex and often controversial issues will necessar- ily be somewhatsketchy andbetray my biases. Ir thesecond part, new evidence--much of it collected over the last few years--will be reviewed and discussed. I will concludethatwe have, once again, strong experimental support for a special phonetic mode of perception.

THE OLD EVIDENCE

In a well-known paper, Wood (1975) listed six laboratory phenomena that, at that time, seemed to provide strong converging evidence for the existen-e of special processes inspeech perception. One phenomenon is the "phoneme boundary effect,"which iscommonly )subsumed under the more general term, rAt4gorical perception. It is the finding that two speech stimuli are easier to discriminate when they can be assigned to different linguistic categories thjh when, though separated by an equivalent physical difference,6hey are perceived as belonging to the same category. A second phenomenon is selective adaptation, the shift of the category boundary on a synthetic speech continuum following repeated presentation of one endpoint stimulus. Three other phe- nomena haveto do with hemispheric specialization: the dichotic right-ear advantage, the right-ear advantage in temporal-order judgments of speech stievli,and differences in evoked potentials from thetwo hemispheres in resooge to speech stimuli. A sixth phenomenon concerned asymmetric interfer- ence between auditory and phonetic stimulus dimensions in a speeded classifi- cation task. Many of the findings that Wood referred to under these headings have been excellently reviewed by Studdert-Kennedy (1976).

At the time the Wood and Studdert-Kennedy papers were written, all of the above-namedphenomena seeme.s to be specific to speech; thatis, they were apparently not obtained with nonspeech stimuli. However, a few years later, the picture had changed considerably. Using Wood's enumeration of findings as their starting point, both Cutting (1978)and Schouten (1980) reviewed more recent research using the various paradigms and concluded independently that there was no evidence for a special phonetic mode of perception. After that statement, the viewsof these two authors diverge: Cutting, a vigorous proponent of the Gibsonian view, srgues for considering speech perception as merely oneinstance of auditory event perception (i.e.,the perception of auditory events other than speech may be as--or nearly as--complex and special as speech perception), while Yhol.en, who represents a more narrowly psycho- physicalortentation, statesrather bluntly that "speech and non-speech auditory stimuli are probably perceived inthesame way"(p. 71), implying that all auditory perception rests on the same elementary processes.

Theconclusionsofboth authorsreflect theirdisillusion overthe failure of a number of experim otal techniques to produce results specific to speech. Since the relevant evidence has been competently reviewed by them and by others, I will deal with it only briefly, focusingprimarily on its interpretation.

3 Categorical Perception

The "phoneme boundary effect" singledout by Wood (1975)--the enhanced discriminability across the phoneticboundary on a synthetic speech continuum-- is merely one aspect of the complexphenomenon termed categorical perception. Other aspects are reducedcontext sensitivity in stimulus categorization and predictability of discrimination performancefrom identification scores (Repp, Healy, & Crowder, 1979). However, these latter two aspects have not been claimed to be specific to speech.

The speech-specificity of the phoneme boundaryeffect has been challenged on the grounds that analogous effectshavQ been demonstrated for a variety of nonspeech continua: noise-ouzz sequences (Miller,Wier,Pastore, Kelly, & Cooling, 1976), tone-onset-time (Pisoni, 197117),tone amplitude in the presence of a reference signal (Pastore,Ahroon,Baffuto,Friedman, Puleo,& Fink, 1977), visual flicker (Pastoreet al., 1977), musical intervals (Burns & Ward, 1978), and amplitude rise-time(Cutting & Rosner, 1974). The results for the rise-time ( "pluck "- "bow ") continuum, which have been widely citad and followed up, and on which Cutting (1978) rested his wholeargument, have recently been claimed to be artifacts due to faultystimulus construction (Rosen & Howell, 1981), but the other findingsappear to be solid. However, some of them are not very su:prising. If a psychophysical continuum is chosenon which some kind of threshold is knownto exist--such asthe criticalflicker fusion threshold--it is obvious that two stimuli fromopposite sides of the threshold will be more discriminabld thantwo stimuli from the same-side. However, it does notfollow that,therefore, the phoneme boundary effect ona speech continuum is also caused bya psychophysical boundary that happens to coincide with the phoneme boundary. The problem is that, in most cases, we have no good idea of whatthe psychophysicalboundary oughtto be. Moreover, a phoneme boundary effect maybe caused bythe phoneme boundary itself,as argued below. There are several reasons why the nonspeech studiesreferred to above have done relatively little to clarifythe issue.

First of all only results obtained with nonspeech stimuli tJ'at have something In common with speeCh are directly relevantto the question of whether a specific phoneme boundary falls on topof a psychoacoustic threshold. Forexample, the observations on theslicker fusion threshold (Pastore et al., 1977) cannot have any direct implications for speech perception. They show that categorical perceptioncan occur in the nonspeech domain, but they do not prove that thecauses are the same as in a particular speech case. Second, just how much certain nonspeech stimuli havein common with speech stimuli they are intended to emulate isa natter of debate. It is doubtful', for example, whetherthe relative onset time of two sinusoids (Pisani, 1977) successfully simulates the distinctionbetween a voiced and a voiceless stop consonant (of. Pastore, Harris, &Kaplan, 1981; Pisoni, 1980; Summerfield, in press), or whether amplitude rise-timehas much to do with the fricative-affricate distinction(Remez, Cutting,& Studdert-Kennedy, 1980). Third, even those nonspeech continua (such asnoise-buzz sequences) that appear to copy a speech cue more or less faithfully yield results that,on closer insp-ction, are notin agreement with speech results. For example, individual listeners in the Miller etal. (1976) study showed boundaries as short as 4 msec on a noise-buzz continuum, whichis much shorter than any boundaries for English-speaking listeners,on the supposedly analogous voice- onset-time dimension (see, e.g., Zlatin, 1974). Note alsothatauditory 4

11 thresholds may shift with extended practice in the laboratory,' while linguistic boundaries ordinarily do not; this creates a problem for comparing the locations of the -two. Fourth,and moatsignificantly, thevarious comparisons of categoricalperception of speech and supposedly analogous nonspeech stimuli generally-have not taken into account the fact that there are multiple cues for each, phonetic contrast and that perception of one cue, as it were, is not independent of the settings of other relevant cues. This issue, which has received particular ittenflon only in the last few years, will be central to the second, part of the present paper. Fifth, there are a variety of other factors that influenee_the locations of phonetic boundaries; language experience, speaking rate, stress, phonetic context, semantic factors, and so on. It remains to be shown, that psychophysical thresholds are sensitive to all, or even of these variables (or their psychoacoustic analogs). Finally,we note that there are examples of category boundary effects on nonspeech continua-that have noiobvious psychophysioil boundaries, viz., for musical intervals (Burns & Ward, 1978; Siegel & Siegel, 1977) or chords (Blechner, 1977; Zatorre & Halpern, 1979), which suggests that,well established categories of non-psychophysical orign may dominate perception.

In view of these arguments, one plausible account of the phoneme boundary efferA remains that it arises from the use of category labels in discrimination. The support for this hypothesis comes from studies that show a change in speech sound discriminabilitycensequept upon a redefinition of linguistic categories for the same stimuli and the same listeners (e.g., Carden, Levitt, Jueczyk, & Walley, 1981). However, the use of category iabele tnAiscrimirntion is not unique to speech. The difference between speech and nonspeech in the discrimination paradigm, probably rests on the nature of the categories; Phonetic categories are not'only more deeply engrained than other categories, but they also bear a spectel relation to the acoustic signal. 40e Studdert-Kennedy (1976) has put it, speech sounds "name themselves." Therefore, linguistic categories will dominate perception in a discrimination task to a larger'extent than nonspeech categories that frequently do not even, exist pre-experiemtally and,in those 08383i merely ,serveto bisect the stimulus range. In addition, the acoustic distinctions and a category contrast may be finer in the case of speech and also are habitually ignored by listeners in a natural situation; therefore, they are more difficult to access in the context of a discrimination task.

The strongest evidence for the alternative hypothesis, that categorical perceptionofspeech restson nonlinguistic auditory discontinuities in perception, comes from research on human infants (for recent summaries, see .16__zyk, 1981; Horse, 1979; Walley, Pisoni, & Aslin,1981)and nonhuman/ animals, particularly chinchillas (Kuhl, 1981; Kuhk& Hiller, 1978).Allowing for the inevitable methodological differences and lbmitations, infants and (so far) chinchillas appear to perceive synthetic speech stimuli essentially the same way adults do, including superiordiscrimination ofstimuli from different (adult) categories than of stimuli from the same category. These effects_ obviously reflect some "natural" boundaries, but it is not entirely clear' whether these boundaries are strictly psychoacoustic in- nature or whether they perhaps reflect some innate or acquired sensitivity to articulatory patterns. Even if they were psychoaeoustic (thisbeing the receivedinterpretation of the infant and chinchilla findings), itis not certain that linguistic categories in fact depend on them. (See, however, Aslin & Pisoni, 1980,for a different view.) For example, children In the 12 5 early stages oflanguage, mo often are not ableto makethe perceptual

distinctions infants seam I.:A be capable of (Barton, 1980). There are still many Open questions here, A fair assessment of the situation may be that the evidence on phoneme bounder' effects neither strongly supports nor diaconfirms the existence of a special speech mode of perception.

Selective .Adeptation

The shifting of phoneme boundaries on a continuum by repeated presentation of stimuli from one category has been a favorite pastime of same. speech perception researchers ever since Eimas & Corbit (1973') discovered the technique. (See Diehl, 198h frm a recent critical review.) In hindsight, this effort. semis not to have been worthwhile. Since varios:s kinds of nonspeech. dimensions show selective-adaptation effects, it was to be expected that auditorly dimensions of speech can be adapted as well. On the whole, this is what a score of studies-show.". The technique was considered interesting because it was thought to reveal the existence of "phonetic feature detectors" (Eimas&Corbit, 1973). However, the evidence for specifically phonetic effects in selective adaptation is scant, and what there is can lorobably'be explained as shiftsin response criteriaorea effects of remote auditory similarity. Recent experiments by Sawusch and Juisozyk1981) and particularly by.Roderts and SUmmerfield (1981) strongly suggest thlit' there is no phonetic componeht in selective adaptation at all, and that the effect takes place exclusively at a relatively early stage in'auditory processing. ! / The concept .Jf phonetic feature detectors is useless not only for the explanation of selective adaptation results (cf. Remez, 1979) but also from a wider theoretical perspective. Mane expresses this better than ,uddert- Kennedy (in press)when hesays_ that "we are dealing with tautology, not explanation. ... The error lies in offering to,explain phonetic capacity by making a substantive-physiological mechanist, out of a descriptive property of language" (p. 225). For, "... the perceived,ietture is an attribute, not a constituent, of the percept, and we are absolved. -from positing specialized mechanisri for its-extraction" (p. 227). Arguments such as these apply not only to the concept of phonetic feature detectors but to the/concept of the feature detector in gener 1. For these reasons, selective adaptation results cannot have any implications for or against the existence of a special speech mode.

'Hemispheric Specialization

The empirical results supporting a hemispheric asymmetry for speech and language are rich and complex. While left-hemisphere advantages have been reported for certain kinds of nonspeech sounds, the evidence thatg-speech processes arelateralized to _the left hemisphere in the large majority of individuals is unassailable.'It has been claimed, however,that precisely because certain nonspeech stimuli show similar effects, the lateralization of speech should be explained by a more general principle, e.g., by a specialization of the left hemisphere for auditory properties characteristic ofspeech (Cutting, 1978; Schouten, 1980), orby an analytic-holistic distinction between the two hemispheres (e.g., Bradshaw 4 Nettleton, 1961). In commenting on the last-named paper, Studdert-Kennedy (1981) has argued that the analytic-holistic hypothesis, whiledescriptively adequate, is ill- conceived from a phylogenetic viewpoint. Rather, since laterslization

6 13 presumably evolved to 'support 30MS behavior important to the species, it seems more likely that lateralization of motor control preceded or caused lateralization of speech processes, Which in turn may be responsible for the 'superior ssalytic capabilities of the left hemisphere. The apparent specialization of the left hemisphere for certain auditory characteristics of speech maylust. as well be the consequence as the AMMO of the lateralization of linguistic functions. Thus, the existing evidence on hemispheric specialization canbe interpreted in an alternativeway that is more compatible with a biological viewpoint and that recognizes the special status of speech.

Other Laboratory Phenomena

Various other findings have been cited as eviaence for or against a speech mode of perception. Thus, Wood (1975) mentions the phenomenon' of asymmetricinterference betweenauditoryandlinguistic dimensions in a speeded classification task. While this finding (whose methodological details/ need not concern us here) may reveal something about the auditory pro6essing of speech, its implications for the existences of a "special speech mode of perception are limited. Similar patterns of results have been obtained with nonspeech auditory stimuli (Blechner, Day, & Cutting, 1976; Pastore, Ahroon, Crimmins, Golowner, & Berger, 1976), suggesting that the asymmetry has a nonphonetic basis.

Schouten (1980) adds to Wood's list two findings that seem to have even less bearing on the question of 4 phonetic mode of perception: A difference in the stimulus duration'needed for correct order judgments with sequences of speech or nonspeech sounds (Warren, Obusek, Farmer, & Warren, 1969), and ani asymmetry in the perception of truncated CV and VC syllables (Pols & Schouten, 1978). The first finding probsbly reflects the fact that speech stimuli are more readily categorized than nonspeech stimuli, while the Second finding seems altogether irrelevant, having most likely a psychoacoustic explanation. It is a mistake to believe (as Schouten apparently does)that the "case against a speech mode of perception* is strengthened by various findings of auditory (nonphonetic) effects in speech perception experiments. such effects are likely to occur for,after all,- speech enters through the ears. The thesis of the present paper is,, however, that these effecti are relatively inconsequential for the'linguistic procesting of speech.

By focusing pr' rill on the experimental paradigms listedin Wood's (1975)article, Cu. (1978) and Schoutsc (1980) neglected a variety of other observations thou suggest the existence of a speech mode of perception. Liberman et al. (1967) reviewed many properties; that are peculiar to speech and seem.torequire special, perceptual 'skills. Foresoat among these properties is the invariance of phonetic perception over substantial changes in the acoustic information; conaiGer the wellknown /dig /du/ example, which shows that the /d/ percept can be cued by radically different transitions of the secohd formant. To achieveithe same classification without reference to the articulatory gesture common to /di/ and /du/,an -exceedingly complex "auditory decoder" would be required.

_ Liberman et al. (1967) also noted that the formant transitions Hilatinguiohing /di/ and /du/ sound quite different from each other when they are presented in isolation and do not engage the speech mode. In fact, when

7

14 second- or third-formant transitions are removed from a synthetic syllable and presented to one ear while the rest of the speech pattern is presented to the other ear, the transitions are found to do double duty: They are perceived as whistles or chirps in one ear, but they also fuse with the remainder of the syllable in the other ear to produce a percept equivalent to the original ,syllable (Rand, 1974; Cutting, 1976). This "duplex perception" demonstrates the simultaneous use of speech and nonspeech modes of perception and has recently been further explored in experiments that will be reviewed later in this paper.

Other authors have noted striking differences in subjects' responses depending on whether identical or similar stimuli were perceived as speech or nonspeech. For eLampA, House, Stevens, Sande', and Arnold (1962) found that an ensemble of speech stimuli was easier to learn than'variouspnsembles of speechlike stimuli that, however, wera not perceived as speech by-the subjects (cf. also GrunRe & Pisoni, Note 1). Several studies of categorical perception have shown that ,speech stimuli from a synthetic continuum- are discriminated well across a' phonetic category boundary, while nonspeech analogs or components of the same stimuli are discriminated poorly or at chance (e.g., Liberman, Harris, Kiwis, Lisker, & Bastian, 1961; Liberman, Harris, Kinney, & Lane,.1961; Mattingly, Liberman, Syrdal, & Halwes,1971). As long astwo decades ago, House et al.' (1962) concluded that "an understanding of speech perception cannot be achieved through experiments that study classical psychophysical responses to complex acoustic stimuli. ... Although _speech stimuli are accepted by the peripheral auditory mechanism, their interpretation as linguistic events transfers their processing to ,some nonperipheralcenter where the detailed characteristics of the -peripheral analysis are irrelevant" (p. 142). This conclusion is still valid, as the remainder of this paper will attempt to show.

Summary

Of the various paradigms reviewed by Cutting (1978) and Schouten (1980), some failed to support the existence of a speech mode of percepttpn because they were irrelevant to begin with. As far as categorical perception and hemispheric specialization are concerned, some of the evidence may have been misinterpreted. The factthat categoricalperceptionand left-hemisphere superiority can be obtainedfor certain nonspeech stimuli does away with earlier claims that-these phenomena are speech-specific. However, it does not necessarily imply that similar patterns of results occur for t e reason in speech and nonspeech; and if they do, it is not necessariltrue that the processes involved in the perception of nonspeech are more b is than, or the prerequisites for, those supporting speech perception. W have seen that there are other findings, not considered by Cutting and Schou en, that suggest' that speech perception differs from nonspeech auditory perception. It must be acknowledgede however, that the empirical results are complex, and while they hardly argue against the- existence of a speech mode, they do not provide an overwhelming amount of positive evidence either.

Certainly,the argument thatspeech perception is specialwould be strengthened if new,less controversial results could be brnught to bear on the issue. Thq second part of We paper focuses on a set of rather recent findings that adda new dimension to the argument. Since'these results are recent and have not been reviewed preiriously, they will be treated in more

8 15 detail. Theymay be grouped intothree categories: phonetic trading relations, context effects, and other perceptual integratior phenomena. What is common to all of them is that they deal with integration (over frequency, time, or space) in phonetic perception.

TA NEW EVIDENCE

The Dist: ,n Between Trading Relations and Context Effects

It is known .any previous studies that virtually every phonetic contrast'is cued tr. . _ral distinct acoustic properties of the speech signal. It follows that, within limits set by the relative perceptual weights and by the ranges of effectiveness of these cum a change in the setting of one cue `(which, by itself, would have lei to a change in the phonetic percept) can be offset by an opposed change in the setting of another cue so as to maintain the original phonetic percept. This is a phonetic trading relation. According to Fitch, Halwes, Erickson, & Liberman (1980), there is a phonetic equivalence between two cues that trade with each other. I prefer to use this term in a slightly different may, for neither cue is perceived in isolation; rather, they are perceived Together and integrated into a unitary phonetic percept. Therefore, the equivalence holds not so much between (a-b) units of Cue 1 and (c-d) units of Cue 2, but rather between the phonetic percept caused by setting a of Cue1 and setting d of Cue 2 and the phonetic percept caused

by setting b of Lue 1 and setting c of Cue 2. There two percepts are phoneticallyequivalent in the sense thattheyyieldexactlythe same distribution 'f identification responses and are difficult to discriminate (see below).

Trading relations occuramong differentcues forthe same phonetic contrast. However, wt.n the perception of a phonetic distinction is affected by preceding or following context that is not part of the set of direct cues for the distinction (as illustrated in the next paragraph), we speak of a context effect. The context may be "close," i.e., it may constitute portions of the same coherent speech signal; or it may be "remote," referring to the relation between separate stimuli in a sequence, or between a precursor and a test stimulus. (Of course, the distinction between close and remote context is, to some extent, arbitrary.) Effects of close context, which are of special interest to us, are similar to trading relations in that they can be cancelled by anappropriate change inone or another cue relevant to the critical phoneticdistinction. Conversely, a tradingrelationcould be described (inappropriately) as a context effect, with one cue(the context) affecting the perceptionof another(the target). Formally, t.ading relations and context effects are quite similar, but it is useful to distinguish them on theoretical grounds. The distinction is best illustrated with an example.

Mann and Repp (1980) Oesented listeners with fricative noises from a synthetic[J]-[s] continuum; immediately followed by one of four periodic stimuli. The periodic stimuli derived from natural utterance of [s0-], [Su], and [au], from which the fricative noise portion had ben removed"; thus, they contained formant transitions appropriate for either [5\1 or [s], and the identity of the vowel was either[0..] or [u]. The results showed that, for a given ambiguous noise stimulus, listeners reported more instances of "e when the following formant transitions were appropriate for [s rather than [3],

9 16 and they also reported more instances of "s" when [4.1.1 followed rather than [G.]. The first effect is a trading relation, the second a context effect. The effect of formant transitions on perception of the [5][s] distinction is a trading relation because the transitions are a cue to fricative place of articulation. They are also a direct consequence of fricative production, and this is obviously the reason why they are a cue to fricative perception. Note that the transitions areintegrated with thefricative noise cue into a unitary phonetic percept; listeners do not perceive a noise plus transitions, or a fricative consonant followed by a stop. consonant, although a stop would be perceivedif the fricative noise were removed or silence were inserted between it and the periodic portion (Cole & Scott,1973; Mann & Repp, 1980). The effect of vowel identity on fricative perception is different. Whether the vowelis (a] or[u] is not a consequence of fricative production, and vowel quality- therefore doesnot constitute a direct cue for fricative perception. The vowel is not perceptually integrated with the noise cue--it remains audible as a separate phonetic segment. It is appropriate here to say that the perceived vowel quality modifies the perception or interpretation of thefricative cues. This is a context effect,_as distinct from a trading relation.1

As we will see below, trading relations and context effects have distinct (though related) explanations ina theory of phonetic perception, and it is that theoreticalview that underlies the distinctionin thefirst place. However,before weturn totheissue of explanation, a brief review of empirical findings shall be presented.

Phonetic Trading Relations

Overview

The fact that there are multiple cues for most phonetic contrasts has been known fora long time. Much of this early knowledge derives from the extensive explorations atHaskins Laboratories since the late 1940s. For exOmple, Delattre, Liberman, Cooper, and Gerstman (1952) showed that the first tWO formants are important cues to vowel quality; Harris, Hoffman, Liberman, Delattre.,and Cooper (1958) demonstrated that both second and thirdformant transitions contribute to the placeofarticulation distinction in stop consonants; and Gerstman (1957) found that both frication duration and rise time are relevant to thefricativeaffricate distinction. Lisker(1978b), drawing on observations collected over a number of years, listed nc less than 16 distinguishable cues to the /b//p/ distinction in intervocalic position.

From these and many other studies,a nearly complete list of cues has been accumulated over the years. However, the data were typically collected by varying one cue ata time, although there are some exceptions, such as Hoffman's (1958) heroicstudy,which varied three cues to stop place of articulation simultaneously. Restrictions on the size.of stimulus ensembles wereimposedby thelimitedtechnology of the time, which made stimulus synthesis and test randowlzation very cumbersome. With the advent of modern computercontrolled synthesis and randomization routines, however, orthogonal variation of several cues in a single experiment became an easy task, andithe limit to the number of stimuli wag set by the patience of the liptener rather than that of the investigator. The new technology led to a resurgence of interest in the way in which multiple cues cooperate in signalling a phonetic

10 4 distinction. Since,for one reason or another, may y of the early Haskins studies had remained unpublished,certain results hit had been known for years by word of mouth or from preliminary reports ly recently found their wayinto the literature, afterhavingbeen r licatedwith contemporary methods.

A word isin orderabout thedefinitionofcues. Thetraditional approach, exemplified especially by the Haskins work (including my own), has been todissect a spectrographic representation of the speech signal, following essentially visual Gestalt principles. A cue, then, is a portion of the signal that can be isolated visually, that can be manipulated independently in .a speech synthesizer constructed for that purpose, and that can be shown to have some perceptual effect. This way of defining cues has been challenged on two grounds: (1) The spectrogram is not the only, and not necessarily the best, representation of the speech signal. For example, the well-known work of Stevens and Blumstein (1978;Blumstein & Stevens, 1979, 1980) pursues the hypothesis that the shape of the total short-term spectrum at certain critical points in the signal constitutes a perceptual cue; thus, the individual formants and adjacent noise bursts are not treated as separate cues. Such 4 redefinition of cues is justified as long as it does not bypass \the legitimate empirical issue of whether the elementary, spectrographically `defined signal components are indeed integrated by the auditory system in this way (as they may be in the case of individual formants, but probably not in the case of other, more disparate types of cues). However, while definitions of such complex cues effectively combine information on one dimension (e.g., in the spectral domain), theytypically sacrificeinformation onother dimensions (e.g., in the temporal domain). Thus, the onset spectra examined mbi Stevens and Blumsteinare static and do not easily permit the description of dynamic change over time. The issue revolves, in large part, around the question howthe perceptuallysalient information inthe signalis best characterized--a question that, of course, lies at the heart of the present paper as well. The essential problem is that the totality of the cues for a given phonetic contrast apparently cannot be captured in a fully integrated fashion as long as purely physical (rather than articulatory or linguistic) terms are used.2 (2) Another criticism of a more far-reaching sort denies altogether the usefulness of fractionating the speech signal into cues (see, e.g., Bailey & Summerfield, 1980).This view, which rests on the precepts of Gibsonian theory (Gibson, 1966), will be taken up in the concluding comments of this paper.

I will not attempt to review in detail all recent studies of phonetic trading relations, of which there are 'quite afew. A brief and selective overview shall suffice. Most studies had the purpose of clarifying the roles and surveying the effectiveness ofdifferent cues to various phonetic distinctions. Some studies that depart from this standard pattern will be considered later in more detail. Whereas the large majority of studies have used synthetic speech,some obtained similarinformation by cross-splicing components ofnatural utterances, or bycombining suchcomponents with synthetic stimulus portions. Notall authors describe theirfindings as trading relations(a term usedprimarily by the Haskins group), but such relations are implied by the pattern of results.

Voicing cues. Manystudies have investigated multiplecues tothe voiced-voiceless distinction. For stop consonants in initial position, both

s voice onset time (VOT) and the first- formant (F1) transition contributeto the distinction (Stevens & Klatt, 1974; Linker, Liberman, Erickson,Dechovitz, & Handler,1977). The criticalfeature of the F1 transition,which can be traded against VOT, is its onset frequency: If the onset frequency is lowered ina phonetically ambiguous stimulus', the VOT mustbeincreased for a phonetically equivalentpercept toobtain (Linker, 1975; Summerfield & Haggard, 1977). Another cue that can be traded for VOT is the amplitude of the aspiration noise preceding the onaet of voicing: If the amplitude of the noise is increased, its duration (i.e., VOT) must be decreased to maintain phonetic equivalence (Repp, 1979). The fundamental frequency (FO) at the onset of the voiced stimulus portion is another relevantcue (Haggarc, Ambler, & Callow, 1970) that presumably can be traded against VOT (seeRepp, 1976, 1978b).

For atop consonants in intervocalic position, Lisker (1978b) has catalogued all the different aspects of the acoustic signal that contribute to the voicing distinction.They include the duration and offset characteristics of the preceding vocalic portion, the duration of the closure interval, the amplitude of voicing during the closure, and the onaet characteristics of the following vocalic portion. Liskeria catalogue is based on,a large number of studies, not all of which have been published; however,tee Lisker(1957, 1978a, 1978c), Lisker and Price(1979), Price and Lisker (1979). Trading relations between voicing cues for intervocalic stops have also been studied in French (Serniclaes, 1974, Notes 2 & 3), and in German (Kohlert 1979).

The voicing distinction for stop consonants in final positiOn has also beenintensively studied. Here, theduration of the vocalic portion is important (especially if no release burst is present) as well as its offset characteristics, the properties of the release burst, and the duration of the preceding closure. Trading relations among these cues have beep investigated by Raphael (1972,1981), Wolf (1978), and Hogan and Rozsypal (1980), among others.

The voicing distinction forfricatives in initialpositionhas been studied by Massaro and Cohen (1976, 1977) who focused on the trading relation between fricative noise duration and FO at the onset of periodicity. In a aimilfirfashion, Derr and Massaro(1980)andSoli (in press) studied the trading relations among duration of the periodic ("vowel") portion, duration of fricative noise, aL! FO as cues to fricative voicing in utterance-final position. Earlier studies of these cues include Denes (1955)and Raphael (1972).

Place of articulation cues. Trading relations among place of articula- tion cues for stop consonantsin initial position - -F2 and F3 transitions, burst frequency and burst amplitude--were studied long ago Oy Harris et al. (1958) and Hoffman (1958), andmore recimtly, by Dorman, Studdert-Kennedy, and Raphael (1977) and by Mattingly and Levitt (1980). For stop consonants in intervocalic position, Repp(1978a) fOunda trading relation between the formant transitions in and out of the closure, gpd Dorman and Raphael (1980) reported additional effects of closure duration and release burst frequency. BaileyandSummerfield (1980), ina series of painstaking experiments, investigated place cuesfor stops in fricative-stop-vowel syllables; these cues incluoed the offset spectrum of the fricative noise, the duration of the closureperiod, and the formant frequencies at the onset of the vocalic 12 portion. Repp andMann (1981a) recently demonstrated a tradingrelation between fricative noise offset spectrum and vocalic formant transitions in similar stimuli. Fricative noise spectrum and vocalic formant transitions as joint cues to fricative place of articulation were investigated by Whalen (1981), Mann and Repp (1980), and Carden et Al. (1981).

Manner cues. Cues to stop manner of articulation(i.e., to presence vs. absence of a stop consonant) following a fricative and precedinga vowel were investigated by Bailey and Summerfield (1980), Fitch et al. (1980), and Best, Morrongiello, and Robson (1981). In each case, the trading relation studied was tha.c between closure duration and formant onset frequencies in the vocalic portion. The two last-named studies will be discussed in more detail below. Summerfield, Bailey, Seton, and Dorman (1981) have shown that duration and amplitude contour of the fricative noise preceding the silent closure also contribute to the stop manner contrast.

Several cues to the fricative- affricate' distinction in initial position (rise-time, nqj.se duration) were investigated by Gerstman (1957); see also van Heuven (1979):: In a more recent set of experiments, Repp, Liberman, Eccardt, andPesetsky (1978) tradedvocalic offset Spectrum,closure duration, and fricative noise duration as cues to a four-way distinction between vowel- fricative,vowel-stop-fricative, vowel-affricate,and vowel-stop-affricate. Trading relations among cues to the fricative-affricate distinction infinal position were reported by Dorman,Raphael, and Liberman (1979: Exp. 5) and Dorman, Raphael, and Isenberg (1980:.

Phonetic Equivalence

Itis obvious that, whenever two or more cues contribute to a given phonetic distinction, ,they can be traded against each other, within certain limits. What isnot W-OViOUS isthat twostimuli with equalresponse distributions are truly equivalent in perception. Since most data on trading relations were collected in identification tasks with arestricted set of response categories,subjects may have hadno opportunity to reportthat certain stimuli sounded like neither of the alternatives. Ata more subtle level, it may be the case that phonetically equivalent stimuli, even though they are labeled similarly, sound different in some way that subjects cannot easily explain in words. One way to assess this possibility is by means of a discrimination task.3

This was undertaken by Fitch et al. (1980) for thetrading relation between silent closure duration and vocalic formant transition onsets as cues to stop manner in the "slit"-"split" distinction, and by Best et al. (1981) for the similar trading relationbetween silent closure duration and F1 transition onset in the "say"-"stay" contrast. First, these authors deter- mined in an identification task how much silence was needed to compensate for a certaindifference in formantonset frequency. Then they devised a discriminatiOn task containing three different types of trials: On single-cue trials, the stimuli to be discriminated differed only inthespectralcue ( formant onset frequency);they hadthesame setting of the temporal cue (silence). On cooperating=cuLs trials, the stimuli differed in both cues, such that the stimulus with the lower formant onsets (which faior "split" or "stay" percepts) also had the longer silence (which also favors "split" or "stay" percepts). On conflicting-cues trials, the stimuli again differed in

13 both cues but now the stimulus with the lower formant onsets had tne shorter silence, sc that onecue favored "split" ("stay")and the other "slit" ("say"). Since the silence difference chosen was the one found to compensate exactly for the spectral difference in the identification task,the stimuli in the conflicting-cues condition were (on the average) phonettiel/y equivalent.4

The results of these experiments showeda clear difference among the three conditions: Subjects' discrimination performance in the category boun- dary region was best in th4.2poperating-cues condition, worst in the conflict- ing-cues condition, and intermediate in the single-cue condition. Thus, it is true that (approximately) phonetically equivalent stimuli, namely those in the conflicting-cues condition,are difficult to discriminate; they "sound the same," whereas stimuli in the cooperating-cues condition sound different,even though they exhibit the same physicaldifferences on the tworelevant dimensions. The pattern of discrimination results follows that predicted from identification data, showing that stimuli differingon two auditory dimensions simultaneously are still categorically perceived (given that perception is categorical when each of these dimensions, is varied separately). It is likely that listeners could betrained to become more sensitiveto the physical differences that do exist between phonetically equivalent stimuli,and the interesting question arises whether discriminationon cooperating-cues trials would continue to be superior to that on conflicting-cues trials.So far, no study has taken this approach. However, preliminary results from p related seriesofexperiments (Repp, 1981b) indicate that some tradigig relations disappear when listeners try to discriminate pairs of stimuli tgat unambigu- ously belong to thesave.:phonetic category (i.e., phonetically equivalent stimuli that are not from the boundary region), suggesting that these trading relations operate only when thestimuli are phonetically ambiguous. This leads us to the question of the origin of trading relations.

Explanation of Trading Relations: Phonetic or Auditory?

The large number of trading relations surveyed above poses formidable problems foranyone who would like to explain speech perception An purely auditory terms. Why should cues as diverse as,say, VOT and E1 'Inset, or silence and fricative noise duration, trade in the way they do? Auditory theory has only two avenues open: Eitherthe cues are integratedinto a unitaryauditory perceptat an early 'stage in peroeption (the auditory integration hypothesis), or selective attention is directed to one of the cues (which then must be postulated to bethe essential cueforthe relevant phonetic contrast), and the perception of that cue is affected by the settings of other cues (the auditory interaction hypothesis).

The auditory integration hypothesis is implicit in the work of Stevens and Blumstein(1978; Blumstein & Stevens, 1979, 1980). To account for the fact that release burst spectm and formant transition onset frequenciesare jointcuestoplace of articulation of syllable-initial stop consonants, Stevens and Blumstein assume that, the perceptually relevant variable is the integrated spectrum of the first 25 msec or so of a stimulus. In other words, the burst (which is usually shorter than 25 msec)andthe onsets of the severalformant transitions are considered anintegral auditory variable. Since both cues are spectral in nature and occur within a short time period, this is not an unreasonable hypothesis, notwithstanding 1,,e different sources of excitation (noise vs. periodic) of the two sets of cues in voiced stops.

14 In fact, Ganong (1978) found support for the perceptual integrality of burst and !torment transition cuesinan ingenious experiment involving interaural transfer of selective- adaptation effects. However, Stevens and Blumstein have had only limited success with automatic classification of stop consonants according to onset upectrum alone, and Kewley-Port (1981) recently demonstrat- ed that automatic stop consonant identification can be improved by incorporat- ing a measure of spectral change. Thus, even though onset spectrum may be an important cue, it does not contain all the relevant information in the signal.

The main problem with the auditory integration hypothesisseems to be that it applies only when the relevant cues are both spectral in nature,are of short duration, and occur simultaneously or in close succession. However, the cues are oftenspreadout over a considerable stretch of time. For example, an explanation of the fact that both the formant transitions into and out of a stop closure contribute to the perceived place of articulation of a stop in medial position (Dorman & Raphael,1980;Repp,1978a; Repp & Mann, 1981a) would require integration of spectra across a closure, i.e., over as much as 100 msec. Such a long integration period seems unlikely; certainly, it is much longer than that envisioned by StevensandBlumstein (1978). Trading relations that involve spectral and temporal cues (e.g., F1 onset and VOT for stop voicing in initial position) cannot be easily translated into purely spectral terms;andtrading relations between purely temporal cues (e.g., silent closure duration and fricative noise duration for the fricative- affricate distinctionin medial position)require a differentexplanation altogether. To besure, there are some trading relations that do suggest auditoryintegration, such as that between VOT (i.e., aspiration noise duration) and aspiration noise amplitude (Repp, 1979), which is reminiscent of certain time-intensity reciprocities atthe auditory threshold. In fact, preliminary data (Repp,1981b)support this suggestion by showing that this trading relation operates independentlyof whether a listener is making phonetic or auditory judgments of speech stimuli. In other cases, however, the cues that participate in a trading relation are simply too diverse or too widely spread out to make auditory integration seem plausible.Or, to put it somewhat differently, whereas any such tracing relation could be described as resulting from auditory integration, this integration would no longer seem to be motivated by general principles of auditory perception; thus, it would have to be considered a speech-specific process.

The auditory interaction hypothesis, which postulates that trading rela- tions arise because perception of a primary cue is affectid by other cues, has even less concrete evidence in its favor, in part because most of the relevant studies remain to be done. In particular, it is not clear whether auditory interactions (masking, contrast, etc.) of the kindand extent requiredto explain certain trading relations are at allplausible. Forexample, to explain the trading relation between vOT and F1 onset frequency as cues to stop consonant voting, it would ha.!e to be the case that a noise-filled interval (VOT) sounds subjectively longer when followed by a periodic stimulus with a relatively low onset frequency. At present, there are no psychoacous- tic data to support this hypothesis. Auditory psychopL!sics involving non- speech stimuli of the degree of complexity of speech is still in its infancy (of. Pastore,1981). Perhaps,as more islearnedabouttheperception of complex sounds and sound sequences, Qome auditory explanations of what now appear to be phonetic phenomena will be forthcoming.5 One serious problem that has vexed researchers since the time of the early Haskins research is that of 15

22 finding appropriate nonspeech analogs for speech stimuli. If the analogs are too similar to sp ech, they may be perceived as speech and thereby cease,to be good analogs and come bad speech. If they are too different from speech, the generalizability f the findings to speech may be questioned. There is a way out of this dilemma: If stimuli could be constructed that are sufficient- ly ,like speech to be perceived as speech by some listeners Out not by others (perhaps prompted by different instructions), or even by the same listeners on different occasions, and if different results are obtained in the two conditions (e.g., two cues trade in one but not in the other), this would then be proof of specialized perceptual processes serving speech perception.

It is from this perspective that a recent study by Best et al. (1981) receives special importance. These authors investigated the trading relation between silent closure duration and Fl transition onset frequency as cues to stop manner in the "say"-"stay" contrast. Afterreplicating the results obtained wits the similar "slit " - "split" contrast by Fitch et al. (1980), they proceeded totest for the presence of a similar trading relationin "sinewave analogs" of the synthetic "say"-"stay" stimuli. Sinewave analogs are obtained by imitating the formant trajectories of (voiced) speech stimuli with pure tones. Such analogsof simple CV syllableshave been used pr#viously by Cutting (1974) and by Bailey, Summerfield, and Dorman (1977), whose work is discussed below; recently, Remez, Rubin, Pisoni, and Carrell (1981) successfully synthesized wholeEnglish sentencesin that way. The interesting thing about these* stimuliis that they are heard as nonspeee_ whistles by the majority of naive listeners, but they may be heard as speech wtten instructions pointouttheir speechlikenes or spontaneouslyafter prolonged listenihg. Once heard as speech, it is di ficult (if not impossi- ble) to hear them as pure whistles again, although_ th speech heard retains a highly artificial quality (Remez et al., 1981). phenomenon was exploited by ghst et al. in their main experiment.

They constructed sinewave analogs of a "say"-"stay" continuum by folloW-, inga noise resembling [s]-frication with varying periods of silence and a sine-wave portion whose component tones imitated the first three foments of the periodic portion of the speech stimuli. There were two versions of the sinewave portion, one with a low onset of the tone simulating Fl, and one with a high onset. (In speech stimuli, less silence is needed to change "sae to "stay" when F1 has a low onset than when it has a high onset.) The sinewave stimuli were presented to listeners in an AXB format, where the critical X stimulus had to be designated as being more similar to either the A or the B stimulus, which were analogs of a clear "say" (no silence, high Fl onset) and a clear"stay" (longsilence, low F1onset),respectively. Someof the subjects were toldthatthe stimuliwere intended to sound like "say" or "stay," whereas others were only told that the stimuli were computer sounds. After the experiment, the subjects were divided into those who reported that they heard the stimuli as "say"-"stay," either spontaneously or after instruc- tions, and into those who reported various auditory impressions or inappropri- ate speech percepts. Only members of the first group, who--according to their self-reports--employed a phonetic mode of perception, showed a trading rela- tion betweensilence and Fl onset frequency, andthistradingrelation resembledthat obtained with synthetic speech stimuli. None of the other subjects showedthis pattern of results. Theseother subjectscould be further subdivided into two groups: those Who reported thatthestimuli differed in the amount of separation between the two stimulus portions (noise 16 2.1 and sinewaves), and those who reported thatthe stimuli differed in the quality of the onset of the second portion ("water dripping," "thud," etc.). The AXB results substantiated these reports: The results of the first group indicated that the subjects paid attention only to the silence cue, whereas the second group seemed to make their judgments primarily on the basis of the spectral cue (Flanalog onset frequency). The response patterns of the two groups were radically different from each other, and both were different from the group who heard the stimuli as speech. It seems reasonable to conclude that the subjects 'in theformertwo groups employed an auditory mode of perception. Being in this mode, they were unable to integrate the two cues into a unitary percept and instead focused on one or the other cue separately, thereby disconfirming the auditory integration hypothesis for this set of opes.6There wassome evidence of an auditory interactionin that those listeners who paid attention to the spectral cue were-affected by the setting of thetemporal cue. However, this effect was not sufficiently strong to account for the trading relation observed in speechmode listeners;--moreover, those ,subjects who focused on the silence cue (which is the primary cue for stop manner) were not affected at all by ti.e setting of the Spectral cue.

The results-of Best et al. provide the strongest evidence we have so far that a trading relation is specific to phonetic perception: When listeners are notin the ipeech mode,the- trading relation disappears and selective attentionto individualacoustic cuesbecomespossible. The data argue against any auditory' explanation of the trading relation at hand, and they support the existence of a phonetic mode of perception that is characterized by specialized ways of stimulus processing. Results' from a recent study -(Repp, 1981b) further confirm thephonetic nature of the trading relation between silence and Fl onset for the '"say""stay" distinction by showing that it is obtained only in the phonetic boundary region of the speech continuum (i.e., when listeners can make a phonetic distinction) but not within the "stay" category (i.e., when listeners cannot make 'a phonetic distinction and must rely on auditory criteria for discrimination). We may suspect that many other trading relations will behave similarly. This is already indicated for the trading relation between closure duration and fricative noise duration in the "say shop""say chop" distinction(Repp, 1981b and forthatbetween fricative noise spectrum and formant transitions in the (S)[s] distinction (Repp, 1981a, discussed in the next section).

How, then, are trading relations to be explained, if not in terms of auditory interactions or integration? The proposed answer is this: Speech is produced by a vocal tract, and the production of a phonetic segment (assuming that such segments exist at some level in the articulatory plan) has complex and temporally distributed acoustic consequences. Therefore, the information supporting the perception of the same phonetic segment is acoustically diverse and spread out over time. The perceiver recovers the abstract units of speech by integrating the multiple cues that result from their production. The basis for that perceptual integration may be conceptualized in two ways. One is to state that listeners know from experience how a given phonetic segment "ought to sound like"in a given context. Since phonetic contrasts almost always involve more than one acousticproperty, trading relationsamong these properties must result when the stimulus is ambiguous because, in this view, it is being -valuated with reference to idealized representations or "proto- types" that differ on all these dimensions simultaneously: A change in one dimensioncan be offset by a chaniiie in anotherdimension,so thatthe

17 24 perceptual distances from the prototypes remain constant. The other possibil- ityis that perceptual integration does not require specific knowledge of speech patterns (whose form of memory storage is difficult to conceptualize) but is predicated directly upon the articulatory information in the signal. In other words, trading relations may occur because listeners perceive speech in terms of the underlying articulation, and inconsistencies in the acoustic information are resolved to yield percepticn of the most plausible articulato- ryact. This explanation thus requires that the listener have at least a general model of human vocal tracts and of their ways of action. The question remains: How much must an organism know about speech to exhibit a phonetic trading relation? An important issue for future research will be the question whether phonetic trading relations are obtained in human infants, and if not, how and when they begin to develop.?

Context Effects

Effects Due to Immediate Phonetic Context

Like phonetictradingrelations,certainkinds ofphonetic context effects have been knownfor a long time. The most familiar example is, perhaps, the dependence of stop release burst perception on the following vowel. Liberman, Delattre, and Cooper (1952) showed that, when noise bursts of varying frequencies are followed by different steady-state periodic stimu- li, the stop consonant categories reported by listeners may depend on the quality of the vowel. For example, if a noise burst centered at 1600 Hz is followed by steady states appropriate for [i] or (u), listeners report "p," but if (a] follows, they report "k."

A similar effect has been reported by Summerfield (1975) who found that the nature of the vowel influences the location of the boundary on a continuum of stop-consonant-vowel syllables varying in VOT. This context effect may actually be a trading relation because it probably reflects the influence of Fl onset (rather than vowel quality per se) on the voicing decision, i.e., a trading relation between Fl onset and VOT (cf. Summerfield & Haggard, 1974, 1977). Recently, Summerfield (inpress)conducted animportant series of experiments in which he tested whether this effect has an auditory basis. He used speech stimuli varying in VOT and in the Fl frequency of the following steady-state vocalic portion,and he compared their perception with that of two kinds of nonspeech analogs. One was a tone-onset-time (TOT) continuum (Pisoni, 1977) that varied the relative onset time of two pure tones of fixed frequency, matched in frequency and amplitude to the first two formants of the speech stimuli. The frequency of thelower tone was varied to simulate different F1 onset frequencies. The other set of nonspeech stimuli formed a noise-onset-time (NOT) continuum(cf. Miller et al.,1976) that varied the leadtime of anoise-excited steady-state F2 relative to a periodically excited steady-state Fl. Different F1onset frequencies were simulated by varying the frequency of F1. The stimuli were presented for identification as "g" or "k"(speech) or as "simultaneous onset" vs. "successive onset" (non- speech). While the VOT boundary exhibited the expected sensitivity to F1 onset frequency, neither nonspeech continuum evinced any reliable influence of F1(-analog) frequency olisteners' judgments. Pastore et al. (1981) recently reporteda \pimilar fa%, luretofind equivalenteffectsof twodifferent secondary varfables (ris time and trailing stimuli) on VOT and TOT category

18 2,5 boundaries. These results suggest that the context effect obtained in speech does net have anauditorybasisbut is specific tothephonetic mode. (However, see Footnote 7.)

An effect of vocalic context on the perception of stop consonant place of articulation was investigated byBailey et al. (1977). Theseauthors constructed two synthetic speech continua ranging from [b]+vowel to [d]+vomel by varying the transition onset frequencies of F2 and F3. The two continua differed in the terminal (steady-state) frequency of F2, whichwas high in one and low in the other. On each continuum, the transition onsets were arranged so thatthe centerstimulus hadcompletelyflat F2 and F3, whileboth transitions rose in one endpoint stimulus to the same degreeas they fell in the other endpoint stimulus. When these stimuli were presented to subjects for classification in an AXB task, it turned out that the category boundaries were at different locations on the two continua, neither being exactly in the center: one (on the continuum with the vowel) was displaced toward the [d] end, while the other boundary was displaced toward the [b] end. Bailey et al. wished to test whetherthisdifference (a kind of context effect, especially when "rising vs. falling transitions" is considered the relevant cue, rather than absolutetransition onsetfrequency, which varied with context) has a psychoacoustic basis. They pioneered in using sineweve analogs for thatpurpose. Thesinewavestimuliwerepresentedin the same AXIS paradigm to a group of subjects that was subdivided afterwards according to self-reports whether or not the stimuli were heard as speech. It burned oat that those listeners who claimed to hear [b]and[d] had their category boundaries on the two continua at different locations that corresponded to those round with speech stimuli. The other listeners, however, who reported only nonspeech impressions, had their boundaries close to the centers of both continua, as one might predict on psychophysical grounds. This experiment provided evidence that phonetic categorization is based on principles differ- ent from those of auditory psychophysics. Presumablyalthough this was not shown directly by Bailey et al.--the asymmetriesa boundaries obtained with speech stimuli were in accord with the acous=haracteristics of typical stop consonants in these particular vocalic co

Let us turn now to other context effects that are of special interest because they involve segments not as obviously interdependent as stop conso- nants and following vowels. One effect concerns the influence of vocalie context on fricative perception. If a noise portion ambiguous between [1] and [s] is followed by a neriodic portion appropriate fora rounded vol such as [u], listeners are more likely to report "s" than if the following vowel is unrounded, e.g., [a] (Kunisaki & Fujiaaki, Note 5; Mann & Rep pN 1980; Whalen, 1981). A preceding vowel has a similar, but smaller effect (H4segawa, 1976). In addition to roundedness, other features of the vowel (such as the front- back dimension) also seem to play arole (Whalen, 1981). Rapp and Mans (1981a)also discovered a small but reliable effect of afollowingstop consonant on fricative perception: Listeners are more likely to report "s" when the formsnt transitions in the following vocalic portion (separated from the noise by a silent closure interval) are appropriate for [k] than when they are appropriate for [t].

Several effects of context on the perception of stop consonants have been discovered in recentexperiments. Mannand. Repp(1980) foundthat, in fricative-stop-vowel stimuli, listeners are more likely to report "k" when

19 26 vocalic stimuli with formant transitions ambiguous between it)and [k]are preceded by an (s)-noise plus silence than when they are preceded by an ES)- noise plus silence. They showed that the effect has two components, one due to the spectral characteristics of the fricative noise (perhaps an auditory effect) and the other to the cateLory label assigned to the fricative (which mmst be a phonetic effect). Subsequently, Repp and Mann (1981a)showed the contexteffect to be independent of the effect of direct cues to stop place of articulation in the fricative noise offset spectra (which proves that it is a true contexteffect and not a trading relation), and they also ruled out simple response bias as a possible cause. In afurther experiment,Mann (1980) found that, when stimuli ambiguous between Ida] and (ga] were preceded by either [al] or Ear), listeners reported many more "g" percepts after (al] than afterEar]. In experiments with vowel-stop-stop-vowel stimuli, Repp (1978a, 1980a, 1980b) found various perceptual interdependences between the two stops cued by thetorment transitions oneither side of the closure interval; in particular, perception of the first stop war influenced strongly by the second.

How are all these effects to be explained? Auditory explanations would have to be formulated in the manner of the interaction hypothesis for trading relations: The perception of the relevant acoustic cues is somehow affected by the context. As in the case of trading relations, however, no plausible mechanisms that might mediate such effects have beep suggested, and no similar effects with nonspeech analogs have been reported so far. On the other hand, reference to speech production provides a straightforward explanation of most, if not all, context effects. Just 83 trading relations reflect the dynamic nature of articulation (of a given phonetic segment), so are context effects accounted fo. by coarticulation (of different phonetic segments). The articu- latory movements characteristic of a given phonetic segment exhibit contextual variationsthat may beeither part of the articulatory plan (allophonic variation, oranticipatory coarticulation)or due to the inertiaofthe articulators (perseverative coarticulation). Presumably, human listeners pos- sess implicit knowledge of this coarticulatory variation.

Coarticulatory effects corresponding to the perceptual phenomenajust cited have been observedin most cases. Thus, it is well known that the release burst spectrum of stop consonants varies with thefollowing vowel (Lie, Note 6) in a manner quite paralleltothe perceptual findings of Liberman etal. (1952). Fricativenoisesexhibit a downwardshift in spectrum When they precede or follow a rounded vowel, due to anticipatory or carry-overlit, rounding '(Fujisaki & Kunisaki, 1978; Hasegawa,1976; Mann & Repp, 1980), whichexplains the effectof vocaliccontext on fricative perception, The formant transitions of stop consonants vary with preceding fricatives (Repp & Mann, 1981a, 1981b) and liquids (Mann, 1980) in a manner consistent with the corresponding percept:!.1 effects. Thus, the available evidence suggests that most perceptual context effects are parallelled by coarticulatory effects. Theimplication is, then, that listenersexpect coarticulation to occur and compensate for its absence in experimental stimuli by shifting their response criteria accordingly. For example, if an (S] -like noise followed by Eu]is not sufficiently low on the spectral scale (as it should be because of anticipatory lip rounding), it might be perceived as an "s." Thus, the evidence i3 hig% persuasive that context effects, just like tracing relations, reflect the listeners' intrinsic knowledge or articulatory dynamics.

20 21 A critical test of the auditory vs. phonetic explanations of context effects can again be performed with appropriate nonspeech analogs, or with stimuli that can be perceived as either speech or nonspeech. Two such studies (Bailey et al., 1977; Summerfield, in press)were discussed above. Ina recent experiment, I took an alternative approach (Repp,1981a): Rather than using nonspeech stimuli that can be perceived as speech, I used speech stimuli (a portion of) which can be fairly readily perceived as nonspeech. Although it is usually difficult to abandon the phonetic mode when listening to speech, except in cases where the speech is strongly distorted or poorly synthesized, fricative-vowel syllables offer an opportunity to do so because they contain a sizable segment of fairly steady -state noise whose auditoryproperties ("pitch," length, loudness) are relatively accessible. In my study, the fricative noise spectrum was varied along a continuum from [i] -like to (s]- like, andthe vowel waseither [a] or[u]. It was knownfrom earlier experiments (Mann & Repp, 1980) that listeners are more likely to label the fricative "s" in the context. of [u] than in the context of (a;. A secondary cue to the [S] -(s] distinction was deliberately confounded with the context effect: The [a] vocalic portion contained formant transitions appropriate for [5], rndthe [u] portion contained transitions appropriate for [s];this increased the differential effect of the two vocalic contexts on fricative identification. (Thus, this experiment tested a context effect and a trading relation at the same time.) The stimuli were subsequently presented in a same- different discrimination task whets* the difference to be detected was in the spectrumofthenoiseportion, and the vowels wereeitherthesame or different,butirrelevant in anycase. Themajority of naivesubjects perceived these stimuli fairly categorically: Their discrimination perfor- mance was poor;the patternofresponses suggestedthat they relied on category labels; and there were pronounced effects of vocalic context, just as in previous labeling tasks. Two subjects, however, performed much better than the others. Their data resembled those of three experienced listeners who also participatedin the experiment. Comments and introspections of these subjects suggested that they were able to bypass or ignore phonetic categori- zation and to focus instead on the spectral properties (the "pitch") of the fricative noise. Thecrucial resultwasthatthese listenersnot only performed much better than the rest (which supports the hypothesis that they employedan auditory mode of perception),but that they did not show any effect of vocalic context. These results were confirmed in a follow-up study where naive listeners were induced (with some success) to adopt an auditory listening strategy. These experiments demonstrate that vocalic context af- fected the perceived phonetic category of the fricative but not the perceived pitch quality of the noise. Therefore, the context effect due to the quality of the vowel, as well as the cue integration underlying the contribution of the vocalic formant transitions to fricative identification, must be phonetic in nature.

Speaker Normalization Effects

A phenomenon related to the context effects just discussed is that of speaker normalization. In an experimental demonstration of this effect, the perception of a critical phonetic segment is influenced, not by a phonetic change in an adjacent segment, but by an acoustic change such as might result from a change in speaker. For example, a (roughly proportional) upward shift of vowelfoments on the frequency scale signifies that the speech signal originated in a smaller vocal tract. (How listeners "decide" that the same

21 vowel has been producc_; by a smaller vocal tract, rather than a different vowel by the aside vocal tract, is an unresolved issue.) Such a change may influence the perception of phonetic :segments in the vicinity, as long as the listener perceives the whole test utterance as coming from a single speaker's vocal tract.

Althoogh speaker normalization is a well-recognized problem-1n speech recognition research,there have been relatively few experimentalSiWies- Rand (1971) constructed atop consonant continua ranging from /b/ to /d/ to /g/ by varying the onset of the F2 transition of three synthetic two-formant stimuli intended to represent, respectively, an At/ produced by a large vocal tract, an W/ produced by a small vocal tract (differing from the former only in F2 frequency), and an /I/ produced by a large vocal tract (differing from theformeronly in F1 frequency). Theresults showedsimilarcategory boundaries (expressed in terms of absolute F2 onset frequency)for the two stimulus continua associated with large vocal tracts, but a shift towards higher frequencies on the continuum associated with a small vocal tract. Rand interpreted his findings as evidence for perCeptual normalization, although this may not be the only possible explanation.

In a more recent study,May (1976)followed fricative noises froma synthetic [3]-(s] continuum with one of two synthetic periodic portions, intended to represent the same.towel produced by two differently-sized vocal tracts. The (S)-(s] boundary shifted as expebted: Listeners reported more "a" percepts in the context of the larger vocal tract. Subsequently, Mann and Repp (1980)'conducted a similar experiment in which synthetic fricative noises were followed by vocalic portions derived_ from natural utterances' produced by a male or afatale speaker. The results replicated those by May. These findings are consistent with the fact that smaller vocal tracts- (females) produce fricitive noises of higher average` frequency than large vocal tracts (males) (Schwartz, 1968).

To these results must be added the evidence from studies that have shown speaker normalization effects due to "remote" context, i.e., due to other stimuli in a sequence or to precursor stimuli o' phrases (e.g., Ladefoged & Broadbent, 1957; Strange, Verbrugge, Shankweiler, & Edman, 1976; Susmerfield & Haggard, 1975). They, all demonstrate the same point: Listeners-interpret the apeech signal in accordance with the perceived (or expected) dimensions of tne vocal tract that produced it. Information about vocal tract size is picked up in parallel with information about articulator movements; these are, respec- tively, thestatic and dynamic(or structural and functional)aspects of articulatoryinformation. Speakernormalizationeffects are difficult to explain in terms of a general auditory theory that does not make reference to the mechanisms of scsech production. Although some effects could, in princi- ple, result from auditory contrast, interactions of similar complexity have not yet been demonstrated in nonspeech contexts.

Rate Normalization Effects

The somewhat, larger literature on perceptual effects of speaking rate has recently been thoroughly reviewed by Miller (1981). Rate normalization, like speaker normalization, is a kind of context effect, and it can be produced by either close or remote context. Rate normalization is said to occur when the perception of a phonetic distinction signalled by a temporal cue (i.e., by the

22 29 duration of a stimulus portion, or by the rate of change in some acoustic parameter) is modified after a temporal change is introduced in portions of the context that are not themselves cues for the perception of the target segment.

Only a few representative findings Shall be mentioned here. Miller and Liberman W979) examined tine stop-semivowel distinction (/ba/-/wa/), cued by the duration and rate of the initial formant transitions, and found that the- category boundary shifted systematically' with the duration of the vocalic portion (i.e., of the whole stimulus). A corresponding shift of the discrimi- nation peak in an oddity task was reported by Miller (1980).This effect may have an auditory basis, for it has not only been found inhuman infants (Eimas & Miller, 1980) Hut also with analogous nonspeech stimuli (Carrell, Pisoni, & Gans, Note 7). However, it mayalsobeargued that simple,durational variation is not sufficient to create variations in perceived speaking rate.

Fitch (1981) recently attempted to dissociate information about speaking rate from phonetically distinctive durational variation. The phonetic dis- tinction studied was that between idabi] 'and (dapi], as cued by the duration of the first stimulus portion ((dab] or (dap]). By manipulating the duration of natural "tterances produced at different rates, she was able to show that speaking rate had a perceptual effect separate from that of physical duration. Thus, the information about speaking rate seems to be carried, in part,by more complex structural variables, such as the rate of spectral change in the signal. Soli (in press) hae recently obtained similar results in a thorough investigation of cues to the(jus]-(juz] distinction. These findings are considerably more difficult to explain by psychoacoustic principles.

The most convincing instances of rate normalization derive from studies that varied remote context. The perception of a variety of phonetic distinc- tions is sensitive to the perceived rate of articulation of a carrier sentence (e.g., Miller & Grosjean,1981; Pickett & Decker, 1960; Summerfield, 1981). Miller and Grosjean (1981) showed that the articulation rate of the carrier sentence was more important than, its pause rate; even though the critical phoneticcontrast("rabid"-"rapid") wascuedprimarily by the perceived duration of a silent interval. Findings such as these suggest that speaking rate is a rather abstract property whose perception requires an appreciation of articulatory and linguistic variables (cf. al3o Grosjean & Lane, 1976). Summerfield (1981) has shown that the rate of a nonspeech carrier (a melody) does not affect speechperception, confirming thatthe listener'srate estimate must derive from speech to be relevant.

These findings are just a samplingbf a muchlargerliterature on pereptual adjustments for speaking rate (see Miller, 1981). Whether or rst thereare correspondingcontextual effects in Tne judgmentof auditory duration is not known (except for the above-cited study by Carrell et al., Note 7),although thereis some plausibility in the hypothesisthatthe durations of adjacent or corresponding auditory intervals are judged relative to each other. Perhaps because this hypothesis seems more plausible than possible auditory explanations of other context effects in speech, there have been few attempts SDfar to simulate speaking rate effects using nonspeech analog stimuli. However, there is some evidence that even simple durational changes may be interpreted differently in speech and nonspeech modes. Smith (1978) presented twoidenticalsyllables in succession andvaried their

23 30 relative durations. Listeners had to judge either which syllable was more stressed (a linguistic judgment) or which syllable was longer in duration (an auditory judgment). The two kinds of judgment diverged: Stress judgments exhibited a tenAency for the first syllable to be judged stressed, whereas duration judgments showednosuch bias. These results indicate that the linguistic function of acoustic segment duration cannot be directly predicted from auditory judgments of that duration. -Presumably, in speech perception, acoustic segment durationis interpreted, as are all other, cues, within a framework of tacitly known articulatory patterns and constraints, such as the well-known lengthening of a final syllable (Klatt, 1976).

Sequential (Remote) Context Effects

Context effects due to preceding and following stimuli in a test sequence area ubiquitous phenomenon and well-known also in auditory psychophyrics. They include effects of neighboring stimuli(preceding and/orfollowing a target stimulus),as wellas effects due to a Whole series of preceding stimuli, referred to variously as selective adaptation, anchoring, range, or frequency effects. Even though these effects are clearly not in any way specific to speech--and speech stimuli are by no means immune to them, as was once believed witn regardto anchoring(Sawusch & Pisoni, 1973;Sawusch, Pisoni, & Cutting,1974)--the pattern of the data obtained forspeech may nevertheless exhibit peculiarities not observed with nonspeech stimuli. The most striking of theseis, of course,the relative stability of phonetic boundaries. Although all boundaries can be drifted to someextentby contextual influences, most boundaries do Oot change very much. (Isolated vowels are a significant excection--see beloW0 Presumably, this is so because listeners have internal criteria based on their long experience with speech, and especially with their native tongue. Its might be argued that phonetic boundaries are stable because they coincide with auditory boundaries of some sort. However, the evidence for such a coincidence is not convincing (see my earlier discussion of categorical perception), and nonhuman subjects seem to exhibit much larger range-contingent boundary shifts for speech stimuli than adult human subjects (Waters & Wilson, 1976).

Another example of an interesting discrepancy between speech and non- speechis provided by the pattern of vowel context effects. Repp et al. (1979) found not only that isolated synthetic vowel stimuli presented in pairs exhibit large contextual effects (as shown earlier by Fry, Abramson, Eimas, & Liberman, 1962; Lindner, 1966; Thompson & Hollien, 1970; and others), but also that backward contrast (the influence of the second stimulus on perception of thefirst) was stronger than forward contrast (the influence of thefirst stimulus on perception of the second). These results become interesting in the light of later findings that nonspeech stimuli show (surprisingly) mueh smallercontrasteffects than isolated vowels and no (or theopposite) difference between forward and backward contrast. Healy and Repp (in press) obtainedthese results by comparingvowel's from an (ii-[I) continuum with brief nonspeech "timbre" stimuli (single-formant resonances of varying fre- quency, labeled as "low" or "high"). Fujisaki andShigenc (1979) also compared vowels with timbre stimuli that, however, had the same duration, and stillfounda large difference in the magrttude of contrast effects, and larger backward than forward contrast for vowels only. Shigeno and Fujisaki (Note 8) compared phonetic category judgments of vowels varying in spectrum with pitch judgments of asingle vowel varyinginFO. While theformer

24 condition replicated earlier findings (large contrast effects, more backward than forward contrast), there were no contrast effects at allin the latter condition. While itseems possible that an auditory explanation of these results will eventually be found, the peculiar flexibility of vowel perception may also be grounded in the special status of vowels as nuclear elements in the speech message. Perhaps, the modifiability of vowel perception corres- ponds to the remarkable contextual variability vowels exhibit in the speech signal.

Other Perceptual Integration Effects

discussion of evidence for a phonetic mode of perception would not be campl e without mention of two strands of research that make a part4.cularly impor ant contribution. They both deal with the integration of cues separated not in time but in space or even occurring in different modalities.

Duplex Perception

Duplexperception is thenewly coined (Liberman, 1979) name for 1 phenomenon originally discovered by Rand (1974) and described earlier in thid oL:ter: An isolated formant transition presented to one ear simultaneously with the "base" (a synthetic CV- syllable bereft of that foment transition) in the other ear is perceived as a lateralized nonspeech "chirp" although, at the same time, it contributes (presumably, by some process of central integration) t the perception of the syllable in the other ear.'The phenomenon by itself 1. is that the same input may be perceived in auditory ar d phonetic modes at the .tae time: the transition isauditorily segregated, yet phonetically integrated with the base, Several recent studies show that various experimen- tal variables affect either the auditory or the phonetic part of the duplex percept, but not both.

Thus, Isenberg and Liberman (1978) varied the intensity of the isolated transition. The subjects perceived changes in the loudness of the chirp, but they could not detect any change in the loudness of the syllable in the other ear, even though theyperceived thephoneticsegmentspecifiedbythe transition. Liberman, Isenberg, and Rakerd (1981) immediately preceded the base with a fricative noise appropriate forJs], which (in the absence of any intervening silence) inhibited the perception of the stop consonant ((p]or It]) that the base in conjunction with the transition in the other ear otherwise would have generated.Listeners found it difficult to discriminate [s]+[pa] and (s]+(ta] as long as they attended to the side on which the speech was heard, for both stimuli sounded like (sal. However, their discrimination of (p]- chirps from [t]-chirps in the other ear was highly accurate. Recently, Mann, Madden, Russell, and Liberman :1981) used the duplex perception paradigm to examine further the effect (discoveredbyMann, 1980) of a preceding liquid on stop consonant perception.' When the syllables [al] or (ar] preceded the base of a stimulus from a [ta]-(ka] continuum, the context effect was obtained in phonetic perception (more [ka] percepts following [al]) While the percep- tion of the isolated transition in the other ear was unaltered.

Effects similarto duplex perception have beenreported,where some nonspeech stimulus in one ear affected phonetic perception in the other ear whP.e retaining itsnonspeech quality. Forexample,Pastore(1978) found that when the syllable (pa] in one ear was accompanied by a burst of noise in 25 the other ear, phonetic perception changed to Eta). Apparently, the noise- - even though it did not nave the appropriate timing, duration, and envelope- - was interpreted by listeners as a (t)-release burst and.was integrated with the syllable in the other ear. There is no doubt, however, that listeners nevertheless continued to hear a nonspeech sound in the ear in which the noise occurred. The finding of Repp (1976) that the pitch of an isolated vowel in one ear affects the perception of the voiced-voiceless distinction for stop- consonant-vowel syllables in the other ear may be taken a2 another instance of duplog perception. Presumably, listeners wild have accurately judged the pitch of the isolated vowel without destroying its phonetic effect.

Duplex perception phenomena provide evidence for the distinction between auditory and phonetic modes of perception. They show that the auditory mode can gain access to the input from individual ears while the phonetic mode, under certain conditions, operates on the combined input from both ears. The "phonological fusion" discovered by Day (1968)- -two dichotic utterances guch as "banket" and "lanket" yield .the percept "blanket"--is yet another exaMple of the. abstract,nonauditory levelof integrationthatcharacterizes the phonetic mode.

Audio-Visual Integration

Perhaps the most important recent discovery in .the field is the finding of aninfluence of visual articulatory information on phonetic perception (McGurk & MacDonald, 1976; MacDonald & McGurk, 1978; Summerfield, 1979). Of course, ithas been knownfor a long time thatlip reading aids speech perception, especially for thehard of hearing,but only recently has it become clear how tight audi -visual integration can be.McGurk and MacDonald (1976) presented a video display of a person's face saying simple CV syllables in synchrony with acoustic recordings of syllables from the same set. When the visual and auditory information disagreed, the visual information exerted a strong influence on the subjects' percepts, primarily d'e to the readily perceived presence vs. absence of visible lip closure. Thu3, when a visual /da/or /ga/was paired with an auditory /ba/, subjects usually reported /da/.8

The interpretation ofthis ,finding is straightforwardand of greac, theoretical significance. Clearly, subjects somehow combine the articulatory information gained from the visual display with that gained from the acoustic signal. In Summerfield's (1979) words, "optical and acoustic displays are co- perceived in a common metric closely related to that of articulatory dynar'Ics" (p, 314). This phenomenon provides some of the strongest evidence we have for theexistence ofa speech-specific mode of perception that makes useof articulatory, as opposed to general auditory, information. The common metric of visual and auditory speech input represents a modality-independent, presum- ably articulation-based level of abstraction that is the likely site of the integration and context effects reviewed above. Phonetic perception in the auditory modality (when speech enters through the ears)is likely to be in every sense as abstract as it isin the visual modality (when articulatory movements are observed directly).

Ina recent ingenious study, Robertsand Summerfield(1981) usedthe audio-visual technique to demonstrate that selective adaptation of phonetic judgments is a purely auditory effect. Although conflicting visual informa-

26 33 tion changed the listeners' phonetic interpretation of an adapting stimulus, it had no effect whatsoever on the direction or magnitude of the adaptation effect. Besides its implications for the selective adaptation paradigm (cf. also Siwusch & Jusczyk, 1981), thiselegantstudyprovidesfurther evidence for the autonomy of phonetic perception.

Disruption of Perceptual Integration

As was pointed out in the discussion of speaker normalization effects, a simulated change in vocal tract size (or in any other speaker characteristic, such as fundamental frequency) must not disrupt the perceptual coherence of an utterance if a normalization effect shall be observed. In the case of formant transitions leading into a vocalic stimulus portion,or of an aperiodic portion(fricative noise) 'being followed by a periodic portion, perceptual coherence is easily maintained when the formant frequencies of the vowel are changed. However, when two periodic sigrApl portions appropriate to different vocal tracts are juxtaposed, a change in Speaker may be perceived, and this may lead to thedisruptionof whateverperceptual interactions (trading relations or context effects) may have taken place between the two periodic signal portions. There are several examples of this phenomenon in the recent literature.'"

For example,Darwinand Bethell-Fox(1977) showed that, bychanging fundamental frequency abruptly at points of transition,a speech stimulus originallyperceived as a smoothalternationof a liquid consonant (or semivowel) and a vowel could be changed into a train of stop-vowel syllables perceived as being produced in alternation by two different speakers. The manipulation of FO signalled a change in source and thus "split" the formant transitions into portions that effectively became new cues, signalling stop consonants rather than liquids or semivowels.

Dorman et al. (1979: Exp. 6) studied a situation in which the percep- tion of a syllable-final stop consonant depends on whether or not there is a sufficient period of- (near - )silence to indicate closure. An utterance such as /babda/is generally perceived as /bada/ ifthestop closure interval is removed. Dorman et al. found, however, that when the first syllable, /bab/, is produced bya male speaker and the second syllable,/da/, by a female speaker, the syllable-final stop in /bab/is clearly perceived. Because of the perceived change in speakers, listeners no longer recognize the absence of a closure interval; the critical syllable-final stop is now in utterance-final position. Interestingly, two subjects who repotted that they did not notice a change in speaker, also failed to perceive the syllable-final stop consonant in the absence of closure.

Conversely, an interval of silence in an utterance may lose its perceptu- al value when a change of speaker is perceived to occur across it (Dorman et al., 1979: Exp. 7): When silence is inserted into the utterance "say shop" immediately preceding the fricative noise, listeners report"say chop". However, when "say" is spoken by a male voice and "shop" by a female voice, this effect no longer occurs; the silence loses its phonetic significance, and the second syllable remains "shop."

This effect was further investigated by Dechovitz, Rakerd, and Verbrugge (1980) who varied the perceived continuity of the test utterance "Let's go

1 27 3,1 shop (chop)" by having speakers produce either the whole phrase or just "Let's go." Silence inserted (or removed from) between the "go" and the "shop (chop)" of a continuous utterance had the expected effect onphonetic perception: "shop" wasperceived as "chop" when silence was present,and"chop" was perceived as "shop" when there was no silence. However, when the "Let's go" with phrasefinal intonation was followed by either "shop"or "chop" from a different production, there were no such effects: "shop (chop)" remained "shop (chop)." Inte;estingly, these authors found thata change of speaker from female to male between "Let's go"and "shop(chop)" did not disrupt perceptual integration as long as the "Let's go" derived from a continuous utterance of "Let's go shop (chop)." This finding is in apparent contradiction to that ofDorman et al. (1979)described inthe preceding paragraph. Dechovitz etal. interpreted itas showing that dynamic informationfcr utterance continuity may override a perceived change in source (despite the concomitant auditory discontinuities). If this interpretation is correct, it may point to another instance where purely auditory principles fail to explain phonetic perception. Some of the variables thatdetermine the perceived continuity of an utterance arelikely to be auditory (cf. Bregman, 1978); however, there may also be speech-specific factors that reflect what listeners consider plausible and possible in the dynamic context of natural utterances.

CONCLUSIONS

The findings reviewed above provide a wealth of results that,in large measure, cannot be accounted for byour currentknowledgeof auditory psychophysics. Although there remains much to be learned about the perception of complex auditory stimuli, some trading relations and context effects seem a priori unlikely to reflect an auditory level of interaction,andat least one--audio-visualintegration--simply cannot derive from that level. While efforts todelineatethe roleof general auditory processes din speech perception should certainly continue, it may be predicted that this role will be restricted largely to the perception of nonphonetic stimulus attributes.

This is not to say that auditory properties of the signal are not the basiccarrier of the linguistic message. However, auditory psychophysics gains knowledge about the perception of 'aese properties in large part from listeners'judgments in psychophysioal experiments, and these judgments are made in a different frame of reference from the judgments of speech. Auditory variables, but not auditory judgments, are the basis of phonetic perception. Even those limitations imposed by the auditory system that have to do with detectability and resolution may rotplay any important role inphonetic distinctions. For instance, there is no reason why phonetic category boundar- ies could not be placed at suprathreshold auditory parameter settings that seem arbitrary from a psychophsical viewpoint but are well motivated by tae articulatory and acoustic patterns that characterize a given language. And even though phonetic and auditory boundaries may sometimes coincide, there is the more fundamental question whether such "boundaries" play any role in the perception of natural speech, cdhsidering thefactthat naturalspeech is different in a number. of ways from the artificial stimuli employed in speech discrimination tasks. While the objection of ecologically invalidstimuli extends to most of the studies reviewed In ..his paper, the present emphasis has been on processes perceptual inegration that promiseto be more general than static concepts such as boundary locations.

28 Two possible criticisms of the research reviewed here should be mentioned. One is that nearly all studies demonstrated perceptual integration in situations of high uncertain4 7roducedbyambiguoussettings of the primary cue(s) for a given phonetic distinction. The perceptual integration observed may have been motivated by that ambiguity. In that case,it may be that perceptualintegration does roc`_ occOr tothe same extent in natural situations, where the primary cues are often sufficient foraccurate phonetic perception.

The other criticism is that, although the trading relationsand context effects reviewed here have beendtz.cribedas complex interactions between separate cues, it may well be that thesecues do not function as perceptual entities thatare "extracted" andthen recombined into a unitary phonetic percept(cf. Bailey& Summerfield, 1980). In thatview, cues serve only descriptive purposes; the perceptual interactions between themcan be under- stood as resulting from the listeners' apprehension of the articulatory events they convey. Whilecues(i.e., acoustic segments)are indispensablefor describing how the articulatory information is represented in the signal,we need not postulate special perceptual processes that constructor derive the articulatory information from these elementary pieces. Rather, the articula- tory information may be said to be directly available (Gibson, 1966; Neisser, 1976). This is an attractive proposal; however, we 3hould_not forget that there are real questions to be answered about the mechanismsthat accomplish phonetic perception and that we know so woefully little aboutat present. If cues and their interactions have no place in a description of these mechan- isms, we are faced with the more fundamental problem of finding, theproper ingredients for a model of speech perception.

There is reason to believe that the information processing approaches currently in vogue are not likely to lead us very farin that regard. To understand how our; rceptual systems work,we need to understandhowa complex biological system (our brain) integrates and differentiates informa- tion, how it is modified by experience, and how the structure of the input (i.e., the environment) getsto be represented inthe system. These are complex biological questions whose solution will not come easily. Computer analogies are largely tautological and distract from the fundamental biologi- cal and philosophical problems thatlie at the heartoftheproblemof perception (see, e.g., Hayek, 1952; Pisget, 1967; Studdert-Kennedy, inpress, Note 9). In a particularly enlighLening discussion, Fodor (Note 10) has recently argued for the modularity of the speech (and language)system, i.e.. for its specificity and relative isolation from other perceptual and cognitive systems. He also pointed out that it is precisely such modular systems that we have some hope of understanding,whereas explanations of perception in terms of general principles remains interminably ad hoc. Thus, we should not be surprised to find that speech perception is accomplished bymeans entirely particular to that mode. The probler' of how to investigate and describe those means will keep us busy for some time to come,

REFERENCE NOTES

I. Grunke, M. E.. & Pisoni. D. B. Some experiment.; on perceptual learning of mirror-image acoustic patterns. Research on Speech Perception (Department of Psychology, Indiana University), 1979, 5,147-182.

3t; 2, Serni61ses, W. La simultaniiti des indices dans la perception du voise- -merles occluiives. Rapport d'Activitia de l'Institut de PhonAtique (Bruxelles: Universit6 Libre), 1973. 7(2), 59-67. 3. Serniclaes, M. Traitement indipendant ou interaction dans le processus de atructuration perceptive des indices de voisement?Rapport d'Activitis de l'Institut de Phonetique (Bruxelles: University Libre19717E217-47- 57. 4. Miller, L., &Eimes, P. D. Contextual perceptionofvoicing .tx infants. Paper presented at the Biennial Meeting of the Society for Research in Child Development, Boston, MA, April 1981. 5. Kunisaki, 0., & Fujisaki, H. On the influence of context upon perception ofvoiceless fricative consonants. Annual bulletin of the Research Institute of Logopedics and PhoniatricsTGUersity t Tokyo), 1977,-11, 85-91. 6. Zue,V. W. Acoustic characteristics ofA22consonants: A controlled study. Technical Report No. 523, Lincoln Laboratory (Massachusetts Insti- tute of Technology, Lexington, Massachusetts), 1976. 7. Carrell, T. D., Pisoni, D B., A Gans, S. J. Perception of the duration of rapidspectrum changes: Evidence for context effects with speech and nonspeech. Reseirch on Speech Perception(Department ofPsychology, -Indiana University), 1980, '6, 421-436. 8. ...411igeno, S., & Fujisaki, H. Context effects in phonetic and non-phonetic vowel judgments. Annual Bulletin of the Research Institute of Logopedics and Phoniatrics (Uni ,ity of Tokyo), 1980, 14, 217-224. 9. Studdert-Kennedy, M. Are utterances prepared and perceived inparts? Perhaps. Paper presented at the First International Conference on Event Perception, University of ConnectiOut, Storrs, June 1981, 10, Fodor, J. A. The modularity of mind. Unpublished paper, M.I.T.. igth

REFERENCES

Aslin, R. N., & Pinoni, D. B. Some developmental processes in speech percep- tion. In G. H. Yeni-Komshian, J.F. Kavanigh74 C. A. Ferguson (Eds.). Child phonology. Vol. 2: Perception. -NewYork: Academic Press,1980. Pp. 67-96. Bailey, P. J., 6 Summerfield, Q. Information in speech: Observations on the perception of (sl-stop clusters. Journal of Emttisental Psychology: Human Perception annPerformance. 1980, 6. 536-56L Bailey, P. J., Summerfield, Q.. & Dorman, R. On the identification of sine- wave analogues of certain speech sounds. Haskins Laboratories Status Report on 101.9.12 Research, 1977.-SR-51/52, 1-25. Barton, D. Phonemic perception in children. In G. H. Yeni-Komshian, J. F. Kavanagh, & C. A. Ferguson (Eds.Y, Child phonology. Vol. 2: Perce.t on. New York: Academic Preis, 1980. Pp. 97-116. Best, C. rongiello. B., & Robson, R. Perceptual equivalence of acous c cues in Speech and nonspeech perception. Perception & Ps h 1981, R, 191-211. Blechner,. M. J. Musical ski ll and the cifegorical perstELLE of harmonic mode. Unpublished doctoral dissertation, Yale University, 1977. Blechner, M. J., Day, R. S A Cutting, J.E. ,Processing two dimensions of nonspeech stimuli: The auditory-phonetic distinction reconsidered. Journpi of Experimental pwhology: Human Perception and Performance.

1976, 2. 257-266. 1/4

3') 37 gloatein, S. E.. & Stevens, K. N. Acoustic invariance in-speech production. 'Journal of"the Acoustical Society of America, 1979, 66,.1001-1017. . Bluestein, S. E., & Stevens, K. N. Perceptual invariance and onset spectra for atop consonantsin different vowel environments. Journal of the Acoustical Society of America, 1980, 67, 648-662. Bradshaw, J. L., & Nettleton, N. C. The nature of hemispheric specialization in man.- The Behavioral and Brain Sciences, 1981, 4, 51-63. Bregman,A. S. Theformation of auditory 'stream In J. Requin (Ed.), Attention and performance VII.'Hillsdale, NJ: Erlbaum,, 1978. Pp. 63- 75. '' Burns, E. M.,IS Ward, W. D. Categorical perception -- phenomenon or epipheno- menon: Evidence from experiments in the perception of melodic musical intervals.1 Journal of the Acoustical Society of America, 1978, 51, 456- 468.

. Carden, C., Levitt, A. G., Jusczyk, P. W.. & Walley, A. Evidence for-phonetic processing of cues to place of articulation: Perceived manner affects perceived place. perception & PsYchophysics, 1981, 22, 26-36. Cole, R. A., & Scott, B. Perception of temporal order in speech: The role of vowel transitions. Canadian Journal of psychology, 1973, 27, 441-449. Cutting. 4. E. Two left-hemisphere mechanisms in speech perception. Perception A Psychophysics, 1974, 16, 601-612. Cutting, J. E. Auditoryand linguisticprocesses in speechperception: Inferences froth' six fusions in dichotic listening. Psychological Review, 1976, 11, 114-140. Cutting, J. E.There may be nothing peculiar to perceiving in a speech mode. In J. Requin (Ed.), Attention andperformance VII. Hillsdale, NJ: Erlbaum, 1978. Pp. 229444. Cutting,J. E., eg Rosner,B. S. Categories and boundaries inspeech and music. Perception & Psychophyaics, 1974, 16, 564-570. Darwin,C. J.. & bothell-Fox, C. E. Pitch continuity and speechsource attribution. Journal of izperimental Psychology: Human Perception and Performance, 1977, 1, 633-672. Day, R. S. Fusion in dichotic listening. Unpublished doctoral dissertation. Stanford University, 1968. Dechovitz, D. R., Rakerd, B & Verbrugge, R. R. Effectsofutterance continuity, on phonetic judgmencs. Hankins Laboratories Status Report on ,§a,12Research, 1980, SR-62, 101-116. Denes. P. feet of clyratiaircla the perception of voicing. Journal of the Acoustical Society of America, 1955, 27, 761-764. Derr. M. A.. & Massaro, D. W. The contribution of vowel duration. F0 contour, and frication durat.on 43 cues to the ljux/-/Jus/ distinction. Perception & Faychophysics. 1980. 27, 51-59, Diehl. R. L. Feature detectors for speech; A critical reappraisal. Psjchojical Bulletin, 1981, la, 1-18. Diehl.R. L.. Souther, A. F., A Convis, C, L. Conditions on rate normaliza- tion in speech perception. EES12115T & ?1/512a1AISI, 1980. 27. 435 - 443. Divenyt, P..L. Some plythOacoustic factors in phonetic analysis. Proceedings the Ninth InternationalCongress of PhOnetic Sciences. Vol, II- Copenhagen: UniVer$ity of Copenhagen, 1479. Pp. 445-452. :merman.- M. F,. 4 Raphael,L. J. Distribution of acousticcues for stop -consonant place of articulation in VCV syllables. Journal ofthe Acoustical Societ...2 of America. 1980. 67. 1333-1335, Dorman. M. F. Raphael, L. I- 6 Isenberg, D. Acoustic fricAtive- j

affricate contrast in word-final position. Journal of Phonetics, 1980, 8, 397-405. Dorman,M. F., Raphael, L. J., & Liberman,A. M. Some experiments on the soundof silencein phonetic perception. Journal of the Acoustical Society of America, 1979, 68, 1518-1532. Dorman, M. F., Studdert-Kennedy, /4 & Raphbel, L. J. Stop-consonant recogni- tion: Release bursts and formant transitions as functionally equivalent, context-dependent cues.arstato2A Psychophysics, 1977, 22, 109-122. Eimas,P. D., ,k7 Corbit, J. D. 'Selective adaptation of linguOtic feature detectord. Cognitive Psychology, 1473, 4, 94-109. Ulm, P. D., & Miller, J. L. Contextual effects in speech perception. Science, 1980, ?pi, 1140-1141. Fitc67-7.77. Distinguishing temporalinformation for speaking rat from temporal information for intervocalic stop consonant' voicing. Iskins Laboratories Status Report on Speech Research, 1981, SR-65, 1-1/ Fitch, H. L..Nalwea,T.,Erickson,D. N., & Liberman,A. P ..eptual equivalence of two acoustic cues for stop-consonant manner. arception & Psychophysics, 1980, 27, 343-350. Fry, D. 8., Abramson. A. S., Eines, P. D.. & Liberman, A. M. The identifica- tion and discrimination of aynthetc vowels.Language and Speech, 1962, 5, 171-189. Fujisaki,- H.,& Kuniaaki, 0. Analysis, recognition, and perception of voiceless fricative consonants in Japanese. IEEE Transactions (MP). 1978, 26, 21-27. Fujisaki, N., & Shigeno, S. Context effects in the categorization of speech and nonspeeoh stimuli. InJ. J. Wolf &D. H. Klatt (Eds.), Speech communication papers presented at the 9th Nesting of the Acoustical Sociaty_01 America. New York: Acoustical Society of America, 1979. Gepong,W. F. III selective adaptation effectsofburst-cued stops.

' Perception & P3 00 h sins. 1978, 24, 71-83,, Gerstman, L. Cues ror at nguishing tong rrioativot6 affricates. and 1122 consonants. Unpublished doctoraldissertation,New York University. 1957. Gibson, J. J, The senses considered as perceptual systems. Boston.04.33.: Houghton-Niff15774;6. Grosjeao,F., & Lane,H. lalw thelistenerintegrates thecomponents of speaking rate. Journal of Experimental Psychology: Human Perception and Performance. 1978 777331:543. hagga7.717117777Ambler,3.,& Callow. P. Pitch as a voicing cue. Journal of the Acoustical Soolet of America. 197ie, 121. 613-617. Haile, 477-111W7ins, N. Analysis by synthesis. In W. Wathe -Dunn L. E. Woods (Ede.), Proceedings of the sePinar on speech sagermlog and

otessin , Vol. 2. AFCRC-TR-59-198.USAF CambridgeResearch Center,

1 9. Harris. K..S..iloffidan, H. S..Liberman.A. H.. Delattre,P. C., A Cooper, F. S., Effectof third-foment transitions on the perception of the voiced stop Journal 3 of the Acoustical Sociqy of America, 1958.-12, 122 -126. Hasegawa, A. Someperceptual consequencesof fricative coarticulation. Unpublished doctoral dissertation, Purdue university. 1976. Hayek, F. A. The !Inn order. Chicago: University of Chicago Press, 1952. In Healy,. A. F.. fiepp,. H. Contest sensitivity Ind phonetic aellatiot. kcategorical perception. Journalof Exisfmntal L4151121211! Huesn Perception And Performance. in press. Heuven, V. J. van The relative contribution of rise tine, steadytime, and overall duration of noise bursts to the affricate-fricativedistinction in English: A re-analysis of old data. InJ. J. Wolf & D. H. Klatt (Eds.), Speech communication papers presented at the 9.12.1 Westinof the Acoustical Society of America. New York: Acoustical Society of America, 1979. Hoffman, H. S. Study of some cuesin the perception of the voiced stop consonants. Journal of the Acoustical Society of America,1958, JR, 1035-1041. Hogan, J. T., & Rozsypal, A. J. Evaluation of vowel duration as a cue for the voicing distinction in the following word-final consonant. Journal of the Acoustical Society-of America, 1980, 67, 1764-1771. House, A. S., Stevens, K. N., Sandel, T. T., A Arnold, J. B. On the learning ofspeechlike vocabularies. Journal of Verbal LearninG and Verbal Behavior, 1962, 1, 133-143.' Isenberg, D., Liberman, A. N. Sperth and nonspeech percepts from the same sound. Journal of the Acoustical Society or Ameri3a, 1978, 64 (Supply. sent No. 1), 520. (Abstract) Jusczyk, R.-W. Infant speech perception: A critical appraisal: Is. P. D. Limas & J. -L. Miller (Ede.), Perspectivesori the stud of speech. Hillsdale, NJ: Erlbaum, 1981. Pp. 113-164. i Kewley-Port,D. Representations of spectral change as cues to place of articulation of LW consonants. Unpublished dool.oral dissertation, City University of New York, 1981. Kohler, K. J. Dimension* in the perception of fori,is and lents plosives. Phonetic*, 1979, .36, 332-343. Klatt757r-LinguistIOuses of segmental durattpn in English.: Acoustic and perceptual evidence. Journal of the Acoustical 1E101 of America, 1976; 2. 1208-1221. Kuhl. P. K. Discrimination of speech by nonhuman animals: Basic auditory sensitivities conducive to the perception of speech-sound categories. Journal of the Acoustical Society of America, 1981, TO. 340-349. Kuhl.K..P. & Miller, J. D. Speech', perception by the chinchilla: Identificationfunctionsforsynthetic VOT stimuli. Journal of the Acoustical Society of America, 1978, 61, 905-917. Ladefoged, 01 E. Informationconveyed by vowels. Journal of the Ac stical Society of America, 1957, 22, 98104. Liberman,;L. N. fole3iFireer-Tirtaond integration ofcues: Evidence that speech is different froe nonspeech and similar to language. Proceedings of the Ninth International Congress ofPhoneticSciences, Vol. II. Copenhagen: University of Copenhagen, 079.-574g8-1117.---- Liberman, A. M., Cooper, F. S.. Shankweiler, D. P.. & ItuddertAennedy, M. Perception of the speech code. 11126213191 Review. 1967: 74, 431-461. Llborian,A. M..Delattre,P. C.,& Cooper,F. S. Therole of selected stimulus variables in the perception of the unvoiced stop consonants. American Journal of Psychology, 1952, 61.. 497-516. Libermer..A.A. M., Harris, K. S., Eines, P. D.. Lisker, L.. & Bastian, J. An effect of iwning on speech The driscrimination of durations of silencekthand without phonemic snificance. Language an. .S msh, 1961, 4, 175-185. , Liberman. A. Ni,, Harris, K. 5., Kinney, J. A.. 4 Lane, H. The discrimination of relative onset time of the components of certain speechand nonspeech patterns. Journal of Experisental PeYch01041._196l.-61, 319-388. Liberman A. M., 7:enherg. D..& Rairerd. B: Duple' perception of cues for

13 stop consonants: Evidence for a phonetic mode. Perception & Psychophysics, 1981, 10, 133-143. Lindner, G. Veraenderung derBeurteilungsynthetischerVokale unterdem Einfluss des Sukzessivkontrastes. Zeitschrift fuer Phonetik, Sprachwissenschaft, and Kommuniketionsforachunit, 1966, 117-1a7-07. Lisker, L. Closure ration and the intervocalic voiced-voiceless distinction in English. Lan a e, 1957, 1111, 42-49. Lisker, L. Is it W T or a first-fOimant transition detector? Journal of the Acousticalloqiety of America, 1975. 57, 1547-1551. Lisker, L. Closure voicing, manner and place of consonant occlUsion. Haskins Laboratories Statim Report on Speech Research, 1978, SR-53, (Vol. 1), 79-86. (a) Liskeri L. ley rabid: A catalogue of acoustic features that May ue the distinction. Haskins Laboratories Status Report on Speech Research, 1978, 3R-54, 127-112. (b) Linker, L. On buzzing the English /b/.Haskins Laboratories Status Report on Smell Research, 1978, SR-55/56, 181-188. (c) Waiter, L., Liberman, A. M., Erickson, D. N., Dechovitz, 0., & Handler, R. On pushing the voice-onset-time tv0T) boundary about. Language and Speech, 1977, 20, 209 -216. Lisker, L., & Price,P. J. Context-determined effects of varying closure duration. In J. J. Wolf &D. H. Klatt (Eds.), Speech communication papers presentedat the Ea Meeting of the Acoustical Society of America. New York: Acoustical Society ofimerica, 1979. MacDonald, J., & McGUrk, H. Visual influences on speech perception processes. Perception -A Psychophysics, 1978, 24, 253-257. Mann,V. A. Influence of preceding liquid onstop consonantperception. Perce tion & Psychophjsicsi 1980, 28, 407-412. Mann, dden, J., Russell, J. M., & Liberman, A. H. Further inestiga- tion into the influence of preceding -liquids stop consonant percep- tion. journal of the Acoustical Society of &a, a,198,1, 69 (Supplement

A., & Repp, B. H. Influence of vocalic context on perception of the (1)-(s) distinction. Perception & Ps mho1 sins, 1980, 28, 213,228. Mann, V. A., & Repp. B. H. influence of prece ing frtcative on stop consonant perception. Journal or the Acoustical Society of America, 1981, 69, 548 - 553. Massaro, D. W., & Cohen, M, M. The contribution of fundamental frequency and voice onset time to the /zi/-/si/ listinctions. Journal of the Acoustical Society cf America, 1976, 60, m-717. Maizsaro, D. W., A Cohen, -R, 7R7Vaice onset time and fundamental frequency as cues to the, tai/ - /si/ .otinction. Ferceptioh & Psychophysics,1977, n. 37 382. Mattingly I. G., , Levitt, A. G. Perception of stop consonants before low unroded vowels. Haskins Laboratories Status Report on Speech Research, 1980, R-61, 167-174. Mattingly, 7777, Liberman, 1. M., Syrdal, A. M., & Halwes, T.Discrimination in speech ana nonspeech modes. Cognitive Psychology, 1971, 2,131-157. May, J. Vocaltract nomalization for is!and/1/. Haskins Laboratories Status Report on Speech Research, 197r, SR-48, 67-73. McGurg, H., & IcDonald, J. Hearing lips and seeing voices. Nature, 1976, 264, 746-748. Miller, J. D., Wier, C. C., Pastore, R., Kelly, W. J., 6 Dooking, R. J. Discrimination and labeling of noise-buzz sequences with varying noise-

34 lead times: Anexampleof categorical perception. Journalof the Aooustical Society of America, 1976, 60, 410-417. Miller, J. L. Contextual effects in the'discrimination of atop consonant and semivowel. Pereeption'& Psychopbxaics, 1980, 28, 93-95. Miller, J. L. Theeffect of speakingrate on segmentaldistinctions: Acoustic, variation and perceptual compensation. In P. D. Eimas & J. L. Miller (Eds.), Perspectives' on the study of speech. Hillsdale, N.J.: Lawrence Erlbaum Associates, 1961.- Miller, J. L., & Groajean, F. How the components of speaking rate influence perception of phonetic segments. Journal 2L-Experimental Psychology: Human Perception and Performance, 1981717-208-215. Miller, J. L., & Liberman, A. M. Some effects of later-occurring information onthe perception of stopconsonant and -semivowel. Perception & psychopnyaica, 1979, 25, 457-465. Norte, P. A. The infancy of infant speech parcception: Clbe first decade of reseach. Brain, Behavior, and Evolution, 1979, 16, 351-373. Neisser, U. Cognition and reality. San Francisco: Freempn, 1976. Pastore R. E. Contralateral cueing effect* in the percpption of aspirated atop consonants. Journal of the Acoustical Society or AM2rica, 1978, 64 (Supplement No. 1), S17. (stract) Pastore,R. E. Possible paychoacdustic :actors in apeecn *perception. In P. D. Eimas & J. L. Miller (Eds.), Perspectives on the study of speech. Hillsdale, N.J.: Erlbaum, 1981. Pastort, R. E., Ahroon, W. A., Baffuto, K. J., Friedman, C., Puleo, J. S., & Fink,E. A. Common factor model of categorical perception. Journal of Experimental Psychology: Hum Perception and Performande, 191771768Z: 696. Pastore, R. E., Ahroon, W. A., Puleo, J. S., Crimmins, D. B., Golowner, D. B., & Berger, R. S. _Processinginteractions between twodimensionsof nonphonetic auditory signals.Journal of Experimental Psychology: Human Perception and Performance, 1976, 2, 267-276. Pastore, R. E., Harris, L. B., & Kaplan,J. TQT: An acoustic GV syllable temporal onset analog? Journal of the Acoustical Society of America, 1981,a(Supplement No. 1), S93. -TAbatract) Piaget, J. Biology and knowledge: Chicago: University of Chicago Press, 1967. Pickett,J. M.,& Decker, L. R. Time factorsin perception of a double consonant. Language and Speech. 196G, 1, 11-17. Pisoni, D. B. Identification and discrimination of the relative onset of two component tones: !implications for the perception of voicing in stops. Jou-nal of the AcOuatical Society of America, 1977, 61, 1352-1361. Pisoni, D. B. Adaptation of ,the relative onset time of two-component tones. Perception & Pnychophyaica, 1980, 28, 337-346 Pols, L. C. W., 1 Schouten,M. E. H. Identification of deleted consonants. Journal of the Acoustical Socivty of America, 1978, 64, 1333-1337. Price, P. J.,& Lisker, L. (/bA-lelp/) but -(10-"Vb/). InJ. J. Wolf A D. H. Klatt(Eds.), Speech communication papers presented at the 97th Meeting of the Acoustical Society of America. New York: Acoustical Society of America, 1979. Rand, T. C. Vocaltractsize normalization in the perception of stop consonants. Haskins Laboratories Status 11.12911 on Speech Research, 1971, SR-25/26, 141-146. Rant, T. C. Dichoticreleasefrom masking for speech. Journal ofthe Acoustical Society of America, 1974, 55, 678 -68C.

35 Raphael, L. J. Preceding vowel duration as a cue to the perception of the voicing characteristics of word-final consonantsin American English. Journal of the Acoustical Society of America, 1972, 51, 1296-1303. Raphael, L. J. Du.ationa and contexts as cues to word-final cognate opposi- tion in English. Phonetica, 1981, 31, 126-147. Remez, R. E. Adaptation ofthe category boundary between speech and nonapeoch: A caseagainstfeature detectors. Cognitive Psychology, 1979, 11, 38-57. Remez, R. E., Cutting, J. E., & Studdert-Kennedy, M. Cross-series adaptation using song and string. Perception & Paychophysics, 1980, 27, 524-530. Remez, R. E., Rubin, P. E., Piconi, D. B., & Carrell, T. D. Speech perception without traditional speech cues. Science, 1981, 212, 947-950. Repp, B. H. Dichotic "masking" ofvoiceonset time. Journal of the Acoustical Society of America, 1976, 59, 183-194. Repp, B. H. Perceptual integration and differentiation of spectral cues for intervocalib stop consonants. Perception & Psychophysics, 1978, 24, 471- 485. (a) Repp, B. H. Interdependence of voicing and place decisions for stop conso- nants in initial position. Haskins Laboratories Status Report on Speech Research, 1978, SR-53 (Vol. II), 117-150. (b) Repp,B. N. Relative amplitude of aspiration noise asa voicing cuefor syllab2e-initial stop consonants. language and Speech, 1979, 22,173- 189. Repp, B. H. Bidirectional contrast effects in the perception, ofVC-CV sequences Haskins Laboratories Status Report on Speech Research, 1980, SR-63/64, 157-176. (a) Repp, B. H. Perception and production of two-stop-consonant sequences. Haskins laboratories Status Report on Speech Research,1980, SR-63/64, 177-194. --5) Repp, B. H. Two strategies in fricativediscrimination. Perception & Psychophysics, 1981, 30, 217-227. (a) Repp, B. H. Auditory and phonetic trading relations between acoustic cues in speechperception: Preliminary results. Haskins Laboratories Status Report on Speech Research, 1981, SR-67/68, this volume. (b) Repp, B. H., Healy,A. F., & Crowder,R. G. Categories and context in the perception of isolatedsteady-statevowels. Journal of Experimental Psychology: Human Perception and Performance, 1979, 5, 129-145. Repp, B. H., Liberman, A. M., Eccardt, T., & Pesetsky, D. Perceptual integra- tion of acoustic cues for stop, fricative and affricate manner. Journal of Experimental Psychology: Human Perception and Performance, 1978, 4, 621-637. Repp, B. H., & Mann, V. A. Perceptual assessment of fricative-stop coarticu- lation. Journal of the Acoustical Society of America, 1981, 69, 1154- 1163. (a) Repp, B. H., & Mann, V. A. Fricative-stop coarticulation: Acousticand perceptual evidence. HaskinsLaboratories Status Rep ant or Speech Research, 1981, SR-67/68, this volume. (b) Roberts, M., & Summerfield, Q, Audio-visual adaptation in speech perception. Perception & Psychophysics, 1981, 30, 309-314. Rosen, S., & Howell, P. Plucks and bows are not categorically perceived. Perception & Psychophysics, 1981, 30, 156-168. Sawusch, J. R., & Jusczyk, P. Adaptation and contrast in the perception of voicing. Journal of ExperimentalPsychology: Human Perceptionand Performance, 1981, 7, 408-421. 406

36 43 Sawusch, J. R., & Pisoni, D. B. Category boundaries for speech and nonspeech sounds. Journal of the Acoustical Society of America, 1973, 54, 76. (Abstract) Sawusch, J. R.,Pisoni,D. B., & Cutting, J. E. Category boundariesfor linguistic and non-linguistic dimensions of thesame stimuli. Journal of the Acoustical Society of America, 1974,55 (Supplement No70,§55. (Abstract) Schouten,M. E. H. The case against aspeech mode of perception. Acta Psychologica, 1980, 44, 71-.98. Schwartz, M. F. Identifi.,ation of speaker sex from isolated viceless frica- tives. Journal of he Acoustical Society of America, 1968,43, 1178- 1179. /..2 Searle, C. L., Jacobson, J. Z., & Rayment, S. G. Stop consonart discrimina- tion based on human audition. Journal of the Acoustical Society of America, 1979, 65, 799-809. Serniclaes, W. Perceptual processing of acoustic correlates of the voicing feature. Proceedings of the Speech Communication Seminar. Stockholm,

1974. Pp. 87-93. , Siegel, J. A., & Siegel, W. Categoricalperception of tonal intervals: Musicians can't tell;harp from flat. Perception & Psychophysics, 1977, 21, 399-407. Simon, C., & Fourcl,n, A. J. Cross-language study of speech-pattern learning. Journal of tWe Acoustical Society of America, 1978, 63,925-935. Smith, M. R. Perception of word stress and syllable length. Journal of the Acoustical Society of America, 1978, 63 (SupplementNo. 1), S55. (Abstract) Soli,S. D. Structure and duration of vowels togetherspecifyfricative voicing. Journa_ oP the Acoustical Society of America, in press. Stevens, K. N. The potential role of property detectors in the perception of consonants. In G. Fant & M. A. A. Tatham (Eds.), Auditory analysis and perception of speech. New York: Academic Pess, 1975, 303-330. Stevens, K. N., & Blumste'n, S. E. Invariant cues for place of articulation in stop consonants. Journal of the Acoustical Society of America, 1978, 64, 1358-1368. Steveiii, K. N., & Klatt, D. H. Role of formant trapsitions in the voiced- voiceless distinction for stops. Journal of the Acoustical Society of Americo, 1474, 55, 653-659. Strange, W., Ve..hrugge, R., Shankweiler,D. P.,& Edman, T. R. Consonant environmeLt -pecifies vowel identity. Journal of the Acoustical Society of America., 1976, 60, 213-224. Studdert-Kennedy, M. Speech perception. In N. J. Lass (Ed.),Contemporary issues in ederimental phonetics. New York: Academic Press, 1976, 243- 293. Studdert - Kennedy. M. loLeon biology ofspeech perception. In J. Mehler, M. Garre," & E. Walker (Eds.), Perspectives inmental representation. Hillsdale N.J.: Erlbaum, in press. Studdert-Kenhedy, M. Cerebral hemispheres: Specialized for the analysis of what?The Behavioral and Brain Sciences, 1981, 4, 76-77. SImmerfield, A. Q. Information processing analyses of perceptual adjustment's to source arid context variables in speech. Unpublished doctoral disser- tation, Queen's University of Belfast, 1975. Summerfield, Q. Use of visual information for phonetic perception. Phonetica, 1979, 36, 314-331.

37 4 4 Summerfield, Q. Articulatory rate andperceptual constancy in phonetic perception. Journal of Experimenter Psychology: Human ferception and Performance, 1981, 1, 1074-1095. Summerfield, Q. Does VOT equal TOT orNOT7 Examinationof a possible auditory basis for the perception of voicing in initial stops. Journal of the Acoustical Society of America, in press. Summerfield, Q., Bailey, P. J., Seton, J., & Dorman, M. F. Fricative envelope parameters and silent intervals in distinguishing 'slit' and'split.' Phonetics, 1981, 38, 181-192. Summerfield, A. Q., &-Higgard, M. P. Perceptual processing of multiple cues and contexts: Effects of following vowel upon stop consonant voicing. Journal of Phonetics, 1974, e, 279-295. Summerfield, A. Q., & Haggar', M. P. Vocal tract normalization as demonstrat- ed by reaction times. InG. Fant &M. A. A. Tatham (Eds.), Auditory ,analysis and perception of speech. London: Academic Press, 1975,115- 142. Summerfield, Q., & Haggard, M. P. Onthe dissociation of spectral and temporal cues to the voicing distinction ininitial stop consonants. Journal of the Acoustical Society of America, 1977, 62, 435-448. Thompson, C. L., & Hollien, H. Soot contextual effects on the perception of synthetic vowels.Language and Speech, 1970, 11, 1-13. Walley, A. C., Pisoni, D. B., & Aslin, R. N. The role of early experience in the development of speech perception. In R. N. Aslin,J. Alberts, & N. R. Petersen (Eds.), Sensory and perceptual development. New York: Academic Press, 1981. Warren, R. M., Obusek, C. J., Farmer, R. N., & Warren, R. P. Auditory sequence: Confusion of patterns other than speech or music. Science, 1969, 164, 586-587. Waters, R. S., & Wilson, W. A., Jr. Speech perception by rhesus monkeys:The voicingdistinction insynthesized labial and velar stop consonants. Perception & Psychophysics, 1976, 19, 285-289. Whalen, D. H. Effects of vocalic foment transitions and vowel quality on the English [s]-0' boundary. Journal of the Acoustical Society of America 1981, 69, 275-282. Wolf, C. G. Voicing cues in English final stops. Journal of Phonetics, 1978, 6,299-309. Wood, C. C. Auditory and phonetic levels of processing in speech perception: Neurophysiological and information-processing analysis. Journal of Experimental Psycflblogy: Human Perception and Performance, 1975, 104, 3- 20. Zatorre, R. J., & Halpern, A. R. Identification, discrimination, and selec- tive adaptation of simultaneous musical intervals. Perception & Psychophysics, 1979, 26, 384-395. ,Zlatin, M. A. Voicing contrast: Perceptual and productive voice onset time characteristics of adults.Journal of the Acoustical Society of Ame a, 1974, 56, 981-994. Zwicker, E., Terhardt, E., & Paulus,E. Automatic speech recognition us ng psychoacoustic models. Journal of the Acoustical Society of America, 1979, 65, 487-498.

45

38 FOOTNOTES

1.A rule of thumb for distinguishing a trading relation from a context effect is that the phonetic equivalence resulting from a trading relation is strong in the sense that two phonetically equivalent stimuli (syllables or words) are difficult to tell apart (Fitch et al., 1980), whereas the phonetic equivalenceproduced bytrading a critical cue igainat some contextual influence is restricted to the target segment, as itlalwaya invalves a readily detectable change in one cr more contextual segments. To the extent,that a change in context (e.g., vowel quality) also modifies critical cues (e.g., formant transitions), context effects may sometimes include disguised trading relations.

2 The attempttodefineintegratedcuesmust be distinguished from independent efforts to represent the speech signal in a way that takes into

account peripheral auditory .transformationa (Searle, Jacobson, ,81 Rayment, 1979; Zwicker, Terhardt,& Paulus, 1979). Such representaticnsare,of course, very useful and may lead 40 the redefinition of some cue 117 however, they do not, by themselves, solve the problem.of cue definition.

3In essence, 'this kind of study investigates whether multilimensionally varying speech stimuli are perceived categorically. Traditional studies of categorical perception have been exclusively concerned with stimuli varying on a single dimension, or varying on.several dimensions in a perfectly correlated fashion. Note that, in these studies, physically different stimuli from the region of the category boundary are not phonetically equivalent- -they have different response distributions. As soon as two or more cues are varied, however, pair f phipetically equivalent stuli can be found for any given response di rib ion Thus,the influence of phonetic categorization on discriminati n jud note can be factored out,at least in principle(see Footnote 4).

4To pr ce precise (ra er than just average) phonetic equivalence, it wouldnot oy be necessary to take into account the fact that individual listeners sh w trading relations.of varying magnitude but also that (covert) labeling res nses may -change 'in the context of a discrimination task (Repp et al., 1979). Thus, the stimulus parameters would have to be adjusted separate- ly for each listener, based on labeling data collected With the stimulus sequences of the discrimination task. This procedure wouldoptimizethe opportunity t verify the prediction that stimuliin the conflicting-cues condition are (more difficult to discriminate than those in the cooperating - cues, onditio ,-with the single-cue condition-in between. However, this order of dffficulty is likely to obtain also when the choices of parameters are less than optimal.

5Most inter tingly he only "completed study (so far) of- atrading relationin human a is lle r & Eimas Note 4)has yielded a positive reauit: The boundary on a V centinuuYwas significantly affected by the duration of the formant transits sivariable that is confounded with Fl onset frequency (cf. Summerfield & Imiggarc07V, Kuhl& Miller (1978) obtained a similar result with chinchillas. This trading relation, at least, appears to be of auditory origin, even though the principle involved is not yet clear. It seems likely, though,thatnotall trading relations will follow this, pattern. 39 46 6That the subjects focused on one cue only was a strategy furthered by the AXB classification task of Best et al. In a different paradigm, the subjects may pay attention to both- cues at the same time (of. Repp,1981h). The important point is that, in the auditory mode, thecues are not integrated into a unitary percept,so thatlisteners may choose between selective- attention and divided-attention strategies.

7In that connection, the studyof Simon and Fourcin(1978) might be mentioned, which showed that the trading relation between VOT and F1 transi- tion trajectory as cues to atop consonant voicing emerged at age 4 in British children but was absent in 2- and 3-year olds. Recently, however, Miller & Eines (Note 4) found a related trading relation (between VOT and transition duration) in American infants.This conflict needs to be resolved.

8I have experienced this effe myself (together with a number of my colleagues at Haskins) and can corm that it is a true perceptual phenome- non, not some kind of inferencer bias in the face of conflicting informa- tion. The observer.really believes that he or she hears Oat, in fact, he or she only sees on the screen; there is little or no awareness of anything odd happening. However, the effect is not always that strong; its presence and strength depend on the particular combination of syllables, in a way that can also, in part, be explained by reference to articulation. It is strongest when the visual information makes the auditoryinformationimpossiblein articulatory terms. The details of the effect and of the relevant variables remain to be investigated.

9These experimentsconcern the disruption of perceptual integration of cues. However,context effects can presumably be similarly blocked by a change in apparentsource. Diehl,Souther, and %;onvis (1980) recently reported a study in which a rate normalization effect (of a precursor on the /ga / - /ka/ distinction) was eliminated by a change of voice. Unfortunately, their data were not entirely consistent and call for replication.

47 TEMPORAL PATTERNS OF COARTICULATION: LIP ROUNDING*

Fredericka Bell-Berti+ and Katherine S. Harris++

Abstract. According to some theories, anticipatory coarticulation occurs when phones for which a feature is unspecified precede one for which the feature is specified, with consequent migration of the feature value to the antecedent phones. Carryover coarticulation, on the other hand, i3 often attributed to "articulatory sluggish- ness." In this paper, EMG evidence is provided that this formulation is inadequate, since the beginning of ENG activity associated with vowel lip rounding is independentof measures of the acoustic durationof adjacent consonants. We suggest that the often noted vowel-rounding gesture simply co-occurs during predictable intervals with portions of preceding and following lingual consonant articula- tions.

INTRODUCTION

A central problem ill understandingthe relationship between speeph production and perception is the disparity between the perceptual representa- tionof speechas a series ofdiscrete events, composed of partially commutable elements, and the acoustic representation as a continuously varying stream, without obvious phonetic segment markers. This acoustic stream is generated by the activity of the several articulators, whose activityis apparently continuous and context dependent. Many theories of coarticulation attempt to solve the problem of context sensitivity by positing some kind of speech synthesis process that occurs in production, and allows the fitting together of the discratunits into the continuous stream. The task of the theorist, then, is to write the adjustment rules.

In a widely cited theory of anticipatory coarticulation, Henke(1966) provides a fairly typical formulation. Each phone in an articulatory string

*In press, Journal of the Acoustical Society of America. Some of these data were presented at the 96th Meeting of the Acoustical Society of Amei-ica, Honolulu,Hawaii, November 1978 [Journal ofthe Acoustical Society of America, 64, S92(A), 1978], and at the 97th Meeting of the Acoustical Society iofAmerica, Cambridge, Massachusetts,June 1979[Journalof the Acoustical Society of America, 65, S22(A), 1979). 4130 St. John's University, Jamaica, New York 11439. spo' ++Also The Graduate School, The City University of New York, New York,New York 10036. Acknowledgment. This work was supported by NINCDS grants NS-13617 and NS- 05332 and BRS grant RR-05596 to the Haskins Laboratories.

[HASKINS LABORATORIES: Status Report on Speech Research SR-67/68 (1981))

41

41. is conceived as composed of a bundle of articulatory features. Anticipatory ooarticulation occurs when phones for which a given feature is unspecified kimede one for which the feature is'specified, withconsequent subjection of the rntecedent phones to the feature value of the following phone.Since time is unspecific- in the theory, the temporal duration occupied by the string of antecedent phones is presumably irrelevant; all will acquire the same feature value.

It has been claimed by Fowler (1980) that all such theories of courtieu- lation belong to the class of extrinsic timing models of speech production. Such models assume that the dimension of time is excluded from the specifica- tion of a phonological segment in the motor plan for the utterance. In Fowler's view, such accounts must therefore necessarily fail to explain or predict coarticulation. While one may or may not accept her argument in its larger theoretical framework, we believe that purely substantive evidence can be marshaled against such phonological segment theories as a Class.

In an earlier report (Bell-Berti & Harris,1979) we rvided evidence that this formulation is inadequate, and have elsewhere sugg sted an alterna- tive hypothesii (Bell-Berti, 1980; Bell-Berti 3 Harrt.1,-'1,9B41. Specifically, we found thatif a rounded vowel was preceded frY,one or two consonants presumably unspecified for rounding, the electromyographic activity associated withrounding begana constant time, rather- tI,an a constant nuMber of segments, tefore the onset of the vowel.

The present experiment was designed to extend the earlier one in several ways. First, we have examined both anticipatory and carryover coarticulation of lip rounding. Often, "articulatory sluggishness" explanations are proposed for carryover coarticulation while "planning" explanations are proposed for anticipatory coarticulation (e.g., HacHeilage, 1970). However, if both anti- cipatory and carryover effects appear to be guided by the same articulatory rules% disparate explanations for these two effects seem less plausible.

Secondly,we have examined the special case in which coarticulation occurs from one vowel to another vowel, where both vowels are rounded and are separated by intervening consonants without rounding specification. In such cases,it has been shown that a "trough" will occur--that is, ENG activity will be reduced at some point in the vowel -to -vowel period. This situation is, ofcourse, not explicable by thetype of modelof coarticulation exemplified by Henke's, as we (Bell-Berti & Harris, 1974)and others Gay, 1978) have pointed out.

Thirdly,we extendedthe design of the experiment to includelonger strings of consonants preceding or following the roundedvowel than the original maximum ofotwo-element clusters. We also increased the subject pool. and included subjects nai'e to the purposes of the experiment.

Fourthly, we checked the subjects to see I: orbicularts oris a:tivtfy occurred for segment sequences for which no lip rounding was specified, In a theory like tWik°'s, itis assumed that a feature, suchas lip rounciing, spreads from a phone for which itis specified, to the preceding phones fur which it is not. If theprecedingphones carry d specification for the

4 feature, the experiment provides no testof the theory. Earlierstudies (Claniloff & Moll,1968) have been criticized by later authors (Benguerel & Cowan, 1974) for possible design flaws of thistype. For the experiment described here, we assume that the alveolars, especially /3/,are neutral with respect to rounding. Renee, we would expect thatin sequences of the form /isi /, no DG evidence of rounding would be observed,since the vowel /1/ is traditionally characterized as SprLad, and the consonant /3/is not tradition- ally characterized with respect to lip rounding(8eonstein, 1960). Mowe'ver, since traditional descriptions are often incomplete concerningf.neogralned articulatory detail,it seemed worthablle to make an explicit check of lip activity durtpg the sequence /13i/ for each speaker.

As in the previous study, we Love used an electromyographicindicator of rounding, theactivity of the orbicular:3 oris muscle, The relationship between orbicularis oris activity and vowel roundingis well documented by A number of studies (Harris, Lysaught, 6 Schvey,1965; Fromkin,1966; Tatham &

Horton, 1968; Sussman 6 Westbury, 1981) .

Speech Materials

The experimental speech materials were two-wordpnrases spoken within carrierpm-rase "It's a again." The rust word was one from the set "lee, lease, leased, ioo. loose, loosed," while thesecond word was one from theset "tool,stool,teal, steel." All utterances whose second word was either "tool" or "stool' will ee calledthe "anticipatory' set in the discussion below. since they weredesigned to examine /anticipatory lip rounding, Conversely, those utterances whose first wordwas "WO." "loose," or "loosed" and whose Second word was "teal" or "steel." will be called the "carryover" set.

In additon to these eighteen Jxperimental utterances :1 in the anticipa- tory ar;-; six in the carryover sets) , we examined an additional group that included "lee teal" and "lee seal ,"to determine whether aspeaker produced either or eote of the alveolar consonants it/ or is/ with orblcularisoris CMG af_Alvity. In tee absence 3f a rounded vowel,

The experinental utterances were placed In randomized lists that included aldltiJnal Items intended as fOti.L Five subject', real the randomized 1lets wstli 74 te: T8 repetitions of earn experimental utterance hadbeen recorded. A ninth A_ibleCt produced only ten repetitins ofeach utterance type- aere asa.el !o read the sentences from an orthographic repreSentatlon. 4n1. tn.4s. prodJed the ph,,nette !=querice3 natural to tht lorL,r1 ratr.er !flan IiI,t tuil "l eased

Fiitt t eIperimen.!_ wo t!le is z' h A,t!e=lt.. AerlIfi.Te !114.1 a:A41 bee- 4 3-4-t'' -,3 3t4ly. ruirP!_- fivf* autlet5 At,.!:=we4 -rtI 431,7)-4t.'dolt of ff." In nreal vs'wt--; tnel,jata n,2 wtent. furtner ar.a.yzel 31,t' mr-7se rreentel 7:ereare 4p041(er1 -or! 0,-!,:a!ed we'f-%; hew!=-* nAlih. S: FBB

Acoustic Waveform (single token)

pV 400 Orbiculoris oris b) averaged EMG,

-600 700m *.c

Figure 6) Simple waveform of one token of ilia utterance type "'same tookol for subject FBB, with consonant onset time (consonantstring duration) indicated. b) Entombs- average ENG activity from the OraieUltIrifie arts muscle, for all tokens of "lease tool,' for lublest FM CMG Onset time. at 5$ of bosaline-to-peak amplitude, is indicated at 160 eset before etl rolasse.

51 ENG and Audio Data o -lfrotioh

FAG potentials were recorded from several pEacementa or, Imo superiorelhd inferior orbicularia orts muscles for 'eon subject, tosing surfer:or electrodes similarto those described by Allen, Lubker. and Harrison(1972), The electrodes were applied to the 4ermillion border of the lips, and spaced about a half centimeter:at:wt. The ENG signals were recorded simultaneously write the audit and ;Picea signals on a multi - channel FW tape recorder. In later analyses. the cbsanel yielding the DG signal with the largest amplitude was chosen;in all oases, this was a superior lip placement, Signals troy the lower lip plecements did nokappear to Cesquelitotively different, but ned a lower stanel-to-noise retio,

x;oustio meNewreseiste,,The acoustic rtoordimge trot each of the three SUDJOGGS *QS, data were subjected to detailed analyeis were digitized and analysed using an oicillographic display of it* digitized wiveform, For each of the 18 tws.word test utterances,the durations of the /IV/ and /Vli sequences were measured for *soh of the ten, to eighteen repetitions, as were tne duration. ofis/ friction and ,ta closureand aspiration. Awerale durations of the /1Vi, /V1/, and consonaet sequences were calculated rem %tie fk, individual token measureseote

Reference points were chcaen for aligning tokens or okell utt.ratioa typo ,E06-eetertetti-s7----rtre-wi, nt-rnensen-for the 12 44abwre tne antictpirtof set was the release of tne /t/ before /u,/: for the carryover set_ it wee thi moment of /t/ closure or the _beginning of /el friction Dasedletely after /u/ iFigure J ENG seascire its, The EKG wavefores for each, electrode positph (cnan- nil) andurt,terfmme ropetitson were rectified. Integrated' (5 cosec, hardware integratini, end digitized. Thesignal:Li/ere smoothed, usinga 3 Assec triangular window, OM the ease able average was oalzulated for each utterance and_channel from the Integrated ENG waveforms.' after aligning allto: ens at the refererKe pointin the Ot7OuStIC waveform. These signal recording And ProCilvvIn6 tecneiques have been described In detail elsewhere (iteuley-Fort. )413),

. using tne ensemble atlefOgOS, we 40tOrnitled the Winning tat OrDLCuiOrIS ore OCtivity fOr the utter rates in the anticipatnry let, and the 004 of this Activity for the utterances in the COrry0wiff set, For the Antl:loatory set .tterences. the' beginningof activitywoe definedesthe Wee at which oroicularle ores ERG activity reached 5$ of its maximum maplitude.2 An etemple of an ensmablv- average of one utterance, from the data of subjeat FHB. is 3ho wn In figurele, with this onset time inliQoted, the carryover Set, the *13 of w:ttvitY-wis 7efine-4 as the time At whif-t 13 EKG at7t:lty lei to YI Of its MSstmlos Amplitude

RESULTS Anti,. Coarticuletioh

:r tee, towntng ilnke4 to thr rirglroling tne St-leg, ro 'hell the nAmher iu- FB8

xx

w of) X x 0 0 w2

CEG x X

X

X

100 200 300 400 100 200 CONSONANT STRING DURATION

Figure 2. tter plots of consonant string duration vs. ENGonset ttse in sec for anticipatory set utterances, for all threesubjects. u/ utterance data are presentedin the left-handcoluan;/u-u/ utterance data are presented in the right-hand colsan

46 5,3 'MOM. MIIMMIlearanir.71471r.,..1111.14411.1/6,

Tattle 1

A. Anticipatory Coarticulation:Slope of best-fit line for consonant string duration vs. ENO Onset Tine for 11-u/ and /up.uf utterances.'

f89 NSW CEO

Mc .3209 Ma .1049 .0006 Flop 19.19 F1,44.1 .20 F1,41 .000014

as . 1%927 at .1/63 .2899' 1. -F-17,44 7.er Flo=

apt .05. but slope is negative 4p(.05. If /utui case not included, sv.1544 27 p.05).

Carryover Oparticulat'ln: Slope of best-fit linefor consonant string duration vs. ENG Oftst. Use for /u-ii. and itio.u/ utterances.

. _

FBB CEO

/u-i/ tai .0161 ay .0674 ems. .4795 F1,52 .3249 F1.42 4.66 , 'Flop 10.74

u- u I oh -.0566 Ms .0162 e -.3843 F 1.3: .1089 F1.41 .2152 "Fl.az 23.42

p< .05 41,p(.01, out slope is negative 4

54 47 flints in the string, this activity should begin earlier when the consonant string is of Icnger duvet1on. if, on the other hand. the beginning of the orbicularis oris activity lore linked to the presence o' a rounded vowel, than should be no correlation between the timing of the beginning of ENG activity and Nit duration of friction and eosure, Since there is a jeneral tendency for theseeventsto be of shorter durationin clusters, itis necessary to examine a number of different consonant sequences. or different lengths, in order to distinguish between the consonant-linked and vowel-linked onset hypotheses. In the present set, the acoustic durations of the medial sequences ranged from 70 Web to about 420 cosec.

The "onset time' of orbicularis orls ENG activity relative to consonant- string duration is shown. for the utterances of the ii-u/ anticipatory set, in the left-hand column of Figure 2. Each panel shows the data for one of the three subjects; each point represents the average consonant-string duration and CMG onset ties for aboutt4 tokens of each type for two subjects. and 10 tokens of each type for the third. If anticipatory coarticulation were systematically related to the onset of the corn.-Aant string, we would expert the points to be fitted by a line having a positive slope; instead, hoaever, the point are fitted by a line whose slope is not significantly different from zero in two cases; and is significantly negative in the third (Table 1).

In the right-hand part of Figure 2, we have plotted the ENG onset to-- relative to consonant string duration for the ,u-u/ utterances. The res.l's fit the same general de.,cription as the /1-td case; that is, coarticulation began a constant interval before the onset of the second vowel, vitn a single exception for each of the three speakersthe casehavingthe honest consonant duration. A fairly straightforward explanation can be pro,i 'ed. if we assume that for this case the intervocalic interval may be shorter .an the time necessary for muscle activity to.fall to oaseline for the fi-r f4,- and rise for the second. This hypothesis is supported by tne fact_hat, for all three subjects, the minimum. or baseline, activity for ft/ strings is htgner than for any other ("4able

minimum 1.--mj: ktp.it-4-1P Microvolts-~ tne btls in r.i itteratsr_e3

!. Another interesting result for the two vowel Condition:. is that there is a difference in the intercept of the best straight-line fit foe /i-u/ and /u- 1.11 cases; that is, rounding for the'second vowel begins earlier -if the first vowel was /1/ than if it was fu/. Somewhat similar data are presented by Sussman and Westbury (1981), for fi-u/sequences as contrasted with /a-u/ sequences. In their data, the difference in onset time is not significant for the fikstu/ vs. /akstuf comparison, although the difference in onset time is significant for the /Diu/ vs. /aka comparison. If the differences in onset time are a Consequence of the lip position for the first vowel,we might expect consistent amplitude differences for the second vowel, depending on the ident: , the first. Such differences were 'reported by Sussman and Westbury for t.,le -ases (see their Figure 3). They do not comment on the /kJ case, where one might expect larger effects. Peak EMG amplitudes for our own data are rented in Table 3, and, although there is some tendency for peak values fc second vowel to covary with the identity of the first, there is no absolute,y consistent result.

The Inalysis presented in Figure 2 does not examine possible effects of the location ofword boundaries. Indeed, in the classicexperimentof Daniloff and Moll (1968), no effects of, wordboundarieswere observed, although some similar experiments have claimed to show effects of some kinli of linguistic boundaries (e.g., McClean, 1973). Since there are complex but siatmatic effects of word boundaries on consonant duration (Lehiste, 1960). we re-examined the data for possible word-boundary effects, as shown in Figure 3. It was not possible to examine those utterances produced with a segment common to tne end of the first word and the beginning of the second, because nonson.st duration could not be apportioned to one or another side of the word boundary. For example, as noted above, the sequence that was orthographically represented as "leased tool" was usually executed as [listul]; since /t/ was associated with both words, no separation caul:: be made. For the subset of tne utterances wherean acousticevent could to associated with the word boundary, the results are as before--that is, there is no systematic relation- ship between onset of anticipatory coarticulati. and word bow 1ary (Figure 3). We would add teat, for each utte- nce set for each subject, the range of EMG onset times forthe orbicularis oris is considerably smaller than the range of consonant durations (Table 4, part A). If the onset of EMG activity were linkei to the beginning of the measured durations, we would expect the ranges to be comparable.

Carryover Coarticulation

Examining tne timing relationship between the end of :;rblo daris oris EMG activity and the duration of the c resonant string following a rounded vowel, we found a pattern very much like that found for the anticipatory condition. Specifically, tne "offset time" appears to be unaffected by the duration of the following consor,nt string (Figure 4). Rather, the slope of the line of pest fit for eacn utterance set for eachsubject wasnot significantly lifferent from zero A:Table 1b). And, again as with the anticipP:c,ry coarticu- latior data, tnerange ofDIG offset times is smaller than the range of coisonant d4,ations (Table 4, part 8). In tnese data, however, lip position for the following vowel did not influence the timing of tne end of the vowel gesture. That is, the following vowel is not anticipated in the tlmIng of the end of tne first vowel gesture. Table 3

Peak ENG Amplitude (in Microvolts) for Vowels of Second Syllable'of "Anticipatory" Set Utterances, with /u-u/ Utterance Peak Am*itude at the Left and /i-u/ Utterance Peak Amplitude at the Right

peak amplitude peak amplitude /u-u/ /i-u/

FBB it 236 362 ist 314 301 sir 280 265 stit 239 269 silt 237 270

NSM it 518 439 sit 480 552 stft 475 444 ist 506 507 sist '434 452 stist 421 430

CEG t 272 , 222 sit 235 246 1st 228 274 Mist 207 200 stft 190 200 stist 240 244

5?. 50 u / /u- u/ FBB 300 300

X 200 X X X

100

I 101 4 I 0 300 0 300

NSM 300 , X

X X X 200 x x

1°° 100 0 300 0 300

CEG 400- 400 X X X X' x 300

)0(

200

X

1 000 I 1000 I 100 200 300 100 200 300 WORD BOUNDARY TO CONSONANT RELEASE

Figure 3. Scatter plots of the duration of word-initial consonant strings Vs. EMG onset time in mum, for anticipatory set utterances, for all three subjects. /i-u/utterance data are presented inthe left-hand column; /u-u/ utterance data are presented in the right- ti hand column.

51 Table 4

Range, in Mee, of EMG Onset and Offset Times and Consonant String Durations

A. Anticipatory Coarticulation

EMG Onset Consonant Syllable Initial Duration Consonant Duration

FBB iCnu 55 174 113 uCnu 95 172 119

NSM iCnu 125 299 176 uCnu 70 296 220

CEG iCnu 95 281 174

uCnu - 120 298 166

41 B.Carryover Coarticulation 4

EMG Offset Consonant Syllable Final Duration Consonant Duration

FRB uCni 15 193 uCnu 50 172 123

NSM uCni 25 293 211 uCnu 20 296 267

CEG uCni 140 260 252 uCnu 110 298 244

52 / u- i / /U- U/ FBB 100

X X XX XXXX X X

400

NSM 100 100

X 1XX X X xx X

-100 100 400 100 Ann

100

i(X

X X X

100 200 300 400 100, 200 300 400

CONSONANT STRING DURATION

Figure 4. Scatter plots of consonant string duration vs. ENG offset time in msec, for carryover set utterances, for all three subjects. /u-i/ utterance data are presented in the left-hand column; /u-u/ utter- ance data are presented in the right-hand column.

CO DISCUSSION

The data suggest that the beginning of EHG activity assodlated with lip- rounding gistures for vowels is more obviously related to other components of the vowelarticulationthantoaspects ofthe consonant stringlength. Similarly, the end of EKG activity associated with lip-rounding gestures is most straightforwardly described with relation to the end of the vowel, and not with relation to the follciing consonant string.

Previously published reports, suggesting that lip-rounding gestures mi- grate ahead to the beginning of a preceding consonant string, may be accounted for by referring to the timing of orbicularis oris activity for the second vowelin /11.u/ utterances having short-duration consonant strings. In these cases, lip-rounding activity seems to beginj&ater (i.e., closer to the second vowel) than it does in utterances having ranger consonant sequences. If one examines only a few utterance types with one or two st.'wt and onelong consonant sequence (cf. Sussman & Westbury,1981), and if an earlier vowel gesture either inhibits or masks the beginning of the rounding gesture in the short-string utterances, it may- appear as though lip-rounding onset follows the beginning of the preceding consonant string. However, we believe that our data cannot be accounted forin this way,nor can the movement study of Engstrand (1980), which give the same general picture.

This picture of coarticulation is quite different from the look-ahead scanner model, presented by Sussmar and Westbury ('981). In their- model, if a prior vowel is biomechanically antagonistic to rounding, "temporal and ampli- tude adjustments areincorporated into the anticipatory rounding gesture." Rounding begins, presumably,some time after the end of the antagonistic vowel, but this time is simply displaced, by some amount, from the beginning of the intervocalic string. Thus, there is always a carryover effect of the preceding vowel on the onset of rounding; but for all consonant strings longer than some value, the onset of rounding varies with string duration, presumably as a reflection of thenumber of elementsin the string. In the model proposed here,a preceding vowel may have some antagonistic effect onthe onset of roundng, and hence, rounding may appear closer to the second vowel in cases where the consonant string is short, or whenthe vowel changes. However, rounding onsettime does not covary with the-number of consonant string elements beyond that point. We assume that the reason Sussman and Westbury apparently observed a string-element effect is that they compared a one-consonant sequence with a three-consonant sequence.

There is still a good dealthat remains unclear about both models and data. We agree that the onset of rounding is clearly influenced by peripheral biomechanical concerns; thus, in the Sussman and Westbury data, rounding for /uV begins at a different time following /i/ and /al, and, in our data, at a different time for Jul following /u/ and /i/.However, by examining a set of utterances '41,0ae consonant durations for eachsubjectwere fairly- well distributed through a wide range of durations, we believe we have shown the rounding gesture to be linked to thevowelarticulation. That is, the specification-of lip position for the consonants is not altered by a migrating vowel feature. InStead, and as we have also suggested elsewhere (Bell-Berti & Harris,1981),we see the vowel-rounding gesture beginning at a relatively fixed time before the acoustic onset of the vowel and simply co-occurring with some portion of the preceding lingual consonant articulations.

54 61 REFERENCES

Allen,G. D.,Lubker,J. F., & Harrison,E. New paint-on electrodes for surface electromyography. Journal of the Acoustical Society of America, 1972, 52, 124. (Abstract) Baer, T., Bell-Berti, F., & Tuller, B. On determining EMG onset time. In J. J. Wolf & D. H. Klatt (Eds.), Speech CommunicationPapers Presented at the 97th Meeting of the Acoustical Society ofAmerica. New York: Acoustical Society of America, 1979. Bell-Berti, F. Velopharyngeal function; A spatial-temporalmodel. In N. J. Lass (Ed.), Speech and language: Advances in basic research and practice (Vol. 4). New York: Academic Press, 1980. Bell-Berti, F., & Harris, K. S. More on the motor organization of speech gestures. Haskins Laboratories Status Report on Speech Research, 1974, SR-37/38. 73-77. Bell-Berti, F.. & Harris, K. S. Anticipatory coarticulation: Some implica- tions from a study of lip rounding. Journal of the Acoustical Society of America, 1979, 65. 1268-1270. Bell-Berti,F., & Harris, K. S. A temporalmodelof speech production. Phonetica, 1981, xi, 9-20. Benguerel, A.-P., & Cowan, H. A. Coarticulation of upperlip protrusion in French. Phonetica, 1974, 12, 41-55. Bronstein,A. J. Pronunciation of American English. New York: Appleton- citury-Crofts, 1960. Danilo'a, R. G., & Moll, K. L. Coarticulation of lip rounding. Journal Speech and Hearing Research, 1968, 11.707-M-1. Engstrand, O. Acoustic, constraints orinvariant input repr,sentation? experimental study of selected articulatorymovements and targets. Reports from Uppsala 1)1:21./ Department ofLinguistics, 1980, 7. 67- 95. Fowler, C. A. Coarticulationand theoriesofextrinsic timing control. Journal of Phonetics, 1980. 8, 113-133. Fromkin, V. A. Neuromuscular *pecification of linguistic units. Language and Speech, 1966. 9, 170-19c.. Gay, T. J. Articulatoryunits: Segmentsor syllables ?. In A. Sell & J. B. Hooper (Eds.) , Syllables and segments. Amsterdam: North Holland Publishing Company, 1978. Harris, K. S., Lysaught, G. F., & Schvey. M. M. Some aspects of the produc- tion of oral and nasal labial stops. Language and Speech, 1965. 8.135- 147. Henke, W. L. Dynamic articulatory model of speech production using computer simulation. Unpublished doctoral dissertation, Massachusetts Institute of Technology, 1966. Kewley-Port, D. Computer processing of EMG signals at Haskins Laboratories. Haskins Laboratories Status Report on Speech Research. 1973. SR-33, 173-- 183. Lehiste, I. An acoustic - phonetic study cf internal oper juncture. Phoretiei, 1960, 5(Supplement), 1-54. MacNeilage,- P. F. Motor control of the s?rial 'ordering or speech, Psychological Review, 1970. 77, 182-196. McClean, M, Forward coarticulation of velar movement at markedJunctural boundaries. Journal of Speech and Hearing Research, 1973,16, ?86-296. Sussman,H. M,& Westbury, J.R. The effects of antagonistic gesture:3 temporal and amplitude parameters of anticipatory labial coarticulation. Journal of Speech and Hear] Research, 1981, 22, 18.24, Tatham, M, A. A.. & Norton. K. Some electromyography data towards a model of speech produotion, University of Essex LanguageCentreOccasional Paper, 1968. 1. 1-59.

FOOTNOTES

1OptImum choice of timing measures from DG signals depends on several considerations, including both the nature of the EKG data themselves and the use for which the measurements are intended. There are three sources of token-to-tbken variability in ENG signals whose relative magnitudes bear on the choice: uncorrected electrical noise, the statistical nature of motor- unit excitation, and articulatory timing wariation. Effects of this third source are minimized by control of speaking rate and by judicious choice (and careful measurement) of the acoustic reference point. When the first two sources of variability are large--and especially when the EKG onsets are gradual--measurement from the average signal is preferred.Since we faequent- ly encounter both gradual onsets and relatively noisy signals,use of the ensemble average in determining EKG onset time is generally the method of choice (Baer, Bell-Berti, A, Tuner, 1979).

2This value Was chosen because It aSsore4 thatwe were not identifying random background noise asthe beginning of activity. This 5% point was exceeded for each speaker for the utterance "loo tool," Whir:h had a relatively short "conzonant string" and. consequently. the minimum revl of Ellt; activity between the two rounded va:1s lid .otfailt..". it of the peak activity, For these case.). we _hope the time, at whir_'h Jam= activity orcorre& TEMPORAL CONSTRAINTS ON ANTICIPATORY COA4i1Ci1LATIre e

Ge!fer,. Katherine S. 4arrta,0 and i ry Hilt

Abstract. 'f eccounts of culation. 1) that the anti of segmental gestures. am, thus the ext'en't of teir influence. si determined primarily according to the compatibility or the feature specifications for nreceding and enticipated phones. and 2) tnetthe extent of anticipatory gesturea is delimited accordipg to temporal specification? intrinsic to the motor program, yield very different predictions regarding articulatory organiiation These predictions were tested by varying the nuie:er of intervocalic consonants in a V;CAV2. uteee V2 was either /1/ or /u/ and Owas /51, /et/. or /nest/. Wewere thus ableto determine the extent of epectral thanges within the c001,40hant string as afunction of the upcoming vowel Our results lend support to the second account and suggest that the onset of a phone's influence on precedingsegmentsis tmporally constrained, presosably l'oCasiSe anticipatory gestures are time-locked to the 34kgMett:athey cLafacterize 4h4 are netfreely- mtgratlhe features,

A significant issue =n speech -r odoction theory Is the ettentto wt. erticulatory gestures for speech %silents are anticipated, From the long-age realization,that phones were, at least spectrographically. hondiscrete. tneo- offeature epreadlag were Dornin attempts to reCOriCiita COhtInjOul Output with a presumed nliWontinuous put ,,e,g Daniloff 4 Hammareerg, Henke. 6=

Numer3,0 models of coartieuistion have in sp or4teal tht hotioh that anti:Ipet101 of articulatory gesturesOCCurapriaarily a- _girding to the ;ompatibility ofthe feature specifications forpreceding andanticipated phones (EWngu*rel s Cowan,1974: Daniloff 4 Moll, 198g: Henke, 1967, McClean, 1473: Busmen 4 *e3tOury. N0'). COarticuletioh. according to this view.15 teerefore iimltless with regard to Use and spreads Over entire ph0n9logical JAIta attil It :o -eked by ocompatible gesturee. In anticipatisn rounded example. ilo round.ing 13 3431d to cutOver43 lany ..'evioql 'segments 43 are unspecified for lip configuration, with tee exteet_ of the enticIpatrv- ge3'4rt varying lirectly with the *naet tn. 1±reio.:311- 3tri 1.er !IF example. Benguere. 4 :41%01, IV :Anii:ff t 141_ ';111'

SA if't t.h13 Kett: ! 4:1'1-:4. o! gmerice, ova, Cona00 AlsoThe :goraduate School, City y of mew yei-k

"Also The Graduate School. UnIverelty Y-4r44,

A011h13tratiOn Helical Center . greet. New yorw. Acknouledert: Thi3 work was Supporttl try 41aCUS S- 11870 ani BAS grant RP 516 to K3SWirs, LaD0fateJrael

IHASKAa LAS-ORATA:Er, Pepo-'1 .,pre,.:h ktletir uy aAtIvuJallip oi Pasta-luovan- cm ctai Jo u°1101400:1 CI au* it ii 91 AJoledi:mus soli scoot iJF pald,z,ot-out? °- owl s7uos14c Ayol ozlpytocorto 't14.8-//a8) :0061 11J09-Itati V 's1.1.4114 '646i ti961 '40110A '(096i ur---e,s- 9 TOPOIO PTT 0111 0101V 11114.4 00i1OMP044:* JO at Ja8Ja1 AVIMU JO sp a lusuocuoz sluaeltws Burnt aseyoroul lou esnpaci up AJOIVeiPilUf Ojn1509 satmelle jiscll of liar luipsaaJd "souotod lAs se V lInSVJ JO telusegas 'RulusilJous .2,141 UP 14001N0J:u1 Jogsru jo rlosClow ilej uItillA 041 Atomic/44 POW OS11 46.100-3 JO aril AJolerpriJe .aoilso9 .44 Iuals4 jo c/ 0010j0.404: ATTOJWOOI 'pallgliap In0q1;A pjyy4,, 441 41nto.eqe jacifir. J' Duts4- cluas*os

y 4441.01J 1p4olsol Ispou .J.1 slIp4 low: 441 aPniTubva: )0 tie ?!.11207)-4v coGeuout 90uOntjul PrOtit AJVA Ce V uollaunj 40 IVJOIMA1 41M-110A ol le41 'auotel 'oJoiaJw. aul JaVuol oul iiu;p4a44 mtiOutJle 041 SS.I 11 SI 01 v0146 AJO1V1n=11.400 4-4,1J0 IV Vle '141Uu0 altu P .101J00 9u1416 Dtrioua ^041 JJ4 V1 Ja,f. A*4lpuotl.c'io.i a4. iii: Jo sli 'ulguat sJaA40014 le 4441 43MS 1140d u/ agil 01 a-4-.0/4, 444 .:7140-nomp laguo jo a41 #u1SOZ,-.11t; rau0u4 0441 4a)94P 4u1 WulmorAr .14v0A 4.1u4nljul pir614 JOTIVIS liu0 rue 'Citut4lc

'Aa4Al1c;41411_4 t! p4;04-';u4tIlo c?apo,..;- '11;c4 4-J1 laiuo 11.74:0ouVIInw!c alto 41.11 -44gvC. _4- 0 PVIP.44.01 "tru1416 prnov 1:,advo A4ol!,/arAur ;=4:1-0j,ko o4 :=,-;Quolop Al e c,lu=01z114V Ailc10.11 0,1 awl 1,141-.;IC pue ..ic;-it? 01 sl: :epArool 41ticie;t:

04! g- ..11 v4,:t1:1;, 4a4;la !op.(34,! Aq ?u1A4e. 4=41 ,i0gariu i caWIT ut ecA,441u; puv ?ullou 4q1 uJOlied 0

p30:14 -i1uf*u41: A:;--u4nta41 r :Dun; ) 4zuvlcIp 00.0 4.7,41 ,-Nos t,140 TovOo w 04c4o4ow1 p-ola1641cu0 p4- c4zuu4011P Ultv ',A:A 'Porr04Uo

440141t :3= V* '14;; JO s '114! ilcOlCi t pup qt., 0.44A 4447;4 puv /In/ 4C_ 4101 iL a,i441 ajWa o3ueJalv= SodAl ztc !ectuIc tul.;OijI0 ATtpc, '4: 4u; A.11u4p; 1,J OP1 4Te0 6.1411eXIC :0 ,10:1104C,J1in :ON rAci aJ1 17.-4!014. ;741;43 co.:71111141a, 11:300 `aZ:JUJ,1414n P4Macillw 4341 141440:'. 10%. .,,kepp.,1 al *-* 11-4,4 acer)1 4Sea t4c.%1:744 4JAp vop.1,:z-4AJ pli4u9wim 'ads1 lnaul tit e

1..??welo r- .`t 401r4 4114e4 'cal. rue p4:;11YIP _awol cz..4o;4Aco a4o" Vicatt7,v at.,1 1.4- istil .:h0%=)A to:Alcv--411J:44i: 4440- t.aQ_A!, uclurtio4 tf417;1; i i;a4-444 c0;4:41 ,1 at4 7ez,11 -1;a0II:Ja

34:r0.--vq j^ -;*44P-- p-A44,d_dol! -4; r4tve-;;. ,714410, 44,J,144,47.41 -C"=114; " "1"-Adi: =,,Va ;,4i4P4aPAt';41,n,1.'1- 41.f a4zea.i.J; 4% . 0-1, i.lti4-pft.v0,CVU 4)4 4f1,41 .4!.4u i 41,41441, C.-u0 4).4uVoQi 1.711.01ta1i 3;4=-NUVus.*.a4 W-44 i,; !lt^ 40i'4 V%40::41c OVir,14C1 4va,14r vi a; vfl:,V3t4,4--1-= pucaoc. iuez4==.'4 4 4Q YuIVue';

k-0:1- 7 ; ; ualuCu

1 i ' :Wow, twle aye raged spectra with their reapective peaks diapIayed above for tee minimalpair _s e° and 'lease ooze"for one subject. Note teelow free,ency resonance threaJghtne int?rvocalic portion of these atterancea.

F16,0e 5ho.cs the ayeraged wavefari-_-1 forthe utterances "lease ease.* ease'and 'least :steel" for a second subject. These are 200 cosec samples that_nci:ide the fiest 5U cosec of the second vowel and the 150 msec pre-eaina tt_ Thus, at eve-y temporal point relative to the onset of '2.we ar ;"--g 3 :flrrent the aee'uatic signal for ea-:.n utterance

It hc : fe.;ttrl at. for the Ibt.Sti despite the trtnc,gr.4-ni, It:ere zs evidence ofonly onefriction portion, one ClO3ute ar4 re,"-aSe utterance appears to have een proeaced -natural y- eeat r st_; affer;a:g from the /5',..! utterance only In the luratle.r ttat clesure_.

ect'a4 avers in .solves some problems, however, also presents :t!-A*r 7hus, tecduse indIvidual tokens of a given utterance typeare vaeuee' eite variable durations, s likely that the friction. and vocalic ..ortion5 ..1;1 te averagedtogetner al the distancefrom the second vowel increases. r-ter to minimize the er,asibilityIf confounding the data in '-t113 way,we tack the range for all tokens of each consonant string type, letermlrel Inl nv'ed t: hens into Ions and short bins on this

meae,rement::, were made spectral aections at12.5 msec intervals 4n1 over 415 c'.1e. 'inter:als f)r the 150 msec preceding the acoustic .?.Set the secorl vowei. F(:,r eachminimal pair, F2 values for the utterans fir-aslu- wereyeattra, : ::-om those of utterances with final --:nee initial were always ilentical, positive values are there .Alf= fnal volde;'3 Influenca, with larger differences

114, grelt_vr effect s

,fe e aiang tne y-axis for F2for all long 31--; p4:rs where VI is after sorting. it should be noted eat weer ti-AenAar;:, sorted In tnis way. there is temporal overlap between -tteranle types. ,.ample, the longest singleton string is longer than the shortest string. wr,iie tree longest, /st/ stringsarecomparable in lurati ,n i the shorte jstOsti strings, which,it should be recalled, were prono_;nc:e1 Thus, thesefigures actually depict two - -and sometimes three --- ompar, sons. one for consonant strings of different phonetic structure an,1 ,1,;ratizn, fer eonaonant strtnus oflaentical phonetic structure but 1:fferert 1,,rats, end, ir some ,:.ases. 6ne where phonetic 3tructiTe differs ,,lureins Ore ,-}mpArab,e.

rsina! ',Me! ata 3t-:or'is that, oespite temporal and phonetic differences or Lizm:1:1';t1e3, the er:ti':al variable appears to be time from the onset of the ser.!):: vowel, s_ich that there is a similar decrease in the F2 difference for eacn pairas their distance from V increases. In other words, it appears that. I.,/ .ttcrances r thistype, the influence of thesecondvowel is

5q 66 i

A

SH

...... JA ...... 11 to A

,...... ".10."......

2 ...b.

J : i

-......

Ig* ...... m.

)

">. I

LEASE EASE LEASE OOZE

"NOT TODAY"

Figure 1. Two averaged spectra with their respective peaks displayed above for the minimal pair "lease ease" and "lease ooze" for one subject.

60 67 44'

4

Figure 2. Averaged waveforms for the utterances "lease ease," "beast ease" and "least steel" for one subject. Accompanying labels depict only the inteC,,d expression and are not transcriptions of the subjects' actual productions. 61 SIT

0

DEFERENCE IN HERTZ 0

A

0

-150-12S ty0 -75 -50 -25 0 DIME Irnsi

142ms Q IsV-e IsV-I 203ms I?erns 0 WV-3 I IstV-L 22.4ms 22enie 0 lststV-3 Istst 1/-1. 243ms

Figu;:e F? difference in Kx for sorted tokens of ainisal pairs where VIis S. Longandshot tokensare indicated by closedand open symbols,respectively. The differentsymbols denote consonant string type, with trianglesfor singleton intervocalic strings, squares for fat/ strings'and circles for istOst/ strings. Values onthe x-axis indicate time before the onset of V2, which is indicated by 0. 7emporal points where symbols are absent corres- pond to the closure period of the stop consonant. The msec values next to the symbols in the legend indicate average consonant string durations for each einleal pair.

62 teLoora-",ly .e..1 irrespective tne segmental composition of tne preced- ing string.

dOP Figure 4 depicts the same F2 differenceas a function of time from the onset of the second vowel, out for pairs where V1is /14/. Again, the long and snort tokens of mimimal pairs are plotted, and tnereis the Same temporal overlap for tokens of different phonetil structure. Pernaps even more tnan tne first figure, the data Illustrete tr.!tendency for all utterance types to snow similar anticipatory effects at almost all sampled intervals.

Mote, tots, tnat at -150 msec we are sampling the F2difference at tne end of thefirst rowel for the snortest /s/ tokens. It is interesting tnat the magnitude of tnis differenceis almost identical with that of the friction portion of tne other pairs. This finding mightto explained, not by anticipatory lip configurations as far back ar the first vowel, Which for /1/ and ./ur are incompatible, but by tongue configurations that are capable of anticipating up-coming pnones witnout preventing the successful production of current ones. Thus,the)ot, of coproduction may be divided Detween primary articulator,.

Figure c,snows tne data for our second sub;ect's mini- .1pairs where V1 .3 ;:/. the trend is Similar in tne sense that anticipatory effects are SimilarIn magnitude at most intervals,tne effects diminise more abruptly over time and at intervals closer to V2.

A po33ible explanation is ..he fact tnat, with only the exception of the /stOsti pairs, allVI offsets occur within tnis150 msec window. This is unlike ourfirst sub)ect, whose consonant strings were of longer durations and, witn one exception, fell outside this time frame. Thus, while it may be possible fortnese vowels to coarticulats, and tnerefore show anticipatory effects, there may be limits tothese effectsforvowels az opposedto frictioe. anus possibly accounting for the rapid fall-off it F2 differences.

It Is interesting, Lc-0, tne there are some negative values, indicating a nigher F2 when. Jui ratner tnan /1/ 13 tne second vowel. However, almost all of tnese occur at 150 glue prior to tne acoustic onset of V2, the most remote portion ofour sample. And,while we nave nottestedthese differences statistically, we would speculate tnat most of these values do not deviate significantly from zero. The value for the long 1st/3V pair, however, is at epochtImately minus1C0 Hz. wnicn is substantial,if not significant. And, Since trre 13 no otner ::,stance of sucr negative value, It is possible that this reflects carry-over effects.

F.g.xe 6 snows tne data for tne second suoject's pates where VIIs /us, andit is similar to nis other utterances in tnat there is an abrupt fall-off in magnitude of tne F2 diff-rence at -75 msec. The general trend is, however, simlier, altnougn there is more scatter attne intervals fartnest from V2, which we :annot explain. This differs not only from cur other speaker, 0..t 4; Sc, um tn:s speaker's other ,:tterances.

DISCuSaion

The data for 00th s,Jo;ects snow the tendency fur coarticulatory effects tobe max:maiat points In time siosest to the acoustic onset of tne second S3TT

0

DIFFERENCE IN HERTZ 0 0

-150 -125-100-75 -50 -25 0 TIME ims) 131ms p uV-3 usli-L 203m leemustV-$ ustV-L 223ms 21$ms OuststV-3 uststV-L 231Im

Figure 4. F2 differences in Hz for sorted tokens of minimal pairs where Vi is /u/.

64 71 /

S. SH

DIFFERENCE IN HERTZ

Gems A IsV-3 IsV-I. 11drns 113ms IstV-11 III IstV-I 134ms 120ms 0 IststV-8 IststV-L 211ms

Figure 5. F2differences inHz forsorted tokens of minimal pairs where Vi is /i/ for second sub,lect.

6; 72 ucnv St SH

600 DIFFERENCE IN HERTZ

43

-150-125 -100-75 -50 -25 0 TIME (me)

941nts b usti-8 ustil-1 110ms t t tins ustV-8 IlhastV-1. i 31 MS 170ffts 0 ustetV-8 uststV-1 212m5

'Figure 6. F2 differences in Hz for sorted toaens of minimal pairs Where 13 /u/ for second subject.

66 7,3 vowel,independent of absolute duration and segmental compositionofthe preceding consonant string. And, while we do not observe the influence of V2 to be identical in magnitude at all points in time, the effects are systematic enough to support the notion that coarticulation is temporally constrained.

Thedata thus speakagainst the notion that anticipatory gestures automatically extend back to the onset of a preceding string. It was observed that the early portions of the longer strings .failed to show substantial effects of the second vowel even though they were allegedly free to do so in the sensethat anticipation of V2 was inno way incompatible with their successful production. Furthermore, some of these F2 differenceswere actual- ly reversed, indicating, perhaps, that carry-over effects were still operative during theearly portionofthese st.:ngs. In addition, coarticulatory effects for the shortest consonant strings were sometimes observable during the latterportion of the first vowel. Thus, we see both the absence of coarticulatory effects inplaces where segment-based mouels predict their occurrence, as well as the presence of effects where these models, by virtue of the hypothesized mechanisms, predict their absence.

Our acoustic data are consistent with those of Soli (1981), who found the frequency of F2 within friction to be lower in anticipation of /u/ vs. /i/. However, ne attributes this difference, not to lip rounding, but to different place of the primary constriction in. anticipation of back V3. front vowels. His argument appears to derive primarily from data showing F2 frequencies to be similar preceding /a/ and /uJ, where both are back vowels but only one is rounded. According to Soli, the effect of rounding,then,is to alter the fricative's overall spectral shape above3 kHz. He maintains further that "while anticipatory vowel coarticulation appears to be limited to the final portion of the fricative," anticipatory lip rounding may occur throughout the fricative (p. 21).

While we consider Soli'sgeneral hypothesisregarding the acoustic effects ofanticipatory tongue configurations to bea very tenable one, we would reject the notion that the general time course of anticipatory gestures differs significantly for different articulators.' In other words, the fact that the lips are free to round during the course of a fricative preceding /LW does not mean that they do so. This was demonstrated electromyographically by usd Harris (1979,in press) and cineradiographically by Engstrand (1981,,whose datashowlip rounding to occur ata fixed time before the acoustic onset of a rounded vowel and tobe unaffected by the number of preceding ',onsonant segments, the p.oduction of wnich in no way precluded lip rounding. In addition, Bell-Bertiand Harris (in press)demonstratedthat ,;ertain speakers round for /5/ in totally unrounded environments Thus, one or,uld naturally expect the electromlographic and acoustic records to differ depending onwhether rounding 13 or is not an inherent feature of a npeaver's frica,ve production.

The main point here Is'.hat whileit may be thatlip rounding and place of ,:onstrir_tion exert ifierent spectralinfluences, it Is intfltively unrea- nonabl 43 we;i an empir.dlly unfounded to suppose that the general organiza- tion of ant1,'1,-,atory gestures should be articulator-specific.

The renu.tn ofthe present study suggest that theonset, of a vowel's

pew_e ob prerelIng 'segments 1 1 temporally constrained, prenqmably because anticipatory gestures are time-locked to the segments they character'e as opposed to being freely-migratingfeatures. Further interpretation of the data, however, is limited by the fact that only the acoustic_waveform was analyzed. We are currently planning studies with simultaneous EMG recordings from orbicularis oris and. pertinent intrinsic and extrinsic tongue musculature in order to determine whether we can account for our acoustic data and Soli's on the basis of tongue and/or lip configurations. In addition, using subjects who _produce /31with and without rounded lips in nonrounded envirnonments should provide an interesting comparison.

REFERENCES

.Bell - Berti, F. Velopharyngeal function: A spatial -temporal model In N. Lass (Ed.), Speechand language: Advances in basic research and practice (Vol. 4). New York: Academic Press, 1980, 291-316. BellBerti, F., & Harris, K. S. Anticipatory coarticulation: Some implica- tions from a study of lip-rounding. Journal of the Acoustical Society of America, 1979, 65, 1268-1270. Bell-Berti, F.,& Harris, K. S. A temporal modelof speech production. Phonetica, 1981, 31, 9-20. Bell-Berti, F., & Harris, K. S. Temporalpatterns of coarticulation: Lip Rounding. Journal of the Acoustical Society of America, in press. Benguerel, A-P., & Cowan, H. A. Coarticulation of upper lip protrusion in French. Phonetica, 1974,,0, 41-45. Daniloff, R. G., & Hammarberg, R. E. On defining coarticulation. Journal of Phonetics, 1973,1, 239-248. Daniloff, R. G.,14 Moll,K. L. Coarticulation of lip-rounding. Journal of Speech and Hearin Research, 1968, 11, 707-721. Engatramr, O. Acoust c constraints or invariant inputrepresentation? An experimental study of selected articulatory movements and targets. Ruul 7 (Reports from Uppsala Univertity, Department of Linguistics), 1981, nr 7. 67-95. Fowler, C. A. Coarticulation and theories of extrinsic timing. Journal of Phonetics, 1980, 8, 113-133. Heinz. J. M., & Stevens, K. N. On the properties of voiceless fricative consonants. Journal of the Acoustical Socie of America, 1961, 33, 589- 596. Henke, W. Preliminaries to speech synthesis based on an articulatory model. Conferences Preprints: 1967. Conference on Speech Communication and Pro- cessing (Air Force Cambridge Research Laboratories, Bedford, Massachu- aetts), 1967, 170-177. McClean, M. Forward coarticulation of velar movement atmarked junctural boundaries. Journal of Speech and Hearing Research, 1973,16, 286 -296. Soli, S. Second formants in fricatives. Journal of the Acoustical Society of America, 1981, 69, S5. (Abstract) Sussman* H. M., & tbury, J.R. The effects of antagonistic gestures on temporal and amplitude parameters of anticipatory labial coarticulation. Journ1 of Speech and Hearing Research, 1981, 24, 16-20. Yeni-Komshian, G., & Soli, S. D. Extraction of vowel information from fricative spectra. In J.J. Wolf & D. H. Klatt (Eds.), Speech communication avers kresented atthe 97th Meeting of the Acoustical Society of America, New York: Acoustical Society of America, 1979, 37- 40.

68 FOOTNOTES

IIt should be noted that while we and others (Yeni-komshian & Soli. 1979; Soli, 1981) consistently note lowfrequencyresonanceswithin friction. previous accounts of the acoustic theory of fricative production (e.g., Heinz & Stevens, 1961) all but dismiss the presence of low frequency resonances, due either to the decoupling of the front and back cavities or to the uancellatioh of back cavity resonances by the presence of zeroes. A :Y:P -10H;S

'ane,te 4!:1

Attra,zt, any .....______3tp zohsonant ie Eng:15h. the fir3t 3t0V 15-,-0,-..zn:y 4nre:raAel For honnomorgant 3top r,cnsonant sequences.thIA statement t't.

taken to iMpiy that the OleceSSary; artl,:v,..---ry release =_=f tt.t first stop has no 0,,servable acousticconsequences, lo etamine ta:I,s

-Ialm. me recorded sentence s. produCed by Several native speaicer5, ,-,!- AmeriCan English ata converSational rate. contirling md-interhal sequencesuf tmo nonhomorganic stops. either across a syllabl.2 boundary (e.g. cactus. 2112nl. orin wOrd-ftali' pOSitiOn a,zt, sobbed). Dscillograms oftheCr1tIcal voros revealedtnat release bursts Of the fir$t Stop chit'urred in the majority oftoltens. eicept In those where the second stop vas bilabial. The b,r!Its mere

a=7,tical:y rather idea and difficult to detet7t by ear. which may a':0uht for thetr ?-awing been negierted In the ..titerature. InSteal -A- a lltple 'releaSee-n4nreleaSee itt?,!Inct1;.:n,we prr)K1e 4fve.- waY :::i0151fIcatton that makes use :f ,icir.-a1l;- arti,--..iatcr)-, . ',...,1trfot!vt, phoneti.c :riterla

:NTRODUCTIN

leluen,:e:t of !wo nonROCOrgani,..- ._)n4f,,r,drit.% are øp c-uf aCrO3n wC}f-.1 t,0101Jarlel/ 444,. b1,1 d9g,gfe4t Pnt;. acr035 ulthin worls caCtu3, plikt2, and 1r nor einal Lc. ;e.g. act, 1otstled, tbooks ofEnglish phon1 generall) point !,":4 the fir.lt stop In .5:4th 5equencelt 13:':occsonly QnreleaSedor 44ttz0r1 :e.g., Ladefoge4,19!. 'p, 44). vci, ;- lay !h4t1 t?;at. Otner5 e,g Atpr..-romtle_ t41. Kery '.;l'e "sP11=;'- !tlt- 4Y-tlator: p)i-ent

11t !,,t_f-i!eetr!+-Lt_ qcia f z'at t.t.t f 1r = !le4,1"--v ;1 r4yt;e' 17.-eilithi,_ flterf=rntt4 a Itrt;.) Zsrt1C,.4tcr2 "eforr:.,4 the t=retk!."4 .ihtrAct t,etweer. artl,=-11:attt tnat rte".4..1 7r! tett":=1 t'e tre 1tLerttle",t

3 !'t -4,- Iv 11 ; "- ;-'relfr-ntel Tht 44e07,1t;_t 'Ve Al_ot1,-01 itt_e'; 4 "aua. ntarla. may 1)eL 7!=e -e.3t'Arit m43 'f At14 Lips _-,r4,1! Ha5k. i Lat t*fitc-1 .r-

LeIgn 11xer. :gnat1,4,11 Patttn:y .1"

-.7--eartfter-t r;v1;t1 f ,"e t HAN. f fl". 1' 'r.044,1 " that 4r=,!*ri j.. tt'rewl1e. *re le-oril w,44d prAur!el ulth

crrq:t 14.4 f .71PlerPlore oppør S thitt i-'rlOnet%:141rN 4e 4-.el "he'efltk 're;e45e' ord ;-,ernap't ttrztIguc.as'.y * etpIo" ref er !re tV ry re iev..! 41C06:3t:' =:onsevence,5--tf-e p,-.t !_sPet:h tr1t . f-r re4ntoh5 -f taltrZlhOlogln4: -! i4evp !461,. prefer t* :a:If'te ^releae t.,aelt_r* 11 5 t. tJri* *fe firl* f!*4n1 ih the ae-cs4tAti,

Uf e Pkr trref arler: earlier`-e-41`.;rerner.', f1-3'=4er"; Atert -e3 where 7*. w4.5 le`..lzaer-ctt- t"f"' n't10".`t

e.-eatel *LP, ;4r0A- nia_!orlty of the tOi(eh3 ":11ht1ihr4 rle!0=Y 14enttridtic' reletale the (ir$t tp ;Repp, r18j;. :G 4 n?re recent M,u-Jy. 4,1r4 ,ttertmel t)y tu* i;t0ifer5 :Rep. In pr-v0, ah: tc,kert3 11.11th a 11n4le ei=t.) containel 1-4ch 1:Qrtower. the tliT5t were Shown have p-rfcept4A1 +dere rfriiet 5tithor.) ur.or.4. or dld perhapf, "1." t!:0 :4tteranze± et1:51neo* t,y 'tepp were tv..!re,-,*rt3entittive;'

1;gief0s1 tr :_-_honet:rfah," A !7*:,--0 *et*'!. inter*1 tiny qey ,ahlieltattcha of the Akercrle artic`-./4*Itcryfret_1)1! !' the(ir3t fttop,_

0'4°- t-":3t. in ":-,t*' w-lent...e3. 'There fty Little faintiJh =ace' ,Y1 trr 14,5 iefqrfote. J3 At-r4hat pc,:tIntel f to for 2E9;:t1CO: aej !nly he !tore IpPC1(11:511.7 i---4f1:23e the 1i ,t1 ,g,vspi-70.41^ '4*3 eltr414,13, Ar.:1 =nef, ,Icr*.16)

that. :e Ar.1 'p Ahl=t; 0-,"1- not have 'Weal

il:t="i2n* r. n he0rd wilier the 11p5 3ep4rAte4A x;_t f-tr-r_4'41it**- 7-rignie Itratettent.1 !logge".!.. triV. the lauthor3 were aware

tr;e f;r31_ ttly f7,-=q-*_;rr ',tat .1r.,1b5tOnt I al

Weak C1' e Are' 34*-1 4h*,rter Jrr -,!ritin the- relea3e1 :n !hre, 4 eAxe- r-rA'tt=le, ç '4.

1.- ,VPA1P ! f!ie :f :.oper.1 =r

:! !re tbc :t!..,==p-f' Are ''I' mar.VJii leterit eAr pretterf t,1y an.lwerf-f tri4!et., ."±"

.4*-.) :r ""0 -,01-.1 -! t' _4e"e4etwft ,-44e !_*-en elthe! =ne ,:rtf,,artt;ie

*e ,e*iea' ; r *r_ere 11 j-i ;,,

e=0 * -** RiA7

wetriod

with tnrre i andthree voi;:edltop5In ft- :ere are passible sequences oftwo etop3 wit!, differeet places of artieulation. f only fourtibd;. egde. /pt.,. and /kt/i occurIn word-final poeition, primarily in the past tense forma of verbs. All 2 SeQuence$ are permissible in wordamdial position across a syllab': boundary. but only two(/pt/ and kt/) occur with any frequency. pet eerily in wordsofRomanceorigin. However. by incluaing same compoune words, we were SucceSSfulin finding two esamplee of each of the 2e iequences in word-medial position.

We constructed meaningful sentenceS, each conteiring two 1- the words to measured, and the subjects read from a typed list of these sentences.The

sentences are shown in Appendix 1 with the critical words underlined. As can be seen, all stop sequences were Immediately preceded and followed by a vowel. vier. primary stress on the preceding vowel. (Note that we were not concerned eer with two-stopsequences across aword boundary. although twostops eroJsing a morpheme boundary in words such as Dootcamp may be considered a catner similar instance.

;Is native speakers of kneri Eng113n.three male and three female. were selected as subjects, They were not informed about the purpose of the eeperiment. but were asked to first study the sentences and then read them at a noemal conversationalspeed. Their productions were recorded on magretic tape usinga Sennneiser Nita aleT microphone. placed approximately 8 inches fro% td.a eebject's lips,and a Crown Sit 822 tape recorder. The recordings were then digitized at 10 kHZ using theHaskins Laboratoriespulsecode modulation system, and tne waveforms were displayed on an oscilloscope. We zeroed in on the closureperiods in the critical words to determine whether or nota release burst of the ffr3t stop was present. If present, such bursts appeared aS list:net Spikes of afew milliseconds doration. roughly in the center of the closure period. A typical esample 13 seown In Figure la. with the closure and the release ear3t3 for both stops indicated for the utterance 3aRtioAt. producedbya female speaker(CO) In some cases. the release bursts were er very low amplitude. and two of the 3ubjects produced a few tokens containing multiple or exaggetated harsts. but thetokenshown in Figure 13 is reeeelentattee ene itterancel containing release t,urstl.

The frequency =-sf a re ease t,urst forthe first stop ln to:;r,1-medial Soqatnce, shown In at;.0 he columnS representthe six ..o11751ble seluenees tWQ different pieees afstop articulation. motile the row, represent*,ee individual Suhjects. The voicing featwe ef the stops has teen ignored in thiS analysis, so that the percentage in ea7h cell IS based on night atords i-fookingat the moans in the right margin, we see that, overall. percent f-4 the words contained a release buret of the first stop. with the average per-:entages individual speakers ranging frcc ee to Ri percent. It farther ender:! !roe; the means In the bottom release hurSts were

e-lua; ic n!--,7ant cmho:att,_:rn ':al n. determinant was SCAPEGOAT

WITH Ci RELEASE BURST

C1release burst C2release burst 4

Closure

WITHOUTC1RELEASE BURST C2release burst

1 b)

Closure

Figure 1. Oscillogram of the word, scapegoat produced bya female speaker. Theword is shown excisedfrom its sentence context withthe release burst of the first stopin place (above) andremoved (below). so Table 1

Percentage of Words with C1Release Bursts

Place of Stop Articulation

C1: ALV VEL VEL LAB ALV LAB Mean C2: LAB LAB ALV ALV VEL VEL

Speakers

NM 25.0 25.0 50.0 12.5 75.0 87.5 45.8 AB 0.0 0.0 50.0 87.5 87.5 87.5 52.1 BR 0.0 0.0 37.5 87.5 87.5 100.0 52.1 CG 12.5 12.5 75.0 75.0 75.0 87.5 56.3 JM 0.0 25.0 87.5 87.5 , 87.5 75.0 60.4 RK 12.5 87.5 100.0 87.5 , 100.0 100.0 81.3

Mean 8.3 25.0 66.7 72.5 85.4 89.6 58.11

Table 2

Percentage of Words with C1Release Bursts

Place of Stop Articulation

CI: Labial Velar Mean C2: Alveolar Alveolar

Speakers

NM 100.0 75.0 81.5 AB 75.0 75.0 15.0 BR 100.0 100.0 100.0 CG 100.0 10U. 0 )00.0 JM 100.0 25.0 62.5 RK 50.0 62.5

Mean 87.5 75.0 81.i5 the place of articulation of the secondstop. linen thesecond stop eras labial, release bursts of the firststop tended to be absent (except for one speaker's velar- labial sequences); wt.enIt was alveolar, release hursts were present in the majority of utterances; andwhen it was velar. release bursts were even more common, The place of articulation of the firststop Seemed to play only a minor role,an we also observed that the vetting feature !..31 consistent Influence on the occurrence of releasebw sts.1

Table shows the same analysis the word-fin4;stop Seluences Sentences 1-4 In tne AppendIti, with the ColuM/Isrepresenting the on.y two possible sequences of place of articulation,and the rows repreSentIng the same individual subjects. Again the ,,oicing feature has been ignore4 that the percentage In each cel. .s hil3e-O on fourwords pere,since no wr_ra containing stop secluences differingin voicing ibt/.ikd,)0.cur word-final position in Englist. The meanS in ',he right Margin ShOw eral. SI percer' of the worJSzontained a release burst of the first stop. with the average percentages forindiv111.41 speakers ranging tram 63 to Percent. The means in the tott0C row Indicate that. aS Inw*r1-501I141 poSition. the0:Ace articulation :f %tf.p tal effect.

Disusslo,

The 1- ,atter in thr5v- lata -.7ae te ,:onsidering tr.-0 artThilAtOry naneu-wers 1(i-40:ve-4- Whrh the _5etcln4.-5tOp I, Iatlal, the Speaker hp the opt1n of e7101Ing the 1415 t,e(o.rean taoller alyooIar or velar cioSure, IS reltaled, an4iftr0.1 OPtiOn 13 rf,'!1_01.04, the rt7leti5e of the first !It':1.1- ,:tuf-S, 14rIng the labial :i5urr f%fl therefuet nos einim31 acoultic cohSegratnces. ti the other hAnc,'. If thr l i r3t it op Is .aDIAI.A: though an 4:vel4r velar -=-10Sure 11.4I t- eStabilShe0 fore the lip, are p-arted. ttl rrIr910 when it fyzek..rs, el:l generally r4 -dire 9 be:zawar t :A Jr the Lips c,,ceoSir.)ni; lqa Jetectsble If be itirtrl "A-1.anelrg zQrrei1, 'r-R=. lrynr11 =r--)r the ilps) thAt afirCt gtenTrticl.n_ -11,1%en ane sup ;1k; tolar and ttrath tfriar.wr m%i-St take !hot the seine td,t,we--11 invo;ve4 7,t.;At,_ prinie, the t-ongye tip estabilSh nOntalet wilt t" t-ef=r'' tr-tt.'"rgtm rj>/-4 releaSel Its ccnt1ct Ised vice versip. ' his seeM1 4 lIffIr4If ttlAnnuwfr that spencers do :41Mnly MM01017Ik.r -!ata hja. that frIrior t;,e31 In alweolarifelle An?, elarwaiwtholsr 1e4.4e:ea, suggel teat Ae-7rni .z=:L--kure.1 e_Stdtliished "thsrtly after t?:e F, -3a t eiralr I,Me two it- PI: -o-irri4pp41: 'ele,±* ct ,V- 4i 1-1.1 ffr-,irn! - e=1", r-rrft ,_A4te

f--tf-rter'ati.-1 V! t" ftt! p 5C-114-t_gr.f_` tA Att1:1,Irsts, ne,r t?-4t proses' a' tt`,-.)!r

"..14/""i-07, !-r ,l_e . 't,4-.se in Ohi-:h !Ltr 7,!'brfr

e-=!. Yl= !,`r IT-fe 'r-41e lel.irhf7r1 4:7-ro!,53 A mdr-,_! rtfw-tr-.1 ;f'-fro 'tr probablilty ,;Ir re.ea3e r3t :A tre first A,f.,

stop $'equtnce, t's Ig031t10%. lenL::n are typIza:4) .r (1)5cuWk0n5 "anre:ea3e1^ 5t0p3. Oor data 'flow tnat releaf.e .:1=r3t5 tnn f.rAt !.tup girt tiIym(;re rreq,:ent thar In wori-me4Ia; posItian.

tr,ith-,)r5 i.AVerCroMtie. 19t.. "16b, mentrel !aiht reie4Se blir3t5. It i5 impression tnattheir ocsrrence ras Aenera:!y ackrdgel M ne4mn for thls may be that try are lin:. 4:t

detect 1---r ear .-_::!:1,s-ctel a trier elperiment tr or-14ee51 ttl5

:F.W.f_F:MVsq

F.,fe _itterar'.;:e5 wr," fr:),D mpeaxer

,:ontaInIng relea3e b,Irst r..,r ti-. 'Ir!kt 1!-::ATA9_41t t%, !_i3ing the ria5K1h3 p;.#15,e moloVilatInn sy3lew. we escerptedtte wirds (rod tneir sentence cc,nteitand they created a secont: '4er3im: earn In the re;ease Cw-st%-,,f the f:.rt stQp wit!! repcel 4!,r, Frr* 5Mcw*,' th13 mOdiflel ert f ti &r-.1 _s,.:apegoat, tt.e ;r1ilf-3; dISp:tlyel In F;V:re '4

we t'-,ee c6n5trl..z:ted t -;1342rimVatIn tety_ :n tne eacr, :r the ten ,7,-,.7c4rr'el ten tide3In rc Dr1er.alth Int_erstimuluS :hter'id.5 ::!;:! (A- :r: trle 21F: test Lt60)-1nterial (r)r-Ch0),-:e trA' verJI-.5nA _3! ea;:::. worl were arriingel In Palrl. it'- mcddlfel _n e.tnee !'lezrt r rj. The resulting ten paIrs oce_,rred ten tIme.5

*,e- altnIn pair!1_ ar _- :)etween

tn ;:artil'Pate] tnef were tne tar.. 1:,ittrti 4n4 seven iia5.1(nn rr 1 prionet11- r_ne yet,').11c. 1;!1.crInInatIt trttltt.tne-, were

w ritte% : tt-e tr.:wens ihl were .nlljte .0-,,,','ted-r

ntm,.un i. -- t --(:211tal, -. n.solt r_te !;f3t -::unvInart In!rj-

tat-1.-St(-4- :r% *he 5ut.5eT)ent e:F- te3'.tte trutle--t5 .4314e:1 tk_ .15*-P-n e.,1 anl tten & momber. tr.e -.nt4:ne-t *te 'wee te..1 *hat tee t4r5*.', Ze :;1!. t-

a ;e, er.14Ae5 J! trr- it :_rt-P e. tre f'.ve 11-141rte.f tte tat w- Ape 1'

i.-er!ifMar 4.1 N11-t "-1101,r. -'!e!_t,er

-f_ Z1HT- tf*Ilt fdA*' f t0.1t, .Afi

i-ktent.a. ! e t'e ti1 t% mg:"-

e e%t * *ne

_ tent. .-tte to- 4' :/ _ i4r44**_, f4 - " the release burst and its temporal separation from the much stronaer release burst of the se:ond stop.

Table 3

year Percentage Correct Discrimination

Discrimination Taak

Stimuil YesANo 2IFC

49.4 55.0

Plocage 61. 58.3

dgar 62.2 . '11

hodp.r. 64.4 62.2

67.8 80.5

Mean 61,C 64.0

was also considerable variability between subjects. In the Yes/No test,thetwo authors performed at 83 and 85 percent correct, respectively, whereas the scores of the other seven listeners ranged from 45 to 66 percent correct. In the 2IFC task, the corresponding values were 89 and 79 for the cutnors and 5C-67 forthe other subjects. Thus, if one excludes thetwo subjects who had pre-experimental experience with the stimuli and perhaps knew better what to listenfor, thereis little evidence that even phone.,ically trained listeners can detect the faint release bursts of so-called "unreleased" stops. This is, then, the likely reason why the bursts were not noticed ty some earlier authors who relied on their auditory impressions.

CONCLUSIONS

In :h13 paper, we nave reported some data relevant to the statement that, in English,stopsfol;owed by a different stop are "unreleased." We have examined several possibletriterpretationt of that statement: (1) If it is interpeted as referring to articulation, it is clearly false. (2) If it is interpreted as referring to the acoustic sigal, itisnotgenerally true 'less the definition of what is to count as a "release burst" 13 restricted to ecoustic events of a certain minimal duration and amplitude. While suco a 7estrictive definitir may have been implicit in some previous discussions of "unreleased" stops. it should be noted that, on the cohtrary, the term "burst" is appropriatelyapplied only to the signal portion excluded bysuch a lefinition-67... to ti'" brief transient generated "rip the stoprelease,

5.1 exclusive of any following aspiration (cf. Dorman, StuddertKennedy, & Raphael, 1977; Fant, 1973). (3) If the statement is inte-preted as referring to perception, it appears to be accurate in so far as stops preceding another stop in conversational speech have release bursts that are difficult to detect by ear. In this sense, the stops in this study were indeed "ur eased." (4) The possibility remains that some phoneticians have used the term "unreleased" in a purely contrastive sense. In this usage, even a stop with a detectable release burst might qualify as "unreleased" relative tosome standardfor "released" stops. The stops recorded by Repp (1980, in press), whose release bursts were from10-40 msec longand quite detectable,may fall in this category. An obvious problem here isthe absence of any clearly defined criterion separating the two classes,

Theseconsiderations illustrate theconfusion that can result from terminology that is not only vague about the level of description to/which it refers (Repp,1981), but also insufficiently defined at the level /intended. Many phonetic distinctions that are couched in acoustic ;erminologyhave been drawn atsome remove from the speech signal. In that respect,theterm "unreleased" is similar to the term "unaspirated," which is commonly applied to consonants, such as English 4], that exhibit a good deal of aspiration in the acousticsignal. Whilethese terms may besufficient forthefield phonetician, they do not reflect the level r detail that acoustic phoneticians are concerned with, and therefore are oz limited us

We propose the following, more detailed classification, in which "release" is reinstated as an articulatory term:

(1) Unreleased: The occlusionis maintained, as in a stop preceding a hraorganic stop or in many utterancefinal stops with delayed release. (2r Silently released: No release burst in the acoustic record. (-',) inaudibly released: Visible release burst in records of the signal, but not readily detectable by ear. 14) Weakly released: Release bur-*detectable by ear but clearly weaker than in (5). (5) Strongly released: Release b,.,stis followed by substantial aspiration or voiring.

In tnis scheme, successive classes are separoted by different criceria: (1) and (2) by an articulatory criterion, (2) and (j) by an acoustic criterion, (i) lnd (4) by a perceptual criterion, and (4) and (5) by criterion of phonetic contrast or classification.

In summary, our studies indicate that, in English,stops -ecedinga nonhomorganic stop in condersational speech are generally released inaudibly or silently, silent releases being particularly common when the following stop is labial. The observations of Repp (1980, in press), on tne other hand, suggest that similar stopsproduced in isolateddisyllables a, typically weakly released.

5,) REFERENCES

Abercrombie, D. Elements of general phonetics. Edinburgh: Edinburgh University Press, 1967. Catford,J. C. Fundamental problems in phonetics. Bloomington: Indiana University Press, 1971. Dorman, M. F., Studdert-Kennedy, M., & Raphael, L. J. Stop-consonant recognition: Release bursts andformanttransitions as functionally equivalent, context-dependent cues. Perception & Psychophysics, 1977, 22, 109-122. Fant, G. Speech sounds and features. Cambridge, Mass.: M.I.T. Press, 19-0. 110-142. Jones, D. An outline of English phonetics (8th ed.). Cambridge: W. Hefter and Sons,7T9Z-- Kenyon,J. S. American pronunciation. Ann Arbor: George Wahr Pub. Co., 1951. Ladefoged, P. A course in phonetics. New York: Harcourt. Brace,Janovich Inc.,1975. MacKay, I. R.A. Introducing practical phonetics. Boston: ;Little, Brown Pub. Co.,1978. / Repp, B. H. Perception and productionoftwo-stop-cknsonant sequences. Haskins Laboratories Status Report on Speech Research,1980, SR-6?,/64, 177-194. Repp,B. H. On levels of dezcliptio in speech research. Journalof the Acoustical Society of America, 1981, 69,1462-1464. Repp, B. H. Perceptual assessment of coarticulation in two-stop sequences. Haskins Laborato.les Status Report on Sreech Research, :n Ares, 3R-69.

FOOTNOTE

1We considered the possibility that the absence of release bursts In some tokens was nue to the substitution of glottalstops for alveolar (and, perhaps. velar)stops. In the informal judgment ofthefirst author. Utterances may have contained, glottal stops. In 18 of these, the putative glottal stop preceded a labial stop. Release bursts were observed in4 of these 18 tokens (22 percent),which is slightly higher thantheoverall incidence of 17 percent in this context (cf. Table 1). Thus,to the extent that glottal stops did -,ccur, they did not change the pattern of our results.

80 APPE.C..!

. Th0 c. I41/ 44'. crL.gi-iar,"1rr,tAed',f p-Ar .

Triofftri P.r.eI,. ay.44q ;...,ar*_: A4 4r ",r,e tr1p4t.Pr

Tne g.-. ne"1`;`," .tgg0'; re,11,-1 ner

4. Petr 13 ,pri cr 1.rge.. m.1'taP 'Pp' rPforptyp

Tt-,et.re4f0,,tdr, ,* f f., Jo 4. f",r Igar "..arrivAl",,r P); 4111.

'. fe..Itel.1. t,r;a .! Af' ir,gA.

7.E..fer,t.ne al,r.141f,ra4.l`ra;gr,,

11. cr1f,"11 lf r gia II,t aparr,g 4t , irr441 , s'y ",' 64r4 rj'Jr %AI rt.."._age nfl r,

; 54 Nancyf,?.-": r pa fr 47., 4f f 41/' 'Aret. , ,r. r

1- 111 0,1,f ,,` ,

=1' 44 r -41, 4 r4, A r, tT /r,;,. rr,

;1',1 a z / I

/IP . 4r.1 I* `14'..'t** tc.plery r5t.r,,or

',Jan 4 ,1,0 fta, Av,5, =I' 5 :f-54,

4

55

, v.

r 44 At ckei3O,-r1,1:--

al a 5 F . t 1., ait a -5, /5'

Tr jr r . 44'1 4nt4;4i..4r 4- a t The It-1(2'816A. bUrrowlng 'Ifthe new-born piga had turned their Pigpen Into a huge mudpuddle,

(innof" beborah'n fadorlte nobbloa 13tirpdancing, e3perially to Iazzand ragtime mull' .

? %,911190 "ilaim that Mixon MAI only a 5r:apeijoat In the of !,I,A. fir-homing and aubterfue.

Margaret.aughther year oll v r trying to shoot a m7 1e with h13 PA0Butl'

In the Fail. ,Atkinq hanglr,g outside the baf4loor of the r.ottage wore really beautlfu1.

?4. My grandmother always inserted a hatp:u or a L',1kin into herBike, to see if they were ready to be removolrNinthe oven.

marine hioiogiats made a movie about the development of tad 91e5 10-9 f"4,1 tnrough a trapdoor mee.hanInm on the mide ,Jr the artificial -'end.

7

/1 Jr 5fttonwi 'he trjo mt=n1I ifJr- tramp ,onsintediargely of 2,btple1anl oahmeal,

barl !ripe! prevent the boidown of hIl far by putting 3a6!5 finder the whoe13, butaftera few lttmpts at m,_,vingit, It lank up to the hubcaps

; r, ./- n."1 OBSTIOFYi FROWCTION BY HEARING-IMPAIRED SPEAKERS: INTERARTICULATOh TIMING AND ACOUSTICS*

Nancy S. McGarr+ and Anders Lafqvist++

Abstract. This at6dy examined the organization of laryngeal contrc2 and interarticulator timing in the production of obstruentsan obstqent clusters by three severely-profoundly deaf adults. Laryngeal activitywasmonitored by transillumination; Lemporal patterns of oral articulation (lips and tongue-palate; were recorded using an electricaltraasconductance technique. For each of the deaf speakers,an inappropriate laryngealabductiongesture was oftenfound between words, a pattern never observedfor hearing speakers. At the same time, the deaf speakers differed from each oth',' with respect to type of errors, variability, one interarticu- lator coordination. For the most intelligible speaker, the timing of glottal opening with respect to oral articulation oas most like that observed for normals. The second deaf speaker often failed to observevoicing contrasts withrespect to plottai Bening. This Subject was neverthelessconsistent in producing most plosives without a glottal opening, aA all fricatives with anopening gestw2. For thc third deaf speaker, the pattern of errors was more complew andin.--luded both missing and inapTopriate g ottal opening gesture.

INTRODUCTION

volless ohstruents requiresintricate coordinatin of sf!veral w'ticulatory systems. At the laryngeal level, in abductioniadduction gesture norm.illy occurs to stop glottal vibrations and assist in the buildup of oral presr:re. (=.4:.-alaryngeal adjustments are also necessary to produce a -1o3u.-cor ::onstril:tion. Thus, laryngeal and supralaryngeelarticulations involve simultaneous activities that must be temporally coordinated. Difierent,es in the relative timing tre laryngeal and oral gestures ar,- used

sPArts A this paper were prPsenteu at !r:Q Int KIseting _ Vou3t -.;i.viety of America. Ottawa, May i8-2(', :961, 44130 C^nter fiir Research in Speech and NeaNng Sciences, and University Center, The City University of N u York. Departratint of Phonetics, Lund University,

jicknowled_tmenr:. We are,.atefol to Thomas Baer and Kathe:-Ine . Harris cf.ements onan eerlier version this paper, and to Kiyoshi Honda, Day Zeichner, and HIchard !;harkany For ten.bnicai_,sslstary during the -

MEilt?L ;jr Ibis word W13 supported by NIC:DS Grants NS-1361"and ±ii' I by NIH Fiomed;:al Research Support GrantRP-0!.,Y4, to Haskins Laboratories.

LAbokAirRic. ,-;pPekr Po5ear-A. In a wide variety of language', to produce c.,ntrast', ic 471', tS;:`.'

f. Ltsker & Abramson, 1964; L814vIst 4 Yoshioka, --)11'

Since the larynx IS placed In an inaccessible and Invisible Is reasonable to assume that coordination of interartlelator gestarS ;earned by auditory monitoring ofthe l'.:ousticsignal, lfrevelopmental studiet sugge:tthat children master sound contrasts requiring laryngeal adjustments (e.g., voicing and aspiration) by attending to their acoustic and perceptual consequencestKewley-Port & Preston, 1974; Liatin & Koenigsknecht. Co.lbert, 19'!7; Macken 6 liartOn. 100). These Stu4Ie5 also show that obstruent ntraStS emerge relatively late In children's speech and that production 13 more variable In children thanIn adults, The acoustic- cues for :-lb3troent3 are complex, spread over time, and involve Jitferences In the 3c:ow1 sur:,:e and the spectral 2ompos1tIon f the Signal. For example, Ir, the production ot A vuicelens fricative In a vocalic environment, the5.1,und suurc :Mangea from periodic tc, aperiod:. and Lack to periodic. a voielei? anifate.1 stop in the same environment13 alsociated with tne!I:lowingsequence uf source -ha.,ges: perIulic during tnP pre e4if,g vGw, si,pnce luring the ,AJ1,1rc, transient noise,aspirati:in ppriudi: . iuw?1. a-HILI -h DeIng spread oat _(t.r time, the woustA, attritJte:. A.)struentsoften ih"olve short-termspPctrA; ._:'htigeS, where hirt frocioenf cumponnts play an Important ro:e, t'Aarple5a! Such ittr,t:,at ,n. are reie,is bursts and tr3rmant transitims lir s,_ tr. ',A r;" I -in2.1-

";uns for !.I cat lves

the :x1plx art: u:at:ry an ,1 les1 fibs!ruent.A. one wcull expect hearlrg-impalred speakers have ;tart::: problems with tn13 of sounds. In.s i) ideel the *-0!-W. ati !oe--)4/1 several descriptive and a,26ustic 1!4dies. For examp,e, speakers frequent:y foilto make thevol-:e'._'1 --vc,eless -Hu-11:-. A

Numbers, )44,!). In studies, toin sohsLitutis4. 1:1 re; f-Te.: is tO the MeMber of tee' parr Heller, H,Ller, % 'Jykes, 44', -. !4 -.7t, !4,-t-,t, and at " to 7.grtate Kangar:. '#)1, Nober, 1-p!,% markIles. At !he se. era. studies "pr.rt A &:1( of ,ars- t i 111tIn,_".!5n speake,.= Alonsen.:4'"); M0:51:1, an,; !flat T.ts_Jre or .rat!t

I5 different r 'm not mai s de' -t. ,stterg.er Prod ict,_ir A __-lug,:eee, is par'i-ularo 11!!l for !ea''Irtg-

.e- impaired 1.;i,'Akers Hudgins R 4mhers, ; bra-11,,n,

err r pattetns fJrthese 1,,,n1. in "ine lr_

t!,e aiII:g it a1:0'-).t: .:3 i:iA. ne '10%,Y-11, nt'tWeer rs as we,. tale tater, sh-wlt, -iffe IPA scee-r. 141kIns

e. 1, trat .t,-413

V-331!_.e * 1.2 o .re !r 4 r-t nerri , le!.- ,e! near ;eArte'

1 .11-it4"

;Armor y nte-,er:._3-10.menT. ,-)1tmt, Pr 4 Tne Pre tr per_:ent41:, or.4!_neFrir Are St:7hr--

i-ertfr '4' r

Justr ,aen*_n_ ;

teak Le31 4e441

pi0er Paer

2 3.

a f .e"

Pa.

"z* 14t_ -+fel *q.ke_', ._7 ;r:

47 A ''fltr*:"S A7, r_ear !e_ 5. 1:11r 0-:e14k.

tr"..7 tr 3:, 1 :t-- a'. 10.0 -131 t. tt-1;-lettleatTheS;s

3,P7. !!'.r ,varVA-1,74.-41rul Vr4itr'-1 *err: evia ;:lttrriot., 7 tt e tv: r ;:noret - rUit;-==1 tti! oir-` p-o-.1.1.1.71e-; tytrte t alterenk 411 t?:e clf tnt' ttt rj c-,t3t.r

ttly Solfee1 it ttelf .!u4gMett3. Eit-1 the re.11J1t1 1,r2m4f1;_el 7_,t;e M;:irl.rore ty4 ::3t*Tr.or viper!r.i-i,7e-; vitt .,?le lcaf =::11,:"ttg !Ltr tte :f.Libtr'iny; nlvaker

to :ttei1:10Z)le tt! r 01r4ae,t-

,r 4r4;40r 2 !-.=t f71.5 -,°.1e-r5T1W--1

trm !-.4! _ :ke411 t:H ;;,,t 1,'I-Jer,t4'_!ANZ!V --1- wzr-1,1 Dr ..:41;-3

114-tf'r;4, ;rr*P-1tP:! :r : 4r-! tr.crir tt.e

r.r1,4rtrl 3 f're 1,:;r ttle 1H rlAt itt=er1-17V*

flr= :WA -1=r:i4'. wIrr, !' r4fr.-1-.1mzei tte .4

t,:*`tA I

1 ;-32r-APJ.= :onensh. A :-Iet:t-.T! fl!'er1=.4c:r.3er!e4 ar.d heii In plr,sitionby a -.eaita!:J 1tr !!.-0) larynx. The amount ot light passing

'--r-rogt the #3,:t-!-:5 wi-vi sensed ty i ph;Autransistor placed on the surface of ,Jst ts,:za. the -r,2_1 Corti;Age 4h1 to tne skin by a light- !ht encionurt The 'ransiliumination signal was recorded on one channel of J Initrurentatin tape recorder. %ring the recording session. the aryni waS monitored thr0,4th tto! fiterse In order to detect the lens

14eyhge.il Jrtiatc,ry noremeht3 cAtained by transiiluminat tan teen shown tc: 1.,e practically idLIcal to 31Milar by f%t--0,1= fn -r the larynx Yoshloka. L8fqvist, :,..8fqwlst 4 Yoshioka. 79,ti=d Transillumination13 tnua an

.Itt;:lytrg itryngea;behavior 0-I speech. It has i) better teetp,rai reltior, that, fiperoptic filming ond video recording. Data collec- 'i 3rr 11$:-:1(31'.1 ra3y, an1 :arger aunt 3 matglrlaI7,an ne any _:tner #t4 Xiatiat;e 'or laryngeal Investigations,

i,atte!n:n ar!i-:atin were re-:-.0rel 4:sing 4h ele:trictii. tt;=nn;Ilje f. Kotl,sur. 8he:rd, into e:ectr011es of a .ere tisiaeu cr; t*:e -431-)r ari ower 11P3 reSpectIvely, n:tot.4!(l f:3et lip (_=!- t:Ingu-t-alate s-orta,c:t thef, bt tlent:fled from !zno,:t-1 V. r . 11f1h31, rte gr w$5 -tri.,tter J.11innel torier_

_ .ere d Voce signal was rec,,ir4i1 direc! .ode

tjt_e fJur A,t altve trlo =_;( a:: crntliOn3 '*_== *- n t"r-r

-! *ir'e r...1!

we.), =.14ra!Aof: were 7a-:0- fr,_14r . t!ing-Je-=atv n -et :4t:4.-t"Jni:- -,4-1te !!t!

'e.,! -f 1,r

_ - , .!: ,." 11 The

en_af .=*1T1 1! the Ar-:L;f-t:z:r ar.1 o nler ke-1

' .1L1

,-,' A,

_! :=114, 4

*!- was calculated. This measurement provides an estimate of therelationship between onset of constriction or closure and the beginning of the adduction of the vocalfolds. It is useful since it highlights differences intiming between obstruents, e.g., stops and fricatives (L8fqvist & Yoshioka, 1981). A second measurement of interarticulator timing was the interval from peak glottal opening to offset of labial or tongue-palate contact. This measure shows the relationship between onset of glottal adduction and release, and is particularly useful in examining timing differences between different stop categor'.,es (L8fqvist, 1980). The physiological measurements were supplement- ed by acoustic measurements of voice onset time for stops. All measurements were made interactively on a computer.

RESULTS

ii ie )bstruents

Figure 1 shows representative tokens of the hearing subject's productions of-voiceless and voiced stops. A glottal abduction/adduction gesture is seen in the transillumination signal for the voiceless stop but not for the voiced cognate Patterns of interarticulator timing are noted in the relationship between events recorded in the signals representing labial/tongue-palate contact and glottal opening, respectively. For the voiceless plosive, peak glottal opening occurs at the oral release, indicated by the offset of lip contact and the release burst. This pattern is the same as that found for other speakers of American English (L8fqvist & Yoshioka, 1981).

Figure 2 shows selected tokens of the same utterances produced by deaf speaker Several patterns are different fromnormal. First,closure Juration is considerably longer for the deaf than the hearing speaker's productions. Second, there tS evidence of an inappropriate glottal gesture. The deaf speakermade a glottal abduction/adductiongesture immediately preceding the test word, before the onset of lip closure for the initial stop. Thus, for botn productions, glottal adduction Starts before lip closure, and tne glcAtis is in a position suitable for voicing at the release of the oral closure. The abduction/adduction gesture between words was fairly typical of theother deaf speakersas well, but was never observed for the hearing speaker,

From these rawdata, a number of measurementswere made that are summarizedIn Figures 3-4 and also in Figures 6-9. Line 1 in these figures snows tne mean duration of closure of constriction. Line 2 shows, as a histogram,the number of instances of a glottal opening associated with the obstrueht production. The third row shows the first measure df interarticula- tor timingthe intervalbetweenImplosion andpeakglottal opening. The seconu measure of interarticulator timing is the interval between peakglottal openlhg to release,indicatedin numerals below the third row. A negative value impliesthat peakglottalopeningoccurredafterthe release. The presentation fo;1,iws our general impression in rank order of overall speaker ir,teligibi:ity: .1; the hearing speaker;(2) deaf speaker 1 (felt to be the 103( .r:t,...:g.t)le leaf speaker); deaf speaker 2, and (4) :leaf speaker A. peal

LABIAL/TONGUE- PALATE CONTACT

GLOTTAL OPENING

AUDIO ENVELOPE

200mac

Figure Recordsofthe hearing speaker's ruduc ttw-s of ,Aterancel "peal" (left). and "beak"(right). ves represent lat/151/tonW palate contact (top), glottal opening (middle). art4 atrrllu envelope bottom). Onset of labialclosure for thewDr1 1n11,181 labia: stopsIry "peal" and "beak"is marked ty relam. O erai ' closure by A . The 4ertical ilrr Ino;r:ates ese =et wtich peak glottal opening ocurs. peal beak :i- v= LABIAL/TONGUE- PALATE CONTACT

z

GLOTTALOPENG

AUWO ENVELOPE

414 200imesc

RO'-:=-1 le4! neame pra-dvit i.)t)ff the -JiterarCeS . r%cowl :-Ignt, aA Figure

Ef_ a Mean Closure or Sonstriction

S

a

Glottal Opening

xt

Mean hnploalon to Peak Glottal Opening mum . mop

PEAK OPENING _? _3 -10 -2 TO RELEASE - -35 NB 1= H 04 002 Di

Figure 3. Summary of measurements for single voiceless abstruents. Seetext for further details on measurements.

91 =

Mean Closureor Constrigtion

b.

Glottal Opening

H D1DOC-43 EMI s

Figure 4. Solitary of soasurements for single voiced obstruents.

92 Results for the single voioeleas and vol..ed obetruenta are summerized in Figures3 and 4,respeotively, Closure= or constriotton duration was always longer fa the .reef eubjeota thonfor the hearing subject, consistent with previous reports. As is typical for rearing speaker), olosure or constriotiou duration WAS longerfor voicelees thenfor voimi segment. Forthe deaf Speakers.the duration measurements thevoiceleee and voiced aegments overlapped (see also below, Figure !).

The number of tokens for whioh a glottal gextur a ooviar.ed are shown In %ine 2. These gestures were always oorreot for the hearing-Speaker and deaf speaker 1. That IS,for single voioeless obatruente. each token vas charao terivad bye single abduction/adduction gesture; for single voieed obstruenta, there was no laryngeal gesture. For the other deli`speakers, tae pattern varied. Deaf speakers 2 and3 used en appropriate laryngeal tweitire /sore often for the alveolar than for that bilabial obetruento. We w411 discuss the voiced obstObents of theVit speakers belowr

With respectto interartioulator,timing, both the hearilegspeaker and

deaf speaker t showed nearly similar patterns for all aegmenZe. For voiceleae stops. the interval from implosion to peak glottal opening tends to be similar to closure duration. This means that peak glottal opening and oral release almost coincide. Thus, these two speakers both shoe A avail negative number for the second measure of tnterarticulator timing ' t the interval from peak glottal opening to re.eilso, fven though the a 6urtions forthe d af speaker are prolonged overall, the relatt.'e ' f oral and laryngeal gcltoreS is indistinguishable from normal. for t frioati'eS of

these two speakers,the intervalfrom implosion t k glottal opening is roughly half of the duration of the oral constriot,..n. Peak glottal opening thus occurs about WO mseo before release.

Dear speaker : was tt1+:'OnatSitent in production, since in most case.: there was no aotIve glottal opening gesture for the stops. For the fricative, there Was an eppropriete laryngealgesture andintersrtioulatortiming was more normal. For deaf speakeri, we again find atinoonetatent pattern. For re, labials, there was no glottal open.ng, whereas forthealiv#,Aars, a glottal ,poning gesture was made. Theinterartloulatortieing in these oaseS is similar to normal. For the glottii 'iii not begin to close until about :e msteaftertheoral release, wtioh 1s aomewenat long,although nottotally unueual, For thefrioattve, although t'y durations arelong overall, the relative 1.'4:ming pattern was similar to the pattern attained for normal speakers,

OsuallN one does not alltouss laryngeal -oral. :o-orsiln on for voiced obstroent produotion. Out sine deaf speakers are knows to o-Jte vol;:eleSs for voiced segments, we have also examined'these productions. Figure 4 shows these data. Here. weagain findevidencethat deafspeakers -may useAn Inappropriate laryngeal abduction gesturefor the produetion of some voiced sounde. Put as before, the speakers are inconsistent in this aberrant pattern.

When the deaf speakers produoeo the appropriate laryngeal gestures for voiceless stops, their overall pattern of tnterartioulator timing resembled that a normala. Specifically, the oral release and peek glottal opening tend to correspond in time. For fricatives, peak glottal opening precedes offset

al #110111101010a.01.01161111 On 041114110CTIMs + amosses to04111 elOTION OPSINIse

Figure 5. Plot of articulatory measurementsforthe three deafspeakers' productions of single obstruenta, Means end standard deviations are shown. 94 100 of tongue-palate r4ontact as has been .observed for normals. But a rather unexploted findin4 was obtained forthese deaf subjects. In general, the k ',laryngeal gesture (Cr U volosloss fricative /s/ was produced correctly more often than for the voiceless plosives. for example, as shownin Figures 3 and 4, deaf speaker 2 oonsistently contrasted stops and fricati.os at the glottal leeeathe former were nearly always produced witha closed glottis, while for the latter, the glottis wasaitalys open. However, as shown in Figure 5, the deaf speakers were unlike the normal in that they were highly variable in their production tram token to token. Standard deviationsfor the deaf speakers were,in many oases, fairly large. For the hearing speaker, the standard deviations werequitesmall --on theorder of 10-25 esec, and therefore not inoluded in the figure,:

For all test words described above, obstruento were produced in'the word- initial position. An allophonicvariation in AmerioanEnglish isthat voiceless stops following a stressed vowel are unalpirated. Therefore, we also examined stops produced in twg different positions of a bisyliabko word- - "paper.* where p1 isstressed and P2 is unstressed. These data are' shown in Figure §. The timing pattern for the init.' stops in this test word was essentially the same as thedesoribed above for 011 speakers' production of a single voioereas stop. For the pattern is similar for the hearing subject and deaf speakers 2 and 3. Closureduration was shorter in. these oases and there was a tendency not to use an abduction gesture in production. However, deaf speaker 1produced both initial and medial stops iman almost identical way, with aspiration in.both 'cases.

...as, le..11111badivaMia.0. 4111,- - 111.

Table 3

MaasuceMenta of V4se Onset Time for Single Stop Consonants (*see, ns6)

M Di D2 D3

P 1 84 87 81, 29 3 8.1 5.6 3.9 6.9

b if 15 16 11 25 3 3.5 4.0 3.3 6.8

1 t 121 83 20 t 9 5 13.8 .16.1 3.0

:d 1 23 47 21 59 3 4.3 19.6 3.5 6.7

Pi Y 68 74 11 2.5 s 7.4 .5 4.8 6.8

P2 1 14 71 3 6.7 3.4 23.8

SIN411.110

95 101 Mean Closure paper P2

Glottal Opening

4

0.1g 2 Mean implosion to Peak Glottal Opening

iO

0 PEAK OPENING 4 3 Tp RELEASE

Mil NM C3 MR H Ds' 0* figure 6.smeary of noanuraionto for the two stops in "paper. Table 3 show* measurements of voice onset timefor single stops. These acoustical measurements match fairly well with thephysiological data, i.e., voice onset time was generally .longer whena glottal gesture was found. However, in contrast to the physiological data,the standard deviations for the acoustic measuedaents were fairly small.

Data for affricates are shown in Figure 7. These segments are known to be particularly difficult for deaf speakersto produce. For the hearing subject, the stop closure and the fricative portionof the voiceless affricate were 39 and 126 maeo, respectively, with peak glottal openingoccurring during the fricative portion. In contrast, for the deaf speakers therewas in moat cases no atop component. Consequently, the timing pattern resembled that ofa fricative. All deaf -weaker& produced the voiced affricateswith a laryngeal abduction gesture.

Clusters

Clusters havenot beenstudied much in thespeech of the hearing impaired. The common /st/ cluster vitt examined in the word inital position and in the medial unstressed position ofa two-syllable word. Figure 8 shows only one component of the cluster since'wewere often unable to identify two separate gestures for the hearing-impaired speakers. Consequently,these productions mostly-resemble patterns described'abovifor the single voiceless fricatives. For thehearingspeaker,whena voiceless unaspirated stop followed a fricative,peak glottal opening is timed during the feleative segment and the glottis begins to close before the stop componentbegins. Deaf speaker 1 tended to use a timing pattern foran aspirated stop with peak glottal opening at release. In some cases, two opening gestures occurred--one for the fricative and one for the stop. For deaf speakers 2 end 3, in moat cases,intererticulator timing for the word initialcluster more closely iesembled that observed for single fricatives. These timing patterns were similar to normal in that peak glottal opening occurred duringthe fricative portion. No clear pattern emerges for these speakers' productions of /st/ in "jester."

We finally turn to clusters with eithera word or morpheme boundary withinthe cluster, see Figure 9. In thefirst case,that of the wor. boundary ("less tea"), we would expect that the word initalstop /t/ would be aspirated since aspiration here is a way of signaling thata word boundary occurs,between the /s/ the /t/. In fact, all of the speakers, with the exception of deaf spec r , produced these tokens with two separate glottal gestures--one for the lastlye and one for the' stop, The patterns of deaf speaker 2 are consist. t with the previous observation that this deaf speaker produced most stops without glottal opening. although for these test words,he nevertheless respected the word boundary. The pattern of interarticulator timing is similar to thatobserved for othertokens offricatives and aspirated stops.

Turning now to the effect of the morpheme boundary, the pattern for the fricative segment is similar to that for other single fricatives. For the stop sapient, only the hearing speaker nos evidence ofa separate laryngeal adjustment. Deaf speakers1 end 2 did twin use a glottal opening. For deaf speaker 3, no stop segment could be identiffied.

97 103 0

Mean Closure tf , MUM 30

0

Mean Implosion` to Peak Glottal Op Ining nu f 'nano 20 200

100

PEAK OPENING 39 67166 73168100 TO RELEASE

IR MEM en . H D1 C4 0,

Figur 7. Summery of measuraments for the affricates.

JO Mean !Closure. or Constriction macBeal

GlottalOpening

Mean implosion to Peak .Glottal Opening

PEAK OPENING 83 2 128143 534786 TO RELEASE

II CB CM El H Ds Ds

Figure 8.Summary of aessuretuents for /et/ clusters.

99 105 Mean Closer or ConetrIctIon lesi tea mes1ent Mee Ci C2 Ci 300-

Glottal Opsaln

Wan Implosion to Peak Glottal Opening 200 200-

PEAK OPEN** ge sow in TO RELEASE 2 31 - 28 74 2718252 0

IIII GB0 CI 11111

Figure 9.Summary of eassurasants for clusters with word and morpheme bounda- ry within the oiustar. DISCUSSION

Normal speakers eistently use different patterns of laryngeal -oral coordination for voice stops gpd fricatives (L8foviet & Yoshioka, 1981). Onset of glottal abd on generally tends to coincide with onset of oral closure or otinstrioti unless preaspiration occurs, in which case glottal abduction precedes osion. For aspiratedstops, peak glottal opening odours at the relessof the oral closure. This ensures a delay in voice onset time-and also s.hAgh rate of air flow for generation of frication neise_lvamediatilly a the reletase. In fricatives, the peek glottal-opening mare closer to h nset of the oral Amnstrictica. The velocity of the abduction gesture is igher for fricatives than for stops and the size of the glottal opening alsoelids to be larger for the-fricatives.These differences in laryngeal,control;and Anterarticulator timing are .oat likely related to different serodyneafilrequirments at implosion and release for fricatives and aspirated stops, reepectivelv. The hearing speaker in this study followed these patterns, '.

1 The deaf subjectsshowedboth similarities and dissimilarities with respect to normalfsPeakere. The vi.it obvious dissimilarity- -ins failure to produce the voiolvd#voiceless distinction. The deaf speakers either made a glottal gesture -*en none was required. or omittedthe glottalgesture. Furthermore; eve., 4ien a laryngeal gesture was produced, its tieing relative to oral articulatory events could be more or less like normal, this pattern varied considervibIO:among deaf speakers.

- ----Net-4 singly, deaf speak er 1, the most intelligible, closely followed the 'normal pa 1 a. _For aspirated stops,-peak glottal opening consistently 'occurred at.t ray release.The same strategy was used in production of the second stop e word "paper," although in thiePase, the phonological rules of Alerican ish dictate that aspiration is not necessary. On the other hand, while tieing' for single fricatives was often produced correctly, the /at/ cluster ahowed different ,siterna of Interartioulatnr tting. One 'example of occurrence is illustrated by that/ cluster in "steal" where t Was observedobrved to be like that for an aspirated stop. Again, this speak, uses an\aspirsted stop inappropriatelyin this example as part of a segme Cluster.,

Deaf ker.2 differa' froornormal. in still a grosser fashion. Stops were copstently produoed without laryngeal aptivity while fricatives were 'usually aced with an appropriate glottal gesture. For. these latter cases, the IA iliculator tieing was relatively correct. Turning to Oaf speaker. .1, we tot both incorrect and highly veriableoroductions. However, when the relative timing is preserved between the articulators, the ahbolute duration of in:tied/story events is longer. than those found for hearing speakers. This pattern 'of increased duration has often been noted in the speech of.the deaf (Hudgins; & Numbers, 1942; Calvert,. 1961; Osberger& 'Levitt, 1979). In relation/to these findings, it is interesting to note that hearing speakeri, when deprived of auditory feedback, also show evidence of increasing duration (tordeni,1980).

4 : Anatlier characteristic that marks the speech of the deaf.as different from n ratalis variabilityin production at the physiological level. This

101 variatility appears to be an important factorin the speech of the deaf suggesting that deaf Speakers, even.the less intelligible, do not produce an utterance in quite the same way each time it is perceived to be in error. Newer, we also'obsermed that even when speaker' were judged to be correct in their productions, there ma dOnsiderable variability from token to token. These results are consistent, with eleotroilfographic data obtained for oral articulatory timing (tongue- lips) of a deaf talker (McGarr A Harris,in press). Variability in production was noted less at the acoustic level (VOT measurements), although fairly large- !standard deviatiOns for deaf Speaker 'productions, havebeenreported ( Monsen, 1976). Suchinconsistenciesin . production may be one reason-why listeners find the speed, of the deaf so. difficult to understand.

Asmentioned above, all deaf speakers meremore socceeiful in paoduAng fricativesthanstops. These results differ from those reported in the literature (Nober,'1967; Smith, 1975; Levitt, Stromberg, Smith, & Gold, 1980). On the -one hand, we find our results perplexing since one would-excact.that /fricatives,. because of their high frequency spectra and articulatory invisi- bility, would be difficult for severely-profoundly deaf speakers to perceive and thus to produce../Altornatively; on the physiological level, one alight postulate that voiceless.fricatives, for example, require less precise inter - articulator timing than voiceless stops. At the'laryngeal level; pe deaf apehker need only opeen the glottis, even if in a fairly stereotypic way-as demonstrated by our subjects, and then direct the air a l-rem in an outward direction.' The distortion of the is/ in the speech of the hearing impaired may thus more accurately reflect poorplaciament of the upper articulators rather than' inappropriate laryngeal /adjustments. Indeed, it is well known -that-normally the /s/ is produced atithe level- of the upper articulators With both channel and wake turbulence; the former'being-generated by the grooved portion of the tongue, and the latter generated when the airstream strikes the teeth. Deaf speakers are known to have diff%culty positioning the tongue fat correct place of articulation (Huntingtow, Haneis, & Sholes, 1968; McGarr & Harris,in press).'Plosiveskon the. other'hand, -demand particularly fine interarticulator coordination between the larynx and the upper articulators a and more precise management of the airstream.

The operation of the larynx' in speech is analbgous to that of an air valvewhereby the valve -must beopenedfor voiceless mounds to let same air escape, and must also.be closed at the appropriate times in ,order to preserve the breath-stream. Stud! of the respiratory patterns of deaf speakers have shown that these subjects evidence at lea* two kinds of problemi. The first is that they initiate phonation at 00 low s\level-of vital capacity: and; also that they produce a reduced number of sklables per breath (Fortier & Hixon, 1977; Whitehead, in press). It-,sectondprobles-is' mismenagersent_of the volume of air by inappropriate valving at the laryngeal level. 'baryngeal valving has twofunctions; articulatory and phonatory. For the former, aerodynamic studies,of deaf speech production do not consistently show that hearing-impaired speakers produce obsiruents with abnormally high, sir flow rates (Whitehead, in prose). One might infer phonatory valving problems from some descriptive studies that often ascribe breathy voice quality to deaf bpeakers (Hudgins * Ambers, 1942; Monsen,Engebretadn, & Vemula, 1978; Stevens, Nickerson, & Rollins, in press). The results of the present study suggest valving problems. of a somewhat (Afferent nature. That is, during A

102 pauses between words, each of the deaf speakers inthis study inappropriately opened the glottis. Whether they actually took a breath,as is suggested in the early work of Hudgins (1937).or simply wasted air cannot be ascertained directla, from our data. However, we would argue that the latteris more likely since the glottal abductiongesture was mealier and shorter in duration between cards than between utterances. This pattern differs fromone hxpothe sized by Stevens et al. (in press). Based on spectrographic analysis -df deaf children's productions, these authorspropoSed that the glottis is closed during pauses between words.

Turning to acoustics and perception,we find a rather straightforward relationship between physiologicalrecords and acoustic 'measurementsfor stops. The relationship between thephysiological measurements and the listener Judgments was not always direct. Perception of both voiced and voiceless cbstrisents.could be found fortokens with and without 0 correct laryngealgesture. Forexample, forthe 'Productions of deaf epoaker 2, -listeners heard /b/ for /p/, thecommon voiced fdr voiceless substitution, when no glottal opening was found, of. Table 1and Figure 3. However, for the alveolar stops of the use speaker, listeners reporta0 u voicelesa'sound in all cases,includ.ng thin* without a glottal abduCTIOn. From Table.) it appears that ')T was only 20 magic for these stops.

These results *ro not too surprising, -sincea straightforward relation- ship tween physiology and listener Judgments is unlikelyin such a complex phor,wienon 43 the voiced / voiceless distinction. This mismatch between physio-

,.logical records and listener judgments of deafspeakers has also been noted by Mahshie (1980).. Although in controlled studies woessynthetic speech,.VOT' has been shown to b4 animportant deterainer forthe voiced/voiceless dlliZin4ion. In real speech there area host of acousticcues that may bepo... rlaponifble for thisperception.' MOssuramentaalong. one single acouettpaok, fdteensliftoannot be reedioy expecte to.p0Ohfctlistener responses when other 'ecoustic yariablei are not held constert.Once interactions have rapes-144U been s'own to occur."Examples of. such interactiana hat affect the perception of tic voiced- voiceless, 'distinction-in stopsare Amplitude And duration of aspiration (Mpg, 1979), end speech tempo and closureduration (Port. 1979; Fitch.1981; see also Hiller,3981). Our VDT values for the deaf speakers were in Mse.range of 20-- 30 'sec, where interactionp and boundary sAiftsare most likely to,occue,..- This may be another reason why listeners to deafspeech

have difficulty making judgments of particular phoneticsegments. .

Earlier, we argued that because thelarynx is placed in an inacossible and invisible position, mastery of laryngeal articulationis arrived at by-the acoustic eeignal. The deaf speakers In this studyall sustained severe - profound hearing loseea'auggesting that oral-laryngealarticulatior, would be exceedingly difficult in light- of reduced auditoryacuity. In fact,, deaf speakers -ace often said to place theirarticulators fairly accurately aspe tinily for those.places of articulation thatare highly visible, but fail to coordinatethe movements between several articulators. OUr data show tnat this notion of deaf speech is in part correct, yetour subjects were also capable of executing appropriate glottal gestures. We would argue that this is in-part-due to _low 'frequency residual hearingthat conveys some voicing Information as well as tactile feedback.

01 10

4 There 'areotherfindings in studies of deaf speech that*re also perplexing and not satisfactorily accounted for by either residual hearing or taction: prepausal lengt?eninA (Reilly, 1979), and pitch declination (Breck- enyiNge, Note,1). If uditoli monitoring of one's own voice Was the 'sole prerequisite, for the Os", Aliahment of these phenomena, one would not otcessar- ily expect-te-find them in profoundly dear smokers. Quite possibjy, they may be doe to intri xic factors of the speech production imam. This idea may also account for why intrarticulator ticthg was sometimes correct for the hearing-impaired' subjects of this study. Laryngeal articulatory overall are rather stereotypic and restricted to abduction anadduction. or example, production or a voiceless fricative involves opening the glottis and letting air through. This bears some resemblance to non-speech activities such ass. blowing and respiration. For the latter, it is reasonable to 8$3U1110 that there exist respiratory-laryngeal linkages whereby glottal abduction and adduction are automatically coordinated with respiratory activity. ,Speech Production in both 'normals and the deaf cost likely utilizes such linkages, although the details aro unknown at present.

REFERENCE NOTE

1. Breckenridge, J. Declination asa phonological process. Unpublished w5nusoript, Bell Laboratories, 1977.

1;7--

igorsft REFERENCES

Barden,G. J. Use of feedback In establishedand" do-eloping speech. Tim N. Lass (Ed.),, Speech tai language: Advances in basic research and 2E1211s! (Vol. 3). New vbrk; Academic Press, 1980, 223-242. Brannon, Z. Visual feedback of 1122221 motions and its influence upon the speech of deaf children. Unpublished7I7E3Fal dissertation, Northwestern

University, 1964. 1 7 Calvert; D. Some acoutn.ic characteristics of the speech of profoundly deaf individuals. Unpublished doctoraldissertetion, Stanford University, 1961. Carr, J. An investigation of the apontaneo6- speech sounds of five year old deaf-born childrP". Journal of Speech ' Heart% Disorders, 1953, 18, 22-29. Fitch, H. Distinguishing temporal information lut speaking rate from temporal information for' intervocalic atop consonant voicing. Haskins Laboratories §/elne Report on Speech Research, 1981, SR-65, 1-32. Forcer, L., A Nixon.T. J. Respiratory kinematics in profoundly hearing - iepairedspez,-drs. Journal ot Speech Hearing and Research,1977, 20, '373-408. r ibert, '"; H. A voice Onset timeanalysitofapical stop- productionin 3- year -lds. Journal of ,Child Language, 1977, 4, 103-110. Heider, F., Heider, G., & Sykes, J. A study of'tgi spontaneous vocalizations of, fourteen deaf children.Volta Reviiw, 1941, Al, 10-14. Kiroae, H. Peaberior cricoarytearras a speech muilie. Annals of .91212m, 8hinoloi and yansim. 1976, 85, 343-342. Hiroae, H., Yoshioka, H., A S. A cross-language study of laryngeal adjustments in consonant, production. Annuar Bulletin (Rea-arch Institute k

110 of Logopedics and Phoniatrics, University of Tokyo), 1978, 12, 61-71. Hudgins, C. V. Voice production and wreath control in the speech of the deaf. American Annals of the Dmif, 1937, 82, 338-363. Hudgins, C. V.,. A Nmbers, F. C. An investigation of the intelligibility of the speech of the deaf. Genetic _Psychology Monographs, 1942. 25, 289- 392. Huntington, D., Harris, K. S., A Sholes, G. An electromyographic study of consonant articulation in hearing-i4eired aPd normalspeakers. Journal of Speech and Hearing Research, 1968, 11, 147-158. Karlsson, Nord, L. A new method of recording. occlusion appliedto the study of Swedish stops. STL -QPSR 2/3, 1970, S-18. Kewley-Port, D., & Preston, N.,Early apical stop production: k voice onset timeanalysis. Journal of Phonetics, 1974, 2, 195-210. Levitt, H.,Stroebere-T7WW77771Thold, T. The structure of segmental errors 1n the speech ofdeaf children. Journal ofCommunication Disorders, 1980. 11, 419-441. Liske777.7-T\Abramoon, A. A cross-language study of voicing ininitial stops: Acoustical measurements. Word, 1964, 20, 384-:422. 1.8fqvist, A. Interarticulator programming in stop production. Journal of Phonetics, 1980, 8, 475-490) Lefqvist, A., A Yosfiirka, H. Laryngeal activity in Swedish obstruent clusters. Journal of the Acoustical Society offtOmerica, 1980, Ai, 792- 801. Liffqvist, A., A YoshiokA, H. Interarticulatorprogramming in obstruent production. Phonetica, 1981, 18, 21-34. Macken, M., A Barton, D. The acquisition of the voicing contrast in English: A study of voice enact time in word-initial stop consonants. Journal of Child aimuSg!, 1980, 7, 41-74. Mahshie, J. Laryngeal behavior in hearing impaired ApliAL11. Unpublished doctoral dissertation, Syracuse University, 1980. Mangan, K. Speech improvement through articulation testing. American Annals

of the Deaf, 1961, 106, 391-396. . Markides, A. The speech of deaf and partially hearing children with special reference to factorsaffectingintelligibility. British Journalof Disorders of Communication, 1970, 5, 126-140. McGarr, N. S.: 6 Merits, K. S. Articulatory control in a deaf speaker. In I. Hochberg, H. Levitt, A M. J. Osberger (Eds.) S eech of the hearing mired: Research, training, and personnel re ar tion. Washington D.C.: A. G. Bell Association, in press. Miller, J. L. The effect ofspeaking rate on segm 1 distinctions: Acoustic variatiqn and perceptual ,compensation. In P. D. EiM83 & J. L. Miller (Eds.), Perspectives on the study of speech. Hillsdale, N.J.: Erl-baum, 1981. Millin, J. "Therapy for reduction of continuous phonationin the hard-of- . hearing population. Journal of tech and Hearing Disorders, 1971, lb, 496-498. Monsen, R.. The production of English stop consonants in the speech of deaf children. Journal of Phonetics, 1976, 4, 29-42. Monsen, R., Engebretson, A. M., & Vemuls, N. Some effects of deafness on the generation of voice.Journal of the Acoustical Society of America, 1978, 66, 3680-1690. Kober, H. Articulation of the deaf. Exceptional Children,, 1967, 11, 611-621. Osberger, M. J., A vitt, H. The effect of timing errors o the intelligi-

1 11 105 bility of deaf children's speech. Journal of the AcoOstical Society of America, 1979, 66, 1316-1324. Port, T-76 influence of tempo on stop closure duration As a cue for voicing and plea*. Journal of Phonetics, 1979, 7. 45-56. Reilly, A. P. Syllabic nuereCul--dirtition in the speech of hearing and deaf children. Unpublished doctoral dissier*'0..ion, The City University of New York, 1979, Repp, B. Relativeamplitude of aspir.. 'seasavoicingcue fqr syllable-initial stop consonants. La -1. and Speech, 1979, 22, 173- 189. Smith, C. Residual hearing and speech productic, of deaf children. Journal of Speech and peering Research, 1975, 13, 795-811. Sonesson, B. On the anatomy and vibratory pattern of tte human vocal folds. Acta Oto-laregologica, 1960, Su lement 156. Stevens, t. N., Nickerson,R., & Ro fins, C.--Suprasegmental and postural aspects of speeeil production and their effect on articulatory skills and inteMptibility. In I..Hochberg, H. Leavitt, & M. J. Oeberger (Eds.). Speechofthe hearing impaired: Research, training, and personnel priparation. Washington, D.C.:A. G. Bell, in press. Subtelny, J. Speech assessment of the deaf adult. Journal of the Academy of Rehabilitative Audiology, 1975, 8, 11016. Whitehead, R. Some respiratory and aerodynamic patterns in the speech of the hearing impaired. In I. Hochberg, HLevitt, & M. J. Osberger (Eds.), of thehearing impaired: Research, tAining, and personnel preparation. Washington, D.C.: A. G. Bell, in press. Yoshioka,H.,EBimvist,A., & Hirose, H. Laryngeal adjustmentsinthe productionof consonant clusters andgeminatesin American English. Journal of the Acoustical Society of America, 1981, 70, 1615-1623. ;latin, 14 a Koenigaknecht, R.. pevelopmon of the voicing contrast: A comparison of voice onset time in perception and production. Journal of Speech and Hearing Research, 1076, 19, 93-111.

FOOTNOTE

1For convenience in the following disemssion, we will call the speech characteristics of the group "deaf speech" aid the speakers o of speech" will be called deaf." By making this identification, we acknowledge at not all parsonswhosustainseveretoprofoundhearinglossesproducethis characteristic speech.

112

106 ON FINDING THAT SPEECH IS SPECIALS

Alvin N. Libensan+

Abstract. A largely unsuccessful attempt to communicate phonologic segments by sounds other than speech led my colleagues andme to ask why speech does it so well The answer *sop the more slowly because we were wedded to a shorisontal" view of language, seeing itas a bitdosioally arbitrary assemhtlage of processes thatare not thee- selves linguistic. Accordingly, we expected to find the answer in general processes of auditory perception to which the acoustic signal had been madeaconform hyilapproPriate regulation of the movements of articulation. What'll" found was the opposite: specialised processes of phonetic perception that had been mode to oonform o the acoustic conaequencee of the may articulatory move- ments are regulated. The distinctively linguistic.runction of these specializations is to provide for efficient perception of phonetic structures that can also he-efficiently produced. To mount that a phonetic specialization exists accords will with a *vortical" view of language in which the underlying activities are seen as coherent end distinctive. Nemo* evidence for such special processes comes from experiments designed to investigate the integration of cues.

I welcom. this opportunity to talk to my fellow nsychologists- about a subject that has, I think, been too much taken for granted.- The subject is perception of phonetic segments, the consonants and vowels that lie near the surface of language. My aim is to promote the hypothesis that perception. of those segments rests bn specialised processes. These support a phonetic mode of perception, they servo a distinctively linguistic function, and they are part of the :erger specialization for language.

'In press, Amrican Psychologist. Also University of Connecticut and-Yale University. Aoknowledgment. This paper is based onmot Distinguished Scientifie Contribu- tiodAward addressg4 en at the otitis% ofthe American Psychological Association, Los Angelis, California onAugust 25, 1981. Preparation of the paper, and muab of the research on which it is- based, was supported by the National Institute of Child Health and Homan Development (HD 01994) and by a Biomedical Research Support Grant (RR05596). At Haskins Laboratories it is hard to know what isowed and to whom. I would, however, especially acknowledge my debt to Franklin 3. Cooter,.adth ~Arno 'I have beren closely associated for 35 years. For help with .this paper I thank Louis Goldstein, Isabelle'Llbarnan, Virginia Mann, Sharon Manuel, Ignatius Mattingly, Patrick Nye, Bruno Bop, and Michael Studdert-Kennedy. I am grateful to J. A. Fodor for making available to me an early draft of his monograph, "The Modularity of Mind,* which I found particularly relevant and stimulating.

OILSKINS LABORATORIES: Status Report onSilimohResearch 3R-67/68 (1981))

. 107 1 113 The phonetic specialization is apparently adapted to the singular code by which phonetid structure is connected to sound, a code that owes its character to the way the segments of the structure are Articulated.and coarbiculated oy the organs of the vocal tract. Npt surpribingly, then, phonetic processes incorporate a link between perception and production. With that as key, an otherwise opaque code becomes perfectly transparent: diverse, continuous, and tangled sounds oft speech are automatically perceived- as a scant handful of discrete and variously ordered segments. Moreover, the segments are given in perception as distinctively phonetic objects, without the encumbering auditory .baggage that would make them all but useless for their proper role as vehicles of language.

But we do take speech and its acoustic nature for granted, so much so that it is,I suspect, hard to see Why perception of phonetic segments should require processes of an other - than - auditory sort, and eveff harder, perhaps, to imaginethat it might mean to perceive those segments as phonetic ebjecta, free of a weighty burden of auditory particulars. It may help, then,.to begin by recounting my experience with an attempt to transmit phonologic information by purely auditory means. That experience. exposed for me the problem that a phonetic specialization might solve, though it did not; of course, reveal how \ the solution isachieved, nordid it showthat the solution requires \ specia4zed processes. Evidence bearing on those matters is reserved for later sections.

Perceiving Phonologic Segmentsin the Auditory node: An Assumption That Palled

\ In the mid-Forties I began, together with colleagues at Haskins Laborato- ries,to design a reading machine for the blind (Cooper,1950; Nye, 196; Studdert-Kennedy & Cooper, 1960. This was, or was to have been .a device that would scan-print and use its oentours to control an acoustic signal.At theoutset we assumed that our machine had only to produce, for each letter, a pattern of sound that was distinctively different from the patters for other letters. Blind users would presumably learn to aesociets the sounds with the letters and thus come, in timeto read.. The rationale, largely unspOken, was an assumption about the nature of speech - -to kit, that the sounds of speech represent the phonemes (roughly, the letters. of the alphabet) in a straight- t forward way,one segmentof sound for each phoneme. Accordingly, the perception Of speech was thought to be no different from the perception, of other sounds, except as there was, in speech, a learned association between perceived sound and thename,of the corresponding phoneme. Why not expect, then, that arbitrary but distinctive sounds would serve as well as speech, provided only that the users had sufficient training?

Given thatexpectation, wewereill preparedfor thedisappointing performance of the nonspeech signals our early machine, produced. So we persisted, seeking to increase the perceptull distinctiveness of the sound alphibt and also the ease with which its wits would form into words and sentences. But our best efforts were unavailing. No matter how we patterned them,.the sounds evoked a clutter of auditory detail that subjects could not readily organize and identify. This discouraged the subjects, but not me, for I had faith that the difficulty would ultimately yield to practice and the principles of learning. What loomed as a far more serious failing was that

108

i14 4 modest increases in rate caused the unit sounds to dissolve intoan imperspi- cuous buzz. Indeed, this happened at flags barely one tenth those at which the discrete unets of phonetic structure can be convoyed by speech.

Having come, thus, to the conclusion thatwe should try to learn from speech, we began to study it. But our hope at that early stage was only that we might find principles of auditory perception, hitherto unnoticed, that the language system had somehow managed to exploit.1These woula not only be interesting in their own right, but also useful in enablingus to overcome the practical difficulty we had been having, since the auditoryprinciples we hoped to find mould presumably be applied to the design ofnonapeeoh sounds our reading machine might be made to produce.

WhatI- did notfor a longtime understand wasthat ourpractical difficulty lay,not in our having failed to find the right principles of auditory perception, but, much deeper, in our having failed to see that the principles We sought were simply not auditory. Perhaps I should have arrived at that understanding *artier had I not been in the grip of a misleading assumption that had,decisivele shaped my thinking about speech, language, and, indeed, almoit anything else I might have found psychologically interesting. I was the more isled because the assumption reflected What I took to be the received view; in any case, I had never thought to question it.

In casting about for a word to characterize the view I speak of, I hit on horizontal" as being particularly appropriate, only to`,discover th,t J. A..Fodor (Note Whad chosen the same word to describe what\I take to be much the same view. Apparently, we have here a metaphor whose time has collie. As appliedto language, the metaphor is intended to convey that the underlying processes are arranged in layers, none of thew specific to language.On that horizontal orientation, language is accounted for by reference to whatever combination of, processes it happens to engage. Hence our assumption, in the attempttofinda substitute for speech, thatperception of phonologic aegients is normally accomplished, presumably in the first layer, by p ocesses of a generally auditory sortthat is, by processes no different from those that bring us the rustle of leave. in the wind o the rattle of a snake in the grass. To the extent we were concerned with the rest of language, we must have*,.abpposed, in like manner, that syntactic structures are managed by using the most general resource* of cognition or intelligence. There were aurely otherprocessesonour minds when wethought about language -- attention, memory, learning, for exahple--the exact number and variety depending on just whichaspects of language activity our attention was directed to at the moment. But all the processes we might have ilvoked had in common that none was specialized for language. We were not prepared to give language a biology of its own, Out only to treat it es an.epiphenomenon, a biologically arbitrary assemblage of processes that were not themselves linguistic.

The opposite view--the one which I now incline--is,by contrast, vertical. Seen thisway, language does have its own biology. It isa coherent system, like echolocation\ in Ulf bat, comprising distinctive processes adapted to a distinctive unction. The distinctive processes are those that-underlie the grammatidal codes of syntaxand phonology; their distinctivefunctionisto overcomethe limitationsof communicating by agrammatio means. To appreciate those limitations, we need only consider how

109 115 1 little we could say if, as in an agrammatio system, there were a straightfor- ward relation between message and signal, one signal, however elaborately patterned, for each message In such a system, the number of messages to be communicated could be no greater thou the number of holistically aid distinc- tively different signals that can by efficiently produced and perceived; and surely that number is very small, especially when the signal is _acoustic. What the processes of syntax and phonology do for us, then, is to encode an unlimited' number of messages into a very limited number of signals. In so doing, they matchour .message - generating capabilities to therestricted resources of our signal-producing vocal tracts and our signal-perceiving ears. As for the phonetic part of the phonologic domain, which is the subject of this paper, I will suggest that it, too, partakes of the distinctive function of grammatical codes, and that it is, accordingly, also special. (For further discussion., see Mattingly A Liberman, 1969; Liberman & Studdert -Kennedy, 1978; Liberman. 1970.)

The Special Function of the Phonetic Mode

To -produce a large, indeed an infinite, number of messages with a small number of signals, a syntax would, in principle, suffice.Without a phonolo- gy, however, each smallest unit of an utterance would necessarily be a word, so a talker WEsuld have to make do with a very small vocabulary.The obvious function of the phonologic domain is, then, to construct words out of a few meaningless units, and thus to make possible the large vocabularies that human beings like to deploy. But the words of the vocabulary are presumably to be foundin the deeper reaches of, the phonology, where they are represented by the abstract phonemes that stand beneath the many phonetic variations at the :surface, variations associated with phonetic context, word boundaries, rate of articulation, lexical stress, phrasal stress, idiolrot, and dialect, to name the most obvious sources. What remains in speaking is, of course, to derive the surface phonetic structures, crid then to transmit them by using the organs of articulation to produce and modify sounds. Transmitting those structures as sounds and at high rates becomes the distinctive function of the phonetic mode.

At average rates of speaking,talkers produce and listeners perceive about.8 to 10 segments per second. In the extreme, the rate may go to 25 or 30 per second, at leapt for short stretches. Plainly, such rates would be impossible if each segment were represented, a in the acoustic alphabets of our early reading machines, by a segment of The organs of the vocal

tract cannot make unit gestures that fast, an , even if they could, the rate of delivery of the resulting units of sound would overreach the temporal resolving power:of the ear. The trick, then, is to evade the limitations on the rate at whiCh discrete segments of sound oah be transmitted and perceived, while yet preserving the discreta phonetic segments those sounds must convey.

The vocal tract solves its part of the problem by breapn6 the two or thre$ dozen phonetic segments into a smaller number of features, assigning each feature to a, gesture that can be made more o,lesa independently, and then turning the articulators loose, as it-were, to do what they can. A consequence is that gestures corresponding to features of Successive segments are produced at the same tie', or else greatly overlapped, according to the constraints and possibilities inherent in the masses to be uoved and in the

110 116. neuromuscular arrowroots thaiVeove them. This is to say that the chairactv at eptooh is determined large 'by the nature of the meohanimme that do the speaking. at ,Lit could-hardly,be otherwise._For even if Nature had devised articulators that could Make eUccessive unit gestures at-rapidratesputting aside-that this would presumably have deAtroyed the utility ofthe vocal tract for such other purposes as Meting and breathing --theresulting drumfire of sound would, as I noted earlier, defeat the ear. At all events, the nature of, the artioulatory process prodisces a relation between phoneticsegment and sound- -the singular clod,I referred to in the introductionthat oust, I think, take first place in any attempt to investigate and understand-the perception of speech.

One oharsoteristicof";lthe code that shouldimmediatelyengageour attention follow* from the. loot that one or another of thearticulators is sliest always moving. The oonsequence is that many, perhaps most, of the potential acoustic cues-4hat'is, aspects of the sound that bearasystematic . relation to the phonetic =segment --are of a dynamic sort. Witness, for ample, the changes in torment frequitioy, caused by the movement fromone articulatory position to *mother and know to be important cues for Various consonants (and, indeed, for vowels) (Liberman, Delattre, Cooper, & Gerstein, 13511; O'Connor, Gerstein, Liberman, Delattre, & Osoper, 1957;Mann & Repp. 1980: Strange, Jenkins, & Edman,'*1977). How do these time-varying acoustic cues evoke discrete-and- unitary phonetio-percepta that have no corresponding time-varying quality/

Another characteristic of the code, owing againto the way tht articula- tars produce and mdulate the sound, is that the acousticcues are numerous and diverse. in ,the ()entreat between ;the (b) of rabid, and the (p] of 1211d, for example, Lisker (1978) has so far identified intien cues, representliiii-a variety of acoustic types.The many cues are not ordinarily of equal power-- some willoverride othersbut powerdoes not appearto be determined primarily by atioustic prominence. How, thenis such a numerous.variety of seemingly arbitrary ones bound into' mini phoneticpercept?

Finally, the processes of articulation, and more particularly coarticula- tion, cause-the potential cues for a phonetic segment to be widely distributed through the signal and merge04 often quite thoroughly, with potential cues for other segments. In a syllable like g. to take.* simple case, it is likey that a single persieter of the acoustic' signal--say the second fonelnt-- carries information simultaneously about at least two of the constituent segments and, in some places, all three (Cooper, Delattre, Liberman, Borst. & Oerntman, 1952; Liberman, 1974). .indeed,'it is this characteristic -of speech, this encoding of several phonetko segments into one segment of sound, that is, as we have seen, an essential aspect of the preemiess by which phonetic segments are produced and perceived at high rates. But the result is an acoustic amalgam, not an alphabet. How does the listener recover from it the string of discrete phonetic segments it encodes?

Of course, we might try to evade those questions, and the thorny problems they pose for the auditory Mode, by supposing that the articulators produce, for each phonetic segment, at least one cue that represents thesegment quite straightforvardly (Stevens & Blumstein, 1981). Because the relation of that cue to the phonetic segment is transparent to ordinary auditory processes, the

lli 117 listener might respond most attentiv4ly just' to it, dismissing the others as so Much cseff, or else learning to accept them as associated with, but wholly incidental to, the real business of talker and listener. Such evasion will be hard to maintain, however, if, as we now have reason to think, the typical listener is sensitive to all the phonetic informatirn in speech sounds (bailey 4 Summerfield, 1980).2 Certainly every potentialcue so far tested has proved to be ail actual cue,no matter how peculiar seeming its relation to the phoneti-.. segment.

We should *suppose,- then, that there is in-speech perception a process by which the manifold of variously merged, continuous, and tilme-varying cues 10 made to form.in the listener's mind the discrete and ordered phonetic segments that were produced by the speaker. But it seems hardly conceivable that this could be accomplished by processes of a generally auditory sort. Therefore...! assume, as I said in the introduction, that the process is special one -!-a distinctively phonetic process, specifically adapted to the unique charac teristicsofthe speech code. Since that ease is opaque except as one understands the special way it comes about, I find it plausible to suppose, further, thatia link between perception and production constr,.ni the process as if by knowledge of what a vocal tract does when it make linguistically significant gestures (Cooper etal., 1952; Liberman. Delattre,& Cooper, 1952).

A Special Process of the Phonetic Mode: Iategration of Cues

Of the many experimental results that bear on the existen-t and nature of distinctively phonetic processes, none id critical; what tells is the,weight of the evidence and the way it converges on certain.conclusions. Faced, thus, with many more results than I could hope to include, I had to choose between picking a closely related few and alternatively, offering a token of each type. (Forrecent and comprehensivt reviews, see Repp: #981;Studdert- Kennedy, 1980). I have chosen the related few. selecting ti.em from recent studies that bear on the three questions raised by the characteristics of the speech node I referred to in the previous section. Aspects of these questions have long beet worried about as the problem of "segmeetatic-,": how is the acoustic signal "divided" into phonetic segments (Cooper et al., 1952; Fant, 1962;Liberman,Cooper, Shankweiler, & Studdert -Kennedy. 1967)? Recently, Repp (1978) and Oden and Massaro (1978) ve looked at the other side of the coin, putting attention on the problem of "integration": hole do cues combine to produce the percept? It suits my purposeshile to adopt their perspective, and

ob. I will. .,

Into ration of a time-varying sound. Frequency sweeps--called formant transitions--o tie kind shown -Di prilwa 1 can be sufficient cues for, the perceived distinction between the stop consonants [d] and [g] in the syllables [de] and [ga] (Harris, Hoffman, Liberman, & Delattre, 1958).But, as I asked earlier, how are such frequency sweeps integrated (as information about the phonetic alienation of "place ") into a unitary percept, [d) or [g], that has about it no hint of a corresponding sweep in pitch? Two interpretations are possible: one, that the integration is accomplished by ordinary auditory processes; the other, that special phonetic processes come int(' play.

112 -,-

Timsto. dad10 [gal NORMAL (BINAURAL) PRESENTATION

boo Isolated transitions (to one sod (to attar ear) DUPLEX-PRODUCING (DICHOTIC) PRESENTATION

Figure 1. *thematic representation of the stimuluS patterns used inthe expir4mtnt on integration oethe time.varying torment transitions (Mann, Midden, Russell. & Liberman, Note 2).

113 On an auditory interpretation, oneZght suppose, most simply, that this is an instance of low -leVel sensory int atibn, something like the well-known integration of intensity and time into'Ithe perception of loudnemp. That possibility is quickly ruled out, hhweverk4 by the observation that when the transition cues are removed from the patterr and presented alone,as in the part of the figure at lower right, listenerN do perceivea rising or failing *chirp,* almost a glissando,that oonformalreasonably to the time-varying perceptthat psychoaeousticconsiderationssight haveled. U3to expel:et (Mattingly, Liberman, Sydal, & Halves, 1971).

But the auditory theory is. not so eas1117,disposed of. because itcan always tall-back on the assumption that the fo Ant transitions collaborate with the rest of the pattern in an interaction of purely auditory, sort, from which the percepts,[d] or CO,- emerge. It ra. little that there is nothing inthat We know about perception of oom sounds to suggest that such interaction should occur, for we know very ttle about perception of complex sounds. Nor does it necessarily matter implausible itisto suppose thatthe articulators couldso comport selves as to produce exactly the right combinition of sounds, not just in GIs instance, but in the myriad others that must occur as the articulators ate-to variations N in; for.. example, phonetic context, rite, and linguis strenm, Such consi- derations make an explanation based on auditory inter on endlessly ad hoe, but they do not, in principle, rule it out..

A phonetic interpretation, on the other hand, too have it that the integration of the torment transitions into a unitary rcept reflects the operation of a device specialized to perceive the sounds in a linguistically appropriate way. As for what is linguistically appropriate, it is plain that perceiving the transitions as rising or falling chirp is not. Language, after all, hai no use for that kind of auditory information; it only requires to know whether the segment was [d] or [g]. Indeed, if the chirps and other curious auditory characteristic,s of speech'eounds were heard as such, they would intrude as an intermediate stage of perception that had, itse::, to_be interpreted, however automatically. In that case, listening to speech would be like listening to the acoustic alphabets of our early reading machines, or ,to Morse code, and that would surely be awkward in the extreme. V

What is required, ifthe time - varying transitions are to be perceived (appropriately) as'unitarysegments, is that the percept reflect neither the proximal sound nor the moredistal movements it betokens, but rather the still more distal, and presumablymore nearly unitary, neural command structure that occasioned the movements. A less timid writer might.call that the talker's phonetic intent. 0

. But whatever the percept exactly. ,corresponds to,I suppose that Mature provided a device that is well adapted to its linguisticfunction, which is to make available to the listener just those phonetic objects he needs if he is to understandthat the speaker said. But Mature could not have anticipated the development of synthetic speech anddichotic stimulation,so it is possible to defeat her design in such a way as to discover something about what the design is. TO do this.. we SO a method that derives from a discovery by'Bend (1974). (Bei:v.018o Isenberg & Liberman, 1978; Liberman, 1979). Its special feature is a way of presenting patterns of synthetic speech so that en

114 120 acoustic cue is perceived as a nonspeech sound and, simultaneously,as support for a phonetic percept. The obvious advantage of the method is that it holds the, stimulus input constant while yet producing two percepts, thus providinga control for auditory interaction. Recently, the method has been applied by Mann, Madden, Russell, and Liberman (1981; Note 2) to determine howa time- varying forwenttransition isintegrated into theporimption ofa stop consonant. The experiment' was as follows;

a To one ear we presented one or another of the nine forms01transitions. as shownat the lower right of Figure 1. By themselves, these isolated transitions sound like time-var*ng ,chirps- -that is, like reasonablyfaithful auditory reflections of the time-varying acoustic signal. To the other ear, we presented all the rest of the pattern--the base, so called- -that isshown at the lower left of the figure. By itself, the base is always perceivedas a stop -vowel syllable; most listerws hear it as (del,some as (gal.

Mien these two stimuli are presented dichotically, listenersreport a duplet ,percept. On one side of the duplexity, the listeners perceive the syllable [di] or (gal, depending on the identity of the isolated transition. This speech percept is seemingly no different from theone that would have been produced had the base and the isolated transition been electronically sized and presented in the normal manner. On the other side, iod at the same, time,, the listeners perceive a nonspeech chirp, not perceptibly different from what they experience when the transition is presented by itself. Thus, given exactly the same acoustic context, and the same brain,the transition -19 simultanedUsly perceived in two phenomenally different +lays: as crititti support for a stop consonant; in which case it is integrated intoa unitary percept, and as it nonspeech chirp, in which case ,tt is'not.

To go beyond the phenomenology just described,we determined' how the transitions would be discriminated, depending on which side of the duplex percept the listener was attending to. For that purpose,we sampled the continuum of,formant transitions by pairs, choosing, as members of each to-be- discriminated pair, stimuli that were three steps apart on' the continuum of Torment transitions shown in Figure 1. These we presented in an AXB format (A and B being the two stimuli,to be discriminated and )Vbeing theone or the other) to sajects =who wereinstructedto decide onthe basisof Iny perceptible difference whether X was more like A or likeB. Whenthe subject's attention was directed to the speech side of the duplex percept,we obtained results represented in Figure 2 by the solid line; with attention directed to the nonspeech side, we obtained the results shown by the dashed line. The difference.is obvious. When the, transitions, support stop conso- nants--that is, when they are perceived in the phonetic mode- -the discrimina- tion function has arather high peak,the location of whichcorresponds closelytothe phoneticboundary. Thisis the familiar tendency toward categorical perception that characterizes segments such as this',a tendenc' that is itself, rather highly adaptive, since it is only the categorroal information - -the segment is categorically [d] or (g] --that is most relevent linguistically. Men the same transitions' are perceived, on the nonspeeoh side of the percept, as chirps, the discrimination function. shownas the dashed line and open circles, it different; in fact, it is nearly continuous.3 the discrimination functions confirm the more blatantly phenomenological results described earlier. Both indicate thatintegration of the torment

I 121 c '101) SPEECH i0.IMuiO NONSPERCH

ti

4.

4

1-4 2-5 ,3-6 4-7 5: 6-9 STIMULUS PAIR Dislinabliity of foment transitions *ben, on the speech side of the doplei percept, they supported perception of. stop consonants, and when; on thenonepownside, therwereperceivedas chirps Was Mann, Redden, Russell, & Liberman, Note 2). e transition into a phonetic percept is owing to a special process that makes available to perception a unitary phonetic object well suitedto its role in language.

'-,gee sun* phonetic process that integrates thetransitions has other oharsiteristios, of course, includinooethat has attract** attention to. a Ions time: it alkjustir-perception te variations in the acoustic signal when those are caused by coartioulatory accommodat:on tochanges in phonetic otetextrthus, it seems to reston a link between perception and production (illweimmi et 11.,, 1952; Mahe, 1980; Meta & Stepp, 1981).A second part of the experiment just described was designed to examine that perceptual adjustment to phonetic context, and to exploit the duplex percept to identifythe domain, auditory or phccetio, in which it occurs. To that end,- we took advantage-of an earlier experiment by Mann (1W) in which she had found that placing the syllables Ealj or tar] in front at the EcialEga] patterns caused_ the position. of the tdai-iga) boundary (on the continuum of torment transitione) to shift-- toward the re end for Ear] and the (41 end for (el]. Since the shift was oonsistent with the change to Idal-Ega] artio4ationthat can be shown to occ ur 4then the syllable Cal] or Ear] is spoken immediately before, Mann inferrenV.that this was, indeed, a use in which the perceptualaya:ms had

automatically reflected **articulation 4nd its eccestio consequences. .

Our turther'iOntetbution to Mann's result was' -simply to repeat her experiment, but with the ',duplex,' procedure (and with measures of discrimina- tion substituted for the. phonetic identifications she had used). Th., outcome was-quite straightforward. On the :verech aide of the duplex percept we (in effect) replicated the *wilier result, as shown by the results displayed in Figure 3. Taking the discrininatico.data Obtained with the isolated idahlga) syllables (solid line connecting solid circles)as baseline, we see 'that platiingthesyllable Earlinfront causedthe discrimination peak (and presumably the phonetic dairy) to eowt0,-the right, toward the (g] end of the continuum of tam: *en tall,... loaded, the peak (and the boundary) apparently shifteciAla site directionthat is, to the left, toward F- (dl; for WIN sub.ja4ts, it shifted so-far as to move off the stimulus continuum, so there is, for them, no effective boundary, which explains why ,wee. peak is so low. For' present purposes, however, the point is simply that --there are large effects of prior ;toilette-context on discrimination of the transitions when those are perceived on the speech side of the duplex percept. On the other band, es we see in Figure 4,,the nonspeech side of tne percept is unaffected by phonetic context: discrimination of the forwent transitions is the same whether the base was-preceded by Eel), by Ear], or by nothing.

Putting the two experiments together, we conclude that, giver. a single acoustic cor.%ext, ersotly the same torment transitions are perceived in two different modes. In the one mode, they evoke nonspeech chirps that have a time- varying quality corresponding, approximately, to the time-varying stim- ulus; changes in the transitions arc perceived continuously; and perception is unaffected by phonetic oontext. This is, of course, the auditory mode. In the other mode,the sametransitionsprovide critical' support for the perception of stop ooneonees thatlick the time -va ;ing quality of the nonspeech chirps; changes in he transitions areperceived moreor less oategorioallyi InCrtreoption is markedly affected by phonetic context. This iirtR-sihniietio mode.

117 123 1-4 2-5 3 -6 4-7 5-8 6-0 STIMULUS PAIR

Figure 3. Disorlainability of the foment transitions on the speech side of the duplex percept when the target syllables Ida] and fgal were in isolation and when they were presented by the syllaties Car) and Esilltron Nunn, Madden, Russell, & Liberman, Soto 2). 124 2 0

3 0rc a 0 UI cc 0cc

0Lu ccw 60 a.

1,4 2-5 3-6 4-7 5-8 6-9 STIMULUS 'PAIR

Figure 4, ,Diacriainability of the foraant transitionson thersonspeech aide of the duplex percept under conditions identicalto those repre sented in Figure 3 (frost Mann, Madden, Russell, &'Liberman, Note 2). 119 or

(sal 'spa) [sal

NORMAL (BINAURAL) PRESENTATION

I I

or or

AIIRIMMIIIMIMIUMINI11 base with and without silende isolated transitions (to one ear) (to other ear)

DUPLEX-PRODUCING (DICHOTIC) PRESENTATION

Figure 5. Schematic representations of the stimulus patterns used to deter- mine whether theimportance of silence asa cue is owing to auditory or phonetic factors. (Fro! "Duplex/perception of cues for stop consonants: Evidence fors phonetic mode," ,by A. M. Liberman, D. Isenberg, and B. Rekerd, Pere.' ion & Psychliphysics, inpress. Copyright by the Psychoribmic Soo ety, Inc. Reprinted by permis- sion.)

126 Integration of sound and silence. Perception of a phonetic segment typically depends, as TMaiEod-iiiMir, on the integration ofseveral--many may be a more appropriate word--acoustic cues. Even in the case of [da] and [ga] just dscribed, there was one othercue, silence preceding the transi- tions, though I did not remark it. To show the effect of- such silencer-an effect long known to researchers in speech (Bastian,Delattre, & Liberman, 1959)--we must put the atop consonant and its transitioncues into some other position, ms in the examples [spa] and [eta] shown at the topof Figure 5. As we see there, an important cue for perception of atop conso- nants--in tnis case, [p] and [t]--ia a short periodof silence between the noise of the fricative and the formant transitionsthat introduce the vocalic part of the syllable (Dorman, Raphael, & Liberman, 1979).

But why is silence necessary, and in which domain,auditory or phonetic, is it iategrated with the transitioncues to produce atop consonants? On an auditory acrlount,we might suppose that there is forward masking of the transition cuesby thefricativenoise, in which casetherole of the intervening silence is to provide time forthe transitions to evade masking. Failing that, we could, as always, invoke awe previouslyunnoticed interac- tion between frequency sweeps (transitions) and silence thatis presumed to be characteristic of the way the auditory system works.

A phonetic interpretation, on the other hand, takesaccount of the fact that presence or absence of silence supplies importantIphoneticinformation- - to wit, that the talker closed his vocal tract,as he must to produce the [p] and [t] in [spa] and [sta], or that he did not,as he does not when he says [sa]. Presumably, the processes of the phone*: modeare sensitive to the phonetic significance of the informationthat silence imparts.

To decide between thescinterpretations, the phenomenon ofduplex percep- tion was again exploited (Liberman, Isenberg, & Rakerd, in press). As shown in Figure 5.base stimuli that sometimes did, and sometimes did not,have ail.nce were presented dichotically with transitioncues appropriate for [p] or for (t]. Two such dichotically yoked patterns were presentedon each trial; tUbjects were asked to identify the speech perceptsand to discriminate the nonspeee; chirps. The reault.was that the subjects fused the transitions with the base and accurately perceived [ta], [spa].or [tta], depending pn the prosence or absence of silence in the base (to one ear) and the nature of the foment transitions(to the other). But the subjects also perceived the transitions as nonapeech chirps, and accurately discriminatedthem as same or different regardlesssof whether or no} therewas silence in the base. Thus, duplex perception did occur, and silence affected the identificationof the speech, but sot thediscriminatiOn of the nonspeeoh.

In a further experiment, the investigators provideda more severe test by asking subjects Co discriminate Jeir perceptson both sides of the duplexity. For that 004116ie,, two dichotically yoked pairs of stimuliwere, presented-on each trial, so arranged as to exhautt all combinationsof silence -no silence in the and[p]-[t]cues in the. isolated transftiona.. Subjects were asked, for each per of percepts, to rate their confidencethat a difference of any kind had been detected. -The results are shown in Figure 6. There are but two critical comparisons:' The firstis in the leftmost third of the figure, in the condition in which there was no silence in ether ofthe two

121 i ?Uwe 6. Ne Mims of star (sposobor ) were powaption ot oyesfor by A. K. 1.1bersan, D.. as. to press. Copyright by the Psyohososic sloe.) toted by meals-

128 base *Una' presented to the one ear (labelledpo Sllence- No Silence") and the two transition cues to the other ear were different (labelled simply *Different"). Gn the speech side of the duplexity (open bar),we see that the difference between, the transitions was not clearly detected, presumably because, in the absence *faience in either base stimulus, Subjects perceived Cu] in both oases. But, on the nonspeech side (shaded bar), ti.e same diffirende m; detected; horsy theabsence of silence in the base made no dirilfrsuoe.Thoother critical comparison is seen in the bars immediately to the- right,. in the addle third-of the slide, representing thecondition that had, in the one ear, silence in onli base stimulus but not the other, ands in the other, otr.'two transition cues thatwere the same. On the speech side of the dMplex percept, ere see thatthe patternswere perceived as very different, eventhoughthe transition cues were the same; presumably, thiswas because one percept, being influenced by the presence of silence, included,a stop consonant, while the other, being influenced by the absence of silence, did not. The result on the nonspeech side stands-1n contrast.% There, the perlsepts'were judged.to be not very different, accurately reflecting the fact

that-thei were, in fact, not differ t,. ,

Thus, in both *Mina oompar sons, silence affected discrimination of .the transitions only on the speech side of the duplex percept. Apparently, its importance depends on distinctively_phonetic processes; add its integra- tion with the tranSition occurs in. the phonetic mode.

,._ The integration of silence and transitions, a"if the patterns just described, reinforces the suggestion, made earlier in regard to the integra- tion of the transitions alone, that the perceived object is not to be found in the movements of the speech organs at the periphery, but rather at some still more distal remove, as suggested by Deep, Liberman, -rsoardt,-and Pesetsky (1978). To see the point more clearly, we should' first take note of a finding that adds another cue for the (p3 in (spa]: the shaping of the fricative noise that is caused by the wily the vocal tract closes fors, p3 (ftelerfield, Smiley,Seton, I Dorman, 1981). Now we have three acoustic 'cues that correspond. neatly tc three corresponding'aspects of thi articulation.There is, first, the shape-of the fricative noise, which signals the closing of the tract; .then the silence, which signals the closure itself; and finally the formant transitions, which signal the subsequent opening into the vowel. If thews three aobustia cues are integrated into a percept that doenot display at least three constituent elements,thenthe perceixed object rust be upatremm &Om the peripheral articulation. A likely candidate, as suggested earlier, is the unitary command structure from which the various movements at the periphery unfolded. ;

Integration of riodie sound and noise. When a talker closes his vocal tract to produce a op oonsonint andMIT opens it 'into a following vowel, the resulting silence and torment transitions are, as we have seen, integrated into astop consonant. It ,is surely provocative that similar ferment transitions are produced, but without thisilence, when a talker almost closes his vocal tract so as to make the noise of a fricative (e.g., Is)), and then opens into the vowel, for in such oases the formant transitions do not support stops; they are, instead, integrated with the noise into the perception of a fricative (Harris, 1958; Kann & Reps, 1980; Whalen, 191). such integration is shown in Figure7, where Ihave reproducedthe results of a recent if 123

129 I

iiimuar-A-Rs)u 0---0-[(1)u]

0 1 9 [In L[S] FRICATIVE NOISE CONTINUUM

Figure T.'!distills:Anion functions fora(53-Es]noise oontinutas When connected to En-approprists or Es]- appropriate transitions andAthe vowels Cal or Cu]. (From *Teo strategies in fricative discriina- tion,* by B. H. limp, Pero' Lion & Psychsphysios, in press. Copyright by the Psychonom o sty, Ind. Reprinted by permis- sion.)

130

1¢Ceal-,- I

experiment by Repp (in press). What,we see in the figure are the judgments [4] or [s] made to stimuli that were constructed as follows.The experimental variable, ranged on the abscissa, was the position on tke frequency scale of a patch of band-limited noise as it moved between a place appropriate for ED and one-appropriate for [s]. The parameters were the nature of the (follow- ing) formant transitionsappropriate, in the onecase, for [8] and, in the other, for [1] --and the two vowels [a] and [u]. We see that the transitions (and also the vowels) affected the perception of the fricative.

Though not shown in this particular experiment, I would note, parentheti- cally, that,patterhs like these, but with 50msec of silence inserted between the fricative noise and the vocalicsection, will be perceived, not as fricative-vowel syllables, but as fricative-stop-vowel syllables -(Mann & Repp, 1980). That is, inserting 50 msec ofsilencewill cause the formant ,transitions to be integrated, not into fricatives, but into stops. It is difficult to account for that as,an auditory effect, but easy to see how it might reflect a special sdhsitiiity to information about a difference in articulation that changes the phonetic-- "affiliation" of the acoustic transi- tions.

In a further, and more severe, test of the integration of transitions and fricative noise that wesaw in Figure 7, Repp measured the effect of the .formant transitions/ on the way listeners discriminatedvariationsinthe frequency positioq'of the noise patch,using ''or this purpose the highly sensitive method df "fixed standard." He found two distinctly different types of discrimination functions. One clearly showed an effect of the formant transitions' and reflected nearly 'categorical perception; the other just as clearly showed no effect of the torment transitions and represented perception thkt was nearly Continuous. Which type Repp obtained in each particular case depended, apparently, on the listener's ability to isolate or "stream" the noisethat is,-to create an effect similar, perhaps, to the --One obtained by Cole and Scott (1973) when they found with fricative-vowel syllables that, as a result of repeated presentation, the noise and vocalic sections would form separate "stream'" that had little apparent relation to each other. At all events,° wehavehere another instance, though occurring in r different phonetic clasp and obtained by very different meth0ds, of a single acoustic pattern that is perceived in two distinctly different ways. One reflects the ,, integration of cues in the phonetic mode, the other the "nonintegration" of the same acoustic elements in the auditory mode.

There is still another method that exploits the poStitility of perceiving exactly the same atimulus pattern in two ways, and thus enables us to testyet again whether the integration of formant transitions and noise occurs in the phstaitili -ifr-aiaitory modes. Wt, t'row-, the Vw-o- ways of,perceiving are not speech versus nonspeech, as in the experiments described thus far, but rather two kinds of speech--namely, fricatives and stops. The relevant experiment is a recent one by Carden, Levitt,Jusczyk, and Walley (1981). Starting with synthetic patterns that produced stop-vowel syllables, they varied the second- torment transitions and found the boundary between [b) and (d). Then they placed in frontofthesepatterns a fixed patch of band-limited noise, neutralized as between the fricatives (f) and [01. In these patterns, the formant transitions cue the difference between the fricatives, but, because the place of vocal-tract constriction is different for the two fricatives, on

125

131 -4.

TINE

Figure S. 3ohaliatio reprasmobatile weed to *valuate Abe 441101144144, o. Int t of vilinoe sad Obrewhb traaati ',transitions ant two Ore aftroeptuel ant asaaert wad A. N. Liborasat Copyright 1960 by parsisalon.) tlie*ome hand, and the two stops, on the other, the perceptual. boundary on the continuum of torment transitions is now displaced. -That is, exactly thiCsame torment transitions distinguish the fricatives differently tram theway they distinguish the-stops. The ,-effect seems most plausibly to be phonetic, refleoting the listener*. *knowledge,* as it were, of the difference in articulatory place -at production between the stops, tb] and (d],on the one hand, and the fricatives, Ifl--and C 1, on the other.But, just tenon sure, Carden Judi his collaborator! presented the, patterns with thenoise patch to one ipoWof sUbjects-aneboldly Baked. thee to perceive stops; then, in precisely reverse fashion, they presented the patternswithout the noise patch 14rik-secanditoup with instructions to perceive fricatives.Ihe listeners' re*cited 'boundaries on the continuum of transitions. thatwere appropriate to the class of phonetic se ants t[b]vs. Id] orit] vs. (0]) they were asked to tear. Thus, exactly the same acoustic patterns yielded different boundaries on the oontinuum of transitions, dependingon whether the listeners 'were perceiving the patterns as stops or as fricatfves. Discrimination function. were also obtained, and these confinedthe boundary. shift. We see.then, that transition clues like 4hose that integrate- with silence to produce a stop oonsonant will integrate with noise to produoia fricative. In both oases, the integration it in the phonetic node.

The equivalence of sound and silence when 4'*egreted". Implicit in the discussion so far is the assn Lion that when ac do cues integrate to form a phonetic percept, they are,for that purpose, perceptually equivalent; otherwise, it would make no sense to speak of the percept as unitary. It is net implied that the Wes are 'necessarily of equal importance or power, only that the4r separate contributions are not erased as separate.But eves that implication is of interest tram a theoretical point of view, because the cues are often very different scoustioally, having in common only that they are the oommon °produots of the same lingilstioally significant gesture. Hence their equivalenceis tobe attributed,most reasonably,to the link between perception and production that presumably Characterizes phonetic processes.

But the implied equivalence of diverse cues is so far just that-- implied. To test the equivalence more directly was the purpose of several experiments. - One of these, by Fitch, Halwes, Erickson, and Liberman (1980), was designed to 03801116 the equivalence of silenoesand torment transitions in perception of the atop consonant in split as opposed to its absence in slit. Synthetic 'patterns likethoseN4thoun inFigure 8 were used.TA. variable was the duration of silenelibetween the'fricative noise.and the vocalic portion of the syllable; the perimeter of the experiment was the nature of the torment transitions at the start of the vocalic sections, aet_en_ea_te_bias_thst_ section tower" (lit], in the.one Awe, and toward (plit] in the other.When stimuli that had been constructed in this way were presented for identifies- tion as slit or split, the results shown in Figure 9 were obtained.One 3003 there a trading relation not different in principle from those 'found by other investigators with other cues. (For a review, see, again, Rem, 1981). The displacement of tha two response functions indicates that, for the purpose of producing the (0 in split,about twenty asec ofsilence isequalto appropriate torment traniitiOns.4 Thus, silence is equivalentto sound, but only, I should think, when both are produced as parts of, the same phonetic act.

IA 127

. 133 rt

emme111.ittj SEMIS °nowt) *s' plit]seams

32 64 00 96 112 120 144 mut160 AtttENT-INTERVAL

Figure9. Effect of silent'interval on perception of /slit/ vs. /split/ for the two settings of the transition cue. (From ',Perceptual equiva- lence of two acoustic, cues for stopconsonant fanner,* by M.. L. Pit.b. T. Helves, D. M. Erickson, and A. M. Liberman,. P ties & PsychoPhysios, 1980, z 343-3590 Copyright 1980 by the music Society, Ins. Reprinted by permission.)

134 10° Of Wars*, it might be argued that the splits producedby thetwo different combinations of silence and sound were not really equivalent, but the forced-choice identification procedure, permitting only the responses slit or font, give the subjects no opportunity to say so. Against that possibili- ty, we carried out another experiment, designed to determine how well:- the subjects could discriminate selected dambinstions of the stimuli on any basis whatsoever. The rationale for selehtion of stimuli was as folloWs. If the two cues, silence and sound, are truly equivalent in phonetic perception, their perceptual effects should be algebraically additive, as itwere. Thus, given two synthetic syllables to be discripinated, and given a base-line level of discriminsbility determined for pairs of stimuli that differ in only one of the cues, it should be possible to add the second cue so as to increaseor decrease diecriminability, girding as the phonetic *polarity* of the two cues causes their effects to work together crat cross purposes. The cute should *summate,*. or *cooperate,' When they are biased in the same phonetic direction--as when one of the syllables to be discriminated combines a silence cue that is longer by the amount of the *trade* with transition cues of the (plat] type, and the other syllable combines a silence cue that is shorter by thi amount of the *trade* with transitioncues of the (lit] type. They should *cancel* each other or *conflict,*.when the opposite pairing is made.,-that is, when the longer silence. cue is combined with transition cues of the typo, and thp shorter silence que with transition cues of the Wit] type. Pairs of stimuli meeting those bpecifloations, and sampling the continuum of *silence durations, were presented for forced - choice discrimination. As shown in Figure 10, discrimination of patterns differing-by both cues was, in fact, either better or worse than patterns that differed by only one, .depending _on. whether the cues were oalculeted to *Rooperate* or to *conflict.* Apparently, the effects of the two cues did converge on a single perceptual object. By this test, then, the cues may be said to be equivalent and the percept may be said to be truly unitary,

That the equivalence of silence and sound in the above example is-owing to phonetic processes is supported in an experiment by Best, Morrongiello, and Robson(1981). Indeed, itis supported there more strongly than inthe experiment just described, because But and her collaborators found -that ;the equivalence was manifest only When the stimulus patterns were perceived as speech. As a first'step, they performed an experiment very to theone by Fitch et al., except that the stfmull wereLai-Ltainstead of alit - split] and the transition-cue parameter was simply the frequency at which the first forwent started. With thee* stimuli, they obtained the identification func- tions shown ih Figure. 11. Wesee there almostexactly the same kind of trading relation'between silence and torment transition that_had been found in the-earlier-iikperiment. hi the manner of Fitch et al:, they also tested discrimination, finding, just as Fitch et al. had, that the two cues-could be made to cooperate or to npnflict depending on 'their phonetic polarities. But now they performed an exiwrisent that proved to be particularly revealing: Borrowing a'procedure that had been used euccessfullfor a similar purpose (Lane ASohneicteraAnte4;_ikileY-s-Sulmerflaid,4-Darleshi,-1977;.--Dormanl---1979).-

end sore recently made the object of further attention (Remex,--ffibin, Pisoni, - A Carrell) 1981), they replaoed the formanta of the vocalic portion of the syllable with sine waves, taking care that the sine waves followed exactly the course of the torments; they replaced. The sounds that result are perceived by most people, at leapt initially,-as nonspeech patterns of noises and tones.

129 135 -4r"M.° OW Ctir en TWO COOPERATING OK ammo- TWO COMPLICTIP40 COS

16 12 4$ 64 SO t6 112 12$ 144 mum AViRAGE SILENT INTERVAL

4 . Figure 10. rptoont comet disortminatioh for pairs of stisull that differ by one cue or by Om ouae-of the same (cooperating cuss) or opposite (mutilating owes) phimetio polarities. Mos *erceptual equips- Unite of two aopustiocuss for stop.00nsonent seaport* by Pitlftw T. Wm, D. M. and A. N. Liberman, Porpootilmilf P nibni 15a0f, 343-350. Copyright 1960 by tbo PsyChnnolie *ty, leo.lepri tod by-permission.).

.0" 136 ?lore lt. Effect of'silent interval on perception of /say/ vs. /Stay/ for the two settings of the transition cue.,(Fros 'Perceptual liqu4valence of acoustic cues in speech and nonspeech perceptha, by C. T. B. Norrongiollo, 1 R. Robsor.. Perception & nine, 1981, 191-211. Copyright 1981 by the Psyohoron- ic iety, Inc. Reprinted by perlissioa.)

AMPS. 131

137 -saf. we LIMNOS C. -NIPAriblim: Lwow 17; smembommin memmsoss

....4010Mriftemrimm4 nome Oft Orwora OW asor Wesaiss Mem Mos OlessealMy

-4 28 32 IS 116 66 96 32 al 66 M es mum aro ammo* soma *v GAP GNAIII2P6 132060 KENT GAP OU6A71,4

Figrire t2.- Effect of silent intervalon *identification* of sine-weve analo- gues of saVatay stimuli. Graph is for those srbjeots mho perceived these stimuli as speashPsay-start listeners). Graphs B and C are for those Mbo perceived thanas nomapeech, divided, according to their reports of it the soundswere like, into those to were apparently attemdimg IA the transitioncue (Graph 11, ospeotralu =Augers) or, alternatively.; the silencecue (Graph C, *temporal', listeners). (From sPeroeptualsequivelence of acoustic OMB' in speech and nonspembh perception,' by C. T. Best, B, Norrongiello, and R. Sobson, Perception& is else sloe, 1961, 191411. Copyright1961 by the Payogloom n e y, Inc. Bepointed by permission.)

138 But some spontaneously perceive them as speech, and others perceive themso after it has been suggested to them that they might. It is possible, thus, to obtain identification and discrimination functions for thesame stimuli when, in the one case, they are perceived as speech and when, in,the other, theyare not. (When perceived as nonspeuch the patternsare, of course, not readily identifiable, but identification functions can beobtained by presenting, on each trial, the target stimulus- -that is,the stimulus to be identified- - together with the two stimuliat the extremes of the continuum, and then asking the subject to say whether the target stimulus ismore like one or the other of the extremes. To insure comparability, the same procedure is used when the subjects are perceiving the stimuli as speech.)The results are shown in Figure 12. We see,in Figure12a, that when the subjects were perceiving the patterns as speech ("say-stay" listeners), their identification functions exhibited the now familiar trading relation. But when the same stimuli were perceived as nonspeech, then, as shown in Figures 12b and 12c, two quite different patterns emerged, depending on whether, as inferred from the subjects' descriptions of the sound, they were attending to the transition cue ("spectral" listeners) or the silence cue ("temporal" listeners). It is, of course, precisely because the subjects could not integrate thecues in the nonspeech percept that they chose, as it were, between the one cue and the other. In any case, both of the identification functions in the nonspeech case are different from the one that characterizes the response to exactly the same stimuli when they were perceived as speech. (Discrimination functions obtained with the same stimuli were also different depending on whether or not the stimuli were perceived as speech, nicely confirming the result obtained with the identification measure.) Thua, with yet another method for obtaining speech and nonspeech percepts from the MOO stimulus, we again find evidence supporting the existence of a phonetic mode, and we see that the equivalence of integrated cues is to be attributed to the distinctively phonetic processes it incorporates.

The equivalence of sound and sight when integrated. Perhaps the most unusual evident= relevant to the issue I have been discussing comes from a startling discovery by McGurk and MacDonald(1976) about the influencer on speech perception of optical information about the talker's articulation. (See also MacDonald & McGurk, 1978; Summerfield, 1979). When subjects view a film of a talker saying one syllable, while a recorded voice says another, then, under certainconditions, theyexperience a unitarypercept that overrides the conflicting optical and acoustic cues. Thus, for example, when the talker articulated [ga] or Ida]and the voice said[ba], most subjects perceived [da]. In that case, the effect of the optical stimulus was, at the very least, to determine place of production. When, in a subsequent experi- ment by McGurk and Buchanan (Note 4), the talker was seen to produce the syllables [ia], [va], [p], [da], [3a]. [ga], [ha], while the recorded voice said [be] over and over again, most subjects perceived [ba], Eva]. [0]. Ida], and theh, for visual [ha], a variety of percepts other than 1010. Here. both place of articulation and manner of articulation were deter fined by the optical input. .(The difficulty of seeing farther back in the vocal tract than [da] accounts, presumably, for the fact that visual [3a], [ga], and [ha] were perceived as having generally more forward places of production.)

Having witnessed a demonstration of the McGurk-MacDonald effect, :Itake the liberty of offering testimony of my own. I found the effect compellirl,

139 133 but, more to the point, I would agree that McGurk and Buchanan (Note 4) have 'captured my experience when they say, "..,the mpjority of listeners have no awareness of bimodal conflict ...," and then describe the percept as "uni- fied." Surely, my percept was unified in the important sense that I could not have decided by introspeetive analysis that partoims visual in origin and part auditory. Even inthoseoases inwhich, given conflicting opticaland acoustic cues,I experienced two syllables, there was nothing about their quality that would have ,permitted me to know whieI had seen and which I had heard.

By way of interpretation, MacDonald end Mc0Urk (1978) indicate that their results bespeak a connection between porceptien:an. production, and McGurk and Buchanan (Mote 4) echo 0 comment by Summerfield (1979), who observed, after having himself performedseveral experiments on the phenomenon,that the optical and acoustic signals are picked up'in a "common metric of articulatory dynamics." I would agree,though I would, of course,' prefer to call the commonmetric " phonetic." But a modeby any othername would bear as weightily on the issue I have put before you, for the important consideration tit is that, in any ordinary sense of modality, the 'WI percept is neither visual nor auditory; it is, rather, something else.

Inte rationFinto ordered atrin s. Having so far considered only the percepon of drirduirRioneic segments, we should put some attention on the fact that phonetic segments are normally perceived in ordered strings. This wants explicit treatment if only benauet, as the reader may recall, a characteristic of the spebch code is that Ifteral phonetic segments are conveyed simultaneously by a single segment of sound. As the reader say also recall, it is just this characteristic of the code that enables the listener to evade the limitation imposed by the temporal ; ..solving power of the oer. The further consequence for perception, which we will consider now,- is tsiat the listener cannot perceive phonetic pigment by-nhonetic segment-in left to right (or right to left) fashion; rather, he must,.,take account ofethe4entire stretch f sound over which the information is distributed. Such an acoustic stretch typically sighste a phonetto structure that comprises several seg- ments. I will offer only a brief example, taken from a recent study by Repp et al. (1978), and chosen because the rk.evEn',, luau happens to cross a word boundary.

The experiment dealt with the effrct of two cues, silence and noise duration, on perception of the locutions 6,ay 2A1.2, grate chip,ve*tothip,and great chip. In Figure 13 is a spectrogramofthe words jorah1p, ithwhich the experiment began. The variable", ahcwn in the "figure, was the duration of silence between the two words. Given the result. of previo.a research, we knew that increasing the silence would bias away from the fiicative in jship and toward the affricate (atop-initiated fricativen chip (Dorman,-Raphael, & Isenberg, 1980; Dorman et al.1 1079). The pa eter, also shown in the figure, was the duration- of the fricative noise, Iniown from previous research to be a cue for the same distinction: imreasea in duration of the noise bias toward fricative and away from affricate (Gerstman.. 1957; Dorman et al., 1979).

In Figure 14 are the results. We see in the graph at the upper left that whenthe 'noise durationwas relativelyshort '62 msen), increasing the

134

A -1.10 I

fricative noise silent interval

Figure 13. Spectrogram of the words "gray ship."

a

or a Ivry10111 ommikersiit --ID*NIin inelstion -111140-10110$000 110 Clare" pieCa* W NOME OUNATIOnvetistion Oil NOISE OURATION1020.8c

Ilel NOM 011001TION141twiec 010 NOM OUR/01001 S

0 41 AO ID SILOCIE OURATIC,

risers 14. Thebeffeot of durstigh ot anima, at each of four durations of fricatAva mode,oq -the perception and placement of stop (or lafricata) menmer, ,Was "Perceptual integration of idOustio cues Pstop. most a, andaftrtoato manor,*by L H.-Reip, .P11.Liberman :and D. Pesetsky, Journal of Experimental Pa loth: Haman Perception and Parformanci, 1976, 4, 421.637. C*Pittliht 1978 b1;011,Amaricsn Psychological Associa- tion Inc.Reprinted 1' pernisaiana,"--

136 1

duration of the silence caused the percept to bhange fromship to chip. Thus, the effect of silence was to produce.a stop-like consonant to its "right," WW1 as it had done in the cases of slit -split and isa)-(spal-(stalthat were dealt with earlier. But, as shown in the graph at the lower right,when the duration of the fricative noise, was relativelylong (182 melee), increases in the duration of the silence caused the perceptionto change, not to gray chip, as before, but to great ship. _That is,increasingthe duration of the fricative noise in ship put a stop consonant at the end ofthe preceding word. The effect is superficially "right to left."But, of course, the effect is in neither direction; it is more properly regardedas a matter of apprehending a structure.

t Given, then, that 'the listener must recover several phoneticsegments from the same span of sound, we ask three questions aboutthe underlying process. 'First, how does the listener delimit the,acoustic span? That is hoW does he know when all the information that is tobe.provided has been provided? There is, after all, no acoustic signal that regularly marks the information boundary. Second, how-does the listene'r store the information as it accumulates? And, third, what does -he do while he waits? Does he simply resonate, as-it were, or does he entertain hypotheses?If the latter, doer he entertain-all ,possible hypotheses? Does he weight. them abcording to the likelihood they are correct? And how quickly does he abandon them as they are proved wrong?

If these questions seem familiar to students oT sentence perception, it is, Fthink, because processes in the phonetic and syntactic domains do have something in common. In both cases, information is distributed in distinc- tively linguistic ways through the signal. As a consequence, the perceiver must recover distinctively linguistic structures: To that extent, the resem- blance between processing in the,two domains is not superficial. Nor is it, if we take the vertical view of languageI earlierespoused, altogether surprising.

Afterwords, Omissions, and Prospects

Having set out years ago to study communication by acoustic alphabets, we might still be so occupied. For acoustic alphabets can be used for communica- tion--witness Morse code--and there are innumerable experiments we could have done had we gone on trying to find the alphabet that works best. But it Is not likely, as a practical matter,that we would ever have made a large improvement. Nor is it likely,.from a scientific point'of view, that we would ever have learned anything interesting. Acoustic alphabets cannot become part of a coherent process; I suspect, therefore, that there is nothing interesting to be learned.

But speech was always before us,proof that there isa better way. Inevitably, then, we put our attention there and, in so doing, began to bark up the'right tree. It remained only to find that speech and language require to be understood in their own terms, not by reference to diverse processes of a horizontal sort. But once the vertical view is adopted, there is little doubt about what we must try to understand.

143 137

r I There is also little doubt,`at any stage of the research on speech, about how much or how little we do understand, because there is a standard by which progress can be measured; we are not in the position of explaining behavior that we have ourselves contrived. Thus, to test what we think we know of the relation between phonetic structure and sound, we have only to see how that knowledge fares when used as a basis for synthesis. In fact, it does well enough toenable ueto synthesizereasonably intelligiblespeech,which suggests that we do know something (Liberman, Ingemann, Lisker, Delattre, & Cooper, 1959; Klatt, 1980; Mattingly, 1980). But the speech is not nearly so good as the real thtng,--whibli-proves, as if proof were needed, that we have something stillter earn. Perhaps what we must learn most generally is to accept the hypothesis, alluded to earlier in the paper, that human listeners are sensitive to all the phonetically relevant information inthe speech signal. If that hypothesis is true, and if the acoustic cues that convey the information are as numerous, various, and intertwined as we now believe them to be,'thedwe should act on our assumption thit the key to the phonetic cod is in the manner of.its production. That requires taking account of all we can learn about the organization and control of articulatory movements. It also required' trying,- by direct experiment,to find the perceptual conse- quences (for the listener) of various articulatory maneuvers (by the speaker). To do that we' must, of course,press forward with the development of a researchsynthesizerdesigned to operate fromarticulatory,rather than acoustic,controls (Mermelsteins. 1973; Rubin,Baer, & Mermelstein, 1981; Abramson, Nye, Henderson, & Marshall, t981). The perfection of-such a device, itself an achievement of some scientific consequence, will enable us to find a more accurate, elegant, and useful characterization of the informational basis for speech perception.

It will not have escaped notice that the claim to understanding I have made is,in any case, a modest one. At most, we presume to know something about what phonetic processes do, and in what ways they are 'distinctive and coherent. As for mechanism, howeier, there Is only the-assume&link between piPteption and production, and even -there we have no certain, or even clear, idea how such a link might be effected. If we knew more about mechanism, we would presumablrbe in a better position to design automatic speech recogniz- ers of a nontrivial sort (Levinson & Liberman, 1981). At present, however, we can only claim to understand where the difficulties lie. That is an important step, to be sure, 'but it is only the first oie, and it will almost surely prove to be the easiest.

Since I have takenthe positionthat speechperception depends on biologically specialized processes, I should, at. last, acknowledge that neurological anddevelopmental studies are relevant. For if phonetic processes are distinctive and coherent from a perceptual point of view, we reasonably expect that they are so from a neurological point of view as well. We do, then, look to neuropsychological data to provide further tests of our hypotheses,to refineourcharacterizations, and, indeed, to supplynew insights into the processes themselves. As for the biology of the matter, we must rely,heavily, of course, on developmental studies of speech perception, especially when theseincludevery younginfantsandcomparisons across languages. Such studies enlighten usabout what might have developed by evolution in the history of the race, and what remains to develop, presumably by epigenesis, in the history of the individual. Of course, neither the

138 144 neuropsychologicar nor the developmental studies will be useful unlesswe ask the right questions. But I believe we are learning how to do precisely that.

REFERENCE NOTOr

1. Fodor, J. A. The modularity or mind. Unpublished manuscript, MIT, 1981. 2. Mann, V. A., Madden, J., Russell, J. N., A-Liberman, A. M., Integration of time - varying wet andthe effects-of phonetic context. unpublished manuscript, Harasel;Laborattories$1198i, 3.---Lane,H. L., & Schneider,B. A. Diserimirative control of concurrent responses,'kthe intensity, duration and relative aierilme of auditory stimuli. Unpublished report, Behavior Analysis Leboratory, University.of Michigan, 1963. 4. McGurk,H., &Buchanan, L. Bimodal speech perception: Visionand hearing. Unpublished manuscript, Department of Psychology, University of Surrey, 1981.

REFERENCES

Abramson, A. S., Nye, P. W., Henderson, J. B., & Marshall, C. W. Vowel height and the perception of oonsonantal nasality. -Journal of the Acoustical Society of America, 1981, 70, 329-339. Bailey, . Q. Information in speech: Obsereations on tha perception of Es]...stop clusters. Journal of Axparimental Peycholymi: Human Perception and Performance, 1980, 6, 536-563. Baile77-P7 Mid, Q., & Dorman, M. On the identification of sine- wave analogues of certain4speeoh bounds. Haskins Laboratories Statue Ikumft on Speech Researobft 1977, SR-51/52, 1-25. BastiaCX:,-Tvelattre,17,-Irersan,-Milent intervalas a cue for the distinction between sto and semivowels in medial position.Journal of the Acoustic$ Society of America, 1959,Jo..1568. (Abstract) Best, C. T.,torrongiello, B., &Robson, P. Perceptual equivalence of acoustic cues in speech and nonspeech perception. Perception & Psychysics, 1981,22,191-211. Carden, G., Levitt, A., Jusozyk, P. W., & Walley, A. Evidence for phonetic processing of cues to Nano of articulation: Perceived manner affects perceived place. Perception &psychophysios, 1981,22,26-36. Cole, R. A., & Scott, B. Perception of temporal order in speech: The role of vowel transitions. Canadian Journal of Psyohology, 1973, 27, 441-449. Cooper, F. S. Research machines for the blind. InP. A. Uhl (Ed.), Blindness! Modern approaches to the unseen environment. Princeton: Princeton University Press, 1950, 512-543. Cooper, F. S., Delattre, P. C.,Liberman,. A. N., Borst, J. M., .& Gerstman, L. J. Some experiments on the perception of synthetic speech sounds. Journal of the Acoustical Society of America, 1952, 24, 597-606.

Coopei7V5., Liberman, A. M., & Borst,J. 01.. The interconversion of audible and visible patterns as of basis for research-on the perception of speech.

Proceedings of the National Academy of Sciences, 1951, it, 318-327. , Dorian, M. F. On the, identification of sine-wave analogues of CV syllables. In E. Fischer -Jdrgensen, J. Riechel, & N. Thorsen (Eds.),Proceedings of the NinthInternational Congressof Phonetic Sciences (Vol. II). Copenhagen: University of Copenhagen, 1979, 453-460. Dorman, M. F., Raphael, L. J., & Isenberg, D. Acoustic Ames for fricative- /

139 145( affricate oontrast\ in word-final position. .journal of Phonetics, 1980, 8, 391-405. Dorman, M. F.., Raphael, J., & Liberian, A. M. Some experiments on the sound of- allancein, phonetic perception. Journal of the Acoustical Society of 'America 1979, 65, 1518-1532. Pant,C. G. A:7271Migrfptive analysis of the acoustio ',peas of speech. Logos, 1962, 5, 3-17. -Rilesh,R. L., Hawes, T., jerrson,D. N.,& Liberman,A. M. Perceptual equivalence oft aoousticues for stop-consonant manna:. Perception & Payobnphysics, 1980, 34 350. Gerstnan, L. Jo Perot, dime ons for the friction portions of certain speech abiinds. Unpublished tonal dissertation, New York Univesity, 1957. Harris, K. S. Cues for the disorimi tion of nmerican English fricatives in spoken syllables.Language and 3 ho 1958, 1, 1-7. Hereby, 1. 3., Hortman, N. 3., Liberman A. N., & Delattre, P. C. Effect of third- formant transitions on' the mention of the voiced consonants.

Journal of the Acioustioalooie4 of ice, 1958, 12;i-122-126.. . Isenberg,_D., & Liberman, A. H. Speech non.speech peruepts from the same sound. Journal, of the Acousticalleallty ofAmerica, 1978, 64 (Suppl. 1), 320. lAbstract)

Klatt, D. H. , Software forboascadeiparallel foment synthesizer. Journal of the Acousitioal Society of America, 1980, 67, 971-995. Levinson, S. E., & Liberman, N. Y. Speech recognition by computer. Scientific American, 1981-, 244, 64-76. LiberrgElra161mmars of speech and language. *Cognitive Psychology, 1970, 1, 301-323. Liberman, A. M. The specialization of the langtiev hemisphere. F. 0. Schmitt& F. G.Wtorden Crods.1, The $eurosoiencea: Third Study Program.Cambridge, Nags.: NIT Press, 1974, 43-56. Liberian, A. N. Duplex perception and_ integration of c903: Evidence 'that speech is different from nonspeech and similarto language. In E. Fischer-Jargensen, J. Rischel, & N. Thorsen (Eds.), Proceedings of the Ninth International Contrite ofPhoneticSciences (Vol. II). Wfienfilfa: University of Coptagen,797971=413. Liberman,A. N., & Cooper, F. S. InSearoh of the acoustic Cues. 1.16 A,, Widow (Ed.), Papers in phonetics and linguistics to the memory of Pierre Delattre. 1Se Hague:Mouton, 1972, 329-338. Liberman, A. N.i Cooper, F. S., Shankweiler, D. P.,& ctuddert-Kennedy, M. Perception of the speech code.Psychological Review, 1967, 71, 431-461. Liberman* L. N., Delattre, P. C., & Cooper,F. 37---The role of selected stimulus variables in the perception of the'unvoiced stop consonants. American Journal of Ps chola 0 1952,15,. 497-516. Liberiiih717-1177%-nirre, . .4 Cooper, 3., & Gerstman, L. J. The role of consonantvowel transitions in the perception of the stop and nasal consonants._i_c__Ilig_ocalPsob_p_lagiNos, 1954, 68, 1-13. Liberman, A. N., lazMnn.;111,., lwilattrarrP. C.. & Cooper, F.S.'. Minimal rules for synthesizing speech. Journal of the Acoustical society of America, 1959, lb 1490=1499. Liberiiii7171:, Isenberg, D., & Rakerd, B. Duplex perception of cues for stop consonants: Evidence for a phonetic mode., Perception & PAZOPAYAci. in Press. Libermat StuddIrt-Kennedy, N. Phonetic perception. InR. Held, Lts

140 146 alatasaxc H. W. Leibowitz, & H.-L. Teuber (Eds.), Handbook of sensory 1hyaio1oj Vol. VIII: Perception. New York: springer.WiTag71-07/171113- Lisksr,L. Rapid vs. rabid: A catalogue of acoustic features that may cue the distinction. Haskins Laboratories Status Report on Speech Research, 1978, 31-54, 127-10.--- MacDonald, J., &MOGUrk, H. Visual influences on speech perception processes. Perception & Psycophytlos, 1978, 24, 253-257%_ Mann,V. A. tOluence or precedingliquidon stop-consonant perception. Peres tion & Paychophysics, 1980, 28, 407-412. Mann, . 7 ., Mmdden, J., Russell, J. N., & Liberman, A. N.Further investiga- tion into the influence of preceding liquids on stop consonant perception. Journal of the Acoustical Society of America, 1981; 69 (SupPi. 1), 3017abitiaiII7 Mann, V. A-., & Repp, B. H. Influence of vocalic context on perception of the ES] 410 distinction.Perception 8 Psychophysics, 1980, 2$, 213-228. Mann, V. A., & Repp, B. H. Influence of preceding fricative on stop consonant perception.Journal of the-Acoustical Society of America, 1981, 61 548- 558. Mattingly, I. G. Phonetic representation andspeechsynthesis byrule. Haskins Laboratories Status Report on 62ei ch Research, 1980, SR-61, 15- 21. Mattingly, I. 0., & Liberman, A. N. The speech code and the physiology of language. In K. N. Leibovic (Ed.),.Information processing in the nervous system. New, York: Springer Verlag, 1969, 97-114. Mattingly, I. G., Liberman, A. M., Syrdal, A. M., & Halwes, T. Discrimination in speech and nonepeech modes. Cognitive.22zookix, 1971, 2, 131-157. 14cOurk, H., & MacDonald, J. Hearing lips andNiiii64vOices. Nature,1976, 264, 746-748. Meresein, P. ArtioulOtory model forthe study of speech production. journal of the Acoustical Society of America, 1973, 51, 1070-1082. Nye, P. W. Psychological, factors limiting the rate of acceptance of audio, stimuli. In Li L. Clark (Ed.), Proceedings of the International Congress on Technology and Blindness. New York: American Foundation for the

Blind, 1963, 99 -109. . O'Connor, J. D., Gerstaan, L. J., Liberman, A. M:, Delattre, P. C., & Cooper, F. 3. Acoustic cues for the perception of initial /w,j,r,l/ in English. Word, 1957, 11 25-43. 0den,17C., & Massaro, D. W. Integration of featural information in speech perception.Psychological Review,. 1978, 85, 172-191. Rand, T. C. Dichotic releasefrom maskinifor speech. Journalof the

AcousticaISociety.of America, 1974, 55; 678-680. , Raphael, L. J., Dorman, N. F., & Liberman,-A. M. Some ecological constraints on the perception of stops and affricatives. Journal of the Acoustical A Society of America, 1976. la (Suppl. 1), 325. Nbstracil- Sewn, R. E., Rubin, P. E., Pisani, D. B., & Carrell, T. D.'Speech perception without traditional speech cues. Science, 1981, 212, 947-950. Rapp, B. H. Percep ual integration ani-Nffirentiation of spectral cues for intervocalic s op oonsosnnants. Perception & Psychophysics,1978, 24, 471-485. Repp, B. H. Accessing phonetic information during perceptual integration of temporally distributed cues. Journal'of Phonetics, 1980, 8, 185-194. Repp, B. H. Phonetic trading relationships and context effects: New experi- mental evidence for a speech mode of perception. Haskins Laboratories

141 147 Status Report 011,414.0 Research, 1981, R-67/68, this volume. discrimination. Perception & Peyehousics, in press. Rapp, H. H., Liberman, A. H., Eccardt, T., & Peoletsky, D. Perceptual integra- tion of acoustic cues for stop, fricative ihd affricate manner.Journal of Egerimental esychology: Human Perception and Performance, 1017711: 621 -637. Rem, B. H., & Henn, V. A. Percept-Jal assessment of fricative-stop coartiou- lations. Journal of the Acoustical Society- of America, 1981, §2, 1154- 1163. Rubin,P., Baer,T., & Hermelstein, P. An articulatory synthesizerfor. perceptual research. Journal of the Acoustical Society of America, 1981, 70, 320-328. Stevens, K. N., & BlOmstein, S. E. The search for invariant acoustic corre- lates for phoneticfeatures. In P. D. Eimas & J. L. Hiller- (Eds.), 'Perspectives on the study of speech. Hillsdale, N.J.: Lawrence Erlbaum Associates, 11117-7-3 . Strange, Y Jenkina.,_____Meatificatice_of vowels in vowel-lesssyllables. Journal of the Acoustical Society of America,

' 1977, 61 (Suppl. 1), 8197tratria5 Studdert-Kennedy, H. Speech perception. Language and Speech, 1980, Lb 415- 66. Studdert-Kennedy, M., & Cooper, F. S. High-perforgance reading machines for the blind: Psychological problems, thermion problems and status. In R; Dutton (Ed.), Proceedings of the International Conference on Sensory Devices for the Blind. London: St. Dunstan's, Summerfield, Q. Use of visual information for phonetic perception. Phonetic., 1979, 116, 314-331. Summerfield, Q., Bailey, P. J., Seton, J., & Dorman, M. F.Fricative envelope parameters and silent intervals in distinguishing 'slit'and 'split.' Phonetics, 1981, 0, 181-192. WhaleW7-07777 Effects of vocalic formant transitions and vowel qualityon the English [8]-(g] boundary. Journal of the Acousti al Society of America, 1981, '612., 275482.

FOOTNOTES

1At one point we assumed that these principles were so general at to extend to perception in all modalities. Indeed, we carried out experiments designed to explore the possibility that patterns could be preserved across vision and audition provided the stimulus coordinates were properly trans- formed (Cooper, Liberman, A Borst, 1951).

2In contrast to the remarkable sensitivity of the phonetic mode to all aspects of the acoustic signal that do convey phonetic information, there is its equally remarkable insensitivity to those aspects of the signal that do not. Thus,, as is well known from many years of research on synthetic speech, the phonetic component of the percept is usually unaffected by gross varia- tions in those aspects of the signal- -for example, bandwidth of the formante2 that are beyond the control of the articulatory apparatus and hence necessa ly-irrelevant for all linguistic purposes (Liberman & Cooper, 19/2; Remez et al., 1981). The only effect of such variations is to make the speech sound unnaturalor, inthemost extreme oases,to make it impossible for the

142 148 41.

listener toLperoeive the sound as speech.. .77 men the chirpsare discriminated in isolation--that is, not as part of theduplex percept --the functionhas the same shape,but the level is displaced about 15 percentage points higher. The difference in lev,4 is presumably owing to the distraction produced in the Aluplex condition by the other side of the percept.

-146 existence or these trading relationsmeans that the location of a phonetic boundary on an acoustic continuum is not fixed; within limits it will move as the settings of the several cues are *hanged. The boundary will also Move, of course, as a function of phonetic context. (3e* the discussion, above, of the effect of preoeding context on the [da] -[gal boundaryand also, for example, Mann and gem 1981; Kapp and Mann, 1981.)

qs

14i 143 210110, PROSCOT AND ONTWORAPA7

Deborah Wilkenfeldc

Ilt1I(DUCTION

Itsometlo*reooding straits- in -silent reeding havebeen reported by number of investigators -employing v variety of experInental techniques review by Cbared, 1972) sad testing in *several languages end orthographic systems (TseagelkIng, A 1977l Itiebsen, A TOrvey, 4977; Navon A edema, 1961).While the presses-0f a phonetic representation in reeding has been tenoleeingly demonstrated, the source of-theeffsot and therole of the representation main lareelyimmuSioP011; The obviauliregilanationthat. We- etteit rescuestree ',proses* at grapbene-to-pboessie eonveraloa--is falsified by eddies* for -pins / tie rectiling-ie reeding nonpalptabetio ortho- graphies Meng it el., 19771 trIehoes it el., 1977).-*

One strategy that sight prove fruitful in untangling these pussies is specify what linsuistierproperties are embodied in the phonetic reprelintation oomatimetid by fluentireeders. The Presage* of segmental phonetic features his been finely ektablished by, the studies cited above, bet evidene Dor adormdegneetal. features, Anna- es word.streas ad met e& prosody,-Was not heretotore bees sought, though ramie& sabihntive44ports waist that-these- flaturecore else present,Aleinam (1979)-44monstesimd en isportast role ter reading in the onspreheasioe of written seetemess,and slam -have hese ihnem- to play-a role in the pereeptioe of, spoken ut wades,. tor slprasewmatals in the phometie represeatation or. writtea langseg Arieh itself swim only. the grOssest buyeasteposotal proper- -ties orsentemees--would be tantalising for a model of reading based pa a strong dependenhy of readies is spew* peroep. on.

In a smell pilot eat pednent valet the response bias technique (Nobler A Carey,. 1967), the study 41ported-here sousht evidence that subjects.enoode I

word stress. in silent reeding *nth. levil of the single. word. .

caleoltiversityoUConneetlout. MftagliM. This work as supported by KICAD Training Grant *0) 00321 to Ignatius G. tinSly.and Donald mailer at the University of Connioti- 400- I matamdebtedto Joseph NOW for his help with the statistical meayeis of date. and -for theseepeter -program used in the experiment; to Lyn 'rosier for her suggestions in the early development of the 'method eaploved; and to Most Fodor andlIgnativi.Nuttiesly for their patient support and many belphl_diseveslons throughoutitroe proj.ot.

VNANTEnilANKAATONIIII:Status Report on Speech Research SA-67163 (1931)1

145 100 STIMULI

Test items in this e rigout were ten words :hosen from among these EngliM- disyllable phs %twee syntactic class depends of the placement of primary stress. For example, content is a noun when the first syllable is stressed andto" Adjective (or refloxive verb "to content oneself") when the 400004 syrat is stressed. Similarly, permit is a pqun when the first syllableiPstressed and a verb when the second syllable ,la stressed.The orth4irephy does. not revelment the location of word stress for these words; presumably in normal circumstarles, sesteehlal,context provides the necessary information for choosing in these few ambiguous oases.

Test stisuli-were -listiromposed of eight unambiguously stressed disylla- bic words-and a ninth, final word taken from the set of homographs. t11 of the unambiguous words in single list' wars matched for placement of primary stress (1.e.,411 had first syllable stress or all had second syllee stress) but were of voied syntactic and semantic classes. s.e

Test lists were embedded in a series of foil lists consisting of from eight to eleven words chosen at radon. The retie of foil sets to test seta was 7:1, yit-dins 60 lists.

In a pretest of the test stimuli, 20 subjects were asked to read alf-Id ai list ft 200 English words, among which the test words were embedded. Their assignment of strkos for the homographs was recorded. Responses to this -west were used as a baseline measure of preference in the experiment., ,.m Results appear in Table I, Column A. Each test homograph was preceded in the main experiment by a list that shard the stress pattern of its lees-preferred reading.

SUBJECTS

Subjects werewere 18 undergraduate volunteers enrolled in introductory lin- guistics courses at the University of Connecti_cut. *11 were native speakers of; English. They were paid for their participation. a

PROCEDURE

Subjects were told that thaturpose of the main experiment was to measure 'the effect of reading rate on Accuracy of recall. Each subject was tested separately. The subject was skated infront of a computer- controlled CRT

=mien on which appeared, for i ch trial, a vertical list of eight to eleven words. The subject was instruc to read each word on the list. silently from top.to'bottom, as quickly as posy ble without missing any of the wdrde, and to signal the experimenter when he o she was rinished by readirl the last word on Use list outloud. TheIls on the screen then disappeared and was replaced by a single word< The bject VAP3 instructed to respond "yea" if the 4 word was on the prOcedin list and "no" if it. was not.' This prcees word was never one of the how As. jSubjectsg spoken responses %p re tape- recorded for transcripton lat The entire took apprvAteetely fifteen, mine%s. / 146 151 RESULTS

The results of this experiment are summarized in Table 1. Column A gives the percentage oftimes that the less-preferred stress pattern for each

0111.11=

Table 1

% Less Preferred Stress.

A B

BIAS CONDITION ITEM ,PRETEST (N-20) BIAS CONDITION (MEMORY QUESTION CORRECT)

conduct 10% (initial) 72% (18) 82% (11) object 20 (final) 17 (18) 13 415) pervert 40 (initial) 77 (18) 77 (18) present 30 (final) 28 (18) 29 (14) digest 20 (initial) 39 (18) 38 (16) progress 40 (final) 33 (18) 29 (14) permit 20 (initial) 33 (18) 46 (11) subject 30 (final) 33 (18) 33 (18) incline 10 (initial) 0 (17) 0 (17) project ,30 (initial) 53 (t7) 56 (17)

6 ambiguous item was given as a response in the pretest and note:whether the less-prJerred reading was as a noun (with first syllable stress) or as a erb (with second syllable stress). Column B gives the percentage of the trielain whicil the less-preferred stress pattern was elicited in the biasing condition. 0 The number of subjects is given in parentheses in this colur-. Column C gives the percentage of trials in which the less-preferred pattern was elicited from subjectsmho answered the'word eecognition question correctly for that test list.. The number of subjects who answered co:rectly appears in parentheses.

Comparison of Colum 3 A and B indicates an effect of the biasing lists on the stress pattern .9f the ambiguous test items. In *a Wilcoxon one-tailed test, this difference waSsignifican'c at the .05 level.

The biasing effect becomes even more apparent if we take into account sUbytcts' performance on the recognition test. Column C gives the results just Tor subjects who answered the memory question co:rectly for the list in

147 152, question. CompJrison of Columnd A and C shows a significant difference at the .01 level.

A further indication that the biasing manipulation was responsible for the effect observed is that a strong correlation (r=.81) was found between performance on the recognition task and number of shifted response*, account- ing for 66% of the variation between subjects. -This correlation is graphed in Figure 1. The graph shows a wide range of subject performance. If we look at bothends of thisrange, at the two least successful and the -two most successful subjects, we find that where performance on the memory task was 69- 70 per cent, subjects gave the less-preferred readingonly 20 per cent of the time, while the two subjects who answered 88 percent of the recognition questions correctly gave the less-preferred reading 60 per cent-of the time.

DISCUSSION

The correlation found is' open to two .interpretations. '..!.der one in- terpretation, a subject's success in the recognition task is attributable to the amount of attention paid .0 the task. The more attentive subjects were more likely to have thoroughly read the word lists; thu_ they were more likely to have recoded the items onthelist, and SO to have been primedby properties of the code.

Under another more interesting interpretation, the more successful sub- jects did more phonetic recoding, as evidenced by the high likelihood that they would be primed by a phonetic property of the word lists. An incidental result of this recoding was the-ability to better remember what they had read,' and thUs better performance on the recognition test.

Under thefirst of these interpretations,attentionrather thanthe requirements of the reading task per se is what determines performance on the recognition test; the evidence found for mental representation of prosody is a by-product of a process, i.e., constructing the phonetic representation, which ie perhaps just one of several representations constructed incidentally in the t course of performing the experimental task.

Under the second interpretation, phonetic recoding is an integral part of good reading, and so if people are reading well; they must be constructing a phonetic representation. This will then prime pronunciation of the ambiguous item in the absence of contextual cues. The alienability of the phonetic representation incidentally facilitates performance on the recognition Better recognition resulta from greater ease of access to or more completeness of the phonetic representation, which may in turn indicate superior reading ability.

The first(attention) expliriation suggests that anynumberof codes results from attending to the list, and does not give any reason to attribute special statue to any code. Thus we should expect semantic and orthographic codes, for instance, to affect subjects' performance similarly to the 'phonetic code in memory tasks of the sort used in this experiment. Tne pattern of resultsreported for a similartask employedby Erickson et al. (1977),

148 153 7

2 8 Ati U O t 0 C4 1-4le 3 U 2 C 2 Qt

SO 55 60 65, 70 75 80 05 90 Percent Correct on RecognitionTest

Figure 1

149

154 suggests that this 13 net the case; the orthographic and semantic properties of their word lists did not affect perforiance in a shorts -term recall tas' in the same way that the phonetic/properties did.

It should be noted tha t the response shift was not equal for all the items tested. While a large effect was obtained for the words digest, permit, pro4ect, conduct and evert, other items eljett, imaq, progress) exhibit-

ed little effect (or even a reverse effect . Incline is the clearest case: in no trial was it possible to bias a subject in the teat situation to pronounce incline as a verb, with' econd. syllable stress.The averages given 1,n Column A are for preferred pronunciations across twenty subjects. These figures indicate that one pronunciation of incline, for example, was preferred over the other by eighteen subjects out of twenty. What they do not indicate is how strong each individual's preference is. Though the forier is such easier to measure; it provides only a very rough estimate of the latter--which is, of course, what is really relevant to .the biasing experiment. The failure ot the biasing manipulation for incline may well be due to the fact that while approximately one person out of ten prefers Was a noun, most people may have it'in Choir lexicons only as a verb. For these people, its stress pattern would be completely unshifeeble however psychologically real stress patterns are in reading. This suggests that for this kind of experiment it would be quite proper, -and indeed optimal, to select words whose baseline frequency is about squal between noun and verb.

The objection might be madethattheeffect found in the present experiment is merely an artifact of the particu.ar task employed, rather than a reflection of normal reading processes. To ,7a'ce this claim is to say that subjects yedstrategiesintheperformance of thistask that were constructellirhocfor this purpose. But there 15 no logical requirement for such a strategy to include the oonstruction of a phonetic representation; on the face of it, a visual representation would suffice. Nor is there any 'reason to expect all subjects to arrive at the'same kind of special strategy. Yet the more successful subject., employed_a phonetic coding strategy, while those subjects who could not 4e_:-this did not seem to find another strategy thqt Was similarly effective. Thus it appears that subjects were making'the belt use they could of reading skills that were already available for more ordinpry purposes.

While it might te argued that the phonetic effects found by Conrad (1964) and Baddeley (1966),for example, and in the present experiment are due to rehearsal strategies 'for short-term recall, which have been shown to employ a phonetic representation (see Baddeley, 1976, Chapter 8, for discussion), this argument does 4ot apply to effects found in the ecceptebility judgment task the 4 employedby Kleiman(1975), which did notrequirerehearsal. Thus construction of a phonetic representation cannot be viewed as a mere artifact. of rehearsal.

It could alsobeargued thatfor semanticallyintegrated sentences, .readers might use a semantic code, and e.:_ploy phonetic code to facilitate demoryonly whenthe items in theexperimental sequence. donot cohere semantically. The findings of Baddeley and Hitch (1974) address this criti- cise. They compared reaction times ina grammaticality judgment task using ordinary sentences and sentences composed of phonetically similar (0.vming)

150 r- words. Phonetic similarity increased response latencies to grammatical and ungrammatical sentences. This task does not involve rehearsal or short -term memory. But it does implicate the parser, lending support to the conclusion from Kleiman's study that the sentence parsing mechanism requires a phonetic. _representation, quite'apart from any requirements of short-term memory, If subjectu construct a fairly detailed phonetic representation in a relatively unnatural situation in which it affords the no apparent advantage, we might also expect them to do it in a more natural situation.,In other words, if subjects encode prosody when they read lists of words silently in a task that does not require comprehension,-then it is likely that they will also encode prosodywhen they read ordinary sentences in a task that necessarily invokes the higher level processing involved in comprehension.

An important finding from this experiment is thEit readers construct a mental representation that includes features not represented in the stimulus. Thus, while it might be maintained thatreaders of English represent the segmental features of the words they read just because these can be extracted by rule from the letteri of the-orthographic system (at least in most cases), no such claim can be made for'suprasegmental features such as stress, for there are no symbols in English orthography that indicate stress. Inthe stress-neutralpretest condition, subjectswere alwaysableto name the- homographs. That this was ")t accomplishedby simply applying rules to translate from orthography to Monology is strongly suggested by the fact that notall wardshavingthe sz:.me orthographic structure 'wereconsistently assigned the-same pattern of stress by a single subject. More likely, a bias of some sort, due to factors such as frequency of occurrence, was responsible for a subject's choice in each -case. Stich a bias could only come from the lexicon. This is true in the case of vowel quality in homographs (lead,-bow) as- well. For these words, at least; naming written words mustfollow lexical access.

Tuts must always be the case.1n naming Chinese logographs and Japanese ksnji., 'These orthographic systems givisvery little phonological information, yet reading lists "ofwords written rh these orthographiesresults in a phonetic representation in-short-term mehory (Tzeng et al., 1977; Erickson et al., 1977). Thus almost all phOnetic information must be suppliedby' the reader after lexical access.

Further support for the active participation of the lexicon in reading is provided by Hebrew. The HebreT language is represented by -4n alphabetic Orthography that keeps the vowel symbolsfairly well separataal fromthe consonant ,symbols. In texts 'intended for fluent adult readers, the vowel information is usually omitted entirely. However, it is'the vowels of Hebrew that represent the inflectional system and carry most of the morphological and syntactic information. The task ofthe ,reader inHebrew is todecide, presumably in the course of parsing procedures, the syntabtio role of each word. and its morphological composition inthatrole. Having derived this information, there is no reason to expect the reader of Hebrew to then add information about the vowels that would represent the word in speecn. But'the results of a study,by Mayon and Shimron (1981) suggest that they do indeed do so. Their eubjecte'read lists of morphologically simple (uninflected) words in which vowel phonemes were represented by the optional vowel diacritics. Latencies in lexical decision tasks we're increased by phonemically anomalous

151 156 diacritics but not by graphemically anomalous diacritics that preserved the phonology. The effect could not be attributed to visual factors.

Their results suggest that in the simple case eadini unambiguous uninflected words, with no concurrentprocessing demands suchasthose required forsentence comprehension, subjectsboth construct a phonetic representation and access the lexicon. (In this case. lexical access appears to follow grapheme-to-phoneme translation. However, there is ample evidence, as Mayon and Shimron point out, for models of lexical access that include a visual route. In any case, the result is a phonetic representation.) Ye Kleiman's reaukta suggest that it is just in those teases in which Process for comprehension is required that the,phonetic representation is importa t. In the case of fluentyeaders of Hebrew in the ordinary situation of re ing text, the construction of a phonetic representation is at least as likely to occur as in the simple case of lexical decision. However, here the construc- tion of, the phonetic representation mustfollow lexicalaccess, as with English homographs, Chinese logographs, and Japanese kanji. But with Hebrew, it is also likely to be the case that the phonetic representation is the product of the parser, rather than of the lexicon, since it is the analysis e resulting from the parsing process that indicates to the reader, what the morphology of the word must be, and thus what vowels must be supplied.

. Thefacts about Hebrew,on theonehand, andEnglish, Chineseand Japanese, on the other hand, suggest two hypotheses to account for the.efrect found in the present experiaeat. Jnder one hypothesis, which I will call the lexical bias hypothesis, prosodic priming isa result of activity in the 'lexicon. There is evidence that stress or some abstract representation from which stress can be derived by rule (Chomsky & Halle, 19681). is a feature of lexical entries (Brown & McNeill, 1966), justassegmental phonological features and semantic features are. As such, stress can probably be primed similarly to semantic features (Meyer, Schvaneveldt, & Ruddy, 1975).As the activation of a single word may activate"anymumber of lexical entries in the, same semantic field, the activation of a single disyllable with first-syllable stress might activate (if slightly)all disyllables having first-syllable stress. The activation of nine such words may have the cumulative effect of activating the e-stressed entry for the homograph to a point where it is much more readily availablethari the second-syllable-stressed entry, and thus more likely t9-be reported in the,priming situation.

The second hypothesis, suggeited by the facts about Hebrew, may be called the parsing hypothesis. According to this hypothesis, even isolated words are parsed( that is, they are processed as one-word sentences (see Mattingly, Note 1). It,is in the parser that the morphophonemic,representation retrieved from the kricon is assigned a phonetic representation. This type of model is well suited to an orthography such as Hebrew. In fact, if it is assumed that the entire linguistic system,6f w!ioh wordrecognition is only a part, is designed for the processing of linguistic structures, this type of-model is equally well suited to English and any other language. The prosodic priming effect can then by seen as the result of a bias induced in the parser as it constructs a complete phonetic representation, including prosody, for each of a aeries of one-word sentences. A small bit of evidence in support of this hypothesis for English is the apparent ease with which sentences containing homographs are read: In syntactic context, the grapheme sequence p-r-o-g-r-e-

152 13, A

a-s (for example) may be instantly recognized atOe noun or a verb as a result Winformation derived by the parser. The entire analysis of the sentence up to the point where 'the hombgraph isencountered determines what syntactic categories are likely to occur in a well-formed structure and guides lexical access to the appropriate' entry, yielding, ultimately, the appropriate phonet- ic representation.

REFERENCE NOTE

1." Mattingly, I. G. On the- nature of phonological representations. Manuscript in preparation, 1981.

fl REFERENCES

Baddeley, A. D. Short-term memory for word sequenc%.ss as a functionof acoustic, semantic And formal similarity. Quarterly Journal of Ex erimental Ps chology, 1966, 18, 362-365.

Badde ey, . . he pow o ogy of miiory. New York: Basic Books, 1976. Baddeley,A. D., A Hitch, G. Working memory. In G. A. Bower(Ed.), The psychology of learning and motivation(Vol. 8). New York: Academic Press, 1974. Brown, R. W., & McNeill, D. The tip of the tongue phenomenon. Journal of ierbal Learning and Verbal Behavior, 1906, 5, 325-337. Chomsky, N., & Halle, M. The sound pattern of 'English. Hew York: Harper & Row, 1968. Conrad, R. Acoustic confusions in immediate memory. British Journal of Psychology, 1964, 55. 75-84. Conrad, R. Speech and reading. In J. Kavanagh &I. Mattingly (Eds.), L e la earand by II!: Therelationshipsbetweenspeech and ig nt. WimSillge7-kass.:--RIT Press, 1972. Erickson, D., Mattingly, I. G., & Turvey, M. T. ft,ddetic activity in reading: An experiment with Kenji.Langguuagge and Speech, 1977, 20, 384-403. Kleiman, G. M. Speech recoding n readiiii: Journal of Verbal Learning and Verbal Behavior, 1975, 14, 323-339. Mehler, J., & Carey, P. Role of surface and base structures in the perception of sentences. Journal of Verbal Learning'and Verbal Behavior, 1967, 6, 335-338. Meyer, D. E., Schvaneveld, R. W., & Ruddy, M. G. Loci of contextual effects on visual word-recognition. In P. M. A. Rabbitt (Ed.), Attention and performance V. London: Academic Press, 1975, 98-118. Rayon, D., & .Shimron, J. Doesword naming involve grapheme-to-phoneme translatien? Evidence fromHebrew. Journal of Verbal Learning and Verbal Behavior, 1981, 20, 97-109. Tzeng, 0. J. L., Hung,D. L., & Wang,W. S-Y. Speech recoding in reading Chinese characters. Journal of Experimental Psychology:Human LsEning and Memory, 1977, 3, 621-630.

153 156 CHILDREN'SMEMORY FOR RECURRING LINGUISTIC ANDNONLINGUISTIC MATERIAL IN RELATION Ts)READING ABILITY*

Isabelle Y.Libermano. Virginia A. Mann,++ DonaldShankweiler,+ and Michelle WerfetIMan+

Abstract. Goo beginni readers typically 'surpass poor beginning readers in memory for lin istic material such as syllables, words, and sentences. Here we esentevidencethatthis interaction between reading ability end ry performance does not extend to memory for nonlinguistic material like faces and nonsense designs. Using an adaptation.of the continuous recognition memory,paredigm,of Kimura (-1963) me assessed the ability' of good and poor readers in the second grade to remember three different types of material: photographs of unfamiliarfaces, nonsense designs, and printed nonsense syllables. For both faces and designs, the performance of the two reading groups was oolparable; only when remembering the nonsense syllables did the good readers perform at a significantly superior level. These results support other evidence that'distino= tions between good and prior beginning readers do not turn on memory mi. se, but rather gn _memory for linguistic material. Thus they extend our previous finding that poor readers encounter specific

difficulty with the use of linguistic coding in zhort-termmemory. ,

The performance of good beginning readers on certainlanguage -based short-term memory tasks, like their performance on many other language-related tasks, tends to be better than that of children who encounter difficulty in le:.-ning to read. The association between reading ability and such short-term 4emory skills Is by now well-documented. For example, children who are good readers tend to have a better memory for strings of writtc: or spoken letters (Shankweiler, Liberman, Mark, Fowler, & Fischer,1979). They are also more

*In press, Cortex.

+Also University of Connecticut. - a .4.+Allso Bryn Maw College. Acknowledgment. This work was partially supported bya grant from the National Institute of 1Child Health and Human Development (Grant HD01994) to Haskins Laboratories and by a NICHD postdoctoral fellowship HD05677) to the second author. For their generous cooperation on this project, we are alsb much indebted to the principals and accond-grade teachers of the Northwest, Vinton and Southeast Elementary Schools in the Town of Mansfield, Connecti- cut. Special thanks are due to the 36 children who participated and to the parents who'gave them permission to do so. In addition, we wish to express our appreciation to Leonard Mark who aided in testing the subjects, and to Robert Katz who provided assistance with the statistical analysis of the data.

[HASKINS LABORATORIES: Status Report on Speech Research SR-67/68 (198f)]

155

159 succapsful ariecalling strings of spoken words, and even at recalling the words of spoken sentences (Mann, Liberman, A Shankweiler, 1980.

Hammer, our concern his been not simply to document this performance difference but instead to uncover the probable cause of the difference. We first approached this problem by turning that appeared to us to be the special advantages of good readers against them. Since we knew.that for adulta» the pree-nce of a high density of phonetically - confusable items hinders the use ,of speechrelated processes in short-term memory, we were led to examine the effect of the same manipulation on the performance of.good and poor readers. We found that like adults,goOd beginning reader, appear to make effective use of phonetic coding in'shoort -term memory, whereas' poor readers .do not. Thus we have shown that he memory performance of good readers fails sharply, even to the level of that of the poor readers, when they are asked to remember a letter string, word string, or sentencecontaining a high density of phoneti cally- confusable items (letters with rhyeing names, or words that rhyme with one another), whersai the performance of poor readers remain!' little changed by this type of material.

At this point in our investigations, we were led. to elk whether there are any other differences between the short-term memory capacities of good and poor readers; beyond those that reflect differential use of a speech code. After all, studies of patients with lateralited brain disease have revealed that verbal and non-verbal short-term memory abilities may berelatively independent (see, for example: Kimura, 1963; Milner & Taylor, 1972; Warring- ton & Mani**, 1969). Hence it sewed at least possible that the ability of poor readers to use nonverbal short -term memory processes could be eqUal to that. of good readers. While this possibility'ip supported by findings that good and poor readers areequally) successfulatremembering unfamiliar (Hebrew) orthographic designs.(Vellutino, Steger, Kaman, &-DeSetto, 1975), it might seem inconsistent with findings that good readers. surpass poor readers in remembering abstract figural patterns (Harrison, Giordani, & Nagy. 1977) andspatial-temporal patterns (Corkin, 1974). inouropinion, however, neither of these latter findings can be regarded as conclusive evidence that poorreaders:I' have difficulty with nonlinguistic short-term memory, der se, since both derive from materials that lend themselves to verbal labeling and to the use of linguistic ossuary strategies (Liberman, Mark, & Shankweiler. 197(1). Therefore, it remained to be determined **tether or not poor readers encounter difficulty with memory proceises other than those pequiring use of a speech code.We sought to investigate this question in the present study by comparing the ability of good and poor readers to remember linguistic material with their ability to remember material that is not only nonlinguistic but also not.readily susceptible to linguistic coding.

Our subjects were good and poor readers in a second-grade classroom. Whoa* memory abilities were tested with an adaptation, of the continuous recognition memory paradigm of Kimura (1963). Usingthat paradigm. we assessed the children's ability to remember each of three types of materials: nonsense designs, photographs of unfamiliar faces, and printed CVC nonsense ?syllables. Whereas thenonsense designs were those employed in Kimura's driginal study (1963), the facial photographs and nonsense syllables were our own innovation. Studies of adult patients with focal brain damage reveal that the ability to encode and remember the nonsense deigns the;. Kimura employed

156 ITM suffers as-a consequence of right hemisphere temporal lobe excision but is relatively unimpaired by oaparable excisions to the left, language - dominant, hamispherm(Eimura, 1963;Niiner, 1974; Milner & Teuber, 1968). Likewise, the ability to encode and subsequently to recognize unfamiliar faces has been determined to be a right - .hemisphere capacity that does not demonstrably depend on the language mediation skills of the lefthemisphere (Leehey,Carey. Diamond, & Cahn, 1978; Yin, 1970).In contrast, the encoding and recognition of English-like nonsense syllables is a linguistic abilitythdt suffers as a consequence of Amass to the left hemisphere (Coltheart,1980; Patterson & Marcel, 1977; Saffron & Marin, 1977).

We anticipated that the results obtained with goodand poor readers in the case of nonsense designs and faces would differ fromthose obtained with nonsense syllables. Good reade.J were not expected to surpasspoor readers in memory for either the nonsense designs or,the f ces, since neither of these sets of items men 4 themselves readily to the use of languagecoding. In the event, however, that good readers should excel at recognizingeither of these materials, it would be taken as evidence that thepoor readers do indeed have broader deficiencies in remembering. We expected good readers to surpasspoor readers in memory for nonsense syllables. on the assumptionthat heir use of phonetic coding asa mnemonic device would be superior to that ofpoor readers.

METHOD Subjects

The subjects inthisexperiment diere 36 second -gradechildren who attended the public schools in Mansfield, Connecticut. An initial pretest group was selected on the basis of the children's Total Reading Score on the Stanford Achievement Tests, which had been administered earlier inthe same school year.. Candidates for the good reading group had received gracescores of from 3.1 to 5.0, whereas candidates for thepoor reading group had received wires; of 1.5, to 2.4. Final selection of 18 good readers and 18 pccr readers was made on the basis of scores on the Word Recognition Subtest of the Wide Range Achievement Test (WRAT)(Jastik,Bijou,& Jastak, 1965). Children selected as good readers had WRAT reading grade equivalents ranging.from 3.1 to 5.0. with a mean score of 4.0; children selected for thepoor reading group received grade equivalents from 1.5 to 2.4, with a seenscore of 2.1.

Mean ages for good and poor readers were 94.0 months and94.2 months. respectively, and were not significantly different. Individual administration of the WI3C-8 revealed good readers tohiveaarai. FL:1 Scala IQ of 113.e. with mean Verbal and Performance IQ's of 112.1 and 112.9, respectively.Poor readers received mean Full Scale IQ of 107.7, with Verbal and Performance IQ's of 104,9 and 109.1, respectively. There were no significant differences between good and poor readers on any of the IQ measures.

Materials

There were three different types of materials: nonsense designs. faces. and syiibles. Tice tests using these three types of items were identical in 'bannerof construction and presentation,each modeledon Kimura's (1963) recurring recognition memory task.

MI Nonsense designs. There were 80 nonsense-design stimuli, each of which was one of the 52 irregular line drawings of Kimura (1963). Four of the designs were used eight times each (the recurring designs), and the remaining 48 once each (the nonrecurring designs). Each stimulus was drawn on a 3 x 5 card. For the purpose of testing, the stimuli were divided into eight sets, of ten; within eachset of ten, the four recurring designs wererandomly interspersed with six of the nonrecurring designs. The first set of ten stimuli constituted the inspection set, the remaining seven sets contained the actual test stimuli.

Faces. Face, recognition stimuli were constructed using 52 black and white photographs, half of which were adult female faces and half adult male faces. In both the male' andfemale stimulisets, half were photographed looking to the left and half looking to the right. To minimize distinguishing details that might lend themselves to verbal labeling, no faces were used that displayed hair, eye-glasses, jewelry or distinctive markings such as scars, distl,pctive makeup, etc. In addition, a uniform mask was applied to each picture to cover hair and background detail as well as to ensure a uniform size.

Again, a set of 80'stimuli was constructed. Fonr photographs occurred eight timeeach (two male faces and two female faces, two looking to the left and two looking to the right) whereas the remaining 48 occurredtonce eael. The stimuli were divided into eight sets each, with each set containing the four recurring photographs randomly interspersed among six nonrecurring ones. The first set served as the inspection set, the remaining seven sets contained the test: stimuli.

Nonsensesyllables. Stimuli for thi3part of the experiment were constructed from a set of 52 CVC nonsense syllables thathad been selected from Htlgard (1962) to have a moderately low association value. Across the different syllables, frequency of occurrence of each letter was controlled as much as possible. The vowels a, e, and u appeared 11 times i appeared times, and o appeared ten times. Everyery consonant (with the exception of s, x, and in initial position and R, ;E, h, and w in final position) occurred at least once, ,th some consonants occurring as often as six times.

From the syllables, a set of 80 'stimuli was constructed. Four of the Stimuli occurred eight times, while each of the remaining 48 occurred once. The.stimulus cards were again divided into 'eight sets of ten each; within each set of ten.the four recurring syllables were randomly interspersed with six non-recurring ones. The first set often constituted the presentation trials, the reaining'seven sets contained the test stimuli.

Procedure

.t) - Each child was tested individually",with thenonsense designs being d presented on the first day of testing, and the faces and syllables on a second day. The procedure for the recurring recognition memory paradigm was adapted from Kimura (1963) and was the same for all three types of material.

The experimenter began each test by telling the chil! that some designs (or faces or syllables), would be shown, one at a time, and that the task was

158 to look at each one very carefully and try to remember it.She then presented the inspection set of ten cards, showing each card for approximately tu.ree seconds. Subsequently, the child was told that more cards would follow, some of which would be identical to those presented in the inspection set, and some of which would be new cards. The instruction was to say "Yes" if a card had been seen before, and "No" if it had not.The teat items were then presenAed to the, child, who was required to respond to each one before being shown the next.

RESULTS

In order to evaluate the performance of the subjects, we first coaputed the percentage of correct responses made by each subject, separately for each of the three types of materials (nonsense designs,faces,and syllables). This was done by summing the number of, correct recognitions and correct rejections, and dividing by 70 (i.e., the total number of test items presented in each condition). After first noting that the performance of the subjects on all three types of Material was consistently above the chance level of 50 percent correct, we turned to the major purpose of our study, which was to evaluate the extent of difference between the performance of ,good and poor resders on each of the three different types of items.

The results of an ANOVA computed on the variables of reading ability (good versus poor reade-rs)and material type (designs, versus faces, versus syllables) revealed a'significant effect of materialtype,F(2,68)=73.3, p<.001, reflecting the fact that designs and faceswere typically harder to rememberthan syllables. There wasfurther the anticipatedinteraction between the effect of item type and reading ability, F(2,68)=8.3, p<.001. As can be seen in Figure 1, good readers were not significantly better than poor readers at remembering eithernonsense designs or faces. (For nonsense designs, t(34)=1.11, p>.1; for faces, t(34)=0.1, p>.6). In fact, poor readers wereslightly (although not signifioantly)-,bett.: at remembering nonsense designs. Good readers, howevec, were significantly better than poor readers at remembering the nonsense syllables, t(34)=3.2, p<.005.

DISCUSSION

The results, then, upheld our predictions. Poor readers were equal to 'good readers in ability to remember both nonsense designs, and faces. In contrast, poor readers made significantly more errors than good readers in recognizing the nonsense syllables. Thus we find no evidence that children in the. two reading groups differ in general memory ability. Rather,we again find'them to differ only in memory for linguistic items. These findings help us to place in perspective two claims 'that are frequently made regarding the origins of many childhood reading problems. One claim sees a "general memory deficit" as central (Morrison et al., 1977). According tc that hypothesis, whichviews poor readers as having difficulty with memory, per se,poor readers might be expected to show inferior performance for linguistic material and figural material alike. Clearly, our results are incompatible with this view, since it was found that good and poor readers differed solely in memory for the syllables.,

159 163 ommose GOOD READERS O. 400 POOR READERS

I

Nonsense Faces Designs

FigureI: Mean percentage of oorrett reap 'ses eade by 6...4 and poor readers on nonsense designs, faces, and aomiense syllables.

- 160 16 A second theoretical claim suggests t at failure of serial order memory tie the core. problem (Bakker,1972; Corkin,19t4; Holmes & McKeever, 1979). Our task did not require that subjects remember the -order of items in' the ,Inspection sat, yet we nonetheleineobtained a difference between good and ,poor raiders' _ability torem: ember nonsense .ayllables. Thusthe poor' readers' ,me ibry problem goes beyond serial order alone. In this reupect, the present findings confirm earlier reaulte by Mark, Shankweiler, Liberman, and Fowler, 1977 and Brae and Shea. 197g. We do realize, however, that a matertal specific deficit in er,ier memory could be a consequence of failure to make effective use of phonetic coding. Indeed, in a recent study (Katz; Shenketeiler, & Libermen,.hote 1) some of us found that good and poor readers selected by the same criteria *A in the present study diffetted in ability to reeall'oeder of the items. But the good readers excelled oalY`When their task was to recall the order of. items that could be coded in terms of lingListic labels. No difference was 'Ceund in memory for the ,order of nonrecodable items. the problems of poor readers in recall of items, per se, and in recall of item order appears vto be linked to some difficulty with using a phonetic code --either a Adler. to r1teode phorteticall, or, a weakened tendency

to use this'codi.5g principle. ee

In- emery, then, wehave discoveredan instance in whi,-despete identical procedures, good and poor'reaners differ- in the ability to eemembee language -based material, but fallto differin memory fortwotypes of nonverbal material. 'thus we Conclude the the short-term -memory deficits of peerreaders appear inileed to berestrictedtothe domain ofphonetic representation In short -tetm memory. Several questions lrise,at thie'peint, Along thee the question of k,y poor .readers fail to sake effective use of a phonetic *pd., and the question of how a deficient linguistic memory comes to be associated With problems in learning to read. At preeint we.are addressing the first of theseequestione by examining the pattern of memory errors made by poor readers, Our approach to the second, however, is guided by a donsidera- tion of the relation between mho; s. -term memory and normal language processing (Baddeley. 1978: Liberman, Mattingly, & Turvey, 1972), which leads us to ask whether pd6i readers encounter difficul:y on the type of language comprehen- sion tasks used in studying aphasic patients (Cersmazza & Zurif, 1978). We suspectthatanswers tothese two questionsmay' brAngus closer to en understanding of the reading process as well as of the propess of reading acquisition. 4

REFERENCE NOTE

Kate, R., Shankweller, D. & Liberman, I. Y. Memory for item order and phoneticeoding in the ^ginning reader. Manuscriptsubmittedfor ptiblication,'1981.

REFERENCES

Bakker, D. J. Temee=lrellellein disturbed reading. Rotterdam: Rotterdam Univereity Press, 1972. 4 Baddeley. a;. (D. The troublewith levels: A reexamination of,Cra* an.a' Lockhart's framework formemory research., 41E.U.S.L.02128.1111 Review, 1978, ge 139-152. 11 'fie, B. & Shea,P. Semantis aed phonetic memory in beginning readers. Memory :ion, 1979, 7., 333- 41. Caramazza, , 57a, E. B. The comprehension of complex sentences in -h -nd aphasics: A test 'of the regression hyl. hesis. In h. r4ae a &E. B. Zurif (Eds.),Language acquisition and language ..tdc-ir Parallels and divergences. Baltimore: Johns Hopkins Univer- sity Press, 1978. Coltheart, M. Deep dyslexia: A right hemisphere'- by othesis. In

. M. Coltheart, K. Patterson, & J. Marshall (Eds.), Deep dyb_exia. Boston: Routiedge & Kegan-Paul, 1980. Corkin, S. Serial-order deficits in inferiorreaders. litTamatalage, 1974, ve, 347-354. Hilgard, E. R. Methods and procedure in the study of reading. In S. S. Stevens (Eds.), Haidbook of experimental psychology. York:

" Wiley, 1962. Holmes, D. R., & McKeever, W. F. Material specific serial memory aefiait in adolescent dyslexics. Cortex, 1979, 15, 51-62. Jastak,J., Bijou,3. U., 1F-Jiitak, S. Wide Range Achievement Test.

. Wilmington, Del.: Guidance Associates, 1965. Kimura, D. Right temporal -lobe damage. Archives 94 Neurology, -1963, 8, 264- 271. Leehey, S., Care; ,S., Diamond, R., & Cahn, A. Upright and' inverted faces: The right hemisphere knows the difference. COrtAx, 1978, 14, 441-449. Liberman, A. M., Mattingly, I. G., & Turvey, M. Language codes and memory codes. InA. W. Melton & E. -Martin (Eds.), Coding processes inhuman

memory. Washington, D.C.: Winston, 1972. ' Liberman, I. Y., Mark, L.'S., & Shankweiler, D. Reading disability: Methodologicalproblems in informatioh-processing analysis. Science, 1978, 200, 801-802. Mann, V. A., ne-man, I. Y.,& thankweiler, D. Children'smemory for 'sentences nd word stringsin relation tb reading ability. Memory & . Cognition, 1980, 8, 329-335. Mark,L. S.,Shankweaer, D.,Liberman, I. Y/., & Fowler, C. A. Phonetic recoding and reading difficulty in eginnirq , readers. Memory & Cognition, 1977, 5, 623-629. MilneF;-67-Rgmisphera specialization: Scop s and limits. In F. 0. Schmitt & F. G. Worden (Eds.), The neurosc ences: Third study program. Cambridge, Mass.: MIT Press, 1974. Milner, B., & Taylor, L. Right hemisphere superiority in tactile pattern recognition after cerebral commissurctomy: Evidence for nonverbal memo- ry. Neuropsychologia, 1(272,10, 1-15. Milner, .13., & 'Utter,H-L. Alteration/ of perception and memory in man: Reflections on methods. In L. Weiskrantz (Ed.), Analysis ofbehavioral change. New York: Harper and Rowei 1968. Morrison, F. J., Giordani, B., & Nagy, J. Reading disability: An information- proce,-ing analysis. Science, 1977, 196, 77 79. Patterson; K. E.,& Marcel,A. J. Aphasia, dyslexia and the phonological coding of printed words. Quarterly Journal of Experimental Psycholo gy,

1977, 29, 307-318. , Saffran, E. M., & Marin, 0. S. M. Reading without phonology: Evidence from aphasia., Quarterly Journal of Experimen' Psychology, 1977,29, 515- 525. Shankweiler, D., Liberman,I. Y., Mark, L. S.,Fowler, C. A., & Fischer, F. W. The speech code and learning to read. Journal of Experimental

166 Rayohology: Human Learning and Memory, 1979, 5, 531=545. Vellutino,F. R., Stager, J. A.,KamaA, M., & DeSotto, L. Visual form perception in deficient and normal readers as a function o0age and orthogiraphio familiarity.Cortex, 1975, 11, 22-30. Warrington, E. K., & Shallice, selective impairment of auditory-verbal short-term memory. Brain, 1969, 92, 885-896. Yin, R. K. Face recognition by brain-damaged patients: A cissociable Mae-. bility? Neuropayohorogia. 1970. e. 395-402.

161 PHONETIC AND AUDITORY TRADING...RELATIONS BETWEEN ACOUSTIC CAS IN SPEECH PERCEPTION: PRELIMINARY RESULTS

Bruno H. iepp

Abstract. When two differentacoustic cuescontribute to the perception of a phonetic distinction, a trading :elation between the -cues can be demonstrated if the speech stimuliare phonet!cally ambiguous. 11$ the cues trade also in unambiguous stimuli? Four .different trading relations were examined usinn a fixed-standard AX discrimination task with atimuli either fry[ the vicinity of the phonetic category boundary .or from within a phonetic category. The results suggest that certain trading relations (presumably of audi- tory origin) hold in both conditions while others are tied to the

' perception of phonetic contrasts and thus appear to be specific to the speech mode.

INTRODUCTION

Vikually any phonetic distinction has multiple correlates in theStolls tic speech signal. That-fs, theoArticulatory adjustments required to change from one phonetic category to the other (other things equal) cause acoustic changes along several separable physical -spectrum,amplitude, time. While a littener typically perceives only a single change--tiz., one of phonetic category --the physical changes that led to this unitary percept can only be descrited in the form of a list with multiple entries. When the signal properties thus listed are manipulated individually in an experiment, A is generally found that they all have perceptual cue value for the relevant phpoiticdistinction, although they may differ in their relative importance. It one cue in such ah ensembl_ is changed to favor category B, anothercue can bemodified to favor category A, sothatthe phonetic percept remains unchanged. This is called a trading relation. Preamably, any two cues for the same phonetic distinction can be traded off againiq etlh other within limits set bytheiracceptablerange ofvalues and by their relative perceptual weights. Numerous.recent studies of trading relations nave been reviewed by Repp (1981p); some of thee will be discussed further below.

The mechanisms by Which a listener's briin combines a number of diverse cues into asinglephonetic perceptarenotknown, but there aretwo

4

Aoknowleed nt. This research was supported by NICHD "Grant HD01994 and BRS Grant 8105596 to Haskins Laboratories. ,I am grateful to Janette Henderson for asiastance in running sub4ects, and to Robert Crowder, Virginia Mann, Richard Pastore,and Sigfrid Soli for helpful comments on an earlier draft 'of the paper. . . (HASKIN LABORATORIES: Status Report on Speech Research SR-67/68 (1981)]

riP 1b5 16& contrasting views on that issue.; One view (e.g., Liberman & Studdert-Kennedy, 1978;Hipp, Liberman, Eccardt, '& Pesetaky,1978) holds that the perceptual integration of. acoustic cues is motivated by their common origin in the production of a phonetic contrast; that is, listeners are assumed to possess and apply detailed tacit knowledge of the multiple acoustic. correlates of articulatory maneuvers. The other view (best spelled out in. Pastore, 1981) maintains that integration of, and trading nelations between, acoustic cues 4. 'might arise either from inteirration or from interactions (such as-masIng or contrastl at a purely auditory level of processing, without reference,to the articulatory origin of the cues. The evidence so far (summarized in Repp, 1981b) strongly favors the first view. However, it is conceivable that, as more is learned about auditory mechanisms, certain trading relations between acoustic odes will find auditory explanations, particularly *hose that seem to have no good articulatory rationale. Since many perceptual trading relations have been deionstrated with synthetic stimuli and without a parallelexamina- tion of speech production, the relation of the perceptual results to what happens in ar _culation may not alwdys be as close as has been supposed, and some trading relations may actually have been-caused by auditory cueinterac- tiond.

Undoubtedly, detailed studies of speech production and speech acoustics as well as auditory psychophysics will shed furtherlight on this issue. There is a more direct experimental approach, however, which makes use of the fact.,that, under certain circumstances, tne same (or highly similar) stimuli may be heard either as speech or as nonspeech.Such different percepts may be achieved either by presenting hpeechlike stimuli 'o human listeners under different instructions, relying primarily on the subjects' postexperimental, reports, about whether the stimuli in fact sounded speechlike or not, or by contrasting human perception of speech with that of nonhumant animals. In either case, the demonstration of a trading relation in all subjectsor in all conditions would favor an auditory account, while the finding that a trading relation holds only when human listeners claim toperceive the stimuli .as speech, but not when they claim to hear nonspeech sounds or when thelisteners are nonhuman, would constitute strong evidence in favor of thespeech - specific (articulatory7phonetic) origin of the trading relation.

There ace no completed studies of trading relations in animals,but interesting results are expected soon from several laboratories. For chin- chillas, Kuh/ and Miller (1978) have reported a shift in the voicing boundary for stop consonants with plaoe of articulation--an effect that may, in part, be dueto a trading relation between voice onset time and formant onset frequencies(cf. Summerfield & Haggard, 1977). A traulud relation between these two variables has also been demonstrated in human infants (Miller & Elmas, Note 1); however, rather than pointing towards psychoacoustic interac- tions, this finding may indicate that human infants are biologicallyprepared for phonetic perception. The present experiments focus on, several effects that have not yet been,demonstrated in either infant or animal subjects. eis

In studies using adult human subjects, two methods have` been appliedt. address the question of the origin of trading relations. One is to construct stimuli that contain the critical cues under investigation but are sufficient- ly different from speech in other respects, so as to be perceived as nonspeech by naive subjects byt as speech by more experienced or specially instructed'

166

163 :Objects. The technique of imitating the speech formants with pure tones has served this purpose well (Bailey, Summerfield, & Dorman, 1977; Beat, Morrongi- ello, &Robson, 1981; Hemet,Rubin, Pisoni,& Carrell,19811. The other method is to use speech stimuli andto ,leadlisteners, thrOughspecial instructions and practice, to perceive them analytically--to segregate them into their auditory components, as it were. This is a notoriously difficult task, but it is possible with certain special stimuli, e.g., with fricative- vowel syllables (Repp, 1981a). In all of these studies--some of which Will be described in more detail below--subjects' response patterns were radically different when the stimuli were heard as speech than when the same stimuli were heard as nonspeech;in pOticular, ,the tradingref- 4onsor other contextual effects under investigation were observed only in the speech mode. However, as noted above, this result may not hold for all trading relations.

The present experiments explored a third method, which has the advantage of simplicity and general applicability, thus making possible the parallel investigation of a number of different trading relations. The method i3 a s mplified version ofa procedureusedpyFitch, Halwes,Erickson, and Li erman (1980) to demonstrate the categorical perception of speech stimuli Iva ying in two cue dimensions. Fitch et al. were concerned with a trading between a temporaland a spectral cue for th "slit"-"split" contrast. the amount of silence between the fricative noise an the periodic stimulus portion, and the presence or absence of formant transitions (appro- priate fora labial stop)at the onset of the periodic portion. In an identification task, less silence was needed °to change "slit" to "split" when formant transitions were present than when they were absent. In a subsequent oddity discrimination task, Fitch et al. compared performance on three types of trials:.(1) Spectral difference Only ("one-cue condition");(2) spectral and temporal difference,the stimulus with the formant transitions always having the longer silence ("two-cooperating-cues condition"); and (3) spectral and temporal difference, but the stimulus with the formant transitions now having the shorter silence ("two-conflicting-cues condition"). Subjects were considerably more accurate irl the second than in the third condition, with performance in the first condition in between. This ordering of conditions was predicted from the way the stimuli were labeled by the subjects. In essence, these results revealed that speech stimuli varying on two dimensions are still categorically perceived. The listeners appeared to basetheir discrimination judgments on the phonetic labels of the stimuli, and thus the trading relation between the two cues was exhibited in discrimination as well as in labeling'responses.

What would hpppen.however,if subjectscould not relyon phonetic labels"? Such a situation would arise if the stimuli to be discriminated were perceived as belonging to the same phonetic category. We kno from many

:411- earlier studies of categoricalperception 'thatSuch, discriminationsare -difficult to make, but subjects typically perform ata level better than chance and their performance.16y be enhanced by increasing physical stimulus differences and/or,by using a paradigm that reduces stimulus uncertainty. If subjects cannot rely on phonetic labels, they must make their discriminations . on the basis of the auditory properties of the stimuli. If some of these properties interact at the auditory level of perception and thereby generate a trading relation, then this trading relation should be observed regardless of whether or not listeners can make phonetic distinctions. On the other hand, ti if a trading relation is phonetic inorigin,then theunavailability of phonetic contrasts should lead to a disappearance of the trading rela'ion. Since, in this case,the cues are preaumibly independent at the auditory level, a difference in two cues should be at least as easy to discriminate as a difference in one cue (cf. Espinoza-Varas, Note 2), regardless of whether the cue values are paired in the cooperating or the conflicting manner (a la Fitch et ii., 1980).

This is the rationale underlying the present experiments. To simplify, the design, the cooperating-cues cond.tion was omitted. The critical compari- son was hetween 1.ioue and 2-cue (conflicting-cues) trials in twp discrimina- tion conditions: Between phonetic categories and Within a single phonetic category. A trading relation in the Between condition (where stimuli con- trasted phonetically on some, but not all, trials) should show up as poorer performance on 2-cue than on 1-cue trials. The same pattern in the Within condition would suggest that the trading relation is auditory in origin. On the other hand, equal or better performance on 2-cue than on 1-cue trials in 'the Within condition would indicate that the trading elation is absent and, therefore, that its occurrence in the Between condition has a phonetic basis.

Four different trading relations wereinvestigated in four parallel experiments that were identical except for the stimuli and their dimensions of variation. Therefore, the general method will be described first, followed by a discussion of the individual experiments.-

GENERAL METHOD

Stimulus Tapes

Each experiment employed speech stimuli (naturalor syntheticwords) varying on two cue dimensions for a specific phonetic contrast. One cue--the primary cue--was always temporalin nature andassumed severaldifferent values, whereas the other cae--the secondary cue--assumed only two different values. Two sets of four values of the primary cue were selected: One set of shorter values was intended to span the- phonetic category boundary (Between ,condition), while the other set had longer values intended to fall entirely within the corresponding phonetic category' (Within condition). Because Weber's Law holds, approximately for the discrimination of duration(e.g., Creelman, 1962), and to facilitate discrimination in the more difficult Within condition, the values in the Within stimulus set were spaced farther apart than those inthe Between :set. The two values of the secondary cue were, chosen ao as to be difficult to discriminate but still sufficiently different to generate an observable trading relation.

A fixed-standard AX (same-different) discrimination task was used. This task has several advantages, Which include low stimulus uncertainty (which tends to raise discrimination scores), relatively short test duration,and direct convertibility of the data into d' scores. The stimulus tapes for the Between and Within conditions were identical except for the settings of the primer; cue. The fixed :standard stimulus occurred first in each stimulus pair and was constant throughout each condition; it had the shortest ,slue of the primary cue and the more conflicting of the two values of the secondary cue

168 171 (i.e., that value which, more than the other value, favored the same phonetic categoryas did an increase in primary cue duration). Eachcondition contained four blocks of stimulus pairs. The first block of 48 pairs was for practice only: On halfthe trials, the standard was paired with itself; on the other half, it was followed by that stimulus which had the longest value of the primary cue but the same value as the standard of the secondary cue. In other words, the practice block_ contained only identical and (relatively easy) 1-cue trials. The first test block of 72 pairs contained the same pairs as the practice block plus 24 2-cue trials. On these latter trials,the difference in the primary cue between the standard and comparison stimuli was the same.as on 1-cu' trials,but therewas an added difference inthe secondary cue whose setting in the comparison stimulus "conflicted" with its longer value of the primary cue, thus makingdiscrimination more difficult if (and only if)the two cues engaged in the predicted trading relation. The remaining two test blocks of 72 trials each were similar exceptthat the magnitude of the difference in the primary cue was reduced, thin' making the task increasingly more difficult. Thiswasdonetocounteractpossible ceiling effects due to individual differences in discrimination accuracy and in phonetic boundary locations. It also served to explore a range of stimulus differences, since it was-not known in advance how well naive subjects would perform in this task.

The standard and comparison stimuli in a pair were separated by 500 msec of silence. The interpair interval W83 2 sec, and t'Aere were longer pauses between blocks.

Procedure

The subjects were tested individually or in small groups. The stimuli were presented over TDH-39 earphones at a comfortable intensity. Al), subjects listened first to the Within condition, followed by the Between condition and by a repetition of the Within condition. The repetition served to investigate whether experience with phonetic contrasts in the Between condition had any effect on subjects' strategies in the Within condition; it also gave a second chance to those subjects who n0L-d this condition very difficult the first time. Inall experiments except the first, the 'discrimination tests were followed by a brief labeling test in which the seven different stimuli used in the Between condition were presented 10 times in random order. (The labeling test for Exp.1 was administered at the e.sd of Exp. lib.) This test was added to verify the trading relation between the two cues.

Instructions were kept to a minimum. The subjects were told about the geneal procedure and a t the relative dirficulty of the task. They were not informed about the fference between the two experimental conditions (except that' the :stimuli uld be 'slightly different), and they were not told the relevant phonetic lapels or the auditory cue dimensions that varied. Rather, they were ,eft to discover theft by themselves as they listened to the 48 practice trials. For these trials only, the correct responses (s, d) were printed on the answer sheet, and the subjects merely checked them off as they went along.' It was hoped that, after this experience, the subjects would have some idea of the difference to listen for(i.e., that in the primary cue dimension). They were told that the differences in the subsequent test blocks were of ste some kind, but that they would get smaller in magnitude. They

1 72 169 were not informed about the introduction of another kind of difference (that in the secondary cue dimension) or of the consequent itai:ease in the true proportion, f "different" trials from 50 to 67 percent,ut it was mentioned thatisy, kind of difference perceived warranted a response of "different." Clearly, the procedure was designed to focus the subjects' attention on the primary cuei since only this cue varied in the practice block.

The subjects responded by writing down "3" or "d" on each trial, guessing if necessary. After each of the three test conditions, they were interviewed about, their impressions and strategies. Inthe final labeling test,they.. chose from the two relevant categories (which they were told) and wrote down their responses in abbreviated form.

Analysis

Individual subject scoresin each test block were converted into d' values, taking the proportions of "differtnt" responses on1-cue and 2-cue trials, respectively, as separate bit rates, nd the proportion of "different" responses ontrialsofidentical stimulias the joint false-alarm rate.

Proportions of 0 and 1 were treated as .01 and .99, respectively, thus limiting d' to a maximum value of 4.66.

Three analyses of variance were conducted on subjects' d' scores in each experiment. The first analysis was on the Between condition only, with the factors Cues (1-cue vs. 2-cue) and Blocks (three, levels of difficulty).'The second analysis was on the. Within condition only, with the factors Repeti- ttons, Cues, and Blocks. (In Exp. 3,only the second repetition was'Ima- ly d.) The absence of any interactions between Repetitions and the other f torsjustifiedthe combination ofthetwo repetitions for thethird alysis, which compared the Between and Within conditions with the factors C itions, Ciws, and Blocks. The critical effect in this last analysis was

. toe ditions by Cues interaction, which was expected to reveal whether or not th same trading relation(or other response pattern) held in the two conditi ns.

EXPERIMENT 1: "SAY4,-"STAY"

The trading relation studied here concerned,as the primary cue, the amount of silence following the fricative noise and, as the secondary cue, the onset frequency of the first torment (F1) following the silence. This trading relation, which is similar to that for "slit!-"split" studied by Fitch et al. (1980), has been previously investigated. by Best et al. (1981): Less silence is neededto change"say" to "stay" whenF1starts ata lower frequency. Best et al. confirmed this trading relation in two different discrimination tests (oddity and ..ariable-standard AX). These tests actually included some within-category trials along with between-category' trials, and the trading relation could be seen to disappear within the "stay" category. However, this result is not conclusive, since it may reflect a floor effect and is based, on rather few responses. Itis interesting to note, however, that the similar data of Fitch et al. (1980) for the "slit " -"split' contrast, although they are open to the same objections, actually suggest a reversal of the trading relation in the within-category regions: Whereas the ordering of 70 173 performance on the three types of trials was cooperating cues one cue > conflicting cues in the phonetic boundary region; it changed to cooperating cues= conflicting cues one cue iat chance) within itegories. This is exactly the pattern one should expect froma trading relation that is specific to phohetio perception.

This expectation was further confirmed byBeat et al. (1981). inan elegant study with "sine -wave analogs" of vsay"-"stay* stimuli. Subjects who reported that they heard the sine -wave stimuli as'(highly unnatural)tqkens of "say" or "stay" exhibited the same trading relation betweensilence duration and F1(- analog} onset frequev y aswas observed lb speech stimuli. whereas those subjects Who heard the ..,..ne-wave'atimulias nqespeech showed a radically different'pe**enn of responses thatsuggested. that they paid selective attention to variations in one or the othercue. They neither integrated the cues into a unitary percept, nor did he settings of the unattended due have much effect on the perception of the attendedcue.

4 Given these'rather convincing results, the present re-investigationof the "sly"- "stay" Jontrast served not oely to replicate the findings of Bestet al. but also to validate the new procedure. The prediction toes, then,' that the trading relation between silence duration and Fl onset frequency would be observed only in the HetWien condition but not in the Within condition.

Mew.thod

Subjects. Eleven volunteers were recruited by announcementson the Yale University. campus and were paid for their participation. Most of them had served in earlier speech perception experiments. A different group of 9 subjects (those of Exp. 4b) took the brief labeling test.

Stimuli. The stimuli were hybrids composed of a natural-speech Es] noise followed by a synthetic periodic portion. The isPnoiseiderived Nom a male speaker's uttert of [se]. The periodic portion was produced on the OVE III° serial resonance synthesen at Haskins Laboratories, following formant specifications provided by Be etsal. (1981) in their Figure 1 (speaker S38). The fricative noise was 212 (Sec long, with a gradually rising amplitudefiver the first 170 msec and a rapid fall thereafter. The dyration, of the synthetic periodic. portion was 300 cosec/ It had a fairly abrupt onset anda fundamental frequency that fell linearly from 110 to 80 Hz.

The two stimulus portions were concatenated after both had been digitized at 10 kHz using the Haskins Laboratories PCM syster The primary cue was the amount of silence between then. In the Between conditIOFTEhe standard stimulus had no silence at all ("say"), and the comparison stimuli had 30, 20, and 10 msec, respectively, on "different" trials in the three test blocks. In

the Within condition, the standard had 70 wee of silence ("stay"), and the . comparison values were 130,1.15, and 100 msec. The "say"-"stay" boundary was .... expected to be in the vicinity of 20 msec of silence. The secondery cue was the onset frequency of Fl in the periodic portion. On 1-cue trials, it was 200 Hz, whereas, on 2-cue trials, it was raised to 299 Hz--acue favoring "say" and thus "conflicting" with theft,longer silence cue in the comparison stimuli. Thedifference in Fl between thetwo versions of the periodic ., 'stiff, lus porti ns gradually dkanished over the first 40msec (the extent of the 1 transition). 171 174 SAY - STAY

BETWEEN WITHIN

'I cue 2c.'

LABELING

Standar* 0 0 70. -70 70 0i0 20 X Comparlsoic 30 20 10 130 113 100 ----SILENCE DURATION Ornsec)

Figure 1. , Results of Etinset 1

175

= 1$

Results

The results are shown in Figure 1. The first panel shows average d' scores in.the Between condition. Diacrilsination performance was high in the first, block but decreased rapidly asthe'difference in the primary cue was reduced, F(2,20) a 24.4, 2 ?.001. As predicted from the trading relation between the primary and secondary cues, performance was higher on 1-cue thin on 2-cue trials; however, this difference did not reach significahce due to high intersubject variability, F(1,10) = 3.7, 2 < .10. The Blocks by Cues interaction was likewise nonsignificant.

The second panel of Figure1 shows the results of the Within condition. These results represent the combined (i.e., averaged d') scores of the two repetitions of this condition, which exhibited highly 'similar response pat- terns. Performance was only slightly better in the second run, F(1,10) = 4.0, 2 ( .10; no factor interacted with- Repetitions. Discrimination eccres'startid at a lower level in this condition than in the Between condition, even though the difference in the primary cue was twice as liege. Performance declined over blocks, F(2,20) = 14.2, 2 < .001, and this effect did not interact with C6034 Most importantly, the difference ttween the two types of trials was reversed here, performance being better on 2-cue than on 1-cue trials, F(,10) r 12.1, A: < .01. This reversal was confirmed by a significant Conditions by 04.103 interaction in the joint analysis of the Between and Within conditions, F(1,10) = 6.6, p < .05.

The third panel of Figure 1 shows the labeling data.for the stimuli used in the Between condition, obtained from a different group of subjects. One listener perceived all ,stimuli as "say" and was excluded. The data of the resettling eight- listeners confirm that the standard stimulus (no silence) was heard as "say" and that the "say"-"stay" boundary fell between 20-25 cosec, as expected. The labeling- data also exhibit the trading relation between the two cues, with fewer"stay" responsesto the 2-cue (i.e., fmnflicting-cues) stimuli. However, this difference once more did notreach significance because of high intersubject variability, F(1,7) = 4:0, 2 < .10.

Discussion

Basically, the results confirmed the predictions: / tr-,dingrelation between the two cues appeared, though not very reliably, the region of the "say"- "stay" boundary, whereas it was clearly absentwithin the "stay" category. This suggests,in accordance with the findings of Best et al. (1981), that the trading relation betweensilence .durationandF1 onset frequency is phonetic, rather than auditory, in origin. e

The present data are somewhat weakened by the nonsignificance of the tr..king relatiOn in the Between condition and in the labeling task. However, we must also consider that (1) the difference in the secondary cue was rather small and (2) the stimuli were presented in a discrimination paradigm that may have facilitated the detection of auditory stimulus differences in the Between condition, .evenmore so as thiscondition eras precededby theWithin condition, *Anil required auditory discrimination of similar differences. Any phonetic trading relation between the relevant cues (or, rather, its manifes- tation as superior performance on 1-cue trials) would be weakened by auditory 176 173 discrimination beyond' the -detection of phonetic differences, since auditory discrimination bestows an advantage on 2-cue trials. -Therefore, the critical result is the change across conditions 4n the relation between 1-cue and 27cue discriminatiOn--a change that-was significant in the present experiment.

It it conceivable, of course, that an auditory trading relation between silence 4Iwation and Fl onset frequency exists when the silence is short but not when it is long. The, most plausible form of this hypothesis would be that the presence of a silent interval is more difficult to detect When F1 has a her onset, but.that the perceived duration of longer silent intervals is affected by Fl onset frequency. This hypothesis is consistent with the present data,but it seems -unlikely in view ofthe Best et. al. (1981) findings. Specifically, these authors found that subjects who perceived sine- ____IgSve analogs of"say"-"stay"stimulinonphonetically and focusedonthe silence cue were not at all'affected by F1(-analog) onset frequency, even When the silence dUrations'were in the short range.

In the Best et al. study, it was found that litteners who followed an auditory strategy focused or one cue ands' ignored the other. In the present Within condition,' selective attention to/the silence cue woad have resulted in equal scores on 1-cue tnd 2-tue trialS, both declining over blocks, whereas 'selective attention to the spectral, cue would have resulted in much better performance on 2-cue than on 1-cue trials, with no decline in 2-cue discrimi- nation 'performance over blocks. Hor!tver, no subject exhibitedthis second pattern, and few exhibited the first.- Thus, the average data (Fig-. 1) are fairll, typical of the individuAl/ subject;they' arenotanartifact of averaging over subjects with radically different strategies. It seems likely, then, that the- present subjects took both cues into account, even though the practice trials encouraged selective attention to the primary cue and

subjects' reports indicated that they hadlittle awareness of th's4 (rather small) difference in the secondary cue. In that case, the higher scores on 2- ewe than on 1-cue trials simply sn_w that stimuli differing on two dimensions a!4, easier to discriminate than stimuli differing on one dimension only, which is perfectly plausible and consistent with the relative auditory independence of'the two cues shown by Best et al. (1981). Their finding that subjects paid selective attention to one orthe othercuewas probably due totheir paradigm,an AXB classification -task in which the-two cues- -were perfectly correlated inthe referencestimuli (A, B). Thus, theirsubjects were encouraged to select one cue and ignort-the other, redundant one; in fact, this strategy simplified the subjects'task. The present4- discrimination task,on the other hand, while it emphasized. the silence cue, encouraged listeners to pay attention to all possible stimulus differences. The ability of subjedts to make tsi of both cue dimensions in one task is not inconsistent with their ability to select only ones of them in a differenttask, since either strategy may be followed with independent auditory dimensions.

It should be noted that the advantage of 2-cue over 1-cue trials in the Within conditiondid notincreaseover blocks (as might be expected if subjectsbegan todirecttheirattention to the secondary cue as the difference in the primary cue got smaller) but remained constant at about 0.3 d', which provides an estimate of the (rather poor) discriminability of the secondary -cue difference, assuming that the discriminabilities of the two cues were additive. Another feature of the data worth mentioning is the apparent

174 convergen,e of the 1-cue and 2-cue scores in the! last block of the Between condition. Although this effect wee not significant, it was quite clearly exhibitedby severalindividual subjects. Note thatthe phonetic trading *elation between the cues is expteted to disappear not only withinthe "stay" ceetgory but also within the "say" category --a situation approximatedby the t 1 block of the Between condition. .

EXPERIMENT 2: "SAY SH°P"-"SAY MP"

The trading relation investigatedin this experiment involved the. same primary cue as in Experiment 1, duration silence, but a different secondary cue - -the duration of the fr. cativo noise following the silence. The trading relationbetween these two cues was demonstrated byRepo et al. (1978): More - silence W83 needed to turn "say shop" into "say chop" when the fricative noise W83 long than when it W83 short.

This trading relation has fawnin cannon With that of ,Experiment 1; however, it does involve two cues varying along the same physie'dimension (duration), which makes an auditory intereetion perhaps' 'or ikelythan between a temporal and a spectral dimension. Fur example there met here4 contrastive effect,such thata 'long fricative noise makes the pr' ding silence soundrelatively snort norvice versa),which woad lead the observed trading relation. The present study put this hypothesis to test, using the same paradigm as Experiment 1. If there is an auditory interaction between the two teMpOr81 cues. then it should surface regardless of whethipror not subjects perceive phonetic contrasts.

Method

Impjsrls. Ten volunteersparticipated, two of woos had also been subjects in Experiment 1, and 41,8 of whom had previously been subjects in Expt.;iment 3b.

- The stimuli were created an the OVE synthesizer. Formant parameters were copied tram a epectrogram of "sat shop" produced bya sale speaker (as used in Repp et el.. 1978). Stothetic_stimuli were used liocause it turned out to be difficult to change the duration of a natural fricative noise without audible clicks or ocher discontinuitiee. The initial 2a0 -asec "say" portiop was followed by a variable silent interval,a fricative noise of variable duratiop, and a 125-asee final periodic portion ( "op') whose first 10 msec overlapped the last 10 asec of .ne fricative noise. The fricative noise reached maximum awilituue after 50 cosec. Fundamental frequency rose from 85 to 100 Hz 'during One portion and fell from !OC to 90 Hi during the "op" nn .on.

The primary cue was the amount of silence preceding the fricativenoise. In the Between condition, the standard etimulue madno silence at all ("say shop"), and the °caparison stimuli had 30, 20. and 10 144,C. respectively.on "different" trials in the three teat blocks, just as 14 Experiment 1. In the bikthin condition, the standard- hat 40 IMMO Of 11131.8424Visay*hops), and the comparison vpluea were 100. 40. and 60 1180C.. The"say shop" - "say chop boundary was I expected to be in the vicinity of 20 stsecof silence. The

175 X

SAYSHOP S CHOP BETWEEN WITHIN 0 1 Cu. 0-- 2 cues

. . LABELING

N, N .It ea rig A 0 1 ic %o 1 u \ I a. 50

.. 10

0 I I= on own Nor Standard: 0 40 40 40 .010 2030 0, Comparison: J0 20 '0 100 80 60 SILENCE DURATION (msec)

Figure 2. Reeultc of Exnerilemit 2.

172 secondary cue was the duration of the fricative noise in thr is_!ond syllable. On 1-cue trials, its duration was 110 msec, whereas,on 2-cue trials, it was 130 msec, thus biasing perception more towards "say shop." The durationof the noise was changed at the synthesis stage by extend ,g its centralsteady-state portion. The stimulus tapes were recorded directly from thesynthesizer, without digitization of stimuli, sop the fricative noise waveforms exhibited natural randoM variability across tokens. Results

The results are shownin Figure 2. The first panel showsthat the average performance level in the Between condition was similar tothat in Experiment 1 (Wherethe smme values of silence nad been employed),with a similarly striking decline over blocks, F(2,18)= 11.8. 2 < .001. However, there-was no difference between 1-cue and 2--,:ue trials;in other words.,the trading relation did not emerge.

In the Within condition (secondpanel ofFig. 2), performancewas somewhat lower despite the larger, differences in le primarycue. Performance declined over blocks, F(2,18) = 16.9, 24 .0 1._ Inaddition,, however, accuracy on 2-cue trials was a good deal better than on 1-cut tri,ls; F(1,9)= 32.3,2 <.001. This difference seemed to increase over blocks,but the Cues by Blocks interaction did not reach significance. There was no significant effect involVing Repetitiona. The joipt analysis of the Betweenand Within condition revealed a significant-' Conditions byCues interaction, F(1,9) 22g4, < .002, which confiried the different effectsthat addition of d second:, Jule had 'n the two conditions.

The labeling results (third panel of Fig. 2),obtained fromthe same group of subjects*, revealed that the standard was-alwaysheard as "asy shop" and that the phonetic category boundary fell1)tween 20n25 wee. e" expected.' However, there wasalao the expected tradingrelation, with more "say chop" responses to stimuli containing the shorter noise, F(1,9) IT 16.9,2 < .01. 1..1s, thetrading relation was exhibited inlabeling but notin 8etween discrimination.

The reliability of the pattern of resultsshown in Figure 2 wet confirmed by the results of the author and his researchaasistenC wlso-took the .est as pilot subjects. Both showed the pattern in especially clear form: No trading relation in the Between condition buta large advantage for 2-,c,Je trials In the Within condition.

Discussion

Except for the coNplete absence ofa trading relationin the BAtween Condition, the present data are quite similar to those of Experiment 1, suggesting that the trading relation betwcalsilence anti fricative noise durations is similar to that between silenceduration and El onset frequency. and that both are phonetic in origin. Both, of course, concern theperception of the same phonetic contrast--stopmanner. At In Experiment1, the rritical finding is the Conditions by Cues interaction,which reflects the change in the difference between 1-cue and 2-cue trialsacross conditions. The absence of a trading relation in the Between conditionis probably ch.:,to listeners' detection of auditory differences in addition tothe phonetic contrast. Since the difference in thesecondarycue else more noticeablehere than in Experiment 1 (as suggested by the larger difference between 1-cue and 2-cue trials in the Within condition), the revoltingauditory advantage for 2-cue trials may have completely canceled theevantage for ue trials due to the phonetic trading relatien In the Between condition.

The difference between thet-cue and 2-cue d'functions in the Within condition euggeats that the discriminability of the secondarycue difference was about O. d' at the outset end increased to 0.9 d'in the last block. where discrimination on1-cue trials was at ohance. Although this increase did not reach significance. it does suggest thatsome subjects directed their attention towards toenoise duration difference asthe ailenee duration difference gotsmaller. the dataalso suggest, surprisingly, that the difference between a 110.7esec and a 130 -esec noise-was much easier to detect than the difference between a 40-maec and a 60 -.sec silence (M/thin condition. last block). Since this finding contradicts Weber's Law, it indicates that silence and noise dkratione are not equivalently represented on the subjective temporal dimension.

An auditory hypothesis compatible with the present data would be tnat t detection of silence is not affected b,the duration of a following noise segment, while the perceived durst- ;n of a loOger silence is increased When the duration of the noise is increased. The'dArection of this hypothetical effect does nest seem rtght. ta,at present there is no direct evidence against this hypothesis. The relevant psychoacoustic experiments remain to be done.

E -RIME T 21. "OAT"- "COAT"

This W5 cor.cerned with a trading relation reported by Repp t1979): When voice onset time MT) is used as the primary oue to me voicing of an otterreco-initial stop consonant,less increase in:TT In needed to turn a voiced stop into a voiceless one when the amplitude of toe aspiration noise .(Whose duration is the VDT) is reduced. This trading relation is different' in two important respects from those investigated in Experiments1 and 2. First, the two interacting cuer3 are both properties of the same signal portion, viz., of the aspiration hoiae that precedes voicing onset. Second, it appears that therr 13 no good articulator; rationale for this trading relation. Although therelevant measurements havenot beendone, it seemslikely that the eeelltude of aspiration. 'measured at a fixed diotance from the release. would be about the same in voiced and voiceless stops. It is true, of course, that voiced steps have a much shorter period of aspiration.and this neeessary coveriatien of aspiration duration and tome-integratedamplitudemay be sufficient toaccount or theperceptual trading re)att.on. Still, the artieutmtory esplanation seems less compelling than that for other effects, where different cues can be shown to be acoustically diverse conseqoences of the saae-artrcuistory act.(cf. Repp et al.. 1978). Moreover, there are well- known instances of trade-offs between duration and amplizode at the auditory threshold and in judgments of loudness (e.g., Garner A Mil,er.1947; Small, Brandt, & Co*. 1962). For these reasons,the present tre'1 g'elatio:, may well beauditory in origin. If so, it was predicted f4 occur in both cOndItIOn5 of Eiperiment 3; that is, .performance or. ospettedti, be higher on 1-cue than ,n in cvt:, the Between and with:h conditinnl.

111 Experiment 3 Was run twice. The first run (Exp. 3e) was only _partially suodessful bect_se the &timuli in the Between condition turned out to have missed the boundary (their VOTs were too long), so that the Between condition war effectively another Within condition. Also,' the VDT differences were rather small, sothat, the subjects were 'in greattrouble. Therefore, a replication (Exp. 3b) was condlioted with shorter VOT values in the Between condition and largerVDT differences. Results from both, runs will be reported. The labeling teat was administered at the end of Experiment 30.

Method

Sub acts. Eight volunteers participated in Experiment 3a. All of them had previiusly been subjects in Experiment 1. There were 'nine subjects in Experiment 3b, ,two of whom had also been in Experiment 3a.

Stimuli. In contrast to the previous stimuli,the present ones were modified natural speech. A female speaker recorded the words "goat"and "coat." They were digitized at 10 kHz, and a VOT continuum was constructed by first replacing the burst and aspiration portions of "goat" (22 msec) with the first 22 msec of "coat" and by then substituting additional equivalent-amounts of aspiration noise from "coat"(VOT = 66 msec)fore each successive pitch period of "goat." Fora detailed description of /this procedure, seethe ' appendix in Ganong (1980).

. Stimuli From this continuum were used in the Between condition may. For the Within condition, where VOTs longer than that of the natural "coat"were required, the stimuli were generated by a different procedure. Note that, in the method described above, total stimulus duration remains constant as VOT is increased while the periodic stimulus portion' is progressively shortened. This is standard procedure for VOT continua and probably does not matter when relatively short VoTs are to De' discriminated.' However, when VOTs are made rather long, little is left of the periodic portion, and informal observations have shown that removal of even a single pitch period may become-perceptually quite salient. That is, subjects may discriminate such stimuli not on the basis of VDT but on the basis of changes in the duration and intonation of the "vowel." To prevent this tram happening in'the present Within condition, the periodic stit.ius portion was h ld constant, and VOT-was further ,increased, by duplicating randomly selecte segments of the final portion of the aspiration noise, where the torment tr sitions presumably were close to asymptote.

Thus, the stimuli in the Between condition had a total duration of 228 msec (VOT plus r riodic portion), with the periodic portion diminishing as °VOT increased, whereas the stimuli in the Within condition had a constant.periodic portion of155 msec,and total duration increased with VOT.'All stimuli 'included, 'n addition, a rather powerful final [t] release burst of approxi mately 112 msec duration, which was separated from,the end of the periodic portion by a 133-msec silent closure interval.

Therprima,y cue in this st6dy was, of course, VOT (i.e., the duration of the ape'rii.dic portion a* stimulusonset). In theBetweenconditionof Experiment 3b (that ofExperiment 3e will not concern us here, since performance was at chance), the standard had a VOT of 38 msec (whichseems rather long but wws still heard ss "goat"), and the comparison stimuli had

i79 182 GOAT COAT

BETWEEN' WITHIN

*0 Standar* 38 38 38 73 737373 Comparison: 55 49 44 168 9891 85 VO1(msec)

Figure'3., Results of Experatenis.3aend 3b.

183 VDTs of 55, 49, and 44 maec,respectively. Inthe Within condition of Experiment 3b, the standard had a VOT of 73msec ("coat"), and the comparison stimuli had values of '108, 98, and 85 magic,- respectively. In Experiment 3a, the same standard was used, but the comparison stimulihad values of 98, 91, and'85 msec. The aecondary cue was the amplitude of theaperiodic stimulus portion, On 2-cue trials, it was reduced by 6 dB SPLinthe comparison stimulus, counteracting the longer VOT of ,that stimulus. This manipulation was performed on the digitized waveform, using computer instructions.

Results

Within - category discrimination-of the "goat"-"coat" Stimuli provedto be a difficult task for naive subjects.One problem seemed to be to discover the dimension on which the stimuli differed. (Recall that the nature of the difference was not revealed in the instructionsbut had to be detected during the practice block.) In Experiment 311, performanceih the first presentation of the Within condition was close to chance (average d'= 0.31), and there was no difference between 1-cue and 2-cue t-igls. A similar result.was obtained ' in the Between condition Where, because of inappropriatelylong VOTs, all but one subject heard only "coat"and performed at chance lever. Thy single subject who appeared to be able to make lee of phonc.ic contrasts performed Suite well and had higher scoreson 1-cue tnan on 2-cue trials, in accor4 With the expected trading relation. Prompted by subjects'complaints over the difficulty of 'the task, the experimentir told 'them beforethe repetition of the Within condition what kind of difference to listen for,and he productO exaggerated examples of stops with different amounte-1(aspiration to Wog- trate the point. This had a'striking effect on (most) subjects' performance. The results from this final condition of Experiment3a are presented in the second panel of Figure 3(the functions labeled "a"). It can be seen that performance was better on 1-cue thanon 2-cue trials, F(1,7) = 5.7, < .o5. Thispittern, contrasts that obtained in the 'Within conditions of Experiments 1 adb42, where he opposite.difference was observed. Due t' large variability, neither the d cline in performanceacross blocks nor the Blocks Dpi Cues interaction reached significance.

The subjects in Experpent ittwere told right at the outset to direct their attention to the inftial portion, of the stimuli; however,:hey were not told the,premese nature ofthe difference to listen for. Surprisingly, the hint didnot help. Performance in the first Within condition waspoor, despite the increased VOT differences (average d'= 0.23), and there was no clear difference between 1-cue and 2-cue trials. Therefore, these data were again discarded. However, the choioc of VOT values for the Between condition was more successful this time; these results sre shpwn in the first pasgl of Figure 3. Sublects performed at a level comparable t.. thatAn Expertmeits 1 and2, althoughthe durational differences weresomewhatsmallerhere. Performance declined over blocks. F(2,16)25.6,2< .05. Scores were higher on 1-cue 'than on 2-cue trials, F(1,8) r. 5.!,, 2 < .01,which riefiects the expected trading relation.

The results of the repetition of the Within conditionare shown in second panel 6f Figure 3 (labeled "b ").' These subjeots,too, were told difference to listen for befoie they'repeated the Within condition. However, their performance improved less than that of the subjectsin Experiment 3s.

181 184 Although better than chance,on the average,scores were low and highly variable. Neither the Blocks effect nor the Cues effectwas significant; note, however, a tendency for 1-cue discrimination to be higher than 2-cue ditorimination. "This tendencyissupportednotonly by theresultsof Experiment 3e but also by the data ofa restaroh assistant who served as a pilotsubjectandshowed *a str!king advantage for 1-cuetrials in both conditions; The Cues by Conditions interactionwas nonsignificant.

The third panel of Figure 3 shows labeling dataderiving from six of the ejects plus the research assistant. (Three subjects had already been tested before it woe' decided to add the labeling teat.) These dataconfirm that the standard ati*ulua (VOT e 38 MOO was perceivedas "POW and they also show the expected trading relation, although it fell short of significance, F(1,6) 5.5 2 < .10.

Discussion

The results of this eiperieent are stronger in terms of what they do not show than in What Limy do show. The most signitidint finding is the atsence of an advantage for 2-cue trials in the Within coidition. The data suggest that, on the contrary, there wai an advantage for 1-cue trials in both the Withieand Between conditions. This pattern of eesulta is the one expected for a tradingrelation of psychoacoustic origin. The interaction between aarirstion noise de ratior and stolitude may be similarto other kinds of `auditory time-intensity trade-offs.

WEENENT 4: "0101"-5HOP"

Tne trading relation &toiled in this list experiment has been known fora long tLma:-.it concerns fricative noise duraticn and rise-time the time from noise onset to the point 1:f MOWS asltudCasjoint cues to the fricative-affricate distinction. Commie 0957) showed that to turnan utterance-initial 15) into e te!), tre GOISt duration needs to be shortened more if 1.4,s rise-time is slow; or, conversely, its rise-time must be shortened more if noise duration lb long. Geretmen excluded the rise-tiee portion from his measure of noise 1;urition. thus confounding total noise duration with the rise -time -sriehle. Van Heuver (1479) rooently raanalmd Gerstman'a data and found that total th430 duration accounted fir nearly ail the variance; rise- time mad, only a mall etintribution to perception. Still. it can hardly be doubtedthatamplitude rise-time tias some cue viklue for thefricative- affricate distirAion. Although some relevant studies, have ccePounded rise- teme with amplitude at onset, which itself may be en important cue (e.g., Daman, Rapt,aei,A Liberman. 1979: EIP. 5).othershaveshown rice -rime proper to be a sufficient cue (o.g.. QAtting & Rosner.1974: Rosen & Howell. 1981;. Thus. tt SttAS iikay that rise-time can be traded aoinst noise duration. at least within certain limits.

Like the trading relation investigated in Erperiment 3. that biLWeen the present two cues'engages two properties of the sa4e, signal portion. Itis possible that these properties interact at tve auditory level 0 determine the perdeive4 duration of the noise.or cossiblyits i;urctived abruptness of onset. dowever, the pr calf Aratiov.r_a.!..4uoui._,A4.3,444* ,43,,,0,4_,,i_Laporksent 3, also has a good artic,ulatory tvpianatton; Naturally produced fricatives and

182 affricates differ in both noise duration and rise-time. Experiment 4 was expected-fleshed light on the'origin of this trading relation.

Experiment 4 actuallyconsisted of two experiments, idertical except for the stimuli. In Experiment 4a, the full "chop"-"shop" stimuli were used. In Experiment 4b, only the fricative noise portionswere presented. This second experiment was intended to serve as a kind of nonspeech control for the first, since informal observations had suggested that the isolated fricative noises did not invite phonetic categorization as "sh"or "oh," or in any .1afie were more difficult to label than the full stimuli. It was expected that whitever phonetic effects might be present in the Between condition of Experiment 4a would be absent in the corresponding condition of Experiment 4b.

Method

Subjects,, Nine volunteers participated,five of Whomhad alsobeen subjects in earlier experiments. All subjects took Experiment 40 first, then Experiment 4b on a separaLe day.

Stimuli.' The stimuli were createdon the OVE IIIc :synthesizer; they were derived from the second halves of the stimuli of Experiment 2. The choice of cue values for the Between condition was guided by Gerstman's (1957). data. The trimau cue was fricative noise duration. In the Between condition, the duration was 70 cosec for the standard (intended to be heard as "chop") and 100, 90, and 80 meac, respectively, for. the comparison stimuli. In the Within condition, the standard had a 140-maeo noise ("shop"),-and the comparison values were 200,180, and 160 -cosec. I The secondary cue was the rise-time of the noise. On 1-oue trials, it was 60 moo; on 2-cue trials.a.s, it was reduced to 30 cosec' (favoring "chop" percepts). In each case, ,the amplitude rise was linear and onset amplitude 1148 set et the minimum value possible it synthesis; amplitude parameter values for the two different rise-times Lien to diverge after the initial 5 cosec. The accuracy of the rise-times was verifiedby digitizing and displaying the waveforms of the stimuli. Stimulus tapes were recorded directly from the synthesizer to avoid artifacts due to "frozen" noise waveforms.

Results -1

The results of Experiment 4a are the functions.11beled "a" in Figure 4. Performance in the Between condition was again comparable to teat in previous experiments: the decline over blocks was significant, F(2,16) = 13.0, 2 < .001. However, there was rc difference between 1-cue and 2-cue trials. A slight avantage for 1-cue trials at the outset changed toa slight advantage for 2-cue trials in the lest block, but the Blocks by Cues interaction was not signifilant.

Surpriwingly, the results of the Within condition were remarkably similar to those of the Between condition. There was no significant effect involving the Repetitions factor. Performance declined over blocks, F(2016) = 26.5. .001, andan advantage for 2-oue trials emerged in the secondand third blocks. The Blocks by Cu:s interactitn reached significance here, F(2,16) 4.2, 2 <.05. This interaction was also obtained in the joint"analysis of the Between and Within condition3. F(2.16) = 7.4. p c :01, withnc triple

4 . 183 I CHOP SHOP

BE i'WEEN WITHIN

.

0 1 co* --o 2 cus

LABELING 100

, 0 0's 1 rt Standard: 70 70 140 140 140 80 90 100110- Comparison: 100 90 80 200 180 160 FRICATIVE NOISE DURATION Crnsec)

Figure 4. limits of Experiments 4a and lib.

187 interaction involving Conditions, which confirmi thesimilarity of the re- sponse patterns in the two conditions.

The labeling data were leas tidy than inthe earlier experiments; in particular, thestandardstimuluswas not an unequivocal "chop"for all listeners. However, the trading relai;ion between thenoise duration and rise- time cues was present and significant, F(1,8) 21.0, p < .01.

The results of Experiment 4b (fricative noise portions only)are labeled "b" in Figurb 4. Performance was strikingly better here than itExperiment "a. A so, in contrast to Experiment '4a,a large advantage for 2-cue trials can be seen, both in the Between condition, F(1,8) 2 19.7, p < .01, and in the Within condition, F(1,8) =°47.9, p < '.001. The results had in common with those of Experiment 4a the Blockaby Cues interaction: The advantage for 2- cue trials increased over blocks,particularly inthe Betweencondition, F(2,16) = 11.3, p < .001. The interaction did not reach significance in the Within condition, where it may have been due to °a ceiling effect inBlock 1. The different patterningofthisinteraction in thetwoconditionswas reflected in a significant Conditions by Blocks by Cues interaction, F(2,16) 6.4, p < .01. There was no effect involving Repetitionsin the Within condition,

Discussion

The "chop"-"shop" stimuli were -the most problekaticones of th) present set. Not only was the phonetic contrast less 4lear -cut,but the author also noted as a pilot subject that the stimuli were prone to auditorysegregation: After some minutes of listening, the fricative noise would suddenly "stream t away" from the periodic, portion,thereby destroying the speechlikeness and perceptual coherehoe of the stimuli. These observations are'in accord with the results, which show little difference between the Between and Within conditions, suggestingthat listeners may have made little or no use of phonetic labels in Betweaa discrimination. The Blocks by Cues interaction may indicate that :subjects made some use of phOnetic labels in thefirst block of both conditions and abandoned this strategy later. This is not implausible in view of the possibility that the standard stimulus in the Within conditionmay not have been an unequivocal "shop"; it is also supported by the reports of some subjects who claimed to have heard a 0]40) contrast in the Within condition. However, thisinterpretation iscalledinto questionbythe existence of a similar Blocks by Cues interaction in Experiment 4b, where phonetic labeling presumably played no role. 'We may presume, then, that the interaction reflect% a change in auditory strategies: As long as differences in noise duration were large, listeners paid attention to thatcue dimension, and on as the differences got smaller was their attention directed to the rise-time differences as well.

Two aspects of the presentrenitimr6ereclear. First, fricative noise duration and ise-time do not seem to engage in an auditory trading relation; otherwise, an advantage for 1-cue trials should have been observed in the Within conditi on, just ar, in Experiment 3. Therefore, the trading relation observed ir the labeling tmsk is likely to be phonetic in nature, and its failure to ntiow_talin_lietween discrimination may be ascribed to procedural factors and to the above-mentioned stimulus problems. Second, the periodic . 185

0. portion of the "chop " - "shop'" stimuli seemed to inter ere with auditory memory for the durst'on of the fricative noise,or with the perception of that duration in the first place: Discrimination was considerably easier When the noises were presenteg_In isolation. Perhaps, this difference reflects differ - ently -sited,auditoreunita; it sight disappear When the noise is perceptually segregated from the periodic portion, *tidier as the consequence of prolonged listening or of a listener-controlled stratsgy. However, Hopp (1981a) found that isolated fricative noises differing in spectrum (rather than duration) were moreaccurately discrintnated in isolation than when followed by a periodic portion, even by sutjectswho were able to perceptually segregate the noise from the periodic portiod. Thus, even though the stimulus components could be isolated by perceptue strategies, they were not completely indepen- dent in auditory memory.

GENERAL' DISCUSSION

Even though the present results must be considered Preliminary, they are encouraging, and the technique used promisee to provide a relative effortless way of determining the origin of a trading relation- The postexperimental laoeling teats showed the expected trading relations in all cases (although it was not statistically reliable in two).Thus, the stimuli seemed appropriate, even though they had not been formally pretested. However,the expected trading relations were not consistently present in the Between discrimination conditions. In ti studies ("say"-"stay," "goats-"coat"), they showed up, but theyvery reliably; in the other two ("say shop"-"say chop,' "`chop " - 'shop "), they were definitely absent.- The proposed reason for this was that the fixed- standard AX paradigm encouraged listeners to Auk. maximal -use of whatever auditory differences they could detect between the stimuli. For example, Carney, Widin, and Viemeieter (1977) and Ganong (1977) successfully used the same paradigm to get subjects to discriminate mall differences in VDT within a phonetic category. Auditory discrimination, in addition to discrimination based on phonet!c labels, would tend to reduce the trading relation observed in the BetWeen condition, unless the trading relation itself is of auditory origin. It al --giems that differences in fricative case duration were relatively soli t, WhiA may ei,lain the absence of an advantage .for 1-cue trials in the Between conditions for both 'say shop"-"say chop* and "chop" - "shop."

The critical data camefromthe Withinconditions ofthe different experiments. In two studies ("say"-nstay," "say shop' - "say nop"), there was an advantage for 2-cue trials, which contrasted with the pattern of results in theBetweencondition. This outcome suggests strongly that the trading relations between the relevant cues are phonetic in origin, confirming earlier results by Best et al. (1981) for the "say"-"stay" contrast. These trading relations--between silent closure duration and Fl onset frequency in the cgs', of"say"-"stay," and between silent closure duration andfricative noise duration -in the case of "say shop"-"say chop"--are well explained by reference to articulation,since in each case changes inthe twO cues are tightly correlated in the production of the rel ant'phonetic contrast. In a third study ("chop"-"shop"), the results were Do 0 ambiguous because similar results were obtained in the Between and Withi nditions, and the advantage for 2- cue trials was not as clear-,:ut. However, since a clear trading relation wa3

186

ISJ obtained in the labeling task,the trading relation islikelytobeof phonetic origin. The articulatory rationale applies here, too: Both trice- "tit* noise duration and rise -time change together in the production of the fricative-affricate contrast. Thus, three of the trading relations inyeati- gated appear to be phonetic in nature. and each of them has an articulatory explanation.

Only the "goat" -"coat* stimuli yielded a different pattern. Here, there was an advantage for 1-cue trials in both the Between and Within conditions, suggesting an auditory origin for this trading relation. Significantly, this trading relshionisoleo the only onethat has no obvious articulatory correlates: Aspiration'amplitude ter as does not seem to vary in the voicing contrast for stop consonants. Thus, the present results fit the predicted pattern: A trading relation is phonetic in origin if it has articulatory correlates, but auditory In origin if it does not.

The results of the Within condition; also tell us something about the ouditory perception of speech parameters. In some cases ("say "- "stay," "say shop "- "say-chop," isolated noises of *chop" -"ahoy"), the two cue dimensions seemed to be independent and aimultineously Accessible to the subjects. In the case at rgoat" -"coat," cm the other hand,-,they seemed 11 interact. This difference is reminisoent of the distinction 'between "separable" and. "integ- ralm'stimulus dimensions Werner, 1974; Lockheed. 1970).'Integral dimensions are those Where,in order for one dimension to exist, the other must be specified, end where *selective attention to one diainsionalone is not possible (Garner, 1974). Aspiration noise duration and amplitude seem to fit that description. However, the pairs of cues involved in the "say " - "stay" and *soy shop " - "say chop" distinctions do not: they seem to be be separable at the auditory level, perhaps because they are also separated in time. In order to prove their etaPtory separability, it would be necessary ow that they can be selectively attended to. as Best et al. (1981)' .have don tor."say" -"stay." The present task did not require selective attention, although it permitted such a strategy; the subjects, however, seemed to pay attention to both cue dimensions, which is certainly an option with separable cues. It is not clear -...t.where'the "chop " - "shop" romults stand in that regard; they are the ones most in need or replication.

Even tHnueh two cues say be sOitorily separable, it is significant that they can nevertholsa be integrated into a single phonetic percept. Presumably, this is achieved by t higher-level, speech- specific prdcess that combines cuss according to implicit knowledge about the articulatory and/or acoustic patterns of speech. It is t necessary to enviaipn this process as one of cue extraction f011owed by cue eccetination accordlOg to certain rules (the traditional machine metaphor); e vaguely, but probably more appropri- ately, may be understood as a cons uencsof perceiving articulatory change throughtheacoustic sigtkel, and referringthe perceivedchangesto internal criteria that &peaty the phonc.io categories of 'the language. If so, it seems likely that attempts to explain phqnetictrading relations by auditory psychophysics will. in most oases, remain futile.

187 REFERENCE NOTES

. 1. Miller, J. L., & Eines, P. D. Biological constraints on the-acnAnition of language: Further evidence froatthe categorization of speech, heAnfantn. Manuscript submitted for publicatibn, 1981. 2. Eapinoza-Varas, B. Optional processing stragmgles inmultidimensional discrimination: Integration versus selective attention. Paper presented at the 97th Meeting of the Acousticif-347rai of America, Cambridge, MA, June 1979. I

REFERENCES

Bailey, P. J., Summerfield, Q., & Dorman, M. On the'identification of sine- wave analogues of certain speech sounds..Haskins Laboratories Status Reporton Speech Research, 1977, SR- 51./52, 1-25, Best, C. T., Morrongiello, B., & Rdbson, R. The perceptual equivalence of two acoustic cues for a speadh cogtrast is specific to phonetic perception. Perception & Psychophysics, 1981, 29t 191_411. Carney, A. E., Widin, G.P., & Vieaeister, N.LF. Noncategorical perception of stop consonants differing in VDT. Journal of the Acoustical Society of America, 1977, 61, 961 -970. Creelman, C: D. Human disoriiination of audito;1, duration. Journal of the Acoustical Society of America, 1962, 4 582-593. Cutting,J. E.,& Rosner, Categories and' boundariebin speech and music. Perception & Psychophyaics, 1974, 16, 564-5700 Dorman, N. F., Raphael, L. J,, & Liberman,' A. M. 'Some experiments on the soundofsilence_ in phonetic perception. Journal of the Acoustical Society of America, 1979, 65, 1518-1532.

Fitch, , Halwes, T., Erickson, D. M.,& ,Liberman, A. M. Perceptual equivalence of two acoustic cues for stop-consonant manner. plusalo & Psycho sits, 1980, 27, 343-360. Ganong, W. F. III. Selective adaptation and speech perception. -Unpublished doctoral dissertatio*,,M.I.T., 1977. Ganong,W. F. III. Phoneticcategorizetiohinauditoryword perception. Joqrnal of Ex rimg,ntal Psychology: Human Perception and Performance,

1980,.6, 110- . Garner,. W. R. _An processing of information and structure. Potomac, MD: Erlbaum, 1974. Garner, W. R4,& Miller,G. A. The masked threshold of pure tones asa function'of duration.. .Journal of Experimental Psychology, 1947, 37, 293- 303. Gerstein, L. Cues for distinguishing among fricatives, affricates.. and stop

consonants.. Unpublisheddoctoral dissertation, New York University, )957. Heuven,V. J. van The relative contribution °forts. time, steady time, and overall duration of noise bursts to the affricate - fricative distinctiou in English: Aare-analysis of old data. Tn J. J. Wolf & D. H. Klatt # (Eds.), 1!14110 paper! pElsented at the 97th fleeting of the Acousticali Societof America. New York: Acoustical Society of America. 1979. Kuhl, P. K., 4 Miller, J.D. eoh perception by the chinco,lia: Identificationfunctions for ynthetto VOT stimuli. Journal the

188 ILL 4,4

Acoustical Siriety ofAmerica,1978, f...ibermen, A. N..04 StuddertAenne0, M. Phonedc perceptIon. Ln A. Held. H. Leibowitz,& Teubef (Eds.).Handbook of season itzlis1os/4 Vol. VIII. Heidelberg: Springer-Verlag, 1978, Livichead, G. R. Identification and the form ofmultidimensional discrtaina Lion space. Journal of U022tArlylk tly11119:lax.1V. 85. !-u). Pastore, R. Ai. Poesible psychoecoustio faottasin speech, perceet_on. in P. D. Ei s & J. L. Hiller (E0s.). tsmealtteon thirstt of- speech. Hillsdale, N. J.; Erlbaum, 1981. ono, Remez, R. E., Rubin, Pisoni, D. B.,'& Carrell, I. O.'Speech perception, without traditional speechcues. Science,- 1981. 212, 947-950, Repp, B.H. R tive amplitude of eeonnoise asa voicing cuefail% syllable -initial stop consonants. anguailM,and 4eech, 1179, 2.2.

Repp, B. H. Twe strategies in fricativediscrimination. perception & Psxchophxsica, 1981, 9, 217 -227. (a) Repp, B. H. Phonetic trading relatione andcontext effects: Me% experimental evidence for a speech mode ofperception. Hankins Laooratories Status Report on Speech perception, 1981. 3R-67/68, thisvolume.(b) Repp, B. H., Liberman, A. N., Eocardt.T., & Pesetaky. 1,. Perceptual integra- tion of acoustic cues for- stop, fricatiVeand affridate manner: Journal of Experiment& LiPscolux: Human tstsgegIn and Performance, 621-637. Rosen,S., & Howell, P. Pluck and bows are not categorically ov/rceivid.

Perception & Paycholewica, 1981, 30, 156-163.- . Small, A. M.,Jr., Brandt,J. F..4 Cox,P. G. Loucnees as a-functics signal duration. Journal of thg Acoustical Society, of America. . 513 -514.. Summerfield, A. Q.,& Haggard, M. P. Onthe disoociatlon of spectral and temporalcues to thevoicing distinctionin initial stop ,!cnsonants. Journal of the !l29,Alticalb19thestor America. 1977. 62_. 435 44&

ti PRODUCTION AND PERCEPTION OF PHONETICCONTRAST DURING PHONETIC CHANGE*

Paul J. Costa. and Ignatius O. Nattingly

Abstract. Ten' productions of each of the two 'words cod OD4 card in ..-1--7--outheasters subdialect of the Eastern New Englanddialect, said to differ phonetically onlydin vowel length,(andba number ofloil words involvini other phonetio contrasts) were recorded ina neutral carrier sentence by each of nine Obooetically naive urban- Eastern- Mew England speakers unaware of the purpose of the investigation. Spectrographicamessuraments revealed fairly oonsistent differences in the vocalic segment durations of cod and card for,most speakers. Out no speaker could reliably identlig-fils own intended productions (though lOentifiestion of-foils was perfvot). tvidently a ihooetio Osage is to progress, and ow results suggest thtt during such * change, contrasts in production say persist after they have ceased to be peroaot%ally relevant.

tv usually taken forgranted inphonetics that given a regular alternation in tne tion of two distinct lexical items, these two item* will be perceived as different. Lebo'', Yaeger, and Steiner (1972), however, have reported instances of ""teal neripmms," in which, despite consistent auousticcifferenoos between two-phonetic types ina dialect, speakers of the dialect felled informal °amputation tests. The purpose-of this study is to the meneps_easomption seems 'to be contradicted. The case in point is taken from the southeastern subdieleot of the Eastern Sew England dialecthenceforth SCE4--spoiwn in and around Fall River, kaaanchusette.

It has been stated by mamas (1958) cod y Kenyon ( 1937), on the basis of date.00lleoted in the 301s for thel.inguistic Atlas of New England, that in the Noe of low vowels, vowel'lengtb is disticittfve for 3111. For 83800, there.is said tio.101 only a vowel length distinotior between the two wordscod Dindl and card (kud). This implies that U* acoustic' signals for such parr; soy differ solely in the duration of the vonalic segment. 104 have found. er, that while there is a fairly reliable &rational difference in tha uction of (pod and card, speakers of this dialececannot consistently label 11 __IA:lcown productions.An other words,the distinctionin production is virtually ignored in perception.

'Paper presented at. Wet sooting ofthe Acoustical Society of Awls,St Ottawa, Canada, May 21, 1961. *Department of Linguistics. University of Connecticut. MoowledsMant. Support for the preparation of this paper' was provtdecl 4 Grant 44 from the National Science Foundation.

INASEIN3 ES: Stet& boort on speech Research SP-67/68 ii981)) 191

19 kd] (10)

v 6 i an cutaboto

Figure 1 194 We recently carried out e production experiment anda perception ezporl sent with nine SENE speakers. In tho production experiment, we collected acoustic Oata from the Informants with which the acoustic correlates ofthe vowel length distinctiOabetweencod andcard could be matured. The materials for eachproduction experiment consisted of five setsoftwo minimally paired common English words differing withrespectto n single future: goat-coat, grata trade, bit-pit, and cod-card. The first four pairs acted 'asfoils. Each word was putinto one of three carrier sentences' so that It would be spoken at a natural teepo. list in which all sentences appeared ton LIMOS.In random order, WO3 prepared. The sobjectle task was to speak each etatence at a normal tempo. Thee. utterances were recorded.

In a subsequent meeting with each subject, which took place fromone hour to two days after the production experiment, a perception test was given. The stimuli for each subjectwerethe 100 sentences he had spoken In the production experiment. Word pair* other than card and cod remained as foils. Each subject was asked to write down the test word in each sentence.

Wide band spectrograms were made of the ten tokens of cod and theten tokens of care asspokenbyeachsubject. Three successive durational measurements were noted: I) the Voice onset WA* .forCk),2) the vocalic duration measured from voice onset to Cie (d) closure, and 3) the closure duration for Id). For each speaker the V'T and closure duration varied rrom token to token without a oonsistsnt ratters. On the other hand, the tint seasuresents Far thevocalic) duratio lied a rather consistent pattern. heasurementiv of vocalic duration plus w closure duration, or both, *Oro

use consistent than vocalic duration , Vocalic durlion averages t the ten sneakers for cod ranged remit # 320 mama, While averages for card ranged from 240 to 00 msec, The ditie,_Ae. in the speaker average ranged Pram 30 to 40 SW, In all, three tubjeots bade a definite split in their productions, foursubjects were moderately consistent, andtwowerevery Inconsistent.

In order to poolthe dataIn a way that would exclude, 20 far as possible. Intersubject variation in speaking rate, we represented the vocalic duration of aeon token in signed units of standard deviation, usingthe eterage of each subject's durations for both cal and card as the mean for that spbject. Thus,if each subject had producedall his tokens of card with limo,' durations than any of his tokens of Cod, all card tokens would have greater signed values of star-lard deviation than any cod token.

Figure ! showsthe datapooled in this way. Me numberofcod productions for a particular range of -standard deviation values is plotted as a histogram above the horivantal SA1A. In the saws way, card productions are plotted belou the horizontal axis. While there Is c substantial overlap, it is clear that tbb proportion of cod productions decreases, end the proportion tf card productions increases, as the standard deviation goes from extreme .gaga values (corresponding to relativelyshort durations) at the left,, to xtreme positive volues (corresponding to relatively long durations) at the right. INA theproduction tilts if, consistent with the vowel length Aistinction described by Thomas and by Nonyon;%. thttwo words du differ in vocalic dorattoo in production.

U15 i ,ritItelorM J, wa 1m rerwasImmum OM . :11171161 low migami 1.0A a ail , Perception is ewe different matter. Individual labeling results break the 'aubjeol.e down into three groups: two speakers with relatively coneiatent percootion; four with inconsistent perception, and three with an overwileiaang reimpose bias towards one target or this other.

Figure 1 also shows the pooled labeling data correct responses being indicate-1 by the darkened portion of each hia6ogram and errors,ny the Wnite portion.. It Is obvious that subjects are identifying the intended productions at 041000 level: they cannot distinguish cod from card.

To determine whether,-subject's judgments were inluenced byvocalic duration, regardless of what trey hid intended as speakers wer4 =plotted the awe data according to perceptual judgments, In Figure 2 cod judgmenta are plotted above the horizontal was and card judgmeots below. It duration had influenced these judgments, the proporticn Qt cod responseswouldhave decreased, and the proportion of card judgments mould have increased from left to right with Increasing values of standard-deviataon. But no such correla- tion appears. Noteventhepositive. and negative extremes ofstandard deviation are consistently labeled. Thus we have evidence that a distinction reliably made in production has no effect upon perception. or A possible explanation for this curious state of affairs is that since the thirties,when the data were gathered on which Kinyon's and Masao' ascriptions were based,long and short IS/ have begun to sergeinthis dialect. If such a linguistic change were In progress, we might indeed expect to find tnst nabitn of production persisted aftera distinction had ceased to have any linguistic signifidance. This would mean merely that speakers were wasting effort in distinguishing words that had effectively become homophones. Note that the cmiverse possibility--a Aingulstic di151notion maintained in perception but unsupported in productiolA-ts unlike*, since it woulo result in misunderstandings.

the descriptions of Thomas andKenyon\NOW4V4f, ire UMW on lore:1st= 0011n10 dataand we cannot be certain that perceptual distinction existed even when the dialect was described, If it td not. then we cannot conclude that a change le 1#1 progress.

Out whether a change is In progress or not, there is anotherway to ,,Interpret this phenosennn. The pronunciations of such words as oral and card 'say function to mark a dialectal rather than a lexical differonoe. It IT interesting to determine whether subjects could sake a dialect judgmenton basis of these words if. and only if, they knew Oust lexical Items Are tided. We intend to pursue this question further.

REFERENCES

Kenton, S. Marielar, rm.±Liluarl (Mtn co,), Ann Arbor, Mal George ' Wahr, I9117 Grath, N. A phonolom and of modern Enilsh. Amor, Men.: University of Michigan Press. 196x. Labov, W.. Yaeger, 4., A Steiner, R. A quantitative stu4y of sound Eque in wilm! (Report on Contract N5f43-3287). Philedelpfnia; University of

19 Poem kraals, 1972. Thosao, C, K. o lotroduotioo to Ms &mils! of recall !Wilt t2nd tido, NewYork: ilooWs,-00.

13

196 DICAT OF ADDITONT MORT IN VOWEL DISCAININATION + Robert C. Crowder

Abstract. Tio experiments oo sus-different vowel dietrieination are reported. is each the min variable was the duration of a silent delay betimes the two items being judged.As would be expected free the assumptionthatsuch judgmentsdepend at Least partly on auditory sensory memory, it was found that looserdelays led to poormr discrisiestion than shorter delays.The auditory emery loss seams to be asymptotic at about threi saccade, whether it is measured by correct discriminative or (as in one part of the Emend experiment) by the influence of the' first vowel on identification of the mound.

INTRODUCTION

For all the wort soma of us have dome oe auditory sensorymemory, we ham verylittle about its time course.Vhat evidence there is comes either from scattered reports using totally noocomparable methods or from experimental tethalques that are not ideal for addressing the decay question.Still, most experts would probably agree that *cleatmemory. doesnot remain available forever'adthat itdecays plower the...Amid memory.There hays been two tessera program that sought data relevant to the decay question, both using alone of meeklos to uncover properties of auditory nemoryt

In Nessarea experimate related to this topic, for sample (seettessero, 1970), a similetoneselectedfrom two possibilities le presented fOr a recoseitim respc.se (hish/low). The phemememen ofinterest is thatan enralated smoking toes promoted just altar the teat stimulus impairs correct respoadlis la a way that depends oe the lettereal between tenet ead'eseh. If the mask is delayed by About 250 memo the imposes Is unimpaired, but more immediate masks reduce performed* coasiderkbly. It is the damage done by the murk that Ma led ilaasarce to infer the existence of auditory meiwi from this demonstrative. la the *tiane suffix affect, discovered by Del-..ett (1965) andelaborated- byCrowder. and Norton (1969), the terser peury trace is a ,hypothesised package of sound inforeatim about the last item In a Emory-span

Also Jour al. of ....rtmantal tuatelsz Rumen Learningand ftElEr. In rms. at Tele university, Rev Raven CT. MLWAtim. I acknowledge with pleasure the aseletance ofVirginia Val - tees la this research. Thanks also to Bruno Rapp for his casemate on an ear- lier draft of this paper. This research wassupportedby NSFGrant Mo. mus71-001042 to R. Crowder, and maw Cunt 11D01994 and 181 Grant R105596 to. Mashies Laboratories.

MAUS LASORATORILki 'flatus Reportos Speech Research 8R-67/68 (1981) )

197

, 199 type list. Performance o4 MO Last Ito* is badly damaged by enextrawgrd, the stimulussuffix, presented as if it were the out item in the list.The suffix can be semantically unrelated to the rest of the list and need not be recalled. Again, auditory storage Is inferred from the vulnerability of the target (last memory item to the list) tcmasking from the suffix.

In both the recognitionmeshingof to and the stteulussuffix paradigm,- incrusts' the intermit between the target sod the mak leads to improved performer:44.pp to a point, Massaro (1470) found this improvement reachedasymptoteat aimed 250 met, and Crowder (1169) foCeA that a suffix delayed by more the about 2 secoods had no effect on performance. !loth cites, asyeptotes were' used as estimates of the duration of auditory memory (Crowder,' 1969, pase 261; Massaro, 1972, page 129). The rusoole" was that wahine would become ineffective when the target information in the sensory store bad decayed. although .this claim by itself isquite true, it is invalidto coeclude anything *bout decay from the time at which making becomes so longer effective; Whin the meek is delays& le thee* peredivs,the ,senscrytrFze mightremaLe intact but meanwhilethesubject bqs had the opportunity to Ncodo the information In it; if the subject has tisensbis- to incorporate the informetim coatained in the ss000ry trace tosous sorepereensot format, then it 'makes no difference whether the meek doesordoes not destroy this Information later.

Watkins and Tod's* (1900; see also Mathias 6 Vistk na 3980,- Impartment 6) have recently reported several experimentsondelayed suffixes. They offered evidence tbst the interval before a delayed euffix was indeedbeing used by subjectsfor readout of auditory information into some more perusseat form of memory. They *Leo comfirmed Crowdeele(1971, page 339) speculation the decay mightbe very much slower than originally conjectured 17 Crowder and MorAo.1 Watkins sod Todres have correctly observed that whereas theabsence ofa suffix effectafter ems delay says nothing about whether the auditory arSCO survives that long, the presence of a suffix effect at some delay does cogged the survival ,ofthetrace for at least that long. They found that if they prevented subjects from engaging In readout of the targetinformationdaring the delay(between targetand suffix), enappreciable suffix effect was obtained after 20 seconds.

Although the stifle experigent may thus be forced to yieldacceptabld lofereoces about gluiest survival time of auditory memory, it is not ideal for this purpose. The portion of performance in thesuffixexperiment that is Interesting for the analysis; of auditory memory -- relative performance on the last serial loaltioh -- is suporImposed on a background of highly complicated and strategy-promshort-term memory functions. For wept., to demonstrate the 20- second- delayed soffit experimet Watkins and Todres had to usage the subjects is a lively mental arithmetic task between the last memory item end the *Mix Item.Ve know that even so modem a teak es cumbering aseries of meaningless items in order SASACOS several types of mechanism -- grouping; cumulative rehearsal, efforts at semantic coding, articulatory loops, andso on -- andmany of them macheolime are quite likely to interact withserial position. Accordingly, ;t would be a boo. to be abls to study auditory memory and its decay properties in the context of a simpler task. hat ie the purpose of the-research reported in this paper.

198 LIM Fiscal (1973) has used a same - different speech discrialisatiootask to study the decay of auditory memory (see ago Rapp, Healy, S Crowder, 1979; the backorouad for thee* ievestisetious is covered in Crowder, to press). In this teak, the subject hears two speech sounds -- perhaps vowels stellar to the /I/ amd/1/ im MST and SIT. The two sounds are typicallyquiteclogs to each other acoustically,so that perhaps they both sound Like oos or the other of these two Idiomatic segments.The subject must decidewhether the tee are identical phisitekli or not. Repecially in the case whore both items monad like orgy one of thepoesiblephonetic446144t0, the reasoninvis chat auditory ream musthave somerole to correct performance. Consider the subject receiving the second of the twoItame to be judged! If the second Item has the same.oame (phometio_lebel)sas the first, then the only way the subject antellwhether they are physically identical is by remembering the soundof the first until the second arrives. / Pisani (197.1) set the delay between the two *owl stimulistj intervals from one -half to two secoada. fps found that performance was poorer at the longer separations, as would be expected if the sound of the first its, -- its auditorymemorytrace -- were decaytmg dories the interval hetwees 'then. The logic of isolatieg sensorymemorycoatributious throughmanipulation of delayinasuccessivediecrimiestioe talk' isDataall ueccevestiossl. finals (1973) observes that such a task provides '...a rather direct approach to 'sensorymemory' processes.'is his experimest, euhjects beard a comoosed-toes and them, awl a variable delay, had to make atiesleteasit, discrialestios betweenastogie probe tome sad Its correspopding element is the erialeal composed; performnace steadily decreased .from eampeun4-probe intervals of ourtelf to two seconds. Samoa (1177) food poorer performaeca Ina "pOysical meter (same/different) task with isterstimulisinternal'of 570 cosec thee 250 meet, using stop-Vowel CV syllables.

The present Imperialists were pleased to test other latetvals than those Mood need, in orderto set se estimate of decay rate is auditory `armory. This reitearch cannot settle whether the eeditbry memorybelieved tosupport same-differest speech diecrimiastion is the samekeeditory memory that has been *toadied intik* suffix 'spinnerets ,(Precategoricel Acoustic Storage) That question ,seede a different kind of esperterst.Iterevar, the same-different discrlsleetion test is obviously a more direct and staple ccetext initthich to 11444411; the auditorystore and thus useful coetext in which to ask about decay. I'

Repertment 1 comprised two parts, to thefirst, therewere 10 wain comiltieue defined by 10 stimulus asset aim:brads* *operettas the two items on each trial.These were set at 0, 200, 400, 600, $00, 1000, 1200, 1400, 1600, and IMO suet. discs the vowels were SOO meet toes, the first two of these copditioes includedpOrsical overlapbetween the two item. It developedthat at least one of the overlap situations was sherpleinferior to the looder stimulus onset seischrony conditions and esbjects complained that theywere confusing. Thus, after testiog 20 subjects in the original deeign we elimisated the twoshorteststimulus 'owlet saynchrocyconditions and continuo( forsoother 20 subjects. 199

20j 16Gthod

Stimuli. The attsulua items were three-foment, steedy-stets, syothetic vowels air to thole* used by Rapp st al. (1979). Those stimuli *peened the costimass from the soul /1/ to !l Thefirst foram center frecisencies ragged fros.289 to 397 8s, the second from 2296 to. 2030 sad the third 3010 to 4432 Ma, ell in roughly lossrithsic steps; for the 'emulous" of eight. Isthis study, the fourth sad fifth takes. were left out so as to whose* the calmest betimes within- sad become-esteem deeisloes.The present setof vowels to Stimuli 1 2, 3, 6, 7, and 8 from Table i of Sapp it al. (1079, pegs 39). Theforasse bandwidths vete 63, 94, sod110 Is, rettNetively. The vowels were 300 seerloos vasvet.produced oa the lashing* Laboratories 0,11I1e systhesiser. Overall tads, rose sharply over the first 30 tsar, ales resafted salters eatil a symmetric fall over the last SO exec. fuedmsestal frowsty &enema gradually :roe 125 to 80 Rx throughout the utterances*

A different test to was prepared for each ofthe 10 stimuli. oast sayachroey coediting*.Oe each, there Isere le pairs of identical tokese (1-1, .2-20 padso am, each repotted three times) where the correct .cheerer use SANS. The lothem 42 pairs oo each tope costaised 16 "pee step' DIFFIRINT trial. (1-2, 241 2-3 and so oe), 8 *two step' palm sad 18 sore widelyspaced DIP/8101R cestrestiag the /1/ items (1, 2, 3) withdot/I/ Items (6,J, sad 8). These 60'trial tepee were sr:egged os the tape in a differeet modem order ter each *tingles oust asyschrosy.

Osmium and Ill5g1Elt.The subjects is Fart O. (10 different silsulus oust saysehroey coeditiose lerleid4es;006ed 200 aloe) reentod their to0o to as order determised by a Wafted Latin equarelcomplete control over fist- ordersegusatinl offsets). The subjnets-is Part Two followed the ease Latin square design but the tapes with 0 aall 200 mess sambas 011.4t asynchrony were simplydeleted; thee theybad 120 fewertrialsthat the first squad of subjects. lestractionswereexplicitabout the experisental desists aid Meowed that the criterionfor a "use' rupees was to be exact Itulat identity.

Following each trial, there wee afivesecondpause 'beforethenext trial. There were no wrote, sous& to apart trials or raepoose periods.The subjects bads numbered answer sheet with the Letters s. sad 1, which they were supposed to circle, indicatingtosirresposse for that trial. A practice tape consist/se of 6 sample trials was presented after the instructional.

Subjects. The subjects were 40 college-hge adults from.- eabit sowa/mop ores, eonsTale students serving as part of m tourse mioiresent sad acme velsoteerieg to serve for say.

Rieults and Discussion

The seen overall proportions COrtO4tfor the two parts of Expertmeat 1 ($ and DUMDUM!' trials combined) areshow. i* the first two rows of Takla 1 as a function of stimulus meet as-yacht For this analysis, the two hindsof trials were sot weighted (ee.the d' analysts, below). Wo thioge Se* quits clear fres inspecttoul There issods loss io discrisinstionas 200

2O fumeti f delay. of wwrouldaspect fro, the Fiecol Secondly. the functioo is far fro* asymptotic over the mom still

Analysis of-variance op abase data COOlireed the reliehility at t basic delay e:tect. 'tor this soa!ylis the me-parte were combined and oaxy the data from stimulus onset asynchronies 600-1800 wars twinned. To hare Inc aided the 0-ester delaywould hare produceda eisloadingly bleb V Nita homier this cooditioh was et, extraordisarily poorrelative to the others. ?or the ccahl dare, the de effect oat hightysignificantstettettcally. ? (7,273 '5.84, Me.. 7 2 (.001,

It cao he objected that these data Pay be influmaced to sou_uoknown Agree bit changesin rearms* criteria for Milne two ttet physirelly Identical,across the different delay conditions. Sucharguments Owe Kaplan. 6 Creoles*, 077) sake the emu for analysing the data to terms of Statistical DecielowTheory Tables,bave receotlybecome available for areneforririlizriabbe-standerdsameidifforehtdata into d* (Kaplan. tiacmilian4 6 C . tl7ltj. The cask is conceived u one where thesubject, Is -set- to "detect amestutas' and 00* sears* false alarm the proportion of Sea responses when the two items were In fact different. iTb, titre relerant- o this anatyais ere Orme to the **mod two rows. of Teble\1. This Occlustooe of thecooventiooal analytic, arecompletelysusteiced biped emirate of sensitivity, Analysis of variance based a.m.etheib0- ubjects. from both penis of the experimeet oti the conditions leer COMMA (st los osier asynchrony40Qthrough1800 seat) confirmed the reliability the delay effect, 0,63) 3.92, Mk .162, 2 < .05. Thus, no changing c ecioo for 'AAMOAAele. across different, etisulosonset asmbrony .41101 CAA borheld rsepoosibls for the IteclIsIng performance observed here. Notechar altbougo bias is not changingover intervals in a way that produces de'decay effect, there is an °vomit string bias in responding This is indicatedby the large d' values sod the relatively lov,(shout 702) rates of correct responding. The overall probability of saying SAM when the two stimuliwire identical weevery hie: .919, and the corresponding rate of false alerts, SAM given different, was .378. Melees, bias was observed to thesecond experlaeot. To repeat,' the teportent\coosideratioo is that a changing bias cement account for the result of inteitst here.

Alqough the'esjority of Wile in this experiment contained, by doeivi items foe the ease rphonsticcategory, there were slough berweeo-category pwire td inspect for a &itemse bete!** the eineof the deiayeffect to betwesh- and-wit hie-cetagery trials. This was does using etimolul pair. as the sampltee variable. ?Or each of the 12 within-cats gory pairs where the correct rdeponsewas *different,* the number of error*sods by all subjects ork stimulus onset asynchrosies 600-1000 and7 1200-1866 was tallied separstoly. Therellabilite, of the decay effect for the within-category data wee verified by tt pairtd t-cast, t (II) 2.83, 2 < .01. The ammowasdone for the 11144 bitsrmstefeller, hit, and Wile the decay effect was reliable, t (17) 6.43;- < .005. Going by the else of the t values, one might suppose ciw effect wee largerfor the hetween-cetegor pair' end indeed the raw differences between the short- and long -delay conditionswere significantly Larger for ,the berweeorcategory pairs than for the within-category pairs. t (28) 3.00.2 < .005. However, the betwee-category pairsall spanned a larger physics distso.:s then the within-category plaire and so there were early fever error, to

20,1 t Inserts !swab* Porfornaace la taper laaat'l

'Ulnae* Owast` Proportion COyX11.tt Asynebrboy (aillissuiefs) Part 1' Part 2 Pin i Pa tt 2 IMIMNEFI

0 .497 *1.01 200 .724 -4111.1=10- 1.86 400 .727 .728 3.21 2.94 600 .707 .729 3.17 3151 800 .747 .709 3.48 3,01 1000 ,721 :713 3.02 4.23 1200 71.8 .713 3.17 3.12 1400 .696 6% 2.64 12.74 1600 479 .6% 2.79 2.77 1800 .682 4684 2.70 2.71* the former. If one wishes to take the coati, of the difference between the short (5) and long (L) intervals rotative to the total orrore mode fora pair r - 5)/(1. 5) -- the different' Is refereed. By this lattetesso4re, therewas a significantly larger delay affect in the between -category data an to the within-category data t (28) 0 5.40, 2 c .005. The conclusion bas tobe that d:dlay does not have a Larger effect on within-category pairs than on between-category pairs. This was the outeose of the Pisani (1973) 114 Rapp at al. (1919) studies, coo.

The compotiont of perfortminee in discrimination that canbe assigned to aOlitorymemory .bas quite plainly not reached asleptote by the loosest interval tested is Experiment 1. The meinpurpose of Experiment 2 was to expand the range of intervals tested.

EXPERIMENT 2

lk The otimulue onset asynchrony value, used 4n this second studywere 500, tom. 1500, 2000, 2500, 3000, 3500, 4000, 4500, and 5000esec. In most other respects the experiment was similar to Experimeot 1 except for oneadditional future: Experiment 2 also included a complete run torough the materials for eachsubject in 'which identification, rather than discrimination, was Walabraa. Reimetal.(1979)badfound that tbe item within pairexerted Oftibly sysmotricti, contrastive effects os Labellag. That is, they observed that when eemMeSs a pair were being Labele4,ghonothcellf as /t. 1. or (1, the ideotiti of tea other pair member inflweneeel the its* **tog labeled. Mc tifact Ale tdrIftflant411, Mbiefi 210404 that-if an emkisnows vowel between and /11werit pftemeted, staring it in tee context of to onekbiguoue 71i *wits It ezati /sore like /1/. That the effect tea symmetrical 'eels that thetY !Ms id the nazi'. fufluenced tha 'steed ut as much is the othur 114v eround at el. eumatedthat use onetext effect* 20 phowtic labelins, wets used by isechaiiess etthiu suditoOrmemoryborease in conditions vhere eaditoryeemory wasremoved #y delay or bysoak lit!tle zonteltual influence we iound.

by analosy, the cootrastioe iffects n e*tic labeling can be coapered with v!sual -brighinees contrast: A wee shede o rave appear* brighter it It *touts lorthe context of a dark background then it appears In_ s light betkgrouod. forany otesesive coscreat to work, it Right be suggested that the two items would have to reside together in nanorr, If so, then eeco; understand why ;Limp at el. iound less contrast when they compromised the auditorrstorsp of itsee durittCA* loterotimuluit interval. It follow thnt contrasteffects could be used en se independent eeesure of the duration of auditory semory.

A wo/4 should be added abcut what causes contrastive cootelt geue7eliatition of importance is that one vowel effects the 14041 applied to soother prrvided they are 41ff:folios and provided they occupy audit-iry eesory together. In recentpublications (Crowder, 1978, to press) I have begun to Ideated* a theory that cnvere these findings. The central assumption reteesnt to cosIt0e4t effects Krasner, to press) is that soditory-sesely roptesentetIons -interact by fasancrlesilic lohlbition of each other. That is, if auditory memory representations of two its occur close together in time, sod uo the is channel, they will tend to inhibit each oetaer and this Imbibition witsie A

greatest in spectral regions where they contain overlapping .energy, If two vowels are stellar' except for the placement of 4- or two foriadts, thia frequincy-specific imbibition will produch contrast: The tormentsassociated withthevowels _used here have very ctinikiderable overlap relative to their center frequency.dIfferencoe. Ibis means that two vowels' torments will have an area or intersection each will have an area note is cosmos with the it other. It 10611Fition between them is frequency specific, the intersection in OA vowels' foots will, suffer the met, leaving the non-intorseetIng Tormentarea in each vowel relatively intact. Since the non-inter ctang .re4loaswerewhat mode tbo twovowelsdistinctiveinthe firs place, eliminating tits region in cocoa rill salience their distinctiveosse mutually, and will Lead to contrastive identification. See Crowder (to press) for further explanation. This interpretation is consistent with a henry that Applies squallywell to the suffix and vowel-discrimination tasks and covers issentially all known evidence on the suffix effect (Crewe's, 1918).

Method

Stimuli. A dliferent set of vowels was used in Experieent 2. Thiswas primarily in orderto tncrease the generality of the research pros;em. The 11-item coatimuus used, in this study crneeed the vowel specs in such a path a*, o include apprcitiaate prototypes of 1111,hol:, and my, which correspond to be vowel sounds in COT, CUT, and CAT, respectively. To achieve this, the Torment frequencies shove in Table 2 were set on the OVIIiit synthesiser. Included in Table 2 are the icizil identificationSeelwheneach of the thirteen tokens was press with itself -- that is. on SAME trials collapeed over inter-item delays. These data show than the subjects were quite willing to acoept this as a threes -vowel continuua. In other respects. the stimulus itsem were similar to those of. Experiment 1.

Each test tape contaltimi 34 pairs, of which 13 4006SAME trials ,1 -1. 2-2,...,11-13). II were two-step DIFFERENT trials (1-3, 2-4,...,11-13), and 10 were three-step DIFFERENT trials (1-4, 2 It was arbitrarily decided to use only Di/RIM trials that ascended in terns of,the nuebering of Table ' (than is 1-4,',hut not 4-1). These 7,4 pair types'were randomly ordered 1%) times, and planed oq tapes otherwise differing ocis in the stimulus onset asynchrony -- 500, 1000, 000, 2000. 2500, 3000, 3300. 4000, 4500, 5000 meet. The interval between 41s was 4 seconds.

21111E and kl25.2014*- "11 )04: went thrGugh the 10 tepee twice, first 14 an identification sc-,..prilment andsecond fn a same/different discrimination experiment. In the former, they were instructed to listen arefu7i.y to the second stimulus in each pair and to identify it by circling one -7',ts-e words tCdi-, CUT ,or CAT) on a withered answer blank. It was expo. 1t4 that the first item in oath pe.-.r would provide a contextual ini.Apnce on this Libeling, to the extent the two items occupied and -tear emery together. The 10 tapes were preserted to a balanced Latin square ardor.

In the second part of the experiment. the saes 10 tepee were presented to sech Subject in the reverse order to that used in the first part. tiqz!, ,-e instructionswere to lake a ease/different 'judgment foreach'pelt. basedn. seamcriteria explained in the previous experiment. Again, a practice tape

204 2 ) TAILS

Samill used iu hpetimopt2

Foment Structure Labels on SANK Trials Stimulus /Q/ Ml /der Number 2

.41 1 /41! 728 1091 2431 .969 .015 .015 713 1107 2431 .964 .031 .005 3 ' 702 1123 2431 .967 .023 .010 4 687 139 2431 .918 .072 .011 5 668 1 2431 .667 .323 .010 6 653 1 2 2396 .536 .459 .005 7 iAi 639 1 89 2396 .182 .813 .003 8 644 T9 2396 .026 .933 .041 9 644 2396 .023 .795 .182 10 649 456 2396 .010 .436 .334 11 653 1543 2413 .005 .221 .774 12 658 1635 2413 .003 .038 .969 13 /X/ 658 1719 2413 .003 .005 .990

.11AIMIIMIMPml.41-111t

205 207 was used to provide familiarity with the sounds.

Subjects. The subjects ware 40 e000g adult fron the seas source as to Experipent

lasults'an$ilbrussion:riscriaination

The djocrimination nrs given in re 1, which stows the overall propdrtion'of correct rasa' judgment*_tea fuoctioa of stimulus onset asynchrony. As lithe first ant, port ernebegan to drop sharply Nitwits one and tarp second 4wever, the solbows little change after three seconds, suggesting that smeltery memo, to the extent it represents a decayingsource of Information for same *rent responding -- has been lost by three seconds.

The same picture is provided by the ti# applysis shown inFigure 2. If anything the results arecleaner when corrected thisway for possible criteriosertifects. Statistic:A analysis confirmed the reliability of the findings linFigures 1 *and 2. Separate **yeas of the untransformed error types "same' on Durum' trials and "d,Iffereme on SAMS trials showed that each cosponent of the pooled errors in Figure 1 was statistically significant, F's (9,531) 4.82 and 6.49, respactively, (NSs 4282.1, 1860.1), f I < of four .0001 ApolYoloof vorionow ood*3 &Min used sugars jects individual' each. There were 10 of thee* eqiirsobjectsihd bed'variance' associated withstimulus onset asynch arcs highly sign ficant, V (9,81) ' 7.55,116e 2163.74, .001. As in nt 1, there was no evilence that the delay was sore potent for the within- than for theietween-category pairs: In this study, the identification results provided only five pairs thatcould convincingly becalled within-category (1-3, 1-4, 2-4, 7-9, and 11-13 -- see Table 2). One of these shooed reduced errors free the snort.- to thelong- intervalconditions while theotherfour showedincomes.' errors. The Abetweet-category pairs showed reliable and consistent delay effects, However,. t (15) 3.78, a < .005. As Woo:, Lbe auditory component was not by any mane restricted tothe cases where itemsbetng discriminatedmatchin phonetic category.

Performance remained quite good e..n after the component being attributed he* toauditory memory had decayed to asymptote. ROVOVOT, not too mach inpertance should be attached to the specific levels of correct responding. These reflect, among other thlus. the mixtureof easy, three-step discriminations (where performance ranged from .875 to .825) andthemore difficult two -step discriminations (.670 to .580). Furthereore, -here was a strou0.blas for responding 'ammo,'as is evident in the correct "sage responses on trials where the two items were identical, where hitsranged from .960 4n the 500-msec stimulus onsetasynchronycondition to .875 in the 4000 -seen condition. Corresponding "ease" responses on DIFFERENT trials ranged from .235 in the 500 -cosec condition to .312 in the 3000-msec condition. The mean prophrticos in Figure 1 thus represent *one of the exact perforpence levels obtained. The important thing of course is the regularity of the data and not absolute levels of accuracy.

Correct performance was also inil-senced by the particular items being discriminated along the continuum frogs /4/, 1.4/, through /JE/. Table,3

206 06 500 1000 1500 2000 2500 74004500 4000 4500 95000 Stimulus Onset Asynchrony imsec)

-J

F4sure 1. Proportion of correct responses, overall, in Experiment 2.

n7 209 500 $000 1500 2000 2500 3000 3500 40004500 5000 Stimulus Onset Asynchrony (cosec)

Flipro 2. Same/different discrimination sensitivity (d') asa function of stimulusonset asynchrony in imperimsnt 2.tech point represents performance of tee supermubjects based on four individuals spline.

v.) 4910 TABU 3

Proportion correct saasidiffo--nt dIscriainntion (coabinem gnAn)

DIPPIRMIT

tan Two -Stop Three -Stop

_PairProportion PairProportion PairProportion

1 .945- 1 - -3 .140 1- 4 .373 .2- 2' .947 2- 4 .167 2- 5 .657 3- 3 .940 3- 5 .465 3- 6 .7,5 4- 4 .937 4- 6 .515 4- 7 .747 SF5 .870 5- 7 .490 5- 8 .883 6- 6 .880 6- 8 .767 6- 9 .957 7- 7 .855 7- 9 .885 7-10 .987 818- 8 .920 8-10 .843 8-11 .973 9.6 9 .917 9-11 .825 9-12 .975 10-10 .917 10-12 .887 10-13 .97'5 11-11 .897 11-13 .$85 12-12 .910 6-13 .957

1.1.111/ OM AM .11.,.

209 211 r shows the proportion correct overall for each of the SAME and DIFFERENT Fairs used in the eaperiment. Quite clearly, the /fit /-end of the continuum was easier than the /41iend. These differences reflect co dosubt the spacing of tokens shownilkTable 2. However, the inportant question is whither the main dicey results weregeneralacross those stimuli, Watchdifferedwidely otherwise in discrimination difficulty. The answer ha ressaurine: Among the 13 types of 8Ahl1 trialet(1 -1, 2-2, and soon)performancest theshortest Intervelwas betterthanperformance at the longest interval in ten cases, with one the and two reversals, E .019 by a sign teat. Among the21 DIFFABINT trial types, tbnre were 17 pairs showing the same difference, with one tie and three reversals, E .006 bya signtest. Thua the extreme variability in pair difficulty is another reason for skepticism about the absolutn values of the means showi in Figures 1 end *2 but it does not discount the generality of the time profile shown there.

One might very well wonder whether the group asymptote of 3000 Reece is representative of the performance of many individual subjects. The analyses of variance reported here insures etbat the decay effectgeneralised across variabilitydue to subjects and evidence has been presented, above, for such generality across items. But the generality of the asymptoterequires 'trouserarguments. There ars not enough data for each subject to calculate individual regressions of performance on delay. !layover, the values for the ten supersubjectcouldbe inspected across Pb. ton delays for that purpose.As a rough estimate ofwhere these ten functions reached asymptote, the interval withthe lowest d' wee determined. For one supersubject, this minimum we at 500 wee stimulus onset asynchrony,for-another, it was at 5006, and for two each of the remaining eight, it fell at 3000, 3500, 4000, and 4500 exec. This near rectangular dtstrtbutioo of the minima is consistent with the generalisation that performance does not change after 3000 meet.

Results and Discussion: 'Identification

The identification results from SAME trials have already been 'displayed in Table 2. These data are collapsed over stimulus onset asynchrony but, as will be seen presently, stimulus onset asynchrony did not matter for the SAME trials. The identification dela of Table 2 show there were two boundaries -- that between /0., and /A/ falling; between stimuli 6 and 7 and the one between /A/ and /IC/ falling between stimuli 9 and 10.The question is now whether these boundaties shifted when subjects were identifying the exact same tokens but In the context of a prior item from'"higher up" on the numbered continuum of Table 2 (recall that the prior context always came from this direction).

To replicatethe Rapp it al. finding of centraat,'the present results would / .have to show that i give'. token sounded as though it cams from "lower down" on the continuum if it occurred on a DIFFERENT trial than if it came on a SAME trial. In terms o! boundary locations, this means the boundaries would shift to the opposite direction -- to a emeler numerical value.

The data relevant to 'this point are shown In Figure 3, whichgives a summary of context effects. Here, the : 't of Table 2 on identification are broken down into the different stimulus onset asynchrony conditions -- grouped by two's forstability. Boundaries were estimated by-linear regression on stimuli 3-8 for the /421-/00 transition and on stimuli 8-12 for the (A / - /k'/ 210 212 '11

4 transition. The two boundaries associated with the three phonetie segments erecollapsed In such a war that the numerical boundary measures on the vertical axis show the mean stimulus amber of the two b 41daries.

Figure 3 shows clearly that for SANK trials, the stimulus onset asynchronymode nodifference. However, onDIFFERENT te;als, boundary locations shifted in the expected direction -- toward lowernumerical values --when therehad beana moot context item. If the 90A was longer than three seconds, it was as If there had been no context st ell, but at shorter intervals, context changedthe labels applied to the second member of the peer. Tie convergence of phonetic labeling on SANK and DUPERS? trials at three seconds is consistent with the suggestion that contrast operatic when the two it in question occupy auditorymemory together. .The particular tie* interval at which these data converge is in appronimate agreement with ,the estimate of asymptotic decay that was basedon discrimination, reported above.

Statistical analyses confirmed the reliability of the picturepresented inFigure 3. For .the short stimulus onset asynchronies coebined (500 !sec through ge including 2500 .sec), 26 out of 37 nontied subjects placed stimuli 6 and 9 farther down the numbered continual on DIFFERENT trials thenon SANK trials, .01. The context effect wee surprisingly general acrossstimuli as wall asacrosssubjects: For the short and long intervals,as defined above, each of the 11 stimuli labeled in a DIFFERENTcontext/lumbers 3, 4,...,13) wasgiven a mean"placement score"along the continuum. This placement was simply a weighted avedge of the three phonetic labelsassigned by subjects.6 Theuse placement score was available frbe the SANK trials.,

The question was whether a given stimulus item would receive lower (that Is, farther down the list) placement on the DIFFERENT trials than on the SANE trials. At the short intervals, thieves the result for 9 of the 11 items, z .033. Furthermore, 10of the, 11 items also showed the full patter's of Figure 3 -- a bigger directional difference between SANE and DIFFERENT trials at theshort thanat the long intervals, I do .006. Thug, the contrastive context effectsonlabeling generalisebothacrosssubjectsand across individual vowel tokens.

It is somewhat surprising that the context effects wowed en consistent acrossstimuli. One would have expected priasrily the ambiguous :Aims to show influence of context. Therefore, further enalyselkwere undertaken to examine the relationbetweenthe degree of as context-effect sad the position of a ` -stimulus on the continuum. For this purpoie, only theshort (500 to 2500 msec) stimulus onsetasynchrony data were used. For each of the-11 stimuli that were lobe's(' on DiFFERENT trials, two placement scores' were compared,one onSANK trialsand pee on DIFFERENT trials. A positive difference means the vowel in questionshowed different phonetic labeling, in the predicted direction, when t followedanothervowelfrom the continuum. These difference. In plac sent are shown in Figure, 4 lbarbitrary numbers that reflect -tou cal__ ation of placement scores. The figure makes obvious that, although iii but t itess showed a "positive"context effect, as reported above, the 3150 of t t context effect was related fn an interpretable fashion

9 211

213 /A/-/a/ 9.0

SAME

SOO-1500- 2560- 3500 4500 -, 1000 20003000 40005000 Stimulus Onset Asynchrony (msec.)

!Nora 3.The relation between boundary ligament andstimulusonsetasyn- chrony for SAM and DIMPIRM trials in the phogetic identification phew of Inpariment 2. 421uk amber, OA the vortical axis *rasa the mean of the rap boundary values.

212 .214 Boundary I Boundary on SAME on pAME trials = I Itrials 5.80 9.96 I N

.6 76 9 10-11 12 13 Stimulus number

Figure 4. The relation between stimulus nunhar and the ales 4?r mutant effects as, labeling. A high positive moor* weans a particular vowel was latelad as awing from farther'sway from its prior 14 eoatoit vowel Mao ft would have been if that price °onionhad beea'the.same yokel- itself. The arrows show ocubliod oategory_ boundaries for AAMetrials.,

213 ,215 f.- to the catesbry boundaries derived.from SANE trialeNThere were two peaks in V- the contextual influenceand they coincide clouely with the two category boundaries. In other words, as one night expect, it woo theambiguous items that were most susceptible to e4text.

OSIDERAL DISCUSSION

The mein goal of these studies was toprovideparametric data on the decay of auditory sensory memory.The results give a consistent estimate that this decay is asyeptozic et close to three seconds, for thesacceestvo discrimination taskused here. The phonetic labeling data of Flour, 3 show anothermanifestation of auditory memory -- context influeoce ieentification -- and this influence disappears at just the sue time.

txperimeots using related techniques to investigate memory for tonom (for example, Harris, 1952; Noss, Nyers, 6 rumors, 1970) do not necessarily converge ou thesameestimate; however, ,cherearetypically notenough intervals studied in these esperinsots to establish an asymptote, and, even if there were, the stiles"! aed tasksare different enough todiscourage comparison. On the other hand, the estimate of three 'muds to close to the value suggested by Crowder and Norton (1969), even though that estimate was only a shot in the dart.

_Athough theitigh performance levels In themeexperimentsdemonstrate that otherfactors besides transient auditory memory support performance in this task setting, it is arelatively uncomplicatedtaskcompered to the suffqtexperiment. If further research susgasts that the successive vowel discsimlnatioa task used here taps the same auditory memorystore that has beenso extensively studied in the suffix experiment, it may be advisable to focus'on the former rather then the latter in future work because it is so much more directaMethod. Perhaps the least encouraging evidence on this point is the finding of Watkins and Todres (980) and of Watkinsand Watkins (1980i, that suffix -like effects occur following filled 4fIlsyof up to 10 seconds. It vitt be for further research to clarify whatare the boundary conditions on this delayed suffix effect and to est/Allen whether it has the saes functional properties as the immediate suffix effect, such Is sensitivity to phonetic class and to physical source channel.

The most intuitively plausible meal for how auditory memory isused in speech discrimination is that subjects try first to make a same /different decision based on phonetic labels and, only after that has failed, "ikon, to consult auditory memory. The Pule is 'If the two sounds have differefitlhanss, say 'different,' otherwise compare the sounds 'themselves." This model (see Crowder, in press,and! Piaui, 1973, for details) is apparently wrong. It anticipates thit effects owing to auditory memory would be stronger in the within-category discriminations than in the betimes -category discriminations. Neither the present studies, the results of Pisani (1973), nor those of Repp et al. (1979) gave evidence for the predicted interaction,

Perhaps subjects adopt some private categotical discrimipttiort thatdoes not match the conventionalphonetic categories but nonetheless serves a similar Tole in performance on the within- category pairs. After listening to the item. in the sti....11us ensemble for some time, subjects eight very well

214 2J, compacce thetwosounds. In thatcase, with *functional categories" cusattected for the minimal within-categorypelts, it would& not be so sampristeg Out there was about the ease auditory influence in thelwithin- and became-category judgments..

Does the threersecood,eetimeta from this research sanestaly functional rolefor auditory memory outalde the narrow task confine of tido procedure! A Of coarse, theaters now only the most preliminary efforts toreenact lawsof informatioaprocessing to real-time language processiag. however, Stevens 1 (1978, pace 14) has not the rsladonship between sentence-lova utterances, sad. breathing. No observes name relation between syntactic strmtturenald, the parses istrodeced by a speaker for the inspiration of breath.As Stevens micsatiesof breethisslimitsentences, or other major syntactic etructures, to a length of not more than two or three seconds. Thus, the Herne- second figureis' of som linguistic interest in my that be related to speech production or cimprehension.But this comsat isno -more Heat suggestive:for nos thug,the echoicdecay- estimate comes from a street:1m where the .trace is held in complete silencewhereasthe cwo-to three-second limitassociated with the breath group is tipically filled with *peach.

INFIRM=

Crowder, R. C. Improved recall for digits with delayed recall cues. Journal .01112E101ONMM25 1969, 82 258-262- 44r 0..WaitingUr the s suffix: Decay, delay, rhythm,and readout La immediate memory. lI Journal of timental 1971, 10, 387-596. Crowder, R. 0.Memory memory systems. fa R. C. Carterette 6 N. . P. Friedman (Rds.) Beadbook of tion, Volume Bee Yorks Academic P4ess, 1978. Crowder, i717the of auditory maim in spin& perceptionand discrimination. Illy. Myers, J. Lamer. A J. Anderson (Rds.) The cognitive .0presemtation of speech. Ameterdem: *North-Solland Publishing Co., (in press.. Crowder, x. Ot 6 Norton J. Precategorical acoustic storage (PAS). Perception ftsligcs194,1 365,473. Dallety". memory The effects of redundancy upon digit repetition. MIstoode ldest' 963,, 237-238; Nausea* Withimftetegory discriminations in speech perception.' tins 6 P eels sits1977, 21 423-430. hurl ne of pitchdiscrimination Withtile. Journal' of 1952, 43' 06-99. Eap N. A. 6 Cranium, C.D. Tablesof d' 'for variable-stapiqrd discrielnatim paradigm.Behavior Research Methods and lestamentatiol, 1978, no 796-813. Kinchla, L. Selective processesin sensory memorysAprobecomparison Weeders.In 9 horablum (Bd.) Attention and performing, IV. sew Volt: Academic Press, 1975. 11.4., 'Kaplan, N. I., 6 Creelman, C. D. The psychophysice of cateiorical perception. cal Bevies, 1977, 84, 452-471. TNNEMM Massaro,D. W. Prioress, Cory images. Journal ofa rinental Psychology, 1970, $5, 411 -417.

215 217 Kuser*, D. M. Preparceptual Loges, processing time, and perceptual units in auditory perceptloo.AyllobILLVLal Review, 1972, 79, 124-145. Hoes, S. K., Ayers, J. L., 4 Pilmore, T. Short-termrev:wattle°armory of toms. Perception 4 1970, !).369-373. Pisani, D. 11. Auditory 'ad pheiette moor; codesto thediscrimination of consonants and vowels. !Emits i PsTch09411411, 1973, up 253- 260. Rapp, 11.. Asaly, A. P. i Crowder, R. D. Categoriesend context to the perception ofisolatedsteady-state vowels. Journal .1 Itmeerimeotal Psychology Gwen Perception and Performaane, 1979, 1 129-143. SorkinU. D. Ks:tenet= of the theoryof signaldetectabillty tomatching procedures in psychosomatics. Journal of theAcoustical SocistE of America, 1962,.43 1745-1751. Stevens, K. X. The-speech signal. to J. P. Kavanagh 6 W. Strange (8ds.), Speech and language in the laboratorz, school. 4 attic. Cambridgo, MS: NIT Press, 1978.- Watkins, hi. 1., 4 Tod's. K. Suffix effecti manifest andconcealed:Further evtdence fora 20-secondecho. Journalof- Verbal Learning and Verbal SeltaiPior, 1980, 19, 46-53. Mackin*, 0. C., 4 Wetting, m. J. The modality effect and echoic persistence. Journal of !Eel-Ana tusbalmy General, 1980, 109, 251-278.

FOOTIIOTES aT

is lOur statement (Crowder 4 Morton, 1969, kage 366) was that a store lasting at least on the order of a 'few seconds= would be adequate for the fengitoobt -- 14. role we bed proposed for suditory.mamory.- - viva

2 Sorkin (!962) has shown why a straightforwardapplicatioo of etandatd tables to the same/different eituatioc is inappropriate.

3 For purposes of getting d'- values, subjegs were combined 'into `supersubjects'of opt because many ladividual b rat were close to or at 1.00. The mesa data look essentially as sent whether o rill hit and filet sternrates are taken befosv calculation if d' or t rotator* calceleted - for each supersubect. For the purpose hf statistical tests on i' valu!sce- S , hoverer, it is convenient to set up,the 'ups:subjects first.

1A.: 4 check on thd da% of. Table 2 dill verify>that performance on la%eltn*s within,tashe ranges the unaltigeously linear for the gro,:p data.

5Thase iWoistibult, 6 and 9, were)sbosep because'they represent parforesoce Merles on the identiftcsiion . from item,jetppor to the tt:frespective test in4-AitheoVfOreIrshould amfFWirnt ambiguous a It, especially subject to co:nolo/Wulf

6 k Speci(444elly, thi threeIdentification as , , were assigned theobbbens 1, 2, and 3...respectively, total response to a stimulus for a . , subject cosld'then be 'character ed as so sviltiss of thi number ed to sit. Theseaverages werehen compressed toe rant. froe .33 to fer analysis s

216 ,2 I Tift IMMIGIBM OF POCK= srpwwww

Michael Studdert-lonned

0 Abstract. To explain the unique efficiency of :peachas an acoustic carrier linguistic information and to resolve the paradox that units corresponding to phonetic segments are not to be found inthe signal,consonants and vowelswere saidto be 'ancodurinto syllabic units. Timm approach stimulated a decade of research into the nature of the speech code sod of its presumablysfacialixod perceptual deooding mechanist*, but- began to lose force as its implicit circularity became-aposrant.An alternative resolution of the peradox-propoawthat the signal carries no message: it carries informationomecarniagitssource. The message, thatis, the piemetio structsra,emerges from the peculiar relation between the source and the listener, as a human and as s speaker of a particular language.- This approach, like its predecessor and like much recent work in child *analogy and phonetic theory, takes timi study of speech to be a promising entry into the biology of language.

The earliest claim for the special status of speech as an acoustic signal sprang from the difficulty of devising an-affective alternative coda to use in reading mechinas for the blind. Hwy years ,of sporadic. occasionallyconcen- trated effort. have stillyielded no acoustic system .0y which blind(or sighted) users can fellow a text such bore quickly than the 35 words a minute of skilled Notes cods operators. Given the very high rates at which we handle an optioal transform of langisage,in reading and writing, this failure with acoustic codes is pertioularly stoking. Bvidemtly, the advantage of speech lies,mot in the modality itself, but in the particular wily it exploits the vidatity. What acoustic prop* 'ties set speech in this privileged relation to language?

The concept of adoodednesaw was as early attempt to answer this question (Widiusids,Cooper, Shankmailar, 3 auddert-Kennedu1967). Liberman and his dolleaggps embraced the paradox that, although speech. carriesa linguistic message, units oarraspndiag..to those of the message ,are not to be found in Cho-signal. -lbw proposed tilet speech should be viewed not as a cipher on linguistic -structure. -- offering the listener a -.signal isomorphic, unit for nelittEAdteh the lemeogtc:but as a coda., The °ode collapsed the phonemic segments (consonants and vowels) into acoustic syllables, so that cues to the

*Also in on, 1981, 10, 301-306. 4.Alao.Queema eige and Graduate Centers-city University of H.York. !!!0bmIdgment. thank Alvin'tiberman, Ignatius Nettingly, and Bruno ;Atop nueit fruitfuldisetassitmt and ,--advice.' Oreparation- of the piper was supported in part-b: NiCe GrAnt-HD 01994 to Wekina Laboratories.

8418,1113 LABORATORILS: Stet s-8eportonSpeech *Research 51 -67/68 (1981)3

217 2J9 component seigmeds were subtly interleaved. The function of the code was to finesse the limited temporal resolving power of the ear. We typically speak sod comfortably understand speech at a rate of 10-15 phoeomesisecond, close to the rate at whichdiscrete elements merge into a buzz. By packaging coesocants and vowels into syllabic units. the argument went, we- reduce this rep by a factor of two or thrza and so bring the signal within the resolving range of the ear.

This complex code called for specialized decoding aochaniams. More than it decade of research was devoted to establishing the existence of a special- ized phonetic decoding device in the left cerebral hemisphere and to isolating the perceptual stages by which the supposed device analyzed the eellable into itspaoneticcomponents. Thisinformation-processing approach to speech perception_ exploiteda variety of experimental paradigms that had seemed valuable in visual research (see Darwin,1976, and Studdert-Kennedy, 1976, 1980, for reviews), but led eventually to a dead end, as tt gradually became apparent that the undertaking was mired in tautology. A prime example was the proposal to *explain" sensitivity to features, Whether phonetic or acoustic, asdue to feature-detectingdevices, and tolook for evidenceof such mechanisms in infants.

Current research has drawn back and is now moving 'along two different, though not necessarily divergent paths. The first bypasses the problems of segmental phonetic perception and focuses on what some believe to be the more realistic problem of describing the contributions of prosody, syntax, and pragmatics tounderstanding speech. Thesecond path, with Which I am concerned, reverses the procedure of the earlier encoding approach. Instead of misusing that linguistic units should somehow be represented as segments in the signal and then attempting to circumvent the paradox of their absence by tailoring a perceptual mechanism for their extraction, the new approach simply asks: What information does the speech signal, in fact, convey?If we.could answer this question,,we might be in a position not to assume and impose linguistic s.ructure, but to describe how it emerges.

Consider the lexicon of an average middle-class American child of six years. The child has a lexicon of about 13,000 words (Miller, 1977), most of them learned over the precious four years at a rate of 7 or 8 a day. Whet makes this feat possible? Of course, the child must want to talk, and the meanings of the words she learns Must match her experience: cat and funny, say, are more likely to be remembered than trepan and surd. But logicelly prior to the meaning of a word is its physical mantfestation as a unit of --Iffiffisuleular-setlon In the-evesier and is -me-meditory-eveat-An the-listener- Since the listening child readily beComes a speaker, even of words that she does not understand, thesound of a word must,at the very least, carry information on how to speak it.More exactly, the sound reflects a pattern of changes in laryngeal posture and in the supralaryngeol cavities of the vocal tract. The minimalendowment of thechild is therefore a capacity to reprodufr ataoctionally eqtavalent motor pattern with her own apparatus. What properties of the speech signal guide the child's reproduction?

We do 06t know the answer to this question. We do not even know the appropriate dimenSions of description. But several lines of evidence suggest

218 22u that the properties may be more dynamic and more abstract than customary dmecriptions of spectral sections and spectral change.For example, some half dozen studies have demonstrated *trading relations* amongacoustically...0am- nanorate portionsof thesignal(e.g.,Liberman & Pisoni. 1977; Repp, Liberman, tocardt, 4 Pesetaky,1978; Fitch,Helve's, Ericiv.son, 4 Liberman, 3980). Perhaps the most familial: example is the relation between onset frequency of firet foment transition and delay in voicing at the onset of a stop ocanonant-vowel syllable: reciprocal variations in spectral structure and duration of &els) produce equivalent phonetic percepts (Summerfield & Haggard, 1977). Presumably, the grounds of this sad other such equivalences lie in the articulatory dynamics of natural speech, of which we do not yet have an adequate account. 'For a review of studies of this type', see Repp, 1981).

A secondline of evidence comes from studies ofsine-wave speech synthesis. Reim, Rubin, Pisoni, and Carrell (1981) have shown that much, if sot all, of the information for the perception of a novel utterance is preserved if the acoustic pattern, stripped of varistiohs in overall amplitude and in the relative energy of torments, is reduced to a pattern of modulated Sine wavesfollowingthe approximate center frequencies of the three lowest torments. Here, it seems, nothing of the original signal is preserved other than changes, and derivatives of changes, in the frequency positions of the main peaks of the vocal tract transfer function (cf. Kuhn, 1975).

Finally, several recent audio-visual studies have shown that phonetic judgments of a spoken syllable can be modified if the listener simultaneously watches a video presentation of a face mouthing a differeht syllable: for examige, a face uttering (ga] on video, while a loudspeaker presents [be), is usually judged to be sayingEde] (MeGurk i MacDonald,1976L Sumnerfield, 1979). The phonetic percept, in such a case, evidently derives from some combination of abstract, dynamic properties that characterize both auditory and visual patterns.

Moreover, infants are sensitive to dynamic correspondences between speech heard and speech seen. Three-month-old infants look longer at the face of a woman reading nursery rhymes if auditory and visual displays are synchronized, than if the auditory pattern is delayed by 400 millipeconds (Dodd,1979). This finding evidently reflects more than a general prefirence for audiovisual synchrony, since six-month-old infants also look longer at the video display of a face repeating a disyllable that they hear (e.g., (lulu]) than at the synchronized display of's fails repeating a different disyllable (e.g., [mama]) (MacKain, Studdert- Kennedy, Srleker, & Stern, Note 1).

The point here is not the cross -modal transfer of a pattern, whflh can be demonstrated readily in lower animals. Rather, it is the inference from this cross -modaltransfer,and from the other evidence cited, that the speech signal vnveys information about articulation by means of an abstract (and .therefor, modality -tree) dynamic pattern. The infant studies hint further that the infant learns to speak by discovering its °opacity to transpose that pattern into an organizing scheme for control of its own vocal apparatus.

Here we should note that, while the capacity to imitate general motor behavior may be quite common across animal species,a capacity for vocal

219 221 f imitation is rare. We should also di7,tioguish social facilitation and general observational learning from the detailed processes of imitation, evidenced by the cultural phenomenon of dialects among whales, seals. certain mongbirds, and humans. Finally, we should rote that speed (like musical performance and, perhaps, dance) has the ,peculiarity of being organized, at one level of execution,in terms of a relatively small number of recurrent and, within limits, interchangeable gestures. SiLient among these gestures are those that correspond to the processes of closing and opening the vocal tract. that is, to the onsets (or offsets) and to the nuclei of syllables.

We do not have to suppose that the child must analyze adult speech into features, segments, syllables, oreven words, before shecansetabout imitating what she has heard. To suppose this would be to posit for speech a mode of developmentthat precisely reverses the normal(phylogenetic and ontogenetic) process of differe ation. And, in fact, the earliest utterancesused for symbolic or communicative endsseem tobeprosodic patterns, which triTtlintheir unity across a wide varietyof segmenta) realizations (Menn, 1976). Moreover, the early words alsoseemtobe indivisible: for example,the child commonlypronounces certain sounds correctly in some worda, -but notin others(Menyuk & Menn;1979). This implies that the child's first pass at the adult model of a word is an unsegmented sweep, a rough, analog copy of the unsegmented syllable. And there is no reason to believe that the child's perceptis very much more differentiated than hqr production. Differentiation begins perhaps, when, with the growth of vocabulary, recurrent patterns emerge in the child's motor repertoire. Words intersect, and similar control patterns coalesce into more or less invariant segments. The segmental organization is then revealed to the listener by the child's distortions. Menn (1978,1980) describes these distortions as the result of systematic constraints on the child's output: the execution of one segment of a word is distorted as a function of the properties of another. She classifies these constraints in terms of-consonant harmony (e.g., [gni() for duck), consonant sequence (e.g.,[nos]for snow), relative position (e.g., (dmge] for 'gator), and absolute position (e.i., [sS] for fish).

Here we touch on deepissuesconcerningthe origin and nature of phonological rules. But the'descripoive insights of Menn and others working in child phonology are important to the present argument because they seem tto justify a view of thephonetic segment as' emerging from recurrent motor patterns in the execution of syllables rather than as imposed by a specialized -perceptual device. As motor differuotiation;,proceeds, these recurrent pat7 terns form claoses,'defined by their shared motor components -- shared, in part, because the vooal tract has relatively few independently movable parts. These components Are, of course, the motor origins of phonetic features (cf. Studdert-Kennedy & Labe, 1980). Some such formulation is necessary to resolve the paradox of a quasi-contiroous signal carrying a segmented linguis- tic message. The signal carries no message: it carries information concern- ing its source. The message lies in the peculiar relation between the source and the listener, as a human and as a speaker of a particulari'language.

Readers familiar with the work of Turvey andShaw (e.g., 1979)will recognize that the present sketch of a new approach to speech peroeption owes much to their ecqlogical perspective(as also to Fowler, Rubin,Remez, &

220 222 Turvey, 1980). What may not be generally realized is that this perspective is highly compatible with much recent work in natural phonology (e.g.,Stampe, 1979), child phonology (e.g., Menn,1980), and phonetic theory (e.g., Lind- blom, 1980; MacNeilage & Leefoged, 1976; Ohala,in press). For example, Lindblom and his colleagues have, for several years, been developing princi- ples by which thefeature structure of thesoundsystems of different languages might be derived from perceptual and articulatory constraints. More generally, Lindblom (1980) has stressed that explanatory theory-must refer "...to principles that areindependent of the domain of the observations themselves" (p. 18) and has urged ',cat phonetic theory "...move (its] search for basic explanetory principles into the physics and physiologyof the brain, nervous system and speech organs..." 4. 18). In short,if languageis a window on the mind, speech is the thin end ofan experimental wedge that will pry the window open. The next ten years may finally see the first steps toward a genuine biology of language.

REFERENCE NOTE

1. MacKain, K., Studdert-Kenn-dy, M., Spieker, S., & Stern, D. Cross-modal coordinationin infants' perception cf speech. Paper presented at the InternationalConference onChild Psychology, Vancouver. B.C., August 1981.

REFERENCES

Darwin, C. J. The perception of speech. In E. C. Carterette & M. P. Friedman (Eds.), Handbook of perception, Vol. VII: Language and speech. New York: Academic Press, 1976, 175-226. Dodd, B. Lip reading in infants: Attention to speech presented in- and out- of-synchrony. Cognitive Psychology, 1979, 11, 478-484. Fitch,H. L., Halves,T., Erickson,D. M.,& Liberman, A. M. Perceptual equivalence of two acoustic cues for stop-consonant manner. Perception & Psychophysics, 1980, 27, 343-350. Fowler,C. A.,Rubin, P., Remez,R. E.,& Turvey,M. T. Implications for speech production of a general theory of action. In B. Butterworth (Ed.), Language production. New York: Academic Press, 1980. Kuhn, G. M. On the front cavity resonance and its possible role in speech perception. Journal of the Acoustical S2215ty of America, 1975, 58, 428- 433. Liberman, A. M., Cooper,F. S.,Shankweiler,D. P.,& Studder -Kennedy, M. Perception of the speech code. Psychological Review, 196774, 431-461. Liberman, A. M., & Pisoni, D. B. Evidence for -aspecial 3 ech-perceiving subsystem in the human. IQ T. H. Bullock (Ed.), Recognit n of complex acoustic signals. Berlin: Dahlem Konferenzen, 1977, 59-76. Lindblom, B. The goal" of phonetics, its unification and application. Phonetica, 1980, 11, 7-26. Mactieilage, P., & Ladefoged, P. The production.lif speech and language.. In E. Carterette & M. P. Friedman (Eds.), Handbook of perception (Vol. VII): Language and speech. New York: Academic Press, 1976, 75-1201 McGurk, H., & MacDonald, J. Hearing lips and seeing voices. Nature, 1976, 264, 746-748.

221 223 Henn, L. Patterncontrol and contrast in beginning speech: A case study in the development of word form and function. Bloomington, Ind.: Indiana University Linguistics Club, 1976. Mann, L. Phonological units in -beginning speech. In A. Bell & J. B. Nooper Syllables and segments. hmuerdam, North-Holland, 1978. Mann, L. Phonoiogioar-Eheory and 'childphonology. In-G. Yenirkemshisn, J. F. Eavanagh, & C. A. Ferguson (Eds.), Child phonol perception and production (Vol. 1). New York: Academicmess, , 43. Nenyuk,-P., & Henn, L. Early strategies for the pirception end production of 'lords and sounds. In P. Fletcher & M. Garman (Eds.), 'Language acquisition. New York: Cambridge University Press, 1979, 49-70. Miller, G. A. Spontaneous apprentices. New York:. SeaburyPress, 1977, Ch. 7. Ohala, J. The origin of sound patterns;in vocal tract constraints. In P. F. MaoNeilage (Ed.), Speech production. New York: Springer-Verlag, in press. Rem, R. E., Rubin, P. E., Pisoni, D. 13.,& Carrell, T. D.Speech perception without traditional speech cues. Science, 1981, 212, 947-950. Repp, B. H. Phonetic trading relations text effects: New experimental (width.)e for a speech mode of perception.Haskins Laboratories Status Report on ZNIech Research, 1981, SR-67/68, this volume. Rapp, B. H., Liberman, A. N., Sccardt., 4 Pesetsky, D.Perceptual integra- tion of acoustic cues for stop, fricative, and affricate manner.Journal of ExperiNental Psychology: Human Perception and Performance,1978, 4, 621-C7. Stamp*, D. A .ssertation on natural phonology.New York: Garland, 1979.

Studoert-Kennedy, A. Speech perception. In N. J. Lass (Ed.), Contemporary issues in experimental phonetics. New York: Academic Press, 1976, 243- 293. Studue.rt-Kennedy, M. Speech perception. Language and Speech, 1980, 21, 45- 66. Studdert-Kennedy, M., & Lane, H. The structuring of language: Clues from the differences between signed and spoken language. In U. Bellugi & N. Studdert- lennedy (Ida.), Signed Ammo andelsi language: Biological constraints on linguistic form . Deerfield Beach, Fl.: Verlag Chemie, 1980, 29-40. Summerfield, Q. Use of visual information for phonetic perception. Phonitica,1979, lik, 314-331. Summerffeid,-4., & Haggard, M. On the dissociation of spectral and temporal cues to the voicing distinction in nitial stop consonants. Journal of the Acoustical Society of America,1 77, 62, 436-448. Turvey,H. T.,& Shaw,R. 1.--715TIrin cy of perceiving: Anecological reformulation of perception for understanding memory. In L-G. Nilsson (Ed.), Perspectives on memory research: Essays in honor of ,Uppsala University's 500th anniversary. Hillsdale, N.J.: Lawrence Erlbaum Asso- ciates, 1979.

222. 221 AUDITORY INFORMATION FOR BREWING AND BOUNCINGEVENTS! A CAVE STUDY IN ECOLOGICAL mullet t

William H. Warren, Jr. and Robert R. Verbrugge+

Abstract. The mechanical events of bounoiwand breaking glassare cocas~ specified by single vs. multiple damped quasi-periodic pulse patterns, with to initial noise burst in thecase of breaking. Subjects show high accuracy in distinguishing natural tokensof these two events and tokens constructed by adjustingthe periodic'. ties of spectrally identical ooaponenta. Differences in average spectral frequency are therefore notnecessary for perceiving this contrast, though differences in spectral consistencyover successive pu,"sis apparently are important. Initial noise corresponding' to glass rupture is not necessary'to distinguish breakingfrom bounc- ing, but may be important or identifying breaking in isolation. The data indicatethat higher -order temporalinvariants in the acoustic signal provide information for the auditoryperception of these events.

Research in auditory perception has emphasized the detectionand process- ing of sound elements with quasi-stable spectralstructure, such as tones, torments, and bursts of noise. In the spectral domain, these elements are distinguished by frequency peak or range, bandwidth, end amplitudes In the temporal domain, moustic analysis has often been lAmitedto the durations of. sound elements and the intervals between them:Much of traditional perceptual research, including that of classical psychozwoustlos, hasMoused on lis- teners' response mural to essentially time- constant functions of frequnnoy, amplitude, and duration,on the assumption thit complex auditory percepts are compositionsoversound elements with those properties(Fletcher, 1934; Helmholtz, 1863/1954; Plum, 1964;see Green, 1976).

The perceptual role.otime-varying U properties of sound has received -nompsratively little'attention. Sole ezoeptions to this can be -foundin reseal* on amplitude and frequency modulation, particularlyas they relate to classical auditory phenomena such astbeats and periodicity pitch. In general, -however, research on time-varying properties has been mostcommon in the study

.Also,Univirsity of Cozainticut. know t; The authors wish to thank Michael Studdert -Kennedy, Michael and Bibert Shaw for their comments. This research was supported by a grant from thi Netionalrnstitutelor Child Health and Human Development (HD- 01994) and a Biomedical Research Support Grant (RR-05596)to Haskins Labors- torlei, and by a National Science Foundation graduate fellowshipto the first author .%

ERASCINS LABORATORIES; Status Report on Speech Research SR-67/68 (198t)]

223 225 of classes of naturalevents, suchas human speech,music,and animal oommunicat'on, where an analysis of sound into quasi -stable.elmaents is often problematic. In the cassamof speech, for'exisplo, many phonemic contrasts can be defined by differences in the direction and'rate -of-change of major speech resonances (see Liberman,Cooper,Shankweiler. & Studdert-Kennedy, 1967; Liberman, Dtlattre, Gerstman, & Cooper, 1956), Some research on the percep- tion of music has also damcnstrsted the perceptual significance of time- varying properties. Identification of musical instruments, for example, is strongly influenced by the temporal structure of transients that accompany tone onsets (Luce & Clark, 1961; Saldanha & Corso, 1964). In particular, the relative onset timing ane the rates\of amplitude *change of upper` harmonica have been found to be critical properties of attack transients that permit distinctions among instrument families' (Grey, 1977; Grey & Gordon, 1978). Antral vocalizations are similarly rich in time-varying properties (such as rhythmic pulsing, frequency modulation, and amplitude modulation), and many of theseproparties -have beenshownto be criticalfor distinguishing the species, sex, dangerousness, location, and motivational state ot the producer (e.g., Brown, Beecher,Moody, Stebbins, 1978; Konishi, 1978;Peterson, Beecher, Zoloth, Moody, & Stebbins, 1978).

It is noteworthy that in each of 'these areas of research on natural events, thediscoveryorexplanationof perceptually significant,time- varying, acoustic properties has been motivated by an analysis of the.time - varying behavior of the sound source. In the case of speech, for example, an thilysis of speech production has been an integral part of the search for the acoustic basis for speech perception (e.g.,,Fant, 1960; Fowler, 1977; Fowler, Rubin, Ranee, & Turvey,1980; Liberman at al., 1967; Verbrugge, Rekord, Fitch, Tuner, & Fowler, in press). it is also worth noting that researchers in theseareas have often found it more usefulto characterize perceptual information in terms of hither -order structure in sound --that is, in terms of functions over the traditional measures of.frequency, amplitude, and duration. Given the time-varying behavior of the sound sources involved, it is not surprising that many of these functions are time-dependent in nature, defining rates of changeand styles of change in lower-order acoustic variables. Finally, it-is not uncommon for researchers in these fields to view this tes*oral structure as a property of the sound stream itself, rather than as a property that must be introduced by a perceiver while constructing a percept.

The role of time-varying properties in the perception of otherl'amiliar eventsinthe human environment is largely unknown,and research on the subject has been sparse. Our goal in this paper is to demonstrate by argument and example that *liner -order, temporalstructure canbeimp9rtantfor distinguishing such events.

It is apparent fromeveryday* experience that listenerscan detect significant aspects of the environment by ear, from a knock at the door to the condition of an automobile engine and the gait of an approaching trim,.Sych naturalistic observations were recently verified in experiments by VanDerveer (Note 1, Mote 2). She presented 30 recorded items of natural sound in a free identification task and found that many event:, suoh as clapping, footsteps, jinglingkeys, andtearingpaperwereidantified with greater than95% accuracy. Subjects tendedto respondby naming a mechanical event that produced thesound, an reported theirexperiences in terms of sensory

224

2 2ti qualities only when source recognition was not possible. VanDerveer (Note 1) also found that confusion errovkin identification tasks and clustering in sorting tasks tended to group acoustic events by common temporal patterns. For example, hammering -Fwas confused with calking,and the scratching of fingernails was confused with filing,but hammering and walking were not confused with the latter two events.

Those results sUppOrt the general claim that sound in isolation permits accurate sidontification of classes of sound-producing events when the temporal structure of the sound is specific to the mechanical activity of the source (Gibson, 1966b Schubert,1974; Warren & Verbrugge, in press). If higher - order information, is found to be specific to events, while values of lower - or%r variables Per se are not, then it may be mare fruitful to view the auditory systen,as being designed for the perception of source events (via higher-order acoustic functions), rather than for the detection of quasi- stable sound element,. Schubert (1974) put this succinctly in his "Source Identification Principle* for auditory perception: "Identification of sound sources, and the behavior of those sources,is the primarytask ofthe lauditory] system" (p. 126).

Thil general perspective on auditory perception is coming to be called "ecological acoustics," on a direct analogy to the ecological optics advocated by Gibson (1961, 19660) as an approach to vision. The ecological approach leads to rusearch that 'is similar in many respects to the work summarized above on speech, music,and- animal oommunioation. In general terms, the strategy for research is to identify the higher-order prOerties that are defined over the oourae of a natural sound-producing event, and then to assess the ability of Irate:tors to, utilize that potential information. A physical analysis of the source and its behavior Wan essential part of the strategy, both for identifying acoustic variables that might otherwise be misses!, and for boundingthe set ofpossible variables in a principled fashion. Firehermore, demonstrating the, specificity' of acoustic structure to thesource event is crucial to avoid the introduction of ad hoc processing principles to buttress perception (Shaw, Turvey, & Mace, in press).

In addition to offering a research strategy,tae ecological approach seeks a generalanalysis of events anda description of the perceptual information specific tu them. This analysis is based on the observation that identifiable objects participate in identifiable transformations or "styles of change" (Gibson, 1966s; Pittenger & Shaw, 1975; Shaw & Cutting,1980; Shaw, McIntyre, & Mace, 1974; Shaw & Pittenger, 1978; Johanomon, Hofaten, & Jansson, 1980). More precisely, a class of objects may be functionally defined in terms of structure that is preserved and destroyed under certain transforma- tions. The information that specifies the :find of object and its properties under change is known ai the structural invariant of an event (Pittenger & Shaw, 1975). Reciprocally, the information that specifies the style of qhange is known as the transformational invariant, which may be described jointIy in tort!!! of the geometric properties that remain constant and those that vary systematically under change (Pittenger & Shaw, 1975; Mark, Todd, & Shaw, in Press).

By such an analysis, events can be organized into equivalence classes ("types") that are defined by sets of transformational and structural invari-

225 22? 4.11"111111.111.111111111.1.19.111mill"."

40 (b)

Figur* 1.Cartoons of tbsrawiihanionl 'yenta of (0) bounoing,and (b) break- ins.

228 ants. Consider, for oxamplOthe style of change of walkingand the.aftimals with appropriate limb structures,or the etnle of change of bur and the objects that are combustible under terrestriil conditions. Witn any equiva- lence class of uV'ents, an'indefinite mistier of particular instances ("tokens") are possible, each preserving the invariants of the class but individuated in space add timethat Charging rhino, or this burning bridge.- Forany perceptible event,informationabout its class membership is, by hypothesis, available by means ofnithe physical Media it disturbs. An analysis of such potential information and its relationship to the source event isa major goal of ecological physics.

The present paper explores the nooustie aspects of droppinga glass object and its subsequent bouncing or breaking. Bouncing and breaking are two distinct styles of change that may be wrought overa variety of objects, such as bottles, plates, pottery, and other ceramics. These two events would be identified by Gibson (1979, pp. BA-B5) as changes of the layout of surfaces due to physical force--bouncing as a case of successive collisions, breaking as a compound event. of surface rupturing followed by successive' collisions (and possible further rupturing) of the broken pieces. The two styles of change constitute disjoint equivalence classes of events:*the bruiting and the bouncing of semi-elastic objects. By acoustic and perceigual studies of these events, wehope to discover the trspsformational invariants that distinguish them. (Structural invar' 's specifying individual properties of the objects such as site, shape, and serial, and individual transformation properties such as height of drop, forge of impact, and angle. of impact,are discussed in Warren & Verbrugge, in press.)

Consider first the mechani:-...1action ofa bottle bouncing on a hard 'surface (see Figure la). Each collision consists of an initial -impact that briefly sets the bottle into vibration at a set of frequencies determined by its size, shape, and material composition. This is reflected in the acoustic signal as an `initial burst of noise followed by spectral energy concentrated at ,a particular set of overtone frequencies. Over a series of bounces, the collisions between object and ground occur with declining impact force and, decreasing.("damped") period, although some irregularities in the pattern may occur due to the asymmetry of the bottle.The spec* components are similar adruss bounces,relative, overtone amplitudes varying slightly due to the varying ibrientations,of the'bottle at impact. (The spectrum within each pulse fe, quasi-stable, and is conventionally described in terms of spectral perks in a cross-section of theisignal.) These acoustic consequences may-be described as a single damped quasi-periodic pulse train in which the pulses share a similar cross-sectional spectrum (Figure 2a). It is this single pulse train that we suggest constitutes a transformationbl invariant of temporal pattern- ing for the bouncing style of change.

Turning to the mechanical action of breaking (Figure lb), it is evident that a catastrophic rupture occurs upon impact. Assuming an idealized case, the resulting pieces then continue to bounce without further breakage, each with its own independent collision pattern. The acoustic consequences appear as an initial rupture burst dissolving into overlapping- multiple damped quasi- periodic pulse trains, each train-having a different cross-sectional spectrum and damping chara4Leristic (Figure 2b). We propose that a compound signal. consisting of a noise burst followed by such multiple pulse trains, consti-

227 229 Ne

Figure 2. Spectrogram* of natural tokens: (a) bouncing, (b) breaking.-

23 tilts& the transforiational invariant that specifies the oompai LISevent of breaking.

Aside from these aspects of temporal pattern and initial noise, certain crude spectral differences between breaking and bouncingnen be observed by compering spectrograms of-natural oeses.(figure 2). FL-et, the ertpills of breaking events are distributed' across a wider range-of frequeiic s n are those. of bouncing events. Second, the' overtones of breaking'argede r in the frequency domain. Both of these properties can be traced Co .the contrast between a single object in vibration and a number of disparate' objects simultaneously in vibration.

The following experiments test the hypothesis` that temporalpatterning, rather than some quasi-stable spectral property, distinguishesbreaLing from bouncing. By superimposing recordings of individual pieces of broken glace, cases of breaking and bouncing can be constructed from a common set of piece' by varying the temporalcorrespondencesamong their collisionpatterns. Experiment 1 establishes 'that listeners can identify naturalcases of breaking and bouncing with Thigh accuracy.. Experiment2 eleMi0OSperformance on constructed cases that include an initial breakage bv.)t, andcompares it with tbi results for natural sound. Finally, Experiment 3 assesses'the contribu- tion of the burst by removing it from both natural and constructedeases of breaking.

EXPERIMENT 1: NATURAL SOUND

The first experiment determines whether natural sound providessufficient acoustic information for listeners to distinguish the events of breakingand bouncing.

Method A

Materials. Natural recordings were made of three glass objects dropping onto a concrete floor covered, by linoleum tile in a mound-attenuatedroom. Using.* Crows $OO tape deck, the sound of.each object was recorder when the object was dropped from 'a 1 ft. height (1-nunoing), and when it Was dropped from a 2 to 5 ft. height (breaking). This yielded three tokens of bouncing and three tokens of breaking. The objects used and the durations of the bouncing (BNC) and breakfAN (BRK) events areas follows: (1) 32 oz. jar: BNC1 : 1600 lased, 22 bounces; BRK1 s 1200 cosec. (2) 64 oz. bottle:

BNC2 s 1600 *Sec. 15 bouncesf BRK2 550 Imo. (3) 1 litre bottle: mic3 1300 msec,' 17 bounces; BRK3 2.700 cosec. The recordings were digitized at a 20 kHz- sampling rote using the Pulse Code Modulation (PCM) system at Haskins Laboratories. A test tape was then recorded; it contained 20 trials of each natural tok4 in,randogized order for a total of 120 test trials. A pause of 3 sec occurred between trials, and a pause of 10'sec occurred after every six trials.

Aubjects. Fifteen graduateland undergraduate students participated in the experiment for piquant or course credit. A Procedure. Subjects 6r* run in groups of two to fiveiand listened to the tape binaurally through headphones. Theywere told that they would be

ti 229 231 hearing recordings of objects that had either bounced or broken after being dropped,but were told nothing about the nature of the objects involved. Their three-choice task was to identify each event 43 a case of breaking or bouncing, with a *don't known option, by placing a check in the appropriate column on an answer sheet. The *don't know* category was included to minimize the possibility that subjects would choose one of the two event categories even when they found the sound unconvincing, as tney would be forced to do in a two-choice situation. They were specifically instructed to ignore the nature of the object involved %and attend to *what's happening to it.* Subjects received no practice trials or feedback. There was a shorl break after 6G trials, and a test session lasted about 20 min.

Results arid ascussisi

Overall performance on natural bouncing tokens was 99.3$ correct ( *bounc- ing* judgments), and on breaking tokens was 98,5$ correct ( *breaking* judg- ments). "Don't bow" responses accounted for 0.2% of all answers on bouncing tokens and 0.7% on breaking tokens. Everimentt clearly demonstrates that sufficient information is present iaf:the7lcouatic signal to permit unpracticed listeners 'xi distinguish the eventi'of-bouncing and bfeaking.

EXPERIMENT 2: CONSTRUCTED SOUND

Experiment 2 attempted:to model the time- varying information contained in natural recordings by using constructedcasesof ,bouncingand breaking, eliminating average spectral differences between the two.

Method

Materials. Tokens intended to model bouncing and breaking were con- structed by the following method. Initially, individual recordings were made of four major pieces of glass from a broken bottle as each piece was dropped and tounced separately from a low height. These recordings were combined in two ways using the PCM system.

To construct a bouncing token, the temporal pattern of each piece was adjusted to matcha single master periodicity arbitrarily borrowed from a recording of a natural bouncing bottle (Figure 3a). This as accomplished by inserting tape hiss between the bounce pulses in recordings of the individual pieces: After all four pieces had been adjusted so that their onsets matched the. same pulse pattern, they were superimposed by summing the instantaneous amplitudea of the digitized recordings. -The result was a combined pulse pattern with archronized onsets for all bounces, preserving the invariant o: a single damped quasi-periodic pulse train to model bouncing (Figure ea).

A breaking token was constructed by readjusting the same four pieces to match four different temporal patterns (Figure 3b).Aa a first approximation, these masterpatternswere borrowedfrom measurements of four different bouncing bustles. since the likely patterns of individual pieces of.glass in the courseof ural breaking were unknown. These four. patternswere initiated simul Ianlkusly, preceded by 50 sees of noise burst taken from the originalrupture. The resultaftersuperimposingthese four independent

230

234! 2 (a) 1

Figure 3. Soheaatic diagrams of oonstructad toi.bins, combining yourcomponent pulse trains:.(a) bouncing, with synchronous puts* onsets. (b) breaking, witWinitial JOISO burst and asynchronous pulseonsets. 231 (a)

(b)

Ana

Figure 4. SpOetrOrillISof constructedtokens: (a) bouncing (SYN1), '(b) breaking (OW).

232 234 temporalseries was a combinedpatternwithasynchronous pulse onsets, preserving the temporal invariant of multiple damped quasi- periodic, pulse trains to model breaking (Figure 4b). Note that the variables of temporal patterning and initial noise were confounded inthis experiment. To the experimenters' ears the burst improved the quality of apparent breakage, but this assumption was later tested in Experiment 3.

Hence, the only differences between bouncing and breaking tokens were in the temporal registration of pulse onsets and the presence (or absence) of initial noise. The range and distribution of average spectral frequencies were similar in the two cases. Mean overall durations differed, averaging 1107 msec for bbuncing tokens and 733 msec for breaking tokens; in general this factor is related to object elasticity and the height of drop, and it is therefore nota likely candidatefor- information specific to astyleof change.

There were certain problems with the constructed cases. The process of superimposing pulse patterns also summed tare hiss and hum, sq that background noise was increased, Moreover, constructing the sound of a. single bouncing object by combining the spectral components of four independently bouncing pieces produced in one case a noise that sounded more like metal than glass material; nevertheless, the temporal invariant was preserved. The other two bouncing tokens sounded like glass. Finally, the use of only four pieces of glass to simulate breaking, the assumption that their periodicities were akin to those of a bouncing bottle, and the assumption of no further breakage after the initial catastrophe, were all- rather arbitrary idealizatlons. Nevertheless, if temporal patterning constitutes information for breaking and bouncing, subjects should be able to make reliable judgments of these tokens.

Three cases of bouncing and three corresponding cases of breaking were produced by this method, each pair constructed from.a unique set of original pieces and matched to a unique set of master periodicities. The original objects, and the durations of the bouncing or synchronous (SYN) and breaking or asynchronous (ASYN) tokens constructed from their pieces, were as follows! (1) 32 oz. jar: SYN1 = 1000 msec,8 bounces; ASYN1 = 950 msec. (2) 32 oz. jar: -SYN2 = 1400 msec,13 bounces; ASYN2 = 650 mec. (3) 64 oz. bottle. SYN3 = 920 msec, 9 bounces; ASYN3 = 600 msec.

Subjects. Fifteen graduate and undergraduate students participatedin Ithe experiment for paymentor course credit. None of them had participated in ;Experiment 1.

Procedure. The procedure was the same as that in Experiment 1, with the exception that trials were presented in blocks of ten rather than blocks of six. Instructions to the subjects were the same, including the instruction to ignore object properties and concentrate on the style of change.

Results and Discussion

The results foreach constructedtokenappear in Table 1, and are consistent, with the predictions of the temporal patterning hypothesis. Bouncing judgments, onsynchronous tokens averaged 90.7%, and breaking judg- ments on asynchronoustokens averaged 86.7% (these judgments being treated as

t.! 233 235 Table 1

Percent Correct Judgmants on Constructed Tokens

(Expei,iment 2) 4

Token Bounclhg Breaking,

1 03.7 69.0 / 08.7,1.7) (17.8, 2.9)

2 94.3 -71.3

(18,9.2.0) (14.3, 4.4)

3 84.3 99-T e 4 (16.9,3:7) (19.9. 0.3)

Overall 90.7 86.7

Note: Mean scores -And endard deviations are in parentheses. Scores are barei Jn 20 trials per subject per cell, N=15.

000.P.11.....b.limmmimmaimmm...... , iwimel.mowalmawwm Nolimnswomaw.

236 "correct".). "Don't know" answers accounted for 0.1% of allresponses on bouncing tokens, and 1.3% on breaking tokens. Considering the artificial nature of the constructed cases and the idealizations involved, their identi- fiability may be considered quite high.

, Some departures fryer.- the general pattern were found for token ASTN2, which :showed a markedly lower level of correct performanoe (71.3%), a higher standard deviation for "breaking" judgments, and a relatively high rate of "don't know" responds (4.0%). These differences were primarily due to the low-performance of five aubjecta who averaged 44% correct on this token, while the performance orthe other ten averaged 85.0%. It may be noted that the summed background noise wan greater in ASYN2 than in the other two breaking teaes. The fact that overall performance in this case was well above chance ,indicates that even the tokenofloWest identifiability contained sufficient infbrmation to distinguish the two events.

It is not surprising that some tokens of constructed br;s::ng are more convincing than others, as there are certainly some natural in tee that are more compelling than others. The differences among tokens may involv th the spectraldiabinctiieneadof thebrokenpisteS--andtheir--diegree°of asynchrony. In pilot teats, when the ,pulses from,a :single piece were adjusted to four different periodicities, the resulting sum of the four patterns did not specify breaking. Apparently, distinct spectral properties for each piece are necessary tp distinguish multiple pulse trains (see Figurelb). The reciprocal bouncing case, inwhich successive pulses .were borrowedfrom differentbottles, Similarly failed toyielal d coherentbouncing event. Hence, spectral similarity across pulses appears to be necessary to specify the unity of a singliNpulse train.

In general, performance with constructed sound was :similar to'that,found for natural sound inExperiment 1. Although performance with constructed cases was somewhat lower than with veatural oases, the differencei were only about' 10% on average, and performance with both natural and,construated cases waS far above the chance level. The data permit us tb conclude that temporal patterning is compelling information for breaking and bouncing. In other words,oonstruoted and natural- cases appear to itinnify thesame general equivalence classes of breaking and bouncing events to a listener.

EXPERIMENT ,: INITIAL NOISE SPECIFIC TO RUPTURING

To isolate the variable of single vs. multiple pulses and assess the importance of the initial noise burst in specifying breaking, the first two experiments were repeated with initial noisesamoved from both natural and constructed cases. Pilot work indicated thatthe -first 80 msec of the signal in natural breaking and bouncing oases was not, in isolation, :sufficient to distinguish the two events. Experiment 3 was conducted to determine whether the initial noise,in additionto. the pulaepatterns, was necessaryto distinguish breaking from bouncing.

235 23 , Method

Materials. Roth natural and constructed tokens were prepared. Bouncing tokensQiir-The same as those used in the two previous experiments. For breaking oases, the constructed tokens from Experiment 2 were modified by reaming the 50 memo of initial noise thAt had been added for that experiment. The natural breaking tokens from Experiment 1 were modified by I-moving the naturally occurring burst. Sinoe there. was no distinct boundary in the natural waveform betweentherupture burstandthesubsequent collision pulses, the natural tokens were edited by removing noise identifiable on an osoillogram and by 14.stening for the lissome -of a burst. This tenbnique resulted in the removal of the initial 80 mseo_from BRE1, 50 mien from BBI2, and 60 asec from BRE3.. In sus, there were three tokens of bouncing and three tokens of breaking (without initial noise) in both the natural and constructed conditions.

Sub ants. Thirty graduate and undergraduate students participated in the experimentor payment. None had participated in the previous experiment.

Procedure. The natural and constructed conditions were run separately with two different groups of 15 subjects. The procedure and instructions ware the same as before, with each group receiving 120 randomly ordered trials in blocks of six.

Moults and Discussion

The results for each token appear-in Table 2. With natural cases, the overall performance was 99.8% oorrect-s:7 bouncing tokens and 96.0% correct on breaking tokens; with the constructed eases it was.93.0% for bouncing and 86.7% for breaking. These resq/ts-were nearly identical to those of Experi- ment1 with natural sound and 'Experiment 2 with constructed sound. "Don't know" answers accounted for 0.0% 'of all responses on natural bouncing tokens, 1.0% on natural breaking tokens, 2.0% on constructed bouncing tokens, and 4.0% on constructed breaking tokens.

Hence, removal of initial noise from .breaking tokens does not reduce their discrisinability. Finding this result for the natural canes indicates that the burst is not necessary to distinguish the two events. The same finding with constructed cases demonstrates that variation in the temporal patterning of pulse onsets is alone sufficient to discriminate breaking and bouncing.

However, we say question whether pulse patterning alone is sufficient to specify' a breaking eventi in isolation; Following the test sessions, a number of subjects in Experiment 3 reported that natural and constructed breaking cases without initial noise often.provided weak instances of the event, some sounding sore like "wind chimes,"."bells," "spoons dropping," or *ice cubes in a glass " ---in othef words, like multiple collisions of initially independent objects. Others reported precisely what was presented: T--7-1-7pieesalling after the break, without an initial crash." Although the acoustic structure was sufficient to distinguish the event of breaking from tnat of bouncing, and not ambiguous enough to merit a "don# know," it could nevertheless specify wind chimes, not breaking, glass, when heard in isolation. Since breaking is a

236

23 Table 2

Percent Correct Judgments on Natural end Construotnd Tokens

Without Initial NIAM(EXPW1114Ot 3)

Natural Constructed

Token !punning !bunnies Brooking

99.7 93.7 94.0 83.7

(19.9. 0.3) (18.7. 2.4) (1NAL 2.2) (16.7.- 2.9)

99.7 97.7 97.3 76.7

(19.9. 0.3) (19.5, 1.6) (19.5f 0.9). (15.3. .1)

3 100.0 96.7 87.1 99.7

(20.0, 0.0) (19.3, 2.3) 07.5, 2.7) (19.9, 0.3)

Overall 99.8 96.0 93:0 66.4

Note: Neap 'scores and,standard-dsviations are in parents's. Scoresare based on -20 trials Oer subject par nell, Ns15 intheNared. condition and 1615 in tbi Constructed condition.

.1 ...... """ aso.e.whwmoommr...... m....

237 239 3

compound event, it is not surprising that the oausal transition from one to many pieces must be specified by an initial rupture noise.

In general, these observations are oonsistent with our original hypo- thesisthatbreaking is specified bya complex acoustic configuration, consisting' of an initial noise followed 'by multiplequasi- periodic pulse trains. Further work remains to be dons to determine whether the initial noise is necessary for identifying breakage under conditions less constrained

. than in the present experiment. /

GESERAL DISCUSSION

The preceding experiments have attempted to determine whether- higher, - o;der, time-varying properties oonstitvto effective acoustic information for the events of bouncing and breaking. Th1 results show that differences in the temporal 'patterning of component pulU onsets are sufficient to perceptually, dietingnish the two events,, with or without an initial burst. These temporal invariants override any contribution of average spectral properties in distin- guishingthe events. The- result, provide evidence thatcertain damped periodic patterns; plus initial noise, constitute transformational invariants that specify breaking and bouncing to a listener.

However, if these temporal patterns are to convey the distinct events of breaking and bouncing, they must be oarried by signals with certain spectral properties. Specifically, a single damped quasi4elodio pulse train must be of oonstant resonance if it is to cohere as the_bounoing of a single object. Reciprocally, multiple damped qUasi4oriodid pulse trains must have different frequency spectra if they are to separate perceptually as indendently bouncing sherds, which-together specify the breaking of an object into pieces. Hence, a- combinationoftemporal and Aspectralpatterns constitutes the information necessary and sufficient to specify breaking and bouncing.

The amplitude end periodicity requirements of such patterns in bouncing events were considered in two simple demonstrations worth mentioning here. Iterating a_recording of one bounce pulse to match the timing or a natural bouncing sequence produced a clear bouncing event, although the usual deolin - ing amplitude gradient was absent. However, adjusting the pulse pattern to create equal 100 mem intervals between pulse onsets, thereby eliminating the damping of the periodic pattern, destroyed the effect of perceived bouncing. The rapid staccato sound was like that produced by a negentropic machine, such as a jackhammer. A damped series of collisions, as constrained by gravity and the imperfect elasticity of the system, appears necessary to the information for bouncing. Experiments are in progress to assess the efficacy of period damping in specifying elasticity or *bounciness" itself.

The experiments exemplify an ecological approach to auditory perception, seeking to identify higher-order soodstio information for complex events. The acoustic consequences of two distinct mechanical events were analyzed for their temporal and spectral structure, and the invariant properties sufficient to convey aspects of the events to a listener were empirically determined. Stich work is preliminary to modeling auditory mechanisms capable of detecting. these invariants (see Mace, 1977).

238

2-10 REFERENCE NOTES

. VanVerveir, N. J. Acouspli info tioifor event poroupon. Paper preointed, at theCilebration n nor IST-IWanorJ. Gibson, Cory+ell Univer$10, Jane 1979. 4-- VanDervesAN. J. Crlfusion errors in identification of environmental sounds. Pei* prainferliir faireetlig of the Acousillial Society of America, Cambridge, Massachusetts, June 1979.

REFERENCES

Brown, C. H., Beecher, N. D., Moody, D. B., & Stebbins, W. C. Localization, of primate callaby Old World monkeys. WOW* 1978,201, 753454. Fant, O. Acoustic: theory ofaigproduarairitio&gm: Mouton, 1960. Fletcher, 11.713amms, the timbre at musical tones and their relation to the intensity, the frequency,andthe overtone structure. Journal of the Annuetinel &mist of rice, '1934, 6, '59-69. Fow2471777-1. 321424-*:Chiln spigh-predUotion. 'Unpublished doctoral dissertatioCWiers ty ofTionneitiout, Storrs, 19/7.- Fowler, C. A., Rubin,P.,lames, & Turvey, M. I', Implications for spiel* production of a worn theory of action. InB. Butterworth Md.), L produotion. Newlirk: Academic Press, 1980. 01 son, J. J. *gloat optics.Vision Research, 1961,1, 253-262. 0 son, J. J. The problem of toniRrAirgstimulation and perception. Journslsof Psychology, 1966, 62, 141-149. (a) Gits47-17-1.the senses oonsidet4I as peroptual systems. Boston: Houghton Mifflin, fou-m- Gibson, J. J. Theecological approach to visual perception.. Boston: Houghton MifITIM 19M, Orson, D. M. An intriduction to burin',Hillsdile, NJ: ErlbSum, 1976. Orey, J. M. eildimensional perceptual soiling of musicaltimbresv--Journal of the Acoustical Sociiitof America, 1977, 11, 1270-1277. Grey:1:7117, A Cordon, . . PorciWireffects-of spectral modifications on menial Umbras. Journal of the Acoustical Society of America, 1978, 6Z 1493-1500. 40hinason, O., Hasten, C. von, & 'Janson; O. Eventperception. 'Annual Review of,Psphologi, 19110,A I, 27-63. Helmarti7H. L.. Y.von On the "lineations of tone * anioLosioal basis for the theory of music-a. J. Ellis, Trans.). New York: Dover, 1959. (Originally published, 1863). latish!, N.-Ethological aspects of auditory pattern recognition.- In R. Held, H. Leibowitz, & H. L. Timber (Eds.), Handbook of sensory physiolngyelv. VIII: Perm ion.. New York: Springer-Verlag, 1978. Libermi per, F. S., Shinkweiler,D. F.,& Studdert-Kennedy, N. Perception of the speech oodi.Psychological Review, 1967, la.411-461. Liberman, A. M., Delattre, P.'C., uirstman, L. T.,11773Oper, F.3; Tempo of frequency change as a cue for distinguishing classes of speech sounds.

Journal of Experimental Psychology,-1956, o 127 -137. Luce,, 1- M. Physical correlates of brass-instrupent tones. Journal of the Acoustical Annietz of America, 1967, 42, 1232-1243. Mace,11.1C07-enilaselotratoirciW peroeiaig: Ask not what's inside your head, but what your hied's inside of. In R. E. Shaw & J. Bransford

229 241 (Eds.), Perceiving, acllak atii\ knowing: Toward an *oologioal . Hillsdale, NJ: Erlbsomf 1977.

Mark, . J. T.,6 Shaw,R. g:\The perception of growth: How dl:r ent styles ofchangeare distligulaked. Journal of Extoie_aental ,: in press. Fe _seam ___ .., 9 . PI* Ef- NM $ So .9 Moody, D. K., & Stebbins, V. C. Neural leterallsation of species-a Icific vocalizations by Japan- ese mmeseoes roseate). So Once, IV6, 202, 324-327. Pittenger, JO Bog M71171. races as viscal.elastio*vents: implicetionsfor a theory of nonrigid she peroeption. Journal of tirmalli Psychology:Human Perceptioniuld L rformance, 1979, 1, 374-

' . ?leap, 11. The ear as eirequency analyser. Journal othe Acoustical Society of America, 1964, .31, 1626-1636. Seldairoi7E7r, & Corso, J. F. Timbre cues anu the ide tification of musioal instruments. Journal of the Acoustical Society coAmerica,1964, Ab 2021-2026. Schubert, E. D. The roleofauditory perception in langue prooessing. In D. D. Duane & M. B. Rawson(Eda.), Roadies roe ti and language. Baltimore: York Press, 1974. Shaw,N. t., & Cutting,J. E. Cuesfrom anecological theory of event perception. In U. Bellugi & M. Studdert-kannedy (Eds.), Signed language and spoken language: Biological constraints.on linguistic form. Weinheim: Verlag ahasie, 1980. Shaw, R. -E., & Pittenger, J. B. Perceiving change. In H. L. Pick & E. Saltzman (Eds.),' Modest' of arceiving and processing intonation. Hillsdale, NJ: 'Erlbaum, 1978. 'Shaw,. R. E., McIntyre,M., & Mace,W. M.' The role of symmetry in event perception. In R. B. MacLeod & H. L. Pick (Eda.), Perception: Essays in honoof James J. Gibson. Ithaca, NY: Cornell University Press, 1974. .Shaw,R. I., 171ily, WT.; &Mace, M. M.. Ecological psychology: The consequence of a commitment to realism. In W. Weimer & D. Palermo (Eds.), Cognitionand the symbolic processes, II. Hillsdale, NJ: Erlhaum,n press. Verbrugge, / R. R., Rekord, B., Fitch, H., Tuller, B., & Fowler, C. A. Perception'of speech ,events: An ecological approach. In R. E. Shaw & W. M. Mace (Eds.), Event perception: An ecological' perspective. Hillsdale, NJ: Erlbaum, in press. Warren, W. H.,& Verbrugge, R. R. Toward an ecological acoustics. In R. E, Shaw & W. M. Mace (Eds.), Event perception: An ecological perspective. Hillsdale, NJ: ErlbaumT,ii-T-Press.

242

240 MI AND SIGN: SOME COMMENTS FROM THE EVENT PERSPECTIVE. REPOS FOR THE LANGUAGE WORK GROUP OF THE FIRST INTERNATIONAL CONFERENCE ON 15eirPERCEPTION., Carol Fowler+ and Brad Rakerd++

Signedand spoken utterances have at least two aspects' that are of interest to a perceiver. First of all, they have a physical aspect, tfie significance of which is given in the lawful relations among utterances, the information- bearing media structured'by them, the perceptual systems _of observers .andlisteners. Secondly,they havehavealinguistic aspect,the I significance of which. is given in the conventional or ruleful relations between forms and meaning.1 In part because our time was limiteC'and in pact because so little work'has been done on the conventional significance of events (as opposed to the'inttinsic signifioanoe (cf. Gibson, 196632), our work group chose to focus on the physical aspect. Nevertheless, it will be seen-thrit we did have a speculative word or two to say about the origins of some linguistic conventions, and we would draw attention to the report of the Event/Cognition.group, as well as to Verbrugge's remarks (discussant for the address by Studdert -Kennedy), for more elaborate treatments of this important topic.

Roughly, our daily discussions centered around five topicareas: (1) useful descriptions of signed and spoken events; (2) natural ponetraintion linguistic form; (3) the origins of 'some linguistic conventions;(4) the ecology of conversation; and (5) concuoting language research from an event perspective..Our review of these topics tall highlight what seemed to us to be the obvious applications of the event approach and also its apparent limitations.

USEFUL DESCRIPTIONS OF SIGNED AND SPOKEN EVENTS

We considered the minimal linguistic event to beehutterince, and identified as %ph anything that a talker (signer) might chooseG7Freign). Obviously, this definition is unsatisfactory on a number of grounds; however, it does identify the minimal event orsinterest as being articulatory (gestur- - al) in origin, and rejects as irrelevant those properties of articulation

*The conferencewas held June 7-12, 1981, in Storrs, Connecticut. The participants in the Language Work Group were Hollis Fitch, Carol Bowler, Nancy Frishberg, Kerry Green, Harlan,Lane, Mark Mandell, Brad Rekord, Robert Remez, Phi#ip Robln0.Judy Shepard lemt., Winifred Strange,- Michael Studdert Kennedy, Betty Tuller, and Jerry Zi rman. +Also artmouth College. ++Also University of Connecticut. Acknowiedgmedt. This work was supported by MICRO grant HD-01994 to Haskins Laboratories. 0

(HASKINS LABORATORIES: Status Report on Speech Research SR-67/58 (1981)1 241

243 !(gesture) that are not intended to ha... linguistic 'significance. We first attempted to verify that utterances have the ,"nested" chanter of other ecological events and that the postings are peroeived; next wi,coosidered how to discover the most useful characterization of utterances for;the investiga- tors' purposes of studying them as perceived events."

Signing and Speaking as Nested Events

Natural events are nested in the sense that relatively sloWer, longer - term or more global. events are composed of relatively faster; shorter-term or more local' ones. For example, a football game is * longer-term event composed- of shogter-tern plays. It is clear from'research..particularXy Johansson's (et.. 1973,1975) on the perception of form and motion in point -light displays-Abat viewers are sensitiveto the' nested structure of events. In his address to this conference, Johanseon described an example of light points pined on a rolling wheel. When a single point is affixed to the rim, a viewer who sees only that point gets no salsa of the wheel's motion; instead, the percept is of d light moving in a oycloid pattern. Homiest, when a second light is attachid, now, to the hub of the wheel, the viewer perceives rolling instead of the cycloid motion. Thus, two appropriately placed lights provide sufficient optical information to.specify the distal event of rolling.

In geometric terms; rolling involves two kinds of motion: Ithiahtla and rotary. These are temporally nested; a series of rotations occurs as the wheel translates over the ground plane. The translatory component affects the behivior of both light points (since both are attached to the translating wheel),- but only the point on the rim is affected by the rotary component as well (since it rotates about the point on the hub)..' Apparently, perceptual sensitivity' to the translation Cu specified by the correlated activity of the two lights)foams a sort of *backdrop* for detection of the rotation; in -essence, the translational component is *factored out* of the cycloid movement of the rim light, thereby revealing its rotational component.

Now let us consider whether, these observations apply to signing and its perception. In American Sign Language (ASL), signs are specified by three lroporties: the shape of the hand or hands-; the place of articulation of the sign within a signing space, and the movement of the hand or hands.Signs con be Ihflecited by modulating the movement. For example, a 'distributional' inflectionIndicating that all of theindividuals under discussionare affected by some act is produced by sweeping the arm through the central body plane. By signing, say, GIVE while- slicing such an are sweep the signer communicates GIVE TO ALL CF THEM. Likewise, a'temporal"inflection, one indicating the -repeated occurrence of an act, is produced by rotating the wrist about s body-centralised point; with this gesture, GIVE is modified to mean GIVE AGAIN AND AGAIN.

Finally, and most importantly for the current discussion, several inflec- tions can be:superimposed. Carrying our previous ample s Step Turthet, it proves possible to sign the Complexly Inflected verb GIVE TO ALL OF THEM AGAIN AND AGAIN. This is accomplished by rotating the wrist while the arm sweeps through its aro. Notice that when this is done, the optical information for the 'temporal'infliction undergoes a radical transformation; the wrist no

242 2 4 .1 longer rotates aboecie tingle point fi at the,oenter of the body, but rather about a *aRoving with the ins are. It appears that observers = Atiat the sweeping motion (combos to all points of the hand, wrist, and arm) as both speoiteing one signed event (the 'distributional' inflection), and as providint a loving frame of reference for the interpretation of the nested 'temporal' infleotion.

Spoken language, with its syntactic unitsphonological segments, sor- Opines, words, .and sydtiotio phrase.-.0d its metrical moltssyllables, feet, phonological phrases (see Selkirk, 1980)1 -lends itself readily to the phar- acterixation 'nested."We will take an examOle of nested ertioulatorrind proeived events from a relatively low-level phenomenon, coertioulation. Th flueni, the productions of successive phonetic segments overlap such that the articulatory gestures often satisfy requirements for twoor more segments at the same time. Typically, for example, unstressed vowels coarti- Guist* with the stressed vowels of adjaoint syllables. Itis therefore tempting Z.) think of the production of the unstressed vowels as being nested within that of their Aniseed oounterparts, and to think of unstressed vowels r as being perceived relative to their _stressed-vowel context. This way of thinking is promoted by findings (Fowler, 1981) that um...;t soma conditions listeners behave as it they hawser oontributionstof the ()outset when Judging the quality of unstressed vowels-- more or less as Johanason'a subjects seem to have factored out common and relative motions in an optical display.

In trisyllabic nonsense words with medial /0/, the medial vowel coartiou- lates with both of its flanking stressed vowels such that the F2 of OP/ in, AO instance, /ibebi/ is Mellor then it is in /ub*bu/. (Compatibly, F2 is high for /1/ and low for /u/.) When extracted from their contexts, the medial /4/ syllables do sound quite different, but when presented in context they bawd alike-i.more alike._ in fact, than do two acoustically identical /be/ syllables ertsented in different cootexts.

A nested-events account of these data would hold that when the /be/ syllables are extracted from the content in whioh they had been produced, the perceiver has ro way to detect (factor out) the contribution that the stressed vowels have made to that - parts n of the acoustic signal ins which AD/ oorrilatis preckeinate over the corrplates of otherseeats --no more than Johansiun's subjects can separate the rotary from the tranalstory components of MCMANints when they see just the OW light on the rim of a wheel. Presentation in the oontext of flanking vowels, on th=e other hand, allows the perceiver to feator out components in oammonwith those vowels,and to recognise the quality of What is left. This leads to the perceived identity

1/ ?di n/ tubebtaituvi tole syllables the' =different trisyllable contexts).

011107AMMON' ..1111.00. Several theories of speech peroeption--inaludi4 Gib (1966) and one more familiarto speech investigators, the motor thew e.g., 1,0erasn, Cooper, Shankweiler, & Studdert-(ennedy, 1967)--adopt a consistent with an event perspective: namely,that the peiceived oategore ofspeech_ere

143 articulatory. in origin. Gibson's View is distinguished from the other by its workingassumption that theperceived articulatory categories .are full' reflected (however' complexly) in the acoustic signal and hence need not be reconstructed by articulatory simulations. What are the reasohstor-thkej major disagreement among theorists Who agree on the question of what in peivelved? One reason. may bs that they differ in terms of how they describe the acoustic signal tie even the articulatory event.

In speach, artioulatory:activitiee and their eqoustio correlates are both richly structured, and oacasqusstly can be described in a great many different ways.. Each of the various descriptions say be most appropriate for certain purposes, but none is privileged for all purposes., and just one or a few are privileged forthesesof understanding vhet atakeris doing and whya listener peroelves at he or she perceives. A theorist who is convinced that the ,acoustic support for perceptual oategorins is inadequate may be correct; but, alternatively, she dr he may have selected a description of articulatory events and their acoustic corriratea that fails to reveal the support.

There are many reasons 'Ay a particular description might be tnaepropri - ate for aiding our understanding of speech perception and production. It -Putessia-44-97-3-1---esamp---Laforma tion about the positions and velocities or the elementary particles of a peg and pegboard are invoked to explain shy a square peg wort fit in a round hole). Or, for any level of detail, it could be inappropriate because it classifies oomponents-In ways that fail to capture the talvor's organization of them or the listener's perceived organization. Appropriate descriptions of vocal activity during speech, then, must capture the organization imposed by the talkee those of the acoustic signal must capture those acoustic reflec- tions of the articulatory organization that are responsible for the listener's ,perception of it. - Agawate descriptions of perceived articulatory categories. In some time frame, talker eight besaid tohave raised his larynx (thereby

decreasing the volume of the oral cavity , abducted the vocal folds, increased their stiffness, closed the lips, and wised the body of the tongue toward the palate. This description listsaset of apparently separate articulatory iota. In foot, however. the .first three of them have the joint effect of *chivying voicelessness; these and the next.lip closure, are the principal components of /p/ articulation; and all five acts together are essential to the production of the syllable /pi/, Thus, the aggregate of occurrences in this time frame have a coordinated structure of relations something like the following: It(lerynx raising, vocal cord abduction, voiai cord stiffening)(lip closure))(tocgue -body gesture)). liv

If an _investigatorsettles forthefirst descriptionnra list of the lugs or individual arttoulgtorg--then, from his perspective. inforestioe the phonetic segments of an utterance is already absent and he cannot expect to find any evidence of it in the acoustic signal. Consequently, when a perceiver recovers segments in speech,the recovery must be considered reconstructive, Motor- settling for this conclusion, however, the investiga- tor can try standing back a little from his flat perspective on the vocal tract activity andlookingfor organizations Nang gestures tbat were not initially apparent. Theseorganizationswill only be revealed 064 a temporal ( 2 hi perspective broad enough tbt coupled changes among lhe coordinated structures can be observed. Certainly it tbtr4 are ocordinatire articulatory relations among geetures and iftherelations have acoustic reflections,then the listener is likely to be sensitive to the coordinated strustura, rather than to the unstructured list of vesture. frog- which it is built, for by detecting the structure of the relatitms :song these gestures, the listener detects the teakerts structure-:.here the foatbral and phonetic segmental structure of the r utterime7-which is what she 'orhe Mat doif the utterance is tp be mane ,400d .

ta support of this general approach to phonetic perception, there is cone evidence that listeners do perceive aggregates of articulatory &l'4,3as if those acts were coordinatedaalpOltal structures. One exampleof this involves the perception of voicing. Following the release of a vo celess stop consonant, the fundamental frequency-((0) of the voice is relotiv y high and falls (Halle & Stevens. 1971; lambert, 1978; °hale. 1979).Folio ng a voiced stop,fo is low and rises. Although the reasons for this differential patterning of f are not fully understood (Bonbon',1978; Hatbert, 9hals, & Ewan, 1979), it is generallp agreed that it results from the timing of certain laryngeal aCuAbasots and firm-certain aerodynamic coodition!! teat ~,he talker establishes ifi maintaining voicelessness or voicing during the production of the component (cf.- Abrem-cm & Lisker. 11). That is.the talker does not K in ito product ahighfallingf1con IC followingrelease of a /p/. th.sto.k, piens to maintain voicelessness of the coosonant and an unintended consequence of that effort is a pitch perturbation folJowingrelease. Compatibly. listeners do not norially hear this pitch difference as such (that is,the' do not notice a higher pitched vowel following /p/ Vain ibi). Instead.in the contest of a preceding amp, a high falling fo conteur in a vowel may. serve as Information forvoicelessness of a preceding pOnSOilcnt (liaggerd. Ambler, & Callow, 1970; Fujimura, 1971). even though, when removed from-the coop:mantel context, the f"contours 9 areperceived as pitchchanged (Mahon', 1978; Mosbert it al., 1979).

. Also suggestive of tho perceptual extraction of coordinated artict.,xtory structures are occasions mhen the perceiver !WOW, tc be misled. Ohala (1974, ihipress) believes that certain historical sound changes can be explained as results of listeserso having failed to recognize some unplanned articulatory co sequence as unplanned. An tample relatedtotkAtfirstone is the develoment of distinctive tales in certain languages. Theselanguages evolvartroa earlier torsions without tone systems. but with biL:ihttionb in voicke between pairs of consonants. Oeertime,thefo different' ,Bust 4 deselbed between syllables differing in initial stop voicing became eregger- it'd and the voicing qistinction was lost. Male's interpretation of the source of the change I: that in these languages ners tended to hear Ine fo .diffehances on the post - consonantal vowels ifpitc had been a controlled erticulatcry variable, rftther than an is rolled onsequence of adjustments related -to- vc iing. Therefore, when thee" individuals produced the vows';. the'accreted controlled (and larger) differences is fo of voiced

and voiceless iitiel syllables. Eventually, because the fo differences, had bocce." higray dis e. the now redundant vc,cing distinctIon was lost and t,he col that foo.ngfly had differed in voicing of the initial consonant ncti4differed in tone. According to Ohala, this process ocourrod ins the separation of Punjabi from Hindi.

/0111..111.1.111MIN. wiromwrimmlom X47 AP

116116wwl Appropriate descriptions of the acoustic signal. Because ver1 little is known about how a talker organizes articulation, descriptions of the acoustic signaluseful for purposes of understandingperception cannot beguided strongly by information about articulatory categories. However, we do know enough to recognize that the usual method of partitioning the acoustic Signal into segments or into "cue' " can be improved on. Such partitioning often obscures the existence of information for the phonetic segmental structure cf speech because the struoture of measured acoustic segments is not coextensive with thephoneticstructure oftheutterance. For onething, phonetic segments as produced have a time course that measured acoustic segments do not reflect. The component articulatory gestures of a phonetic,segment gradually increlae in relativ, ,-ominence over the residual gestures for a preceding segment and consequently tut acoustic. signal gradually cornea to reflect tho articulatory character of the new segment more strongly -han that of the old one. Thus, phonetic 3 ate are not discrete on the time axis, although they can be identified as .ally separate and serially ordered by tracking the waxing and waning of their predominancein the acoustic signal(cf. Fant, 1960).

acoustic segments, on the other hand, are discrete. (Such segments are stretches of theacoustic signal bounded byabruptchanges in spectral composition.; 10 individual acoustic segment spans far less than all of the acoustic correlates of a phonetic segment and, in general, it reflects the overlapping production of several phonetic segments (cf. Fant, 1960). Looking at the signal asa series of discrete acoustic ,segments,then, obscures another way of looking at it: as a reflection of a series of overlapping phonetic segments successively increasing and declining in prominence.

Partitioning acoustic signals into acoustic segments also promotes as- signing separate status to different acoustic "cues" for a phonetic feature, even though such an assignment tends to violate the articulatoryfict that many of these cues, no matter how distinct their acoustic properties may be, are inseparable acoustic products of thegestures for a, sVhile phonetic segment ! Lisker & Abramson, 1964; Abramson.& Lisker, 1965). Ale findings of "trading relations" among acoustically distinctive parts of the speech signal indicate that these cues are not sepJrable.for perceivers any more than they can be for talkers. For example, certain pairs of syllables differing on two distinctacoustic dissensions- -the duration of a silentinterval following frication noise and the presence or absence of form-nt transitions into the following vocalic segment--are indistinguishable by listeners- (Fitch, Halwes, Erickson, & Liberman,1980). Within limits, a syllable with a long silent interval and notransitionssounds thesame as one with ashort silent interval and transitions. It is as if the transitions inthe second syllable are indistinguishable fromtheextra silenceAn. the first. A perceptual theory in which thin observation isnaturaland expectedis difficult td imagine--unless the theory recognizes that detecting acoustic segments per-se 13 rot all there is to perceiving speech. We would argue that the cues in these,stlmul are indistinguishable to the degree that they provide informa- tion about the same articulatory event,- Thus, 24,4sec of silence "trades" with the roman% transitions because both cues specify production of /p/. It is our view that source-free descriptions of acoustics will never succeed in capturing what a speech eventsounds like to a perceiver, because it is information carried in the signal, not the signal itself, that sounds like somtLning.

246 44

NATURAL CONSTRAINTS ON LANGUAGE FORM

Shifting perspectives from ongoing articulation and its reflectionsin proximal stimulation, we considered how,over the long term, properties of the articulators in :speech or of the limbs in signmay have shaped linguistic forms. Similarly,we considered how perceptual systemsand acoustic or optical media, with their differential tendencies to be structuredby various properties of distal events, may have shaped the forma of sign endspeech.

Sign has several regular properties suggestive of natural constraintson manual-language forma. One (Bettison, 1974; cited in Siple, 1978, and Klima & Bellugi, 1979) takes the form of a symmetry constrainton two-handed signs; if both hands move in the production ofa sign, the shapes and movements of the two hands must be the same and symmetrical. This constraint is compatible with anecdotal evidence (from novice piano players, for example), andmore recently with experimental evidence (Kelso, Southard, /5 Goodman, 1979; Kelso, Holt, Rubin, & Kugler, in press) that it is difficult toengage in different activities with the two hands. One reason for this may 'be a tendency for actors to reduce the number of independently contr:lled degrees of freedom in compleaLtaske by organizing structures coordinativ'ely (e.g., Turvey,1977). Kelso's emporia,,s suggest that the two arms and hands tend to\be organized coordinatively even when such an organ :ion would seem unnecessary or even undesirable (Kelso et al., 1979; Kelso et al.,.in press); when subjects were required to engage in different activities with the two handsor arms, the "different" movements tended to retain similar properties.

A second constraint, called the "Dominance" constraint by Mattison, may have at,similarigin in general constraints on movement organization. For signs in which just qne hand moves and the other hand servesas a base for the movements (a place of articulation), the base handmust either have the same cenflburation as the moving hand or dne of a very limitedset' of other configurations.

An example of a constraint in spoken languages they be the tendency for syllable structures to respect a "sonority hierardhy" (e.g., Kiparsky, 1979) whereby sonority (roughly, vowel-likeness) increases inward toward the vowel from both 'syllable edges. Hence,for example, /tr/, a sequence in.which sonority increases from 'left to right, is an acceptable prevocplic sequence, but postvocalically the order must be /rt/..

As for language.feMtures owing to properties of perceptual systems and stimulating media, Lindblom's proposed .constraints on the evolution of vowel systems prOvid6 an example in spoken languages (1980;see also,Bladon & Lindblom, 1981). Lindblomhas proposed that vowelsystems maximize the perceptual distances among member vowels. Based on estimates of distances among vowels in perceptual space, he succeeds iMPpredicting which vowels will tend' to occur across langulges, invowel systems of various sizes. This

implies a constraint on phonological inventories that perceiver* be able .!.o recover distinct phonetic segments when distinct ones are intended. Talkers cannotelect torealize distinct phonetic segments by using articulatory gestures(howeverdistinctthey may be themselves) that fail to leave

247 distinguishing tracesinthe acoustic medium or in the neural medium of perceptual systems. (Analogous articulatory constraints also operate to shape vowel systems. Thus, the relatively densely populated frorit vowel apace and the sparsely populeted back vowelspace doubtlessreflect the relatively greate- agility and precision of movement of the tongue tip and blade compared with the tongue body.)

4 Lane proposed that similar perceptual arrticulatory constraints may shape the evolution of sign inventories. Facial expre: on provide informa- tion in ASL and perceivers tend to focus on a signerjow Ice. This creates a gradient of acuity peaking at the face. According 06-SiOle (1978), signs made well away from the face tend to be leas :similar one to the other than signs made in its vicinity; in additibn, two handed aligns made in the periphery are :subjectto theSymmetry and Dominarceoonstraintajust described,which provide reditsdancy for the viewer who may not see them as clearly to signs produced near-the face. Lane suggested that the relative frequency of signs in various locations in signing apace might be predicted jointly by the acuity gradient favoring signs located near the face and a work-minimizing constraint favoring signs closer to waist level.

THE ORIGINS OF SOME LINGUISTIC CONVENTIONS

As we noted earlier, the conventional rather than necessary relationship between linguistic forms and their message function is central to the nature of language, freeing linguistic messages from having to refer to the here and now, and thereby allowing past, future, fictional and hypothetical events all to be discussed. For Gibson, this property of language removes it from the class of things that can be directly perceived:3

[Perceptual cognition] isa direct response tothings basedon stimulus information; [symbolic cognition] is an indirect response to things based on stimulussourcesproducedby anotherhuman individual; The information in the latter 13 coded; in the former case it cannot properly be called that (1966, p. 91). N,N ..m. The study group did not discuss language compreherisiori, in relation to eventgory,ory, perhapa because event theory currently offers little guidance on that s Howevar, there was discussion of the origins of some linguistic )12bject. conventions. Several examples suggestan, originof certain cOnventialel relations as elaborationr of intrinsic ones. The example of tonogenesis given earlier illustrates this idea. Ohala proposes that in some languages distinc- tive tones originated as controlled exaggerations of the pitch perturbations on vowels caused by the voicing or voicelessness of a preceding consonant.

A second example is so-called "compensatory lengthening" (e.g., Grundt, 1976; Ingria, 1979)--a historical change whereby languages concurrently loat, a final consonaqin some words and gaineda phonologicaldketInction of /towel length, with fTe words that formerly had en cfb in a consonant now ending in a phonologically long vowel. In spokes languages, the measured length of vowels shortens when they are spoken before consonants (e.g. Lindblom, Lyberg, & Holmgren, 1981), Of course, since vowels coartioulte with final consonants, thismeasured shortening may notreflect "true" shortening; presumably, acoustic evidence oftheircoarticulatingedOes is obscuredbyacoustic

248 2 5o correlates of the overlaid consonant. In any case, the "loss of a final consonant leads to measured lengthening of the vowel. If that unintended lengthening was perceived as controlled lengthening(just ashypothetically, uncontrolled pitch perturbations were perceivedas controllc4 nitoh contours), and'was subsequently producedas a controlled lengthening, it couldserve as the basis for a phonological distinction invowel length.

A final exempta in speech apparentlyhas an analogue insign. Some speech production investigators have proposedthat vowels and consonants are produced by relatively separate articulatoryorgnnizations in the vocal tract, and that vowel production saygo on essentially continuously during speech production, uninterrupted by concurrently producedconsonants (e.g. Oilman, 1966; Perkell, 1969; Fowler, 1980). These proposals are based on observations that voliel -to-vowel gestures thatoccur during consonant production (Ohman, 1966; Perkell, 1969) sometimes lookvery itmilir to vowel-to-vowel gestures in VV sequences (Kent & Moll, 1972). Also, a relatively separate organization of voweland consonant production with continuousproduction of vowels may promote such linguistic conventions as vowel infixingin consonantal roots in Arabic languages (McCarthy, 1981) and vowel haiconyin languageh including Turkish (and in infant babbling [e.g., Merin,1980])..

Vowel infixing will provide an illustration. In Arabic languages, verb roots are triconsonantal. for example, the root 'ktb'means "write." Verb voice and aspect (e.g., active/passive,perfective/imperfective) are indicated by morphemes consisting entirely ofvowels. in McCarthyia raven`: anslyois (1981), the consonantal roots and vowelmorphemes are interleaved according to specifications of a limited numbe of wordtemplates and a small nuiber of principles forassigningAlpcomponent Segments tothe templates. Some derivationally related wordrin Areli0are: katab, ktabab, kutib, and kuutib. The consonantal rootin each oast,is'ktb'; the vowel morphemes are 'a' (perfective, active)and 'tit' (perfective, passive); and the relevant word templates are CVCVC, CCVCVC, CVVCVC (whereC is a consonant and 'V is a vowel). The general rules for assigning rootsand morphemes to templates. are (1) to assign the component segments left to rightin the template, and (2) if there are sore C slots than consonants orsore V slots than vowels, to spread the last consonant or last vowelover the remaining C or V slots. The only exception to this generalization is '1' in out',which is Obeys assigned to the right -moat V in the template. 'Belowaretwo illustrations ofverb formation according to this analysis:

k4timb 01_ kumfib k4I ir k 4 b ..!

Oil discussedan analogous system in ASL. A particular root morpheme can oe associated with different sign templatesto express derivationally or inflectionally related 'versions of the morpheme. The templates have slots for locations (L) and movements (m), where theformer specify person and numbeY andthe latter pacify aspec;. Totake an example, tne template that underlies I GIVE HIM is (LM + ML). Movements and locations are assigned to it as in McCarthy' '--4nalysia:

251 hr + OIL 1.

swillmove ') )" A template can oiude several .L's and 44's--more, in fact, than there are distinct movements in a root morpheme. In this case, the movements of the root morpheme are ',signed Jet to' .right in the template untilthey. ere exhausted, and then the right-most movement spreads to fill the empty N slots. In I GIVE TO X, T, -AND Z the .amplate =V.-assignments of root morpheme movements are as follows:

I X Y.. i

1 1 I I

U4 XL 10.. . NL

lows moveg

Analyzed this way, the meshing of movements and locations is Moiler to the meshing of vowels and oonsonants in languages with infixing and gel harmony systems. This leads:to the-question of whether the system is favored as a linguistic delete*, sod, it so, whether it is favored by virtue of the signer's motor organisation for producing it. It might be favored,for example, if the sot .r organization u6derlying sign production readily produced 110,1c repetitions qf a movement (as time uodarlyirg stepping, breathing. .obewing shid perhaps vewel production do), *and if minimal adjustments to the organization would enable shifts-in location without changing the form of the movement.

THE CiI COT OF _CONVERSATION

A scan of the varieus=confeeence add s shows the close tiei between the event approach and Gieson's' theory04perception. Indeed, Gieson's.radical.rethiel g,of-class c-perceptual problems includes the notion that a percelier doss opsrstls in a series of "frozen moments,' but rather in ,an ongoing-Weam of events. We therefore thought it useful to examine the 'ecologx of ..,ne..41e40-event; aed,in doing so we were reminded that both the speaker .and the listener (the signer-and tte observer) have a stake in the smooescof a communtaitive episode. This is 'a rather unique circulate:nee; it invites botiu'e fiddlier analysis of the perceiver as an active seeker of Anformatioh AC.K diqson,..-19661, and at less familiar analysis of the producer Is an eotite provider of informational support.

As to the perceiver's active role, we first of ell se. behavior Intended tom' enhance signal detection: titer head' can be rotated to an optical orienta- tion, the souroe_can be approached, and so ,on. Beyond this there con be direct communicative intervention; rthat the perceiver can oake requests for repetition or clarlfioation. On the producer's part, there are the well- known redundancies of language: In Assence, more than enough information is provided to ensure the accuracy of communication. 04-o, perhaps to avoid syntactic ambiguities,the talker say provide careful prosodic marking foe clause boundaries and the like (e.g., Cooper& Pec.na-Cooper, 1980). An

251 234 finally,a talker will enunclatemore clearly (anda signer gesture sore distinctly) when there Isa great distance to the perceiveror when the 'mugs context sakes %particular word unpredictable.

CONDUCTING LANGUAGE,RESEARCN FROM AM EVENT PERSPECTIVE

rt there is a the tothe event conference, it is surely that psychologists have paid G60 little attention to the systematic (and potential- ly informative) nature of change. With rwipeot to speech, this can be seen in the Gammon practice of decomposing the speech stream into a succession of discrete acoustic segments (e.g., release bursts, elpirstion, foment trans' tior,s, and the like). A 440141 literature speaks, in turn, of the difficulty in bringing these acoustic semments into solo correspondence with linguistic segments. in the case of sign, the perceptual significance of change was overlooked lb early attempts to devise sign glossaries; investigators were preoccupied with cataloguing the festival properties of hand shapes and roiled at first to recognize the importance of the gestures being made with the hands ((lima A BOIA -71419794 chapter12 and passim; Bellugi A Studdert-tennedy, 1960).

The members of our group were agreed that A shift of emphasis Is needed: investigators of both speech and sign should give greater consideration to the tike-varying properties of those events. To begin with, this will involve focusing on the dynamics of the source events themselves. These investiga- tions of thebOtir#211'can suggest oospetible and appropriate perceptual ens - 14"101- MN* rolcrf,i; w2rk tam* ibhiinneen's poM-light techniques to study thy coordinated activities of the signer, end the perception of levies' movesents andinflections (e.g., Palmer, Bellugi, 4 Lutes-(lriecoll,1981), seems to offer promising beginnings for such an approach.

Alternatively, analyses of time-varying properties of thesignal may provide guidance its understanding thewaysin which stalkeril- and signers structure articulatory activity (of, Fowler,1979; Toiler 4 Fowler,1980), On this issue, our group spent a good deal of Use considering the recentwork of Oemoz. And Carrell (1981; Remez, Rubin' itCarroll,1981). They have shown that the phonetic eessage of A utterence can be preserved in sines y. approsimetions that reproduce only the centerfrequencies of Its first three foments, These stimuli have no short -tier acoustic constituents that vocal tracts oan produce and consequently lock many ecowitio elements heretofore identified by Investigators as 'peach cues. Presumably the stislulI are intelligible because incarnation is provided by reistiont among the three sieusolds, information that the sinusoidal varietiona are compatible with a vocal Origin,

These finding; ifs important not because they 400tatIO neon to be Asimportant to oeeoh perception. After ell, naive listeners did notspontaneously hear *', sineweves as phonetic events. instee4, the findings are Leportent in showing that ttee.warring properties of the signal 000 provide sufficient information forword acrd segeent Identification in speecn. In this respect.es News: *Adk4bin pointout(Vote 1), their demonstration is closely emalogous to Johanss6nds demonstrations with point- light dispisys of sowing figures in t.oLh demonetretione, f;toonigp pre4w1404 essentimi Information for form, The concluision we draw from all of the examples considered here is that students oflanguage ibould not be misled by thetimeless quality of linguistic forms. Signing and speaking are Coherent activities and natural classes of events. It is only reasonable to expect that the signatures of these events will be written in time as Well as space.

REFERENCE NOTE

1. Remez, R. E., & Rubin. P. E. The stream of speech. Paper distributed at the First 4International Conference on Event Perception. Storrs, Ct., June, 1981.

REFERENCES

Abramson, A. S., & Lisker, L. Voice onset time in stop consonants: Acoustic analysis and synthesis. In D. E. Cummins (Ed.), Proceedings of the 5th International Congress of Acoustics. Liege: Imp. G. Thone, A-51, 19g: Battison, R. Phonological deletion in AmeriL-, Sign Language. Sign Language Studies, 1974, 1,01-19. Belluir717: & Studdert-Kennedy, M. Signed and oaken language: Biological constraints aoli,lingui!tic form. Berlin: Dahlem-Konferenzen, 1980. Bladon, R., A Lindblom, B. Modeling the judgment of vowel quality differ ences. Journal of the Acoustical Society of America,1981, 69, 1414 - 1422. Cooper, W. E., & Paccia-Cooper, J. Syntax and !mgt. Cambridge, Mass.: Harvard University Press, 1980. Font, G. Acoustic theory of speech production. Netherlands. Mouton, 1960. Fitch,H. L., Halwes,T., Ericksou,D. M.,& Liberman,. A. M. Perceptual equivalence of two acoustic cues for stop consonant manner. perception and pAys4ALL14e, 1980, 27. 343-350. Fowler, C. sPeroeptuel lenters" in speech production and perception. Wo!pticn A Psychophysiel, 1979, 25. 375366. Fowler, C. Coartioulation and theories of extrinsic timing control. Journal of Phonetics, 1980, 8, 113-133. Fowler, C. Prauction and perception of coarticulation among stressedand unstressedvowels. Journal, of averst and !tearing Research, 1981, 46, 127-139. Fujimura, O. Remarks on stop consonants: Syntheais experiments and acoustic cues. In L. L. Hammerich,R. Jakobson, &E. Zwirner (Eds.), Form and substance: Phonetic and linguistic papers presented to Eli Fischer- itrgensen. Copenhagen: Akadomisk Forlag, 1971. Gibson, J. J. The senses selnIciEssi as perceptual eyatems. Boston, Maas.: Houghton-MirfliS, 144. Grundt, A. semetsuan in phonology: Open syllable Inaning. Oldbmington, Ind.: Indiana University Linguistics Club, 1916. Haggard, M. P.. Ambler, S., A Callow, H. Pitch as a voicing cue. Journal of the Acoustical 2p21111( of America, 1970. 47, 613-617. Nall.. N., A Stevens, k. A note on laryngeal featuree. Quarterly Pro real Report& Research Laborator of Electronics (ftsaachutetts Institute of

Technology), 19T1, ioi, 1 -213. Hombort, J.41. Consonant types, vowel quality and tone. In V. Frokin (Ed.). Tone: A linguistic sEley. sew York: Academic Press, 1978. Hethert, J.-M., °hole, J.. & Ewan, W. Phonetic explanation for the develop-

252 254 sent of tones.Lan slam 1979, 15. 37-58. Ingria, R. Compensa ory lengthening as a metrical phenomenon. Linguistic ljat_ldr. 1979, 11, 465-495. Johansson, G. Visa. perception of biological motion anda model forits analysis. Perception & fanhophysics, 1973, 14, 201-211. Johanason, G. Visual motion perception. Scientific American,1975, IR(6), 76-89. Kelso, J. A. S., Holt, K., Rubin, P., & Kugler, P. Patterns ofhumap interlimb coordination emerge from the properties of nonlinear oacillaL tors: Theory and data. Journal of Motor Behavior, in press. Kelso,J. A. S., Southard, D., & Goodman, D. C the coordination of two- handed movements. Journal of Experimental Psychology:Human Perception and Performance, 191T57129-238. Kent :17, & Moll, K. Tongue body articulation during vowel and diphthongal gestures. Folio Phoniatrica, 1972, 24, 278-300. Kiparsky,.P. MXMI 'structure assignment is cyclic. Linguistic Inquiry, 1979, 10, 421-441. -.. Klima, E. S.;& Bellugi, U. The signs of language...Cambridge, Mass.: Harvard University Press, 1979. Liberman, A. M., Cooper, F. S., Shankweiler,D. P.,& Studdert-Kennedy, M. "--- Perception of the speech code.Psychological Review, 1967,74, 431-461. Lindblom, B. The goal of phonetics andits unification and application. Phonetics, 1980, 1, 7-26. Liqdblom, B., Lyberg, N., & Holmgcen, K. Durational patterns of Swedish phonology: Do they reflect short-term memory processes? Bloomington, Ind.: Indiani-bniversity Linguistics Club, 1981. Lisker, %., & Abramson, A. S. A cross-language study of voicing in initial stops: Acoustical measurements. Wori, 1964, 20, 384-422. McCarthy, J. J. A prosodic theory of nonconcatenative morphology. Linguistic

Inquiry, 1981, 12, 373-418. . ) Menn, L. Phonologilca theory and childphonology. In' G. Yeni-Komshian, J. F. Kavanaugh, & C. A. Ferguson (Eds.), Child phonology (Vol. 1). New York: Academic Press, 1980. Mali., J. Experimental historical phonology.' In J. N. Anderson & C. Jones (Eds.), Historical linguistics II: Theory snd description in phonology. Amsterdam: North,Hollafld Publishing Co., 1974. Ohala, J. The production of tone. In V. Fromxin (Ed.), Tone: A linguistic

survey. New York: Academic Press, 1979. .. Ohala, J. The listener as a source of sound change. In M. F. Miller (Ed.) A Papers from the parasession on language and behavior. Chicago: Chicago Li guisErnarity, in priss.

Ohman, . E. G. Cosrticulation in VCV 144erances: Spectrographic measure- lien s. Journal of the Acoustical Society of America, 1966, 39, 151-168. Perkell, J4 S7FiVii5rogy of speech production:Terutirsand implications of a quantitative cineradiographic study. Cambridge, Mass.: M.I.T. Press, 1969. Poizner, H., Bellugi, U., & Lutes - Driscoll, V. Perception of American Sign Language in dynamic point-light displays. Journal ofExperimental Psychology: Human Perception and Performance, 1981, 7, 430-440. Putnam, H. Reductionism and the nature of psychology. Cognition,1973, 2, 131-146. Remez, R., Rubin, P., & Carrell, T. Phoneticperceptionof sinusoidal signals: Effect? of amplitude variation. Journal of the Acoustical

253 do.

dro

Society of Amerita, 1981, Lb 3114. (Abstract) Rome, R., P., Pisani, P., & Carrell, T. Speech periception Without traditional speech cues.Science, 1981, 21S, 947-950. Seliirk,E. O. The role of inz-are categUFTes inEnglish word stress.

Linguistic, Inquiry, 1980, 11, 563-605. , Siple, P. Linguistic and psychological properties of American Sign Language: An overview. In P. Mine (Ed.), Understandin ItiE language through Age language research.- New York: AcMI 0 *SS. 1978. Tuller,' B.,&Fowler, C. A. Some articulatory correlates of perceptual isochrony. Perception .& Psychophysics, 1980,,2',, 277-283. Turvey,,H. T. prelim varies to a theory of action with reference to vision. In R. Shaw & J. Bransford (Ede.), Perceiving, acting and knowing: Toward an ecological psychology. Hillsdale, N.J.: Erlbetsa,,',1977.

FOOTNOTES,

1We do not intend to suggest by therword conventional ghat the linguistic aspects of utterances have been established by popular' accliim. We intend only to distinguish the linguistic aspects from the physical aspects in terms of their "relative arbritrdrinese." Let's consider a physical example first: the articulatory and acoustic differences between the version; of /d/ in /di/ and /du/ arer necessary and lawful, given the nature of vocal tracts. This contrasts with the aspiration difference between the versions of /p/ in "pie* and "'spy," the production of which is required of English speakers only by convention or rule. We know this to be the case since speakers of other languages (e.g., French) make no such distinction.

2In Gibson's view:

The relation of a perceptual stimulus to its causal source inthe environment is of one sort; the relation of a symbol to its referent is of,another sort. The former dependi on the laws ofphyaica and biology. The latter depends on alinguisticcommunity, which isa unique invention of the human species. The relation of perceptu- al stimuli to their' sources is an intrinsic relation such as one of projection, but the relation ofsymbols to their referents is an extrinsic one of social agreement. The conventions of symbolic speech must be learned, but the child canjust about as- easily learn onelanguage as another. The -connections between stimuli and their sources may well be learned in part, but they sake only one language, or better, they do not'make a language at all. The language code is cultural, traditional and arbitrary; the connection between stimuli mndsour.ces is not (p. 91).

3t is interesting in this regard that theories of perception developed within the information-processing framework have relied almost exclusively on verbal materials as stimuli and propose that perception is indirect.

256 254 A FRICATIVE-STOP COARTICULATION: ACOUSTIC AND PERCEPTUAL EVIDENCE

Bruno M. Repp and Virginia A. Mann+

Abstract. Eight native speNkers of American English each produced 10 tokens of all possible CV, FCV, and yFCV_utterances with Vz DO br [u], F2 Es] or ED.and C * Et) or Ekl. 'Acoustic an showedthat the fomentent transition onsetsfollowing stop consonant release were systematically influenced by the preceding fricative, although there were large individual 'differences. In particular, F3 .A. Ind Fa tended to be higher following Es] than following 14]. r%1-coartioulatory effects were equally large in FCV (e.g., /stied) and VFCV (41.g.,-/maW) utterances; that is, they were not reduced, when a syllable bounudry intervened between fricative and stop. In a paraliel, perceptual study, the CV portions of these utterances(with release burs s removed to provoke errors) were 'presented to lieteners for i tification of the stop consonant. Thepattern, of place-of711r111imalation confusions, too, revealed coarticulatory effects due to-the excised fricative context.

INTRO&UCTION

In two previous papers ,(Mann 4 Rapp, 1981; Repp 1 Mann, 1981) we described an effect of a preceding fricative on stop consonant perception: ak. When a stimulus ambiguous.hetween Etta and (kW was preceded by a fricative noise aporoprisito for Es) (plus a brief ailanoevpropriate forstop closureL listeners reported-48W a.)re often than 140:0. A preceding ED noise, on the oor hand; had little effect on the perceived place-of stop articulation. In a ries of experiments, we eliminated several possible explanations of the contrasting effects of (s) and (3),su-ch as a simple response bias, auditory contrast, or direct cues to stop place of articulation in the fricative noise. We concluded that the perceptual contexteffect most likelyreflects listeners' exphotation of a coartioulsitory interaction between a stop conso- nant and a preceding fricative -- namely, a shift in plane of stop consonant articulation towards that of the fricative.

4 bib la NE 1 4. +Also Bryn Maur College. Acknowledgment. This paper is a revised and expanded version of a paper

presented atthe101st Meeting of the Acoustical Society of Ame-lcain , Ottawa, Ontario, May 1981. our research was supported by NICHD Grant HD01994 and EMS Grant 81105596 to Hatkine Laboratories, and by NICK0 Postdoctoral Fellowship H00677 to the secon4 author. We thank Christpe Cook and Joyce Schoenhtimer who ran subjects and scored data for the perceptual experiment. and Janette Henderson for doing some of the acoustic measurements.

[HASKINS LABORATORIES: Status Report on Speech. Research SR-67/68,0981)1

257 in our second paper (Repp A Mann. 1981), we reported data tut supported this hypothesis. Starting with fricative-stop-vowel utterances obtained (roe a single speaker, we examined listeners* stop consonant percer.ion after the fricative noise and the stop release burst had been removed. The stops in these truncated CV syllables were more often petceived as having a relatively forward place of articulation when the excised fricative had been (s) than when it had been (S). In addition, acoustic measurements of the same stimuli showed that the onset frequency of the second foment (F2) followingthe *too release was lowed by about 100 Hs in the context of Ea), relative to 1.51 context. A pot5ible difference in F3 onset In the opposite directionwas also Indicated. Thus, F2and P3 onsets Were more widely separated in (a) °caw than in Illcontext --a pattern that is consistent with the hypothesized forward shift in place of stop articulation following (s), considering the well -known fact that F2and F3 onsets are more widely separated in It41.i than in (Ss).

While these( data suggesced that fricative-stop coarticulation car, oc'ur, their generality was uncertain. In the present paper, we report acoustic _measurements andsiipplementary perceptual tests using utterances ollected from eight new 3coasiatS.

ACOUSTIC MEASUREMENTS

_.15m.15122. Four sales (AA. 11.. Vt)and four resales (vW SP, P FM), all native speakers of American English. 4ere enlisted. They included two senior phoneticians (AA. IL). an esperiznced speech scientist (F50, graJuste studentin phonetic, PP),inc fourspeakers with litt 1r7.4.-mol trwinc

Faale a The Set of utterance, Us.1_

itA4 co Om) ga tku 3tcl Ica '1k0.) aka a.) snta si shit

IO-St0.1 $sda uatv) utdu

alga %11k1.); ,iJ,$u asnda fitui J3hdu Z4 k oa?,44 is kv; 45hgu Utteranees. TM experimental utterances included *II possible combina- tions of an imitial vow$ (1131, OA, or absent), a fricative Us). ED. or absent), a stop (Et)or ,(10), sod a final vowel (t.h) or (4.4), vita the restriction tkat the two vowels, if present, be the same.. Table I lists the individual utterances, both to *boost** notation and in the spelling In which they were read by the. =bleats."kite that the stop consonants, elthougp unaspirstad in both My and VPCY =Keay, were phootologically voiceless in utteranoee- where they were part of a sylleble-initial fricative-stop oluater,_but phonOlogieelly voiced in VFCV Utterances where they were in syllsble-initial positice.1 Thus, this set of utterances enabled us to assess oat oely the-effeet of a preciedlog fricative oo stop artioulatico but Also the sensitivity ofthat Whet to thepresence of anintervening syllable boundary.

Ten randomized lists of them utterances were typed on a sheet or paper. The lists included four,other utterances (103, 1.10.).[ by]. and (Jul) who' anslysti is will not report bare. The C! syllables ((W. (a42. Etu). ticul) were added after speakers VII and SP had been recorded; thus. CV data were available for sic /*camera only.

Recording aozi_dure. The utterances were produced in a sauna roof Nom In front of a Shure dynamo microphone and recorded on a Crown 800 fop, recorder. Speakers were-giveo sample pronunciations by the experiaenter and were instructed to readat an even Roe andas naturally as possible. Speakers varied in their assignment of stress in the disyllabic (VFCV) utterances: Three (AA. IL, VW) stressed the sow= syllable wale the atner five stressed the -first syllable, This unintended variation in stress offered the Oppanunity to observe any possible effects of this variable.

Nyasoramot procedure. Individual uiterences were input fray audio tape to a Federal tiA-64 spectrum analyser. The results of the spectra, analysis were stored in the isemory buffer of a GT-40 computer and displayed on hemlett-Feakard osoillosodpe. By ustus cursor below a spectrogram of the whole utterance,individual time trams scold be =loved whose smoothed average swarm was displayed above the spectrogram, while the corresponding portion of the digitized wavefore appsared on i wood screen. Thus.the SeloOt1015 of frames for spectral analysts...was guided by both waveform and spectrographicinforaatioo. Spectral Of0S,..00a10Mt were computed overa 25.6-esec time frame; the step size from oolv frame to the nest was 12.8 &sec. The speetrum was displayed as a point plot with a resolution of 40 Ht. Spertral peaks corresponding to ferments were deterained from this display by eye and noted do by_ hand. Appropriate adjustments were made for *symmetric shapes of format peeks; occasional multiple peaks dye to eloreant straddling two or mc. individual nersonice wereaveraged. In doubtfulcases.tne spectre of the preceding and following ties frames were taken as a guideline.

Seesuse of the laborious nature of this suir,uslprotedure, the see/sure- seats nad to De restricted to the best crucial aspects Of tne stimuli the °Ow trequenoies of F2 emd F3 tendin sem* cases. Fa) foliouing the stop release. Woo, tr.release Ivr,*of thestop casualty .° a ,nigol? rraimilar spectrum (espcially for alveolar stops). It vas 4gmortd, .sno eddsvrements were !Aeon from the first trade thatglowed aclear .tarmant pattern, normally including F1 (signifying the onset of voicing). Additional

;7` 7 1.4 1.4 1 1.014 1.4 1.0 1.0 1.0 LI ti Fg MHZ) figure 1.format transition bottoms for individual ape.eraproductios o tice4, Etta. and Mu),averaged over rive different oontesti and depicted. aa trajectories lei the fz.ieplan..Dour for ma sr *lasingfromspeakersAA, U.,and duetounreliable F oassuronects.

2 Gn 4 meNsurements were taken Boot the-nexttwo frames (only from the nextframe in the case of speaker AA whoa:. utterances .ere the first measured). so that Torment transitions were tracked over approximately 50meet.

i Note that tole procedure prevides conservative estimate of coarticuli- Ory effects dr* to the tricative.since any,such effects are likely to be monk Ot000unced et the point of stop release and to dicreasm with distance from the releve. Although co/articulatory changes in the release burst may sexist (cf. Rap 4 /Win. 1981, for indirect evidence) they cannot be assessed eaeit, by the present method. Thus. the present investigatin was concerned solely wit' ,oartICUISOry ceanges in the fcrsent transitions followingthe release burst..

The raw Oats consisted of the frequencies of e And_ F3 (ane. eemetimes, Fe) for three (two in thecase of &A) consetutive Prameseof each Of ten tokens of 20 utterances (16 in the case of 1101and SP) produced by eight speakers. Hissing dit,, due to omisaions, mespronunciationc, engross acoustic anomalies .wore rere. A more con source of /missing data was the weaktess of some formants in certarh utterances. particularly F3in utterances containing 'kill. For sole speakers, as noted below. cc reliable data for Ficould be obtained in these instances.

Insults ILA Discussion

The measurements of F2 soft F3 in. FCV and VFCVutterance- were se'jected to separate 5e-way analysesofeiriencel with thefactors FIllable %ouhdary (FCV vs. lifFt-e), Fricative (151 vs. (JD. towel ((a.) vs. Cup, Stop ((t) vs. (k)) and Time C3 frames Speaker AA was notincluded in these analyses becaloSe of missing oats.

Figure1 gives an impression of the general frequency characteristics of the forwent transitions, regardleSe,of preceding context. The transitions are depicted as trajectories in the 1243plane, separately for each speaker's producAlonS of (t0). (x0.), (tul, and (ku], averaged ovee the file contexts: (-I. 01-). [i-3.(05)(or (us-)), end tm.S-) or (u1-1). Except?orthe few cases with missing data pointe, each trajectory 13 based on three points in time separated by1 ".8 maec, with 50 measurements per point. In the left panel, It cdn be seen that all speakers had falling F2,__transitions in both Itso,1 and 6011). but two different patterns emerged for F3: For five speakers (1.1., RN, VG. SP, PP), the F3transitions were falling for ttei and slightly fallinsIC for(koJi for the remaining three speakers(AA, Vii, F88),F3 was cosolete)y fiat for (tie) but rising for (kit.). These individual diixerences may indicate that the second group of speakets produced te.) with a relatively high Fl. In the righ'',ace), we see that all speakers -(except for 11,14 in (kul) showed falling F3trane...ons in (tu) but a fl t F3 in (ku). Note that after about 50 meec of formant movement, the foments or (to.) and (co.), an of (tu) and (ku). were still widely ,separated. suggesting rather long torment transi- tions and/or variations, ipvowel quality dEpendent onthe preceding stop (poreicularly le (uI).

Tbe trends shown in Figure 1 are all highly significant. and they are general4 1^ aaraimait with other data in the literature. We will. not dwel . on them here, a' our'primary concern was the effect of preceding fricative

259 context. We examined this effect in terms of the difference'in formant onset fr *kAes following (a) and IS).

'40 2 showsthesedifferences (in Hz)' for F2, "oken down by individual -an..e pairs and speakerebut averaged over thu three time frames. A wive difference indicates that F2 was higher following (s] than following Italics indicate differences that were significant at the 2 < .01 level in individual t-tests. It can be seen that, on the average, F2 was 4 Hz lower following (sI than following (31a nonsignificant 1ifference. Nevertheless, out of 64individualcomparisons,20 weresignificant--a proportion far exceeding chance. Of these 20 differences, 8-were positive and 12 negative,- which confirms the absence of any general trend. Since there was no pattern in the data,these significant coarticulatory effects must be considered entirely idiosyncratic.

In the analysis of variance, however,there was 1 significant triple interaction between Fricative, Stop, and Time, F(2,12) = 14.0,R. <.001: The F2 transitions of alveolar stops started an a.erage of 40 Hz lower in (s] context than in (S) context, and this difference diminished over time. The F2 transition of velar stops, on the other hand, was essentially unaffected by fricative context. No other effectinvolving the Fricative ,factor was significant, except for one marginally significant 4-way interaction with no clear associated pattern.

The F3 measurements are shownin Table 13. The picture wasquitk different here. On the average, F?was 46 Hz higher following(s] than following 03, F(1,6) = 51.8, < .001. OC the 64 individual comparisons, 28 werOsignifieant, and every single one of them was positive./ Thus, even though there was considerable variability across speakers and; tokens,the evidence for coarticulatory variation in F3 is very strong, e correlation het% the entries in Tables 2 and 3 is -0.07, indicating no re ation between con..xt-induced shifts in F2 and in F3.

The coarticulatory effect on F3 did not decrease over time, suggesting that fricative context may have influence1 not only the artculation of the following stop butalso that of thefollowing- vowel. ;wointeractions involving the Fricative factor reachedsignificance in /the analysis of -vuriance, One--between Fricative, Syllable Boundary, and Time, F(2,12) = 4.2, 2< .05revealed that the coarticulatory effect increased over time inFCV utterances but did not change at all over time in VFCV utterances. According to the second interaction - 'between Fricative, Vowel,Stop,! and Time, F(2,12) = 8.0, 2 < .01--tbe coarticulatory effect increased over Wile in Cul context and for alveolar stops in (0.) context, but decreased over tiAlle for velar stops in (0J context. The reasons for these complex patterns arrinot clear.

Table 4 shows the F4 measurements, which were obtainedfor only five speakerd and yielded reliable data for only about half the comparisons (mostly 'those involving stops preceding (0).2 Nevertheless, the pattern was very clear: Out of '9 individual comparisons,.18 were positive, and 13 of these wee significant. Thus, there was a clear tendency for. F4 to be higher -following Isi than following" [3]. This tendency peemea to be even stronger Cur that for F3, the average difference in Table 4 being more than twice as large (102 Hz) than that in Tablea. However, the changes in F3 and in F4 were not significantly Correlated (r = 0.21). 260 d 262 Table 2

Coarticulation Effects on F2: [FA-[F2]s in ,z.

Utterances Speakers AA LL RH VG VM SP PP, Rik., Mean

. . (sto.]-[Stl 10 -11 :II -65 32 -24 -21 Al 77 - [ske.]-[4k0] 36 -13 1 85 52 8 0 17 23 (stu]-(Stu]_ 98 5 -64 12 -76 -12 -47 -44 -8 [sku]-(Sku] 4 -20 76 7 49 -164 -44 -147 -30 bi.sto.)-(aStoJ 4 :-.35 Ai -57 -13 15 -3 -4 -20 (aska] -[asko.] 131 51 3 -3 !7 44 -33 40 46 EustuMuStu] -22 9 -81 :13, -15 4 21 -71 -30 (usku]-(ulku) -10 9 -8 -15 -31 -1 33 WI -8

Mean 31 -1 -22 -7 17 -16 -12 -24 -4

Note:Anderlines indicate difference is significant (2 < .01) byt-test.

Table 3 .

Coarticulation Effects on F3: [FA-[F3],i, in Hz.

Utterances Speakers AA LL RM VG VM SP PP FBB Mean

[sta.] -[Ste.] -20 101 (54) ill 43 37 27 117 50 Eskej..[Skej 86 1 76 61 -21 64 29 LI 43 Estul-[Stu] 74 12 igl 67 28 fa 75 A 66 tskul-tkui (82) 12 (19) 0 71 112 li (44) lestea-Wt.04 21: 33 -24 21. 12 a -1 145 43 [asko.)-(04k0.1 (60) 8 104 40 -55 11 12 45 37 Eustul-[uStu] 108 61 15 64 88 24 125 1 61 luskul-EuSkul 25 TO -29 25 Tg) 55 (22)

Mean 60 54 48' 50 8 43 .62"62 46

Note: Underlines indicgte difference is aignificApt < .01) by t-test. Differences in parentheses are basedon a small..... number of tokens only,

12 263 ,41

Table 4

Coarticulation Effects on 1F44_ [poi LH Hz.

Utterances Speakers AM Yft SP PP FBb

EstuJ-iStul 35 .17.in 47 L00-45I4u] 16 .133, 4441-DStAJ -1 211 filisk4J-WW II 27 (ustal-EuSta) 100 36 260 84 Luskul-Culkul 106 12, 4in

Mote: Underlines indiopte difference Is significant (2 < .01) by t -test.

-7144111.111tem....Inaleamienpl.inIMPOS.

Table 5

Confusion Matrices 'or Truncated Stops In (a) arld Eu) Context.,

Percent Responses

V (u)

INV" 0 Utterance 'le te4 0d0 0d0

VgitV) 16 13 55 10 6 6 5 80 8 1 -. ((f)tV3 16 9 52 17 6 3 6 80 9 4. E(a)kV) la 8 21 41 6 63 4 3 19 [(1)kil3 26 6 14 46 8 70 3 2 14 11 i(Vs)tV3 6 13 be 9 8 7 3 8* 5 MOM 9 10 63 12 6 3 3 87 6 ((Vs)kV) 10 10 32 42 6 52 5 5 29 LtYj)kV) 14 8 30 41 7 62 3 4 23

Mon..1104.12alte-11.14.1i ..-3.11.911.711war

26 ,1 A co!Perleon of the F3datafreeeachfricative context with the measurements for CV utterances did not confirm ICar expectation (bossed on the earlier perceptual data) that the coarticulatory effect would be primarily due to (a). On the contrary,thedata suggest that it Was almoet entirely due to (I). However, this difference was In large measure due to single subject (PP), and because this analysis could be done on five speakers'utterances the effects did not reach statistical significance.

We recognize that it is difficult to infer articulatory proceeles from acoustic data. Oven our hypothesis thatthe place of stop articulation Shifts towards that of the preceding fricative (Repp t Kann, 1981), one slant expect that the foreant transitions of a stop following (s) would be more (t;- like (indicating a forward shift) than those of a stop following ($). whIct would be sore(10-like(indicating a backward shift). Since(t) ha4 94.ewhat-higher F3 onset than (k) in both vocalic contests (cf. Figure 1), our finding of a higher F3onset following (s) is consistent with tnese expecta- tions. What is not consistent is (/) the absence ot any coarticulatory shifts in F2, partiAlsrly in(-u) context where (t) and (k) are characterized by wit..,4differing frequencies (cf. Figure 1), and (2) the finding of higher F4 onsets following is), for our data indicate that F4is considerebly higher it (kV) than in (tu), with less difference between and RAJ, Ire view of these ambiguities. we turned to a perceptual test in the hope that it might shed some light On the direction of tne shifts in stop place OrtiCulatiOn,

PERCEPT) d. DATA

To coeplement our acoustic SeaSureet0t.3. we gathered perceptur.1 data for a sunset of tne utterimices described above. supposing that labeling responses to FCV ant, YFCV utterances fro, which the fricative noise and reease burst had been re ores eight provide another means of assessing any colvticulation between fricative one stop ---a procedure used successfully by Rep; and Manic (1981), wetatgan by focusing only on those utterances that Aimed tie vowel Out later extended 9.r esperisent to utterances containing

Psethod

S)Jble,ts- The subjts were ten stvlontS Colleges. #Il native speakers of English, of wen were pail volurteerI 4nd two were participating ,,,part of a class pro ct,

Stimuli. 7o crests_ the tr, ted syllables, the utterances err .1 Mixed at10 kHz using tne ttaskins Laboratories PCW systexv Individuaa utterances were displayed on a storage oscilloizorl, and the beginning of trlr first clear pitch pi/FS* folioving the stop release burst was locatA4 in ttt waveform. 'Only the StiRvi45 portion following that point was retained, The gurat duration (fro* burst onset to the cutoff point) was recOrded_ This wed done -for five tokens of each of *14 eight speakers"Jt.Aeranr:01 containing tr,e, vowel and for four speakers' (AA, LL, PP. FEVEI tAt4r4V:f,!4 ;_'fantitinIng vowel (44.3

The truACe Cl syllables were assemeled into st-q.._ 4n1 recorded Onto audio tape. A separate tape was Created for ejrn speaker or-1for vowel, each tape contain repetitions of oath or tte st:84,;1 towor! 9 of each of 6 utterances iIn separately rano-MoiledDIOnitS IttOrStIllhA4, interval vas 2. 4th 7,5 sec between blocks.

Procedure. All subjectsparticipated in two different sessions of apptosi2ately one hour, The La.) tapes for epeettere Lk, IN. NM, one SP were played in the first session and those for speakers AA. PP. VG, and MB Were played id he seooad session.in the order as listed, Sit of the subjetts returned for a third sea/ion in ell of the 1u) tapes were plsiod. The stimuli were presented In a quiet roc over -5W-39 earphones. Subjects were required to label *itch stimulus as containing an initial 6b.6 6tne as Ir. that). .4.6 lil." or: if necessary, (no consonent),

Results +sad Discussion

The data obtained with speaker SP's l4I utterances were cociuded frog analysis because listeners found it difficult to near any stops and rerponded fairly randomly, The combined confusion minds forthe remenieg seven speakers' 011 utterances is shown in tip left or Table 5. Comparing utterances differing only in the nature of the originalfricative. it is evident that 64" (end 6t416) responses were somewhat more frequent when the fricative contest had been iel, and that 64" (and 6b6) responses were 00re frequent utien the fricative contest had been 11). Except for the trend it the *bs response*. this pattern is consistent with °lir hypothesje that (a) leads to a forward shift in the place of articulation of a following stop.

Responses vf lo Ig were suejected to separate o-Oey analyses of *nee 4th the faosrsSpesker, Stop (It) vs (V)). Fricative fts) vs. 13)). and 41,1011*ourctiry tFCV vs. VFCVi. We discovered that, while tee effect of fricative content on ge reepoeses did not reenh signIfiCenee, that on 64" resPoolmel did. F(1,9) 14.5, 2 C Nowever, the ettent of tnia difference varied cross speakers. F(6.90 8,3. 2 r -001- It was *140 sweater fur alveolar stops than for VOISr ones, P{ 1.9) * 8 1. , I As-.5. and greater for KW utterances tnen for VFC utterances. f(1.9; 1 13 8, p Several otter atatisticel Interactioee were _aignifir.-ent, Ihdle0.Ing Mgt variability 'song uttoren403 prodioced by 4ifferent spesicary out eonliaten In subjects* perception.

To 00 whetter the peakar variabiiity in !me porteptuai eats 4.8 ra Atel to tte viirisb2IIty otWerverl In tte 4KOVStle: 004nUr011ent, Of IM7g-re)t- 41 the percentage of 411* responses Nshich reed fhiNft n91,11Ifintant ritof fr)Cativa contest) for flitch jttarenca tnipt nod f-t-,Atined s) from that for the -trr4M0h4tha utterance, that had 0Ontalnod 11, andthen correlated these difference snore) r 40105 fearea-ch of7 speadere, with the F3 difforone, A.4scoll's or 7e-A0 3- The rreieltioh was positive end iilthincint. rf2J3) ,4s, Pet Tt%us, pairs of utterances stewing a relativeiy iarge erousti- affect of fricative contest 1.1, /40,40, of F3 roilowing 1eil 40v5 is -dad to oi101t larger difference *g rewonsvPs tvlr_. ffvoi" 44* relpOn,0,1 tx7, et4rw70, that origineily ,hciuded fan- , :gm *oaf The cee' Oilitfl$ for the fui we ice thin 4114,40ibr stops were OW often identified as*1,- find ng tt,e_ . but !runcetel valor 0,4=s fl.4"440 prodoeinantly .15° reipOeees--a sal be atplainel bi tho 41glIarzti of 'Pe 1.equall1 *trowel)forstin tisane! oflabial and velarstops in ':Ohteirt ,e2f Kok/ley-Port,. 001). toWn0r wItt ii Phielble listener bias to reapond at in this contest, The table reveals iittle systematic variation tontingent on the wised frgoative bntett, iscept fora trade between "be end ege responses to velar etops "eh the ;rooeding fricetivi had been feJ. 'ha responses were less frequent. and age responses 0000 frequent, than when it hed O'er' (3). Those differ shOeS. as reflected In the' Stop by Frioative interaotion, wore significant in Nnlfirete eheliSOS of*Oaresponses, VI 3 e k ( rlj. and of'se rosponefs,Ft1.5) 15_0, k Nowever, there were a nueber of SigalriOent interOCtiOPS With otherfactors, especially with 5peakers, re ri.0t1041 84,1n high betwerl-spookor 4ariebility coupled with relatiwaly between-listener Weriebiliti- There elle no significant f,-orrelation with the ecouStIO IDOSSureseht, for iv) uttSfilhoO,

CVIIICLUSICOS

Os of or present studies. oven though they are f,ssedfin p very lame of data, ere hot quite es olwar es we had hoped. NeverthIsiess. two sof* lopropriste First. we hOV obtainedrather solid cousti, e/ for a f.,,oarticulatory shift in stop production contingent on preoeoing fricetive contevt. This shift was reflected in generally higher truest volve of rj ehd re following fil thiw following fp, Secohd, WP have found addltIOnal evidence for fricetive-laduced shift, in stop prod4ction In 12stenerA' SercePtinn of the voeillic for/sent transitions, although the S,Orr0- lotion between the eroustic and percfleptveifindings lees week, iferiabillty of coertioullitory effees OrthOSSspeakers and tokens was unospetedly serge_ Unfortvnetelt. ft.itner the acoustic het the perceptual data hero a straight- forear artiolatory interpretation, whi'P leaves open the question of whether the S o?r.0 of stop erticuiation indeedshifts towerdthat of a pr,f,,ding ffteitfivo, :-_,!. likohlqner SOO* *ore e.osplet ertIcruletory adjustmentis involved_ Provehebly ably direct obeervatiohs cofspeek',h prolufion will shed tight 'err this issvf6 In epe studies, to rove 1*14 the ftoielatirio(,/r tiS figrther rot04 -I 0%*A0lishing frp:atlio-stop ^cor!I non as ft o91 ps.,,neAsow" ),= !fto V --it.r:sot4 p,,e.?0ptv*/ trAnsIn.,

PEFEf

r1f 1.-0.,3 of Spettfil tpl ";:sr!, 113 'ex stg rof9501-1 rtftlf,n,

prAt!f.linA ,trvirinno the AcouSti:s1 Sos4e:/ of Ameritc1:441. 69_ 540.

p 4 tsars,, i Ferreptvei eseessment of fr. tnWit,)- 16t; e Acoustitel 30 let) of Aver

PliOTIN7E5

fe 414v) pni,7_-logif-eiii i,,ifef, in fiPTArst-rol, e*,ere Jn!tpire,e y NIVO Nito,Pettml tWt, proin141 '1, ant 111; r., 111,01r, tfifs- nrjAltirwl.to* foto, tel mi! stops es 't, nr ;11t,

2t'7 2Averelle Cr onset *venni.* for five individual speakers (based on a subset of the utterses) ware 2862 Ns OM. 3733 Hz (M. 3962 Hz (SP). 4303 Hz (PP), and 3626 H (FOO).

no cheek for any ble differenoes in burst duration contingent on precedingfricative, ananalysis of variance was conducted on the burst

duration measurements. For the Eci..) utterances, there was no significant effect of the preceding fricative. Bursts were. however, significantly longer for velar stops (24 maws) than for alveolar ones (16 Imo), F(1,7) a 39.2,p < .001. Surat' were also significantly longer following a syllable boundary. P(:.7) 11.3, I ( .02, although the difference was only 2 aseo. In the [u] utterances, too, bursts were longer for velarstops (24 moo) than for lveoler ones (20 asec), P(1,3) 28.5. p (.05, and bursts tended to be longer followint is) (24 asec) than following [J! (20 ee), F(1,3) = 10.7, p ( .05, both effects being due to unusually short bursts for alveolar stops following (s) (17 asec). The syllabi* boundary affect was reversed here but nonsignificant. II. PUBLICATIONS

III. APPENDIX

269 267 )

SR-67/68 (1981) (July-December)

/r--PUBLICATIONS

-.'Aai?ason,A. S., Nye, P, W., Henderson, J. B., & Marshall, C. W. Vowel height and the perception of consonantal nasality. Journal of the Acoustical Society of America, 1981, 704 329-339. Baer, T. Obt n of vocal 'fold vibration: Measurementof excised larynges. In - N. Stevens & M. Hirano (Eda.), oval fold physiology. Tokyo: University of Tokyo Press, 1981, 119-133. Beil-berti: F., & Harris, K. S. Temporal patterns of ooarticulation: Lip rounding. Journal of the Acoustical Societof America, in preaa. Bellugi, U., & Studdert-KenneiTKTEde. Signed and spokenlanguage: Biolo ical constraints on linguistic form. Weinheim: Verlag Cherie, 1 80. Apurler, C. A., & Tassinary, L. G. Natural measurement criteria for speech: The anieochrony illusion. In J. Long & A. Baddeley (Eds.), Attention and performance IX.Hillsdale, N.J.: ErIbaum, 1981. Heal,,, A. P. The effects of visual similarity on proofreading for misspel- lings Memory & Cognition, 1981, 2, 453-460. Henderson, J. B., & Repp, B. H. Is a stop consonant released when followed by another atop oonsonant? Phonetics, in press. Katz, L., & Baldasare, J. 'Syllable coding in printed word recognition by children and adults. Journal'of Educational Psychology, in press. Liberman, I. Y., & Mann, V. A. Should reading remediation vary with the sex of the child? In A. Ansara, N. Geschwind, Galaburda, N. Albert, & N. Gartrell (Eds.), Sex differences in dyslexia. Baltimore: The Orton Dyslexia Society, 1981, 151-168. May, J. G. Acoustic factors that 4say contribute to categorical perception. Lan ua e and Speech, 1981, 24,V3,284. MbGarr, . .The et sot of context on the intelligibility of hearing and dear children's speech.Language and Speech, 1981, 24, 255-264. Metz,D. E.,Whitehead,R. L.,& McGarr,N. 3. Physiological aspects of speech produced, by deaf persona. Audiology: A Journal for Continuing Education, in press. Osberger, M. J., & McGarr, N. S. Speeph production characteristics of the hearifig impairei. In N. Lass (Ed.), Speech and language: Advances in basic research and practice (Vol. 8). New York: Academic PrP3, in press. Remez, R. E., Cutting, J. E.), & Studdert-Kennedy, M. Cross-aeries adaptation using song and string. Perception & Psychophysics, 1980, 27, 524-530. Remez,R., & Rubin, P. Thestream of speech. Scandinavian Journal of Peychology, in press. -Repp, B. H. Petoeptual equivalence of two kinds of ambiguous speech stimuli.

Bulletin of the Ps chonomic Societ , 1981, 18, 12-14. Repp;r-n. -U--atrateg es in cativedn'orimination. Perception& Psychophysica, 1981. 2, 217-227. tit

27o 269 Rubin, P., Baer,T., & Meraelstein, P. An articulatory synthesizer for perceptual research. Journal of the Acoustical Society of/America, 1981. IR, 321-328. Studdert -Kennedy, N. Cerebral hemispheres: Specialized for the analysis or what? The Behavioral and Brain Sciences, 1981, a, 76-77. Studdert -Kennedy, M. The emergent/e of phonetic structure. Cognition, 1981. 10, 301-306. Studdert -Kennedy, N. A noteon the biology of ipeech. perception. In J. Mahler, M. Garrett, A E. Walker (Ede.). Perspectives in mental

representation. Hillsdale, N.J.: Erlbaum, in press. . Studdert -Kennedy, N., 4 Bellugi, U. Introduction. In U. Bellugi 4 N.-Studdert -Kennedy (Eds.), Signed and spoken, ILEALIII: Biologics& constraints onlUIKAAL1 form. Weinheim: VerlagyChemle, 19S0, 41-56. Studdert -Kennedy, N., & Lane, H. Clues from the difrerencea between signed and spoken language. In U.- Bellugi 4 N. Studdert -Kennedy (Eds.), Siged and spoken lasuLia: Biological constraints on linkuistic Weinheim: Y.-arias-Mode, 1980, 29-40. Verbrugge, R. R. Transformations in knowing: A realist view of metaphor. In R. P. Honeck 4R. R. Hoffman (Eds.), Cognition and figurative language, Hillsdale, Zir.1.! Erlbaum, 1980. Verbrugge, R. R. Two feasts of metaphor. (Review of Metaphor and thought by A. Ortony (Ed.), and Onmetaphor by S. Sacki (Ed.).) ECallneutz Ps °holm, 1980, g2, 82?-828. Verbrugge, . R., 4 Rakerd, B. Vowel perception: A review of theory and rAlearch. In N. J. Lass (Ed.), Speech and Imam: Advances in basic research and practice (Vol. 8). New York: Academic Press, 4n press.

.Verbrugge, R. R.,_Pekerd, 8., Fitch, H.,Tulles'. 8., & Fowler, C. A. The perception of speech events: An ecological perspective. In R. E. Shaw W. Mace (Eds.), Event perception. Hillsdale. Erlbauser in press. Warren, 'W. H., 4 Verbrugge, R. R. Toward an ecological acoustics. In R. E. Shaw 4 W. Mace (Eds.), Event perception. Hillsdale, N.J.: Erlbaum, in press. Watson, B. C., 4 Alfonsc. P. J. A comparison of LRT and VOT values between stutterers and normal speakers. Journal of Fluency Disorders,1981.in press. Whalen. D. H. When anaphors are metaphors, In J. Copeland 4P. W. Darn (Eds.), The seventh LACUS forum. Columbia. SC: hornbeam Press. 1981. 276-283.

270 51-67/68 (194I) (July -Nacompor)

APPERDII

DTIC (Defense Technical Informatich ter) an-0 ElIC losourcos Information Center) lumbers::

Statui Report DIM ERIC

31-21/22 January - Juno 1970 AD 719382 ED-044-679 51-23 July - September 1970 AD 723586 ED-052-6511 31-24' October - Deciaber 1970 AD 727616' ED -052 -653 3111-25/26 January - Jim. 1971 AD 730013 ED-056-560 51-27 July - September 1971 AD/4933, ED-011-433 October - December 1971 AD 742140 6D- 061.837 30 January - June 1972 AD 750001 ED-011-484 S1-3 /32 July -.December 1972 AD 757954 ED4M-285 51-33 January - March 1973 AD 76E373 CD- .41-263 38-34 April - June 1973 AD 766178 ED-061-295 SO-35/36 July - December 1973 AD 774799 ED-)94-A4A 31 - -37/36 Jecuery Jun% 1974 AD 783548 ED-094-445 31- 39/40 July - December 1974 AD A007342 ED-102-633 51-41 Jammer, - March 1975 AD A013325 ED- I 09-722 SR-4214j April - September 1975 AD A018369 ED-117-770 31-44 October - December 1975 _AD AO2V59 ED-119-273 31-45/46 January - Awns 1976 AD 11026196 ED-123-678 SI-47 July - September 1976 AD A031789 ED-128-870 31-48 Octebew Decembdr 1976 AD A036735 ED- 135 -028

9 January - March 1977 AD A041460 , ED-141-864 31-50 April - June 1977 AD A044820 ED-144-138 S2-51/52 July - December 1977 AD A049215 ED-147-892 SR-5r3 January - March 1978 AD 1055653 ED-155-760 $9 e-64 April . June 1978 AD A067070 ED-161-096 St- 55/56 July - De :Amber 1978 AD A065575 ED-166-757 3147 January - March 1979 AD 4083179 ED-170-823 SR-58 ipril - June 1979 A0;1663 ED-17a-967 3149/60 July - Deoimiter 1579 AD 108204 EIS - 181-5 S SR-61 January - March 1980 AD A085320 ED-185-636 SA-62 April - JUDO 1980 AD A095062. Ea-196-099 31-63/64 July - December 1980 AD A095840 0).197-416 $1 1-65 January - March 19$1 AD A099958 ED-201-022 SR-66 April - Jude 1981 AD 4105090 $..7D-206-038 SO-67/68 July - Deosaber 198 or $0

ita ordering any of thee* Issues say be

44011C and/or ERIC *roar numbers act yet ossiond:

272 aD numbers .my be ordered Iron: ED marbles say Os ordered tram:

U.S. Department;. of Comuseroa EDIC Document Deprodoott.:41 SIWVIC4 mations' Toobnical Leforaation SargIce COIWAOT Microfilm Intarnatiom0 5285 Port Motel load Corp. (CHM SPrinifield, Virginia 22151 P.O. Sol 190 Artist:.*, Virginia 22210

Hums Laboratories Status Report on Simeon issesron is abstracted-0 Language sod Lava& 64TAT017:Abstramts, P.0.1173206, Sus Nev. colt:boils 92122.

213

272 UNCLASSIFIED ,41 DOCumion CONTROL DATA R & D ^1.,,,,,,(6 F pp.., } ,...i, .-...44,..,a*wow... , ,,...... et demhovel4,4. ...11, 4...ill: ...v.,,e ,...1.41.,44. -.1*-ir f,.O if. 3.rifi OM. .. 0.y ...e. ',--iskina Laboratorisi Unclassified 270 Crown Street . $,w Raven, Connecticut 06510

wo . _------Haskins Laboratories SCatue Report on Speech Realugr,7h. SR-67/68, July-December, 1981.

.0C...Pree. ..., P.1 '19rgo sr * ..-. 4,,,iiit Interim Scientific Report

.,,..044.e i ....ml +I i.41141. .._

Staffsit Haskins Lsborstories, Alvin M. Liberman, . ______0* g Pa of at 44) or 0(.11 0 .>)- ogr% December, 1981 284 514 4,...H vi. oA 411- +e 0 0 T ee..I ± HD-01494 DNS-8111470 I 11D-0,677 SS13870 SR -67/68 (1981)

. NOi-0-1-242J X513617 . RHR-05596 C--SG -0118 ,, , . ., . ..,..,4_ 4, e + o RV, " . ,0. //0%5.4494046, CS79-1617 W8006144 one io Omf0A.

Distribution a tnio dricumeritto. unlimited° 4

t. , . . g Ci. #it le ,e. 4.4 ± .=

N/A Sec No. 8

. . ------)______,-----4-77---...c.----

m!. .09...t(: Jul,-It oczAsosf) toerOo Of 0 roltoirit 44:701% ro CA- litggu% ,4p.,- stow* a cod's, oo tho mtwy of 4,00404, toottmatntotto for If, Lov.o-zkin: 444 otoccics1 yolt44(too4, 2444ou44119t covot 04 foil-Gs/104 telitc$ -Psoostle frdifola fOLttotto ood foof_ost offht*-1/0., eX..fLefet.4 '-=.141,=,44- far4 optoortt faudo of witterptloo

.t46901,41 ttottoros of coattleutstioo- LIO touts:flog ... ,fooporof eonatiotots oo itofttiootort foottttglAttOo As a ;tog cooertmolt r4144444 whoa tpttowsd St so00,t 0zolt ,:,,,h0,4st, 4,7Asettueot Olt4oritoo by o4tta4-tp411.44 epostor4 Zot4r4ttiT,t ttot1.4 sod oce...4tUte -04 floOtos tam wroth 14 orttsl 44441A4, PrOOOO, OW Oribooriofty .4Ntfdto'o moon, tot ...hints llott4tI tlz 4t0 mooll,14mItitt4 e4tills; 21- tvtoswo to 14441oll Obfiltt. its. 1r sod 44di!!4ry ttocitsig talstlicroa >lKnssa 4447.5..13 1:,a#IT, iv.e1W, 1,ert.-1-At003 PfOfiltioory revolt* . -Pro4o.cttoo *AO oof4fOtto4 of ob000ttc (4;atigot 4,ttos pt-Notif ,',..m4.,

,11wcilv at sdpiry potoor. to volsol'itorealaNtittoo 4 ,40 4444tits5cis of plows< OCCIettuff

-oadttary Intonation tot btsstos sod t.vortog irt.tot4 a ::t4c t::44, ;,-. seslettl<41 stove:It -4p0scs sod oi's %ow rowposte !tm As etsw 04,4p4

tAosuovp Work Croup of too Pirtle totOtoottafto; C=1-efetwoco s.--ft f.wt,"! Fottcotl4n -fficAttwo-okoo cosrlIC.siottoo 4,*qattz rand octcgittwol r14so<0

It DD UNCLASSIFIED SOU 0105 .60? 4111 t *thisdocument con.aine no intoned- -1=7"Trtettly thanes's*. ,41 tion not freely available to Cie general public. 904 It is 44ectibuted primarily for library use. e lliCLASSML12) ----1=nt7e 1mi 4.-

Speech Perception: phonetic, audtory, trading relations stpp c,-nsonants, release speech processor acoustic rues vowel discrimination, auditory memory *rosy phonetic structure speech, sign language, events coarticulation, fricative-stop, acoustic

Speech Articulatior, cc.rticulation, lip rounding, temporal 'constraints abetruents, deaf speakers

Reading: prosody, orthographv MVA101/4linguistic, nonlinguistic. reading ability

Ecological Acoustics: auditory information, breaking, bouncing, ever;

A I BACK I UNCLASS!PIED 1)"as**Ott 414-73 111MINO upcwrtoy Clatitafteatton Oft .t14t-1101 24 *