The Role of Second Formant Transitions in the Stop-Semivowel Distinction
Total Page:16
File Type:pdf, Size:1020Kb
Perception & Psychophysics 1981,29 (2), 121-128 The role of second formant transitions in the stop-semivowel distinction EILEEN C. SCHWAB, JAMES R. SAWUSCH, and HOWARD C. NUSBAUM State University ofNew York, Buffalo, New York 14226 An experiment was conducted which assessed the relative contributions of three acoustic cues to the distinction between stop consonant and semivowel in syllable initial position. Subjects identified three series of syllables which varied perceptually from [ba] to [wa). The stimuli dif fered only in the extent, duration, and rate of the second formant transition. In each series, one of the variables remained constant while the other two changed. Obtained identification ratings were plotted as a function of each variable. The results indicated that second formant transition duration and extent contribute significantly to perception. Short second formant transition ex tents and durations signal stops, while long second formant transition extents and durations signal semivowels. It was found that second formant transition rate did not contribute signifi cantly to this distinction. Any particular rate could signal either a stop or semivowel. These re sults are interpreted as arguing against models that incorporate transition rate as a cue to pho netic distinctions. In addition, these results are related to a previous selective adaptation ex periment. It is shown that the "phonetic" interpretation of the obtained adaptation results was not justified. A fundamental claim of bottom-up theories of 1978) does not guarantee that the human speech per speech perception is that phonetic labeling is the ceiver takes advantage of this information. direct result of analyzing a number of acoustic fea It becomes very important, then, to determine ex tures of the speech waveform (e.g., see Fant, 1967). actly what acoustic information is utilized by humans Regardless of the specific mechanism employed for in the course of phonetic perception. This inventory this acoustic analysis, these data-driven theories must of acoustic cues will provide a basis for evaluating specify a set of basic acoustic properties which are the psychological validity of theories of speech per coded during speech perception. One problem with ception. In addition, assessing the entire repertoire of this approach, pointed out by Studdert-Kennedy perceptually significant cues may constrain the types (1977), is that the choice of these features by theorists of perceptual mechanisms used in cue extraction has been entirely post hoc. At present there are no (e.g., spectral templates vs. formant trackers). Fi unifying auditory principles guiding this theoretical nally, the full specification of these acoustic-phonetic feature selection process. In other words, bottom-up cues might allow us to determine if there exist any theories tend to choose those acoustic properties general auditory principles of speech perception (cf. which can be successfully employed to perform pho Studdert-Kennedy, 1977). Currently, such principles netic labeling (see Stevens, 1980). Thus, feature pro (if they exist) may be obscured by an incomplete pic cessing theories typically invoke sufficiency criteria ture of human acoustic information processing dur without regard for whether or not the acoustic prop ing speech perception. Before any auditory principles erties employed are perceptually significant for hu can be defined, it will be necessary to systematically mans (see Norman, 1980, for a related discussion). investigate the separate and conjoint effects of the It is very tempting to assume that the human per full spectrum of acoustic information available in ceptual system uses all the acoustic information avail speech. able in making phonetic decisions. However, demon The phonetic distinctions of voicing (e.g., Lisker strating the sufficiency of a set of acoustic features & Abramson, 1964; Summerfield & Haggard, 1977) for phonetic labeling (e.g., Stevens & Blumstein, and place of articulation (e.g., Dorman, Studdert Kennedy, & Raphael, 1977; Liberman, Cooper, This work was supported by NIMH Grant MH3l468-Q1 and Shankweiler, & Studdert-Kennedy, 1967) in stop con NSF Grant BNS7817068 to SUNY/Buffalo and NINCDS Grant sonants are examples of two distinctions that have NS-12l79 to Indiana University (which supported development of been studied intensively and extensively. Yet, even the speech synthesizer). The authors would like to thank James for these phonetic contrasts, all possible cue specifi Pomerantz for his comments on an earlier draft of this manu script. Requests for reprints should be sent to any author at the cations and interactions have not been fully deter Department of Psychology, 4230 Ridge Lea Road, Buffalo, New mined. For other phonetic distinctions, such as man York 14226. ner of articulation, the research to date has not ex- Copyright 1981 Psychonomic Society, Inc. 121 0031-5117/81/020121-08$01.05/0 122 SCHWAB, SAWUSCH, AND NUSBAUM plored cue structure in sufficient depth, especially hence, rate) increased, F2 transition extent (and hence, when some of the cues are intrinsically related. In rate) decreased. It is possible that these extent cues many instances, manipulation of one of these cues could interact and thus increase the variance of the necessitates a change in at least one other cue. For ex boundary locations when identification functions are ample, a change in second formant transition extent plotted against transition rate. Since the change in (the frequency excursion from onset to steady state) transition rate for the two formants was not consis may change the overall F2-F3 transition pattern (e.g., tent across vowels, the relative contributions of du rising vs. diverging), the spectrum at syllable onset, ration and rate cues could not be determined un perceptual summation of F2 and F3 onsets, and tran equivocally. sition rate. Anyone, or all, of these features, which The previous studies manipulated the tempo of all are intrinsically interrelated, could be perceptually formants in a stimulus and observed the effect on relevant. This problem is exemplified by considering perception. The next two studies manipulated the the phonetic distinction between stop consonants transition rate and extent of only one formant and (e.g., [bDand semivowels (e.g., [wD. observed the effect on perception. Suzuki (Note 1) Several earlier studies have examined acoustic cues examined the effect of Fl transition rate and extent that serve to distinguish stops and semivowels on the perception of intervocalic stops and semi (Hillenbrand, Minifie, & Edwards, 1979; Liberman, vowels. It was reported that, in general, large Fl fre Delattre, Gerstman, & Cooper, 1956; Miller & quency extents were perceived as stops. Suzuki found Liberman, 1979; O'Connor, Gerstman, Liberman, that an increase in transition rate reduced the fre Delattre, & Cooper, 1957; Suzuki, Note 1). The first quency extent required to perceive a stop. However, published study of stop-semivowel cues used two an examination of the data indicates that an increase formant stimuli to examine the effect of transition in transition rate was accompanied by a decrease in tempo (Liberman et al., 1956). Tempo was varied by transition duration. Thus, the results could also be increasing the duration of the transitions (and de indicating that a decrease in Fl transition duration creasing the rate of the transitions by an appropriate reduces the frequency extent required to perceive a amount) while holding transition frequency extent stop. Another study examined the acoustic cues that constant. Subjects identified synthetic stimuli which serve to distinguish semivowels and liquids (O'Connor ranged perceptually from [be] to [WE] and [gE] to [je] et al., 1957). In part of this study, subjects identified (as in "yet"). Adult subjects were able to utilize the stimuli that varied in F2 frequency extent before a tempo of the F1 and F2 transitions as a cue to dis variety of vowels. O'Connor et al. found a relation tinguish stop consonant from semivowel. These re ship between frequency extent and the perception of sults, indicating the usefulness of the tempo cue, semivowels. When the F2 transition was in the ap have been extended to infants. Hillenbrand et al. propriate direction (rising for [w] and falling for (1979) examined the ability of infants to discriminate uD, they found that a decrease in the extent of the F2 between [bs] and [WE], which were cued by changes in transition resulted in a decrease in semivowel re transition tempo. The first experiment used synthetic sponses. Since transition duration was held constant, stimuli similar to those of Liberman et al. (1956). a decrease in transition extent resulted in a concur The second experiment used computer-modified rent decrease in transition rate. tokens of natural speech. In both experiments, infants These previous studies indicate that the extent, were able to discriminate stop from semivowel on duration, and rate of consonant transitions are major the basis of the tempo cue. In another experiment, cues to manner of articulation. Unfortunately, we Liberman et al. (1956) examined the effect of transi cannot evaluate the relative contribution of each cue, tion tempo before a variety of vowels. For all stimuli, since these cues have been confounded in previous each transition began at the same frequency (120 Hz studies. Since rate is defined as frequency extent for Fl and 600 Hz for F2). So, transition extent was divided