Journal of Experimental Psychology: Human Perception and Performance 1975, Vol. 104, No. 2, 105-120

Aspects of Phonological Fusion James E. Cutting Yale University and Haskins Laboratories, New Haven, Connecticut

Phonological fusion occurs when the of two different speech stimuli are combined into a new percept that is longer and linguistically more complex than either of the two inputs. For example, when PAY is presented to one ear and LAY to the other, the subject often perceives PLAY. The present article is an investigation of the conditions necessary and sufficient for fusion to occur. The rules governing phonological fusion appear to be the same for synthetic and natural speech, but synthetic stimuli fuse more readily. Fusion occurs considerably more often in di- chotic stimulus presentation than in binaural presentation. The phenomenon is remarkably tolerant of differences in relative onset time between the to-be-fused stimuli and of relative differences in fundamental frequency, intensity, and vocal tract configuration. Although phonological fusion is insensitive to such nonlinguistic stimulus parameters, it is sensitive to lin- guistic variations at the semantic, phonemic, and acoustic levels.

Most of the dichotic listening literature ments of both stimuli are combined to form has dealt with the phenomenon of perceptual a new percept that is longer and linguistically rivalry. Given a different stimulus presented more complex than either of the two inputs. to each ear at the same time, the subject This phenomenon is called phonological typically reports hearing one or both of fusion because it conforms to phonological them. The different information contained rules of English: Given BANKET/LAN- in each stimulus is not combined into a sin- KET, the subject reports hearing BLAN- gle percept. Thus, for example, given the KET, not LBANKET (Day, Note 1). dichotic digits FOUR/SIX, the subject Nonword fusions also occur; for example, never reports hearing SORE or FIX. Per- GAB/LAB yields GLAB, not LGAB (Day, ceptual fusion does occur when certain vari- 1968). According to phonological rules of ables are taken in account. In several types cluster information in English, initial stop of fusion phenomena, the stimulus variables + liquid clusters are allowed, but that facilitate fusion appear to be psycho- initial liquid + stop clusters are not. Day linguistic in nature. For example, given the (1970) found that when stimuli are pre- dichotic pair PAHDUCT/RAHDUCT, the sented dichotically with these constraints re- subject often reports hearing PRODUCT moved, fusion occurs in both directions. (Day, 1968). In this type of fusion, seg- Given the stimuli TASS/TACK, for exam- This research was supported by National Insti- ple, the subject reports hearing TASK on tute of Child Health and Human Development some trials and TACKS on others. Both Grant HD-01994 to the Haskins Laboratories, and /sk/ and /ks/ clusters are permissible in was based in part on a PhD thesis submitted to final position in English. the Department of Psychology of Yale University. Thanks go to Ruth S. Day, with whom I have Phonological fusion cannot be explained as never had a colorless conversation; Franklin S. a response bias for acceptable English . Cooper, with whom I have never had a fruitless Day (1968) found that when different pro- conversation; and the members of the New Haven ductions of PAHDUCT are presented to Dance Ensemble, with whom I never talked about this at all. each ear, subjects report hearing PAH- The author is now at Wesleyan University, DUCT. That is, they do not report hearing Middletown, Connecticut and Haskins Laborato- the acceptable English that corre- ries. Requests for reprints should be sent to James E. Cutting, Haskins Laboratories, 270 sponds most closely to the nonword inputs. Crown Street, New Haven, Connecticut 06510. Likewise, RAHDUCT/RAHDUCT yields 105 106 JAMES E. CUTTING

RAHDUCT. Only when the stimuli are each formant gliding to the resting frequency of PAHDUCT/RAHDUCT does fusion occur, the following . Stop began with 50 msec of formant transitions gliding into the and it occurs regardless of which stimulus following vowel. Voice onset time was 0 msec is presented to which ear. In the present for voiced consonants and +50 msec for voiceless article, I will consider a fusion response to consonants. Natural speech versions of the same occur when a stimulus pair, each item of items were spoken by the author and recorded on audio tape. An effort was made to match utter- which has n phonemes, yields a percept of ances in terms of pitch, intensity, and duration, n + 1 phonemes (following Day, 1968). but some variation was unavoidable. Both syn- The studies presented here are an anal- thetic and natural speech stimuli were digitized ysis of phonological fusion and the condi- and stored on disk file for preparation of diotic tions necessary and sufficient for fusion to and dichotic tapes (Cooper & Mattingly, 1969). Subjects. Sixteen Yale University undergradu- occur. Of primary interest are the modes ates participated in identification and fusion tasks. of presentation that optimize the occurrence Each received course credit for his or her efforts. of fusion responses and the stimulus vari- As in all experiments in this paper, subjects were ables under those conditions that enhance right-handed native speakers with no history of hearing trouble and with no and detract from the fused percept. Both experience listening to synthetic speech. They linguistic and physical (nonlinguistic) vari- listened in groups of four to tapes played on an ables are of interest. Ampex AG-500 dual-track tape recorder. Audi- To undertake such a project, one must be tory signals were sent through a listening sta- tion to matched Telephonies earphones (Model able to control the stimuli in a precise man- TDH39). Gains on the tape recorder and listen- ner, and such control is only available with ing station were adjusted so that stimuli were pre- a computer-driven speech synthesizer. How- sented at approximately 80 db. (re 20 /u,N/m2). ever, synthetic speech has a characteristic These criteria and procedures were used for all experiments in the present article. No subject "cold-in-the-nose" quality which may effect served in more than one experiment. The fusion phonological fusion. To insure that the task in the present study preceded the identification present studies are not concerned with arti- task, but it is more instructive to consider them facts inherent to synthetic speech, fusion was in reverse order. first observed for synthetic and natural Task 1: Identification of Individual Stimuli speech tokens of the same items. All stimuli, natural and synthetic, were recorded on a diotic identification tape. The tape consisted SYNTHETIC VERSUS NATURAL SPEECH of a random sequence of 120 trials: (4 sets of Experiment 1: Baseline Data stimuli) X (2 versions of each set: natural and synthetic) X (3 stimuli per set) X (5 observa- General Method tions per stimulus). Subjects wrote down the en- tire word that they heard presented. There was a Tape preparation. Four sets of stimuli of the 3-sec interval between items. same general pattern were selected: the PAY set (PAY, LAY, RAY) ; the BED set (BED, LED, Results RED) ; the CAM set (CAM, LAMB, RAM) ; and the GO set (GO, LOW, ROW). Two ver- All stimuli were highly identifiable. Each sions of each stimulus were prepared, one in syn- synthetic and natural speech item was cor- thetic speech and one in natural speech. Synthetic stimuli were prepared on the Haskins Laborato- rectly identified on better than 93% of all ries' parallel-resonance synthesizer. All stimuli trials. Cluster responses (such as PLAY) were identical in pitch contour, rising from 116 to were reported on less than 2% of all trials. 124 Hz and then falling to 82 Hz. Within each Identification tasks were a part of each ex- set all stimuli were identical in duration (3SO, 450, 400, and 350 msec for the four sets, respectively), periment presented in this article. Since the and all had the same acoustic structure except for stimuli and results generally did not differ the initial ISO msec. Liquid stimuli within each from those in Experiment 1, the identifica- set (those beginning with /!/ and /r/) differed tions are not discussed in all experiments. only in the direction and extent of the third- formant transition. This acoustic cue distinguishes Task 2: Identification of Fusible Pairs the two liquids (O'Connor, Gerstman, Liberman, D&lattre, & Cooper, 1957). Each liquid stimulus All stimuli used in the identification task were began with 50-msec steady state prerelease formant also used in the fusion task. However, instead of resonances, followed by 100-msec transitions in presenting one stimulus at a time, two stimuli PHONOLOGICAL FUSION 107 were presented, one to each ear. The dichotic pair TABLE 1 consisted of a stop stimulus and a liquid stimulus PERCENT FUSION RESPONSES FOR EACH PAIR IN from the same set; for example, PAY/LAY or EXPERIMENT 1, IN BOTH SYNTHETIC SPEECH PAY/RAY. All stimuli and possible fusions were AND NATURAL SPEECH VERSIONS monosyllabic, high-frequency English words: PAY/ LAY -» (yields) PLAY, PAY/RAY -» PRAY, Percent fusion BED/LED -» BLED, BED/RED -» BREAD, Fusible pair CAM/LAMB -» CLAM, CAM/RAM -> CRAM, Synthetic Natural M GO/LOW -» GLOW, and GO/ROW -» GROW. Most of these items and fusions have Thorndike- PAY/LAY 78 41 59 Lorge (1944) word frequencies of at least 100 per PAY/RAY 70 33 51 million. No alveolar stop-consonant stimuli were (55) chosen because /tl/ and /dl/ clusters do not ap- BED/LED 48 33 40 pear in initial position in English. BED/RED 46 19 33 (37) Two dichotic tapes were prepared, one using CAM/RAM 63 29 46 synthetic stimuli and one using natural speech CAM/LAMB 57 17 37 stimuli. Three lead, times were selected: The plo- (42) sive portion, of the stop initial stimulus began SO GO/LOW 68 41 54 msec before the onset of the liquid initial stimulus, GO/ROW 60 35 47 the stop and liquid began simultaneously, or the (51) liquid began SO msec before the stop. Day (1968) and Day and Cutting (Note 2) found that within Stop + /!/ 64 36 50 Stop + /r/ 58 26 42 such a restricted range, lead time has little effect M (61) (31) on fusion. The pulse code modulation system at the Haskins Laboratories allows temporal align- ments to within .5-msec accuracy (Cooper & Note. Values in parentheses are the means of the means. Mattingly, 1969). There were 96 randomly or- dered items per tape: (4 sets of fusible stimuli) for both synthetic and natural speech pairs. X (2 liquids per set) X (3 lead times) X (2 chan- The most common was a stop + /!/ fusion nel arrangements per pair) X (2 observations per dichotic item). All subjects listened to both types response for a stop + /r/ stimulus pair: for of stimuli: Half listened first to the synthetic example, PAY/RAY -» PLAY. Whereas speech pairs and then to the natural speech pairs, /!/ was substituted for /r/ quite frequently, while the other half listened in the reverse order. the reverse substitution was much less fre- Subjects were instructed to write down what they heard on each trial (one word or two, real words quent. Day (1968) and Cutting and Day or nonsense) and to respond after each trial. Be- (1975) have reported this /l/-for-/r/ sub- fore the task began, they listened to several prac- stitution phenomenon. It cannot be ac- tice pairs and wrote their responses in order to counted for by the frequency of these clus- familiarize themselves with the task and the stimuli. ters in English. In this study, /l/-for-/r/ substitutions occurred for both synthetic and Results and Discussion natural speech stimuli at approximately the Fusion responses occurred readily, but same rate. The relative high frequency of were more frequent for synthetic stimuli these responses in the present study seems than for natural speech stimuli. Subjects unusual, since Day (1968) found them to fused synthetic pairs on 61% of all trials, occur on only 4% of all fusions of stop + while fusing on only 31% of all natural /r/ pairs. The phenomenon will be con- speech trials. This 2:1 ratio was highly sidered in more detail in Experiments 8 significant by a sign test: All 16 subjects and 10. showed results in this direction (2 = 3.75, Frequency of fusion responses differed p < .0001). The order in which subjects across sets of stimuli. Table 1 shows the listened to synthetic and natural speech fusion rates for the four sets in both syn- pairs was not a significant factor. thetic and natural speech versions. Fusion In general, fusion was more frequent for rates for the PAY and GO sets were con- stop + /!/ pairs than for stop + /r/ pairs sistently higher than those for the BED and (e.g., PAY/LAY versus PAY/RAY), as CAM sets. These differences may be re- shown in Table 1. This pattern was re- lated to the frequency of occurrence of the ported previously by Day (1968). Anoma- fused responses as words in English; ac- lous fusion responses frequently occurred cording to Thorndike and Lorge (1944) 108 JAMES E. CUTTING and Carroll, Davies, and Richman (1971), subjects binaural pairs were presented by mixing the words PLAY, PRAY, GLOW, and the two channels of the dichotic tape. The re- maining eight subjects listened to a new tape of GROW occur more frequently in general dichotic and binaural trials randomly intermixed publications than BLED, BREAD, CLAM, (Group 3). This tape was exactly twice as long and CRAM. Day (1968) has shown that (192 items) as the tapes of Experiment 1, and the fusions occur more frequently when the dichotic and binaural pairs were constructed digi- tally by computer at the time of recording. fused outcome is an acceptable English word than when it is not (although nonword fu- Results and Discussion sions do occur, e.g., GORIGIN/LORIGIN Fusion was more frequent for dichotic -^GLORIGIN). Perhaps within the do- presentations than for binaural presenta- main of words, frequency of occurrence in tions: 45% and 15%, respectively. This the natural language also plays a role in fu- 3:1 ratio was highly significant (z = 4.8, p sion. There were no ear effects nor chan- < .0001) : All 24 subjects yielded fusion re- nel effects in the present experiment or in sults in this direction. Other than this main any experiment described in this article. effect, there were no statistically reliable There were no significant effects of lead time configuration here, but see Experiment 3. differences in fusion. For example, the di- chotic fusion rates for Groups 1, 2, and 3 Overview of Experiment 1 were 53%, 38%, and 45%, respectively; Although there were differences in fusion corresponding binaural fusion rates were between synthetic and natural speech stim- 16%, 8%, and 23%. Large variances in fu- uli, the rules that govern their fusibility sion rate within groups kept these differ- appear to be comparable. This conclusion ences from revealing any other statistically is based on two distinct results. First, the reliable pattern. As in Experiment 1, /!/- pattern of fusions for the four sets of stimuli for-/r/ substitutions occurred read- was quite similar for both natural and syn- ily, and there was only a slight effect of lead thetic speech stimuli. In both cases the time. Such differences are considered in PAY and GO sets fused more readily than more detail in Experiment 3. Differences in the CAM and BED sets. Second, the pat- fusion rate among stimulus sets were similar tern of "misperceptions" for the two types of to those in Experiment 1. stimuli was almost identical. The phoneme More frequent fusion in dichotic than in /!/ was quite frequently substituted for /r/ binaural presentation demonstrates that pho- on a fusion response; for example, PAY/ nological fusion is an active and central RAY -» PLAY. The reverse substitution process. "Pre-fused," or electrically mixed, was much less common. Thus, it appears synthetic stimuli are not perceived as a pho- that generalizations can be made about as- nological mixture of the two items. The pects of phonological fusion from the results reason fusion is so infrequent in the binaural of tasks using synthetic stimuli. case presumably stems from the fact that the electrically mixed stimuli beat against one ASPECTS OF STIMULUS PRESENTATION another, creating a very "dirty" signal. No such contamination occurs for dichotic pairs. Experiment 2: Dichotic Versus Unique left-ear and right-ear stimuli appear Binaural Presentation to be perceptually combined at some central Method locus. The synthetic stimulus tapes used in Experi- ment 1 also were employed here. Twenty-four Experiment 3: Variation in Lead Time different Yale University undergraduates listened to dichotic pairs, in which each item was presented Day (1970; Note 1) and Day and Cutting to a separate ear, and to binaural pairs, in which (Note 2) demonstrated that variation in the items were electrically mixed and presented to alignment of natural speech fusible stimuli both ears. Eight subjects listened first to a se- has little effect on fusion rate. Fusion oc- quence of dichotic trials, then to a sequence of binaural trials (Group 1), while eight others lis- curs almost as readily when the liquid stim- tened in reverse order (Group 2). For these ulus (e.g., LANKET) leads the stop stim- PHONOLOGICAL FUSION 109 ulus (BANKET) by as much as 150 msec, as when the stop leads the liquid by the •• -One-item Responses : — Fusion Responses same extent. Moreover, fusion rate in both instances is nearly identical to that for the C/D UJ simultaneous onset case. This result is par- in Z O So- ticularly surprising when one considers that cl- es neither the /b/ in BANKET or OC 40- /!/ in LANKET was longer than 90 msec 5 3°- O in duration. In Experiments 1 and 2, lead a: uj 20- times of 0 msec, + SO msec (when the stop Q. leads), and —50 msec (when the liquid 10- leads) yielded nearly identical fusion rates. The present experiment was designed, in part, to investigate relative onsets (lead stop led liquid led times) of longer than 150 msec to discover TIME(msec) the interval at which fusion disintegrates. FIGURE 1. Comparison of fusion rate and num- In addition, Experiment 3 explores the ber of one-item responses at 11 lead times in relationship between fusion and backward Experiment 3. masking. Consider the dilemma in the com- Within a set the items were identical in all re- parison of two hypothetical dichotic pairs. spects except for the initial formant transitions Imagine first the nonfusible stimulus pair requisite for the perception of a stop consonant PAY/DAY. If one staggers their onsets versus /r/ versus /!/. Dichotic pairs were as- by 50 msec, the second item will most likely sembled for all combinations of fusible items within a set: PAY/LAY -> PLAY, PAY/RAY mask the first (see Pisoni, 1973, and Stud- -» PRAY, KICK/LICK -» CLICK, and KICK/ dert-Kennedy, Shankweiler, & Schulman, RICK —> CRICK. Eleven lead times were chosen: 1970, among many others). Thus, if DAY 0, ±50, ±100, ±200, ±400, and ±800 msec. Since leads PAY, the subject may report hearing all stimuli were 350 msec or less in duration, the 400- and 800-msec relative onset items involved only PAY. In a fusible pair, PAY/LAY, temporally nonoverlapping stimuli. All pairs were the later-arriving item will not mask the represented in two tapes, each with a different first. Even if LAY begins 50 msec before random order. Each tape consisted of 88 pairs: PAY, the subject seldom reports hearing (2 sets of stimuli) X (2 stop/liquid pairs per set) PAY alone, but usually fuses them into X (11 lead times) X (2 channel arrangements per pair). Eight Yale University undergraduates PLAY. This response in this particular served as subjects for course credit. Each sub- situation is the crux of a dilemma. The seg- ject listened to both tapes, yielding 176 trials per ments /p/ and /!/ are not only combined subject. into a cluster but also perceptually reordered Results and Discussion in time: /!/ began before /p/ in the stimuli, but in the response they are reversed. The As shown in Figure 1, fusion occurred rules of language dictate the perception of most readily at short lead times, especially PLAY, whereas the rules of backward when the stop stimulus led the liquid by 50 masking and central processing in general and 100 msec. Here, fusion responses oc- predict the perception of PAY. Perhaps curred on 63% and 59% of all trials, respec- varying relative onset times beyond 50 and tively. Fusions were infrequent at the even 150 msec in the phonological fusion sit- -200, ±400, and ±800 msec leads, where uation would strain fusion and favor back- they occurred on an average of less than ward masking. 10% of the time. Intermediate fusion rates occurred at the 0, -50, -100, and +200 Method msec leads. Both sets of stimuli fused read- The PAY set (PAY, LAY, RAY) was again ily, with fusions slightly more frequent for employed. In addition, the KICK set (KICK, the PAY set. LICK, RICK) was generated on the Haskins" synthesizer. The PAY set stimuli were 350 msec The fusion rate at the short leads (0 and in duration, and the KICK set stimuli 325 msec. ±50 msec) in the present experiment is 110 JAMES E. CUTTING similar to that found in Experiments 1 and tation of the number of PAY or KICK 2. One would expect fusion to occur at the responses. longer lead times where members of a pair Between +200- and 0-msec lead times, do not overlap in time, but one might expect phonological rules dominated. Most one- that they would depress fusion response item responses within this lead-time do- rates at short leads. The absence of a de- main were fusions. However, between 0- crease suggests that fusions are not a re- and — 200-msec leads, two types of single- sponse bias resulting from lack of knowledge item responses occurred at near equal about the stimuli. The fusions that did frequencies. One-item fusions, PLAY, dem- occur at long leads occurred primarily at the onstrate the influence of language rules, beginning of the task when subjects were whereas one-item nonfusions, PAY, dem- less familiar with the items. Fusions at onstrate backward masking and the in- shorter leads continued at the same rate fluence of central processing constraints. throughout the task. Again, /l/-for-/r/ Within this lead-time domain, 58% of all substitutions were frequent. one-item responses are fusions and 42% are nonfusions. In Experiments 1 and 2 Fusion and -Backward Masking phonological rules always dominated over To consider the conflict between phono- the rules of backward masking. In the pres- logical rules and the rules of backward ent experiment, in which lead times were masking, it is necessary to observe more extended to more than double the stimulus than just fusion responses. For example, duration, backward masking did occur, but not only are there fusion and nonfusion re- occurred only for stimulus pairs in which sponses, but there are one-item and two- the liquid stimulus led the stop by less than item responses as well. These four cate- 200 msec; even in pairs like LAY-before- gories are not mutually exclusive. Consider PAY, backward masking occurred less than the stimulus pair PAY/LAY: There are half the time. In PAY-before-LAY pairs, one-item fusions, PLAY; one-item non- there was essentially no backward masking, fusions, PAY or LAY; two-item fusions there were few LAY responses, and PLAY (responses in which at least one fusion oc- dominated at all the short leads. curs), PLAY + PAY, PLAY + LAY, or Although differences in lead time of even PLAY + PLAY; and of course two- greater than about 200 msec reduced fusion item nonfusions, PAY + LAY. All of to a minimum, fusion occurred within the these responses occurred in the present ±100-msec domain with considerable fre- study, but at different frequencies according quency. Compare this domain with the to lead times. As might be expected, one- ±2-msec tolerance of relative onset differ- item responses predominated at the short ences for sound localization, another form leads while two-item responses occurred at of auditory fusion, and one sees that the the long leads. In Figure 1, the one-item tolerance for lead-time differences in pho- response curve is superimposed on the fu- nological fusion is several orders of magni- sion curve. Notice that it is nearly sym- tude greater. metrical, unlike the fusion response curve. Except for the 200-, 400-, and 800-msec Overview of Experiments 2 and 3 relative onsets, the fusion curve is primarily Mode of stimulus presentation has con- a subset of the single response curve. That siderable effect on fusion; dichotic fusions is, between ±100 msec the subject typically of synthetic speech stimuli are much more wrote down a single response, and often frequent than binaural fusions, suggesting that response was a fusion (e.g., PLAY). that central processes are much more im- When the subject wrote 'down a single re- portant than peripheral processes in phono- sponse that was not a fusion, 91% of those logical fusion. The fact that fusion is rela- responses were stop stimuli alone (e.g., tively insensitive to lead-time variation lends PAY). Thus, the area between the two credibility to this view. In visual masking, curves in Figure 1 is primarily a represen- for example, central processes appear to op- PHONOLOGICAL FUSION 111

erate over a longer time span than periph- ±50 msec. Twelve different Yale University eral processes (Turvey, 1973). If phono- undergraduates served as subjects in each of the three experiments. logical fusion is a central phenomenon, per- haps certain physical parameters of the stim- Results uli would have little effect on the frequency Fusion occurred readily for all stimuli at which fusion responses occur, just as they pairs. In Experiment 4 the frequency of have little effect on central processes in vi- fusion response rates was identical for pairs sion (Turvey, 1973). that shared the same pitch and those that PHYSICAL ASPECTS OF STIMULI differed in pitch: 50% each. Fusions for all four types of pairs were within a few Experiments 4-6: Pitch, Intensity, percentage points of one another. The pat- and Vocal Tract Size tern was the same in Experiments 5 and 6: Method Pairs with the same intensity and pairs with different intensities all fused at a rate of The same stimulus sets were used as in Ex- periment 3: the PAY set (PAY, LAY, RAY) 36%, and pairs with the same vocal tract and the KICK set (KICK, LICK, RICK). For configuration and those with different vocal Experiment 4, two versions of each stimulus were tract configurations fused at a rate of 59% synthesized, one at a relatively high fundamental each. In other words, within each experi- frequency (pitch) and one at a relatively low pitch. All stimuli had a falling pitch contour. ment there was no contribution nor decre- High-pitch stimuli began at 140 Hz and fell to ment in fusion rate attributable to variation 120 Hz, whereas low-pitch stimuli began at 120 of pitch, intensity, or vocal tract size. Hz and fell to 100 Hz. For Experiment 5, all Lead time had a nominal effect on fusion, the low-pitch stimuli of Experiment 4 were em- ployed, but they were synthesized at two relative but there were no interactions with any intensities, 80 db. and 65 db. re 20 /iN/m2. For other factor. The fusion rates for the +50, Experiment 6, all low-pitch high-intensity stimuli 0, and —SO msec lead times were, respec- were employed, but they were synthesized as if tively, 55%, 48%, and 48% in Experiment spoken by two different speakers: one a normal adult male and another a speaker whose apparent 4; 40%, 35%, and 34% in Experiment 5; vocal tract size was only 5/6 as large as the and 65%, 56%, and 57% in Experiment 6. adult male. Because these stimuli had a low The /l/-for-/r/ substitutions were frequent pitch, the tokens from the smaller vocal tract con- in all three studies, and the PAY set fused figuration sounded as if they were produced by a male midget. Variation in apparent vocal tract at a slightly higher rate than the KICK set. size was accomplished by elevating the formant Experiment 7: Multidimensional frequencies of the PAY and KICK set stimuli by a factor of 6/5. All stimuli were highly identifi- Stimulus Variation able, as determined by identification tests. Method Dichotic pairs were assembled from all possible combinations of fusible items within each set. The PAY set and the KICK set were synthe- Consider the pair PAY/LAY in Experiment 4. sized in eight different versions: (2 pitches, those The members of the pair shared the same pitch, frequencies used in Experiment 4) X (2 intensi- PAY-high (for high pitch)/LAY-high and PAY- ties, those used in Experiment 5) X (2 vocal low/LAY-low, or they differed in pitch, PAY- tract configurations, those used in Experiment 6). high/LAY-low and PAY-low/LAY-high. The One dichotic tape consisted of items within a set same pattern was followed in Experiment 5, ex- that shared none of the values in the dimensions cept that relative intensity was substituted for of pitch, intensity, and vocal tract size. Thus, relative pitch. In Experiment 6 the pattern was for example, PAY-low(pitch)-low(intensity)-small similar. Paired items either shared vocal tract (vocal tract) was paired with LAY-high-high- size (PAY-large/LAY-large and PAY-small/ large. This tape consisted of 96 items: (2 sets of LAY-small) or they differed in vocal tract size stimuli) X (2 liquids per set) X (3 lead times) (PAY-large/LAY-small and PAY-small/LAY- X (8 different mismatched pairings, across the large). Two dichotic tapes with different random dimensions of pitch, intensity, and vocal tract orders were prepared for all three experiments. size). Channel arrangements of the stimuli were Each tape contained 96 items: (2 sets of stimuli) arranged such that stops and liquids were equally X (2 stop/liquid pairs per set) X (4 pitch, in- represented on both channels. A second 96-item tensity, or vocal tract combinations) X (3 lead dichotic tape was prepared to consist of fusible times) X (2 channel arrangements per pair). The items that were all low-pitch, high-intensity, large leads used in all three experiments were 0 and vocal tract stimuli: (2 sets of stimuli) X (2 liquids 112 JAMES E. CUTTING per set) X (3 lead times) X (2 channel arrange- LINGUISTIC ASPECTS OF STIMULI ments) X (4 observations per pair). Twenty-four Yale University undergraduates listened to both fu- Three linguistic levels are considered in sion tapes. Half the subjects listened first to the the remaining experiments: the level of se- multidimensionally varied tape and then to the un- mantics, the level of the phoneme, and the varied tape (Group 1), whereas the other half listened in reverse order (Group 2). level of acoustic structure as it pertains to language. The phoneme level might appear to be the primary linguistic level at which Results phonological fusion occurs, since phonemes Differences in pitch, intensity, and vocal are fused into clusters. However, this need tract size among the fusible stimuli had re- not be the case. Therefore two other levels markably little effect on fusion. Overall were chosen, one higher (semantics) and fusion rate was 47% for the multidimen- one lower (acoustics) than the phoneme sionally varied pairs and 44% for the un- level. The experiments begin at the se- varied pairs, a result that shows the reverse mantic level and move "downward" to the trend of what might have been expected. phoneme and acoustic levels. The two groups of subjects differed some- Experiment 8: Semanti-c Level: what in fusion rate for each class of stimulus Sentence Contexts pairs. Group 1 fused on 46% of all varied trials and on only 38% of the subsequent Day. (1968) found that semantic cues at unvaried trials. This trend was not signifi- the word level influenced fusion rate. Fu- cant; only 7 out of 12 subjects showed this sion rates were higher when the fused out- pattern. Group 2 fused on 49% of the un- comes were real words than when they were varied trials and on 45% of the subsequent nonwords (PAHDUCT/RAHDUCT -» varied trials. This trend was not significant PRODUCT vs. PAHLOW/RAHLOW -» either; again, 7 out of 12 subjects yielded PRAHLOW). Nonword fusions did occur results in this direction. The accustomed (GORIGIN/LORIGIN -> GLORIGIN), nominal effect of lead times and the "usual" although at a reduced rate. amount of /l/-for-/r/ substitutions were The present experiment was designed to found in the data. Again, PAY-set pairs observe the effects of semantic cues at the fused somewhat more often than KICK-set sentence level on fusion rate. Since Ex- pairs. periments 1-7 found that /!/ was frequently substituted for /r/ in fusion responses, the present study was also designed to observe Over-view of Experiments 4-7 the effect of semantic context on /l/-for- Within the relatively large ranges of val- /r/ substitutions. ues explored in the dimensions of pitch (20 Hz), intensity (15 db.), and vocal tract Method size (a ratio of 5:6), the physical aspects Two sets of stimuli were used, the PAY set of the signal have remarkably little effect on (PAY, LAY, RAY) and the GO set (GO, LOW, ROW). Fusible pairs were presented in isola- fusion. Even when the three dimensions tion, as in previous experiments, and imbedded in vary, such stimulus pairs as PAY-low-low- sentence contexts. The PAY set appeared in the small/LAY-high-high-large fuse as readily context THE TRUMPETER S FOR US as PAY-low-low-small/LAY-low-low-small. (Sentence Frame 1) and THE MINISTER — — — S FOR US (Sentence Frame 2), whereas Taken together with the results of Experi- the GO set appeared in, THE COALS ARE ments 2 and 3, the results of these experi- ING AGAIN (Sentence Frame 3) and THE ments strongly suggest that the perceptual TREES ARE ING AGAIN (Sentence combination of the stimuli occurs after the Frame 4). These sentences were made into di- chotic pairs such that THE TRUMPETER linguistic information is extracted from each PAYS FOR US, for example, was presented to of the signals. Otherwise, physical variation one ear and THE TRUMPETER LAYS FOR would surely affect the rate at which fusion US to the other. All fusible pairs appeared in occurs. both semantically probable and improbable con- PHONOLOGICAL FUSION 113

texts. Thus, when PAY/LAY appears in Sen- regardless of liquid presented. Sentence tence Frame 1, the response might be THE Frames 2 and 4 yielded a more predictable TRUMPETER PLAYS FOR US, a reasonable sentence; but when the same pair appears in Sen- set of results: In both cases, stop + /!/ fu- tence Frame 2, the response might be THE MIN- sions are semantically probable and as fusions ISTER PLAYS FOR US, a semantically less were very frequent. They occurred on 96% probable sentence. A reverse pattern of expected and 84% of all trials, respectively. There results follows from PAY/RAY pairs, and Sen- tence Frames 3 and 4 with GO-set stimuli paral- were virtually no THE TRUMPETER lel Sentence Frames 1 and 2. PRAYS FOR US and THE COALS ARE Two tapes were prepared, one with fusible tar- GROWING AGAIN responses, regardless gets imbedded in sentences and the other with of what liquid stimulus was presented. Other target pairs presented in isolation. The sentence tape consisted of 64 items: (4 sentence frames) X responses to sentence frames were primarily (2 stop/liquid pairs per set) X (2 channel ar- responses in which only the stop stimulus rangements) X (4 observations per sentence). was reported; for example, THE MINIS- Dichotic sentence pairs were presented at a simul- TER PAYS FOR US. Pairs of stimuli in taneous onset with 12 sec between trials. Subjects wrote down the entire sentence they heard. The the isolated-pair condition yielded similar isolated-pair tape also had 64 pairs: (2 sets of liquid substitution results: For example, stimuli) X (2 stop/liquid pairs per set) X (2 when PAY/RAY fused, PLAY responses channel arrangements) X (8 observations per were given 85% of the time, a pattern nearly pair). Again, only the simultaneous onset time was used, but the intertrial interval was 4 sec. identical to that for sentence trials. The Subjects wrote down "what they heard"—one reverse substitution rarely occurred. word or two, acceptable words or nonsense. Half The present experiment showed that mean- of the 16 subjects listened first to the sentence ing at the sentence level can not nullify the tape and then to the isolated-pair tape, while the others listened in reverse order. /l/-substitution effect. Relative frequency of occurrence of the fused words cannot ac- Results count for the substitutions either: GLOW- ING, for example, is much less frequent Fusion was significantly greater in the than GROWING (Carroll, et al., 1971; sentence condition than in the isolated-pair Thorndike & Lorge, 1944). Word-level condition. All subjects showed this trend considerations, however, may provide a clue. (s = 3.8, p < .001). Fusion rate was 89% Day (1968) found that, although subjects for sentence trials and 63% for isolated-pair usually reported hearing GROCERY when trials. The order in which subjects listened given GOCERY/ROCERY, sometimes they to the tapes was not a significant factor. reported hearing GLOCERY, a nonword. Fusion rates were comparable for stop + In the present series of experiments, both /r/ and stop + /!/ items, as well as for the stop + /r/ and stop + /!/ fusions for both sets of stimuli in each condition. a given set were acceptable English words. Stop + /!/ responses dominated all sen- Day (1968), on the other hand, chose stim- tence contexts even when they were seman- uli that could fuse meaningfully with only tically inappropriate. For Sentence Frame one of the liquids, /!/ or /r/. She found 1, THE MINISTER PLAYS FOR US oc- that PAHDUCT/RAHDUCT -* PROD- curred on 74% of all trials. Certainly, the UCT and not PLODUCT, and that minister PLAYING is semantically less GEEDY/REEDY -» GREEDY not likely than PRAYING (which occurred on GLEEDY. Such results suggest that mean- only 16% of all Sentence Frame 1 trials, re- ing at the word level can override the /l/- gardless of the liquid stimulus presented) substitution effect. This effect is consid- even in today's climate of changing roles. ered again in Experiment 10. Likewise, in Sentence Frame 3, THE TREES ARE GLOWING AGAIN oc- Experiment 9: Phoneme Level: curred on 83% of all trials, despite the fact Stops and that it is not very probable. GROWING re- Phonological fusion occurs when pho- sponses occurred on only 4% of such trials, nemes from different dichotic stimuli are 114 JAMES E. CUTTING perceived as a cluster. Experiments 9 and not be accounted for by the relative fre- 10 examined the phonemic components, the quency of the possible fusion responses in stop and the liquid, to assess their impor- English: for example, BLED and GLOW tance in the fusion phenomenon. are much less common than FLED and In Experiments 1-8, stop consonants FLOW (Carroll et al., 1971). Further- served as the first phoneme of the to-be- more, initial /f/ + liquid clusters occur at fused consonant-consonant-vowel cluster. about the same frequency as initial /b/ + The present experiment compared the fusi- liquid clusters in English, and considerably bility of stop/liquid and /liquid more frequently than initial /g/ + liquid pairs. Both initial clusters occur very fre- clusters (Denes, 1965; Hultzen, Allen, & quently in English, but Day (1968) reported Miron, 1964). results of pilot work showing infrequent fu- The order in which subjects listened to sions for stimuli involving fricative pho- the stop and fricative test tapes was not a nemes. significant factor. Fusion rates for stop + /!/ and stop + /r/ stimuli were within a Method few percentage points. Fricative + /!/ and In addition to the BED set (BED, LED, RED) fricative + /r/ fusion rates were also com- and the GO set (GO, LOW, ROW), the fricative parable. The /l/-for-/r/ substitutions were stimuli FED and FOE were synthesized. The fricative /f/ was chosen because it is the only frequent for stop/liquid pairs, but not for fricative in English that clusters with both /r/ fricative/liquid pairs. In fact, /r/ substitu- and /!/ in initial position. Fricative stimuli were tions occurred on 64% of all trials in which identical to the stop stimuli in duration, pitch, and fricative + /!/ stimuli fused, while corre- intensity, and differed only in the acoustic struc- ture of the first 100 msec. BED and GO have sponding /!/ substitutions were rare. A formant transitions in this region requisite for the second type of substitution also occurred. perception of stop consonants, plus a portion (SO Fricative/liquid stimuli did not always yield msec) of steady state vowel. FED and FOE, on fricative + liquid responses; for example, the other hand, began with bandpass noise (above 1,500 Hz) and aspirated transitions requisite for FED/RED -» BRED. In fact, about 70% of the perception of /f/. A given liquid stimulus all fricative/liquid pair fusions were actually such as LED was paired with both a stop con- stop + liquid responses. Stop-for-fricative sonant stimulus (BED/LED) and a fricative substitutions were not the result of poor stimulus (FED/LED). All stimuli and possible fusions were English words or names: BED/LED fricative stimuli, since subjects identified -» BLED, BED/RED -> BREAD, FED/LED them in isolation on a diotic test with a high -> FLED, FED/RED -> FRED, GO/LOW -> degree of accuracy. Instead, these substitu- GLOW, GO/ROW -» GROW, FOE/LOW -» tions appear to be an extension of the dif- FLOW, and FOE/ROW -» FRO. ferences in fusibility between the stops and One tape was prepared with stop/liquid pairs and another tape with fricative/liquid pairs. Each fricatives. tape consisted of 120 dichotic trials: (2 sets of stimuli) X (2 consonant/liquid pairs per set) X Experiment 10: Phoneme Level: (3 lead times) X (2 channel arrangements) X (5 Liquids and Semivowels observations per pair). Twelve subjects listened to both tapes: half listened first to the fricative/ The present experiment examined the sec- liquid stimuli and then to the stop/liquid stimuli, ond consonant in the to-be-fused cluster. while the others listened in the reverse order. Stimuli beginning with semivowels (i.e., /w/ and /y/) were generated to see whether Results they would fuse as readily as the liquids, Fusions occurred more readily for stop/ and to see whether the /l/-substitution effect liquid pairs than fricative/liquid pairs. Fu- would be extended to /w/ and /y/. sion rates were 57% and 18%, respectively. This 3:1 ratio was highly significant, with Method all subjects showing greater fusion rates for Two sets of stimuli were used: the KICK set stop/liquid items (z = 3.18, p < .001). Un- (KICK, LICK, RICK used previously, plus WICK) and the COO set (COO, LIEU, RUE, like the results of Experiments 1 and 3, fu- YOU) generated on the Raskins' synthesizer. sion rate differences of this experiment can- The only stop consonant that clusters with all PHONOLOGICAL FUSION 115 liquids and semivowels in English is /k/; yet /ky/ F2 VARIATION occurs only before the vowel /u/, while /kw/ does WICK LICK Llty YOU not occur before /u/. Thus, it was necessary to synthesize two sets of stimuli, one for /I, r, w/ and the other for /I, r, y/. (Note that LIEU could also be represented as LOU.) Liquid and semivowel stimuli within the same set were iden- tical in all respects except for the direction and slope of the second- and third-formant transi- tions, as shown in Figure 2. These are the cues that distinguish all liquids and semivowels (Lisker, 1957; O'Connor et al., 1957). All stimuli and possible fusion responses for both sets were common English words or names: FIGURE 2. Spectrograms of liquid and semivowel KICK/LICK -> CLICK, KICK/RICK -> stimuli used in Experiment 10. CRICK, KICK/WICK -> QUICK, COO/LIEU -> CLUE, COO/RUE -» CREW, and COO/ YOU -» CUE. A tape was prepared with 108 trouble processing /r/ than /!/, and often dichotic items: (2 sets of stimuli) X (3 liquid and hear /!/ in both cases (Rosen, 1962). semivowel stimuli per set) X (3 lead times) X. (2 channel arrangements) X (3 observations per The articulation pattern of /r/ is less stable pair). Twelve subjects listened to two passes than /!/ (Bronstein, 1960; Delattre, Note through the tape, reversing headphones after the 3), and /r/ is more difficult to pronounce first pass. correctly under conditions of delayed audi- tory feedback than /!/ (Applegate, 1968). Results and Discussion The dichotic fusion results are complemen- KICK/LICK, KICK/RICK, and KICK/ tary to these studies: Perhaps /r/ is less WICK pairs all fused at an average rate of stable than /!/ in perception and produc- 70%, while COO/LIEU, COO/RUE, and tion, and the fusion results of this stud}' are COO/YOU all fused at an average rate of a manifestation of /r/ instability. 42%. There were no significant differences within each set. However, regardless of Overview of Experiments 9 and 10 which stimuli were presented, most re- When the first consonant in the to-be- sponses were stop + /!/; /!/ was substi- fused cluster is a stop, fusion rate is high, tuted for /r/, as in previous studies, and it but when it is a fricative, fusion occurs con- was also substituted for /w/ and /y/. siderably less often. The role of the second CLICK and CLUE responses occurred hi consonant in the to-be-fused cluster is less 89% of all trials in which fusions occurred. clear: Fusion occurred equally well for all Again, word frequency of possible fusions stop/liquid and stop/semivowel pairs, yet cannot account for the substitutions. For all pairs tended to yield a stop + /I/ re- example, QUICK is much more common sponse. than CLICK, and CREW is more common than CLUE (Carroll et al., 1971). Never- Experiment 11: Acoustic Level: theless, the /!/ substitutions for both sets of Liquid Transitions stimuli yielded relatively common English Linguistic cues at the sentence and pho- words. The data of Day (1968) suggest neme levels affect fusion rate. Perhaps lin- that when /!/ substitutions do not yield ac- guistic cues at the level of acoustic struc- ceptable words, they occur considerably less tures are also important. Since the liquid often. is perceptually interpolated between the stop Difficulties with /r/ versus /!/ occur in a and the vowel in the fusion response, one wide variety of other situations besides pho- key to fusion may lie in the acoustic struc- nological fusion. Children, for example, ture of the liquid. Experiments 11—13 ex- have more difficulty in pronouncing /r/ amine aspects of the acoustic structure of than /!/ and sometimes pronounce both liquids. phonemes as /!/ (Morley, 1957; Murray, Experiment 10 showed that for the pres- 1962; Powers, 1957); the deaf have more ent sets of stimuli, liquids (/r, I/) and semi- 116 JAMES E. CUTTING

100 stimuli. Sixteen Yale undergraduates participated in two tasks, dichotic fusion and then diotic iden- tification of the liquid-to-vowel stimuli in isola- O 75 tion. Since the results of the identification task liquid vowel were highly relevant to the fusion task, those data UJ (j are discussed first. O — 50 A diotic identification tape of 80 randomized CC <± liquid-to-vowel items was prepared: (2 sets of stimuli) X (4 stimuli per array) X (10 observa- tions per stimulus). There was a 3-sec interval between each item. Subjects wrote down the sin- gle item that they heard presented on each trial. Dichotic items were constructed by pairing the stop stimuli with all items in the liquid-to-vowel arrays. The tape consisted of 96 pairs: (2 sets of 6° stimuli) X (4 stimuli per array) X (3 lead times) X (2 channel arrangements) X (2 observations per pair). Subjects listened to two passes through the tape, reversing headphones after the first pass. As usual they wrote their responses, indicating 30 oUJ what they heard. cc. Results and Discussion Stimuli 1 and 2 were identified as be- ginning with /!/ on 88% of all trials, as 1234 shown in the top panel of Figure 3. Stim- BOTH ARRAYS uli 3 and 4 were identified as beginning with FIGURE 3. Results of identification and fusion /!/ on only 8% of all trials. All subjects tasks involving liquid-to-vowel stimulus arrays in showed this quantal trend. There was no Experiment 11. difference between the LAY-to-AY and LICK-to-ICK stimulus arrays. (/w, y/) tend to yield stop + /!/ Fusion occurred at a rate of 52% for pairs fusions. It would appear that any combi- containing Stimulus 1 or Stimulus 2, the nation of rising and falling second- and stimuli which had been identified as begin- third-formant transitions is sufficient for ning with a liquid. Other pairs yielded stop + /!/ fusions to occur. If any com- only 6% fusions (stop + liquid responses), bination is sufficient for fusion, perhaps no as shown in the lower panel of Figure 3. transitions are needed at all. For example, Thus, liquid-like transitions are necessary PAY/AY, a pair without any liquid transi- for phonological fusion: Fusion occurs in di- tions, might also yield stop + /!/ fusions. rect proportion to the extent that liquid-like The present study varied the slope of the items are indeed perceived as liquids in liquid transitions to determine the extent of isolation. formant transitions necessary for fusion to Experiment 12: Acoustic Level: occur and to gain additional support for the Degraded Liquids view that fusion is not a response bias. Experiment 11 showed that transitions in Method the liquid stimulus were necessary for fusion The PAY set and the KICK set were altered to occur. The present experiment was de- to include five stimuli, one stop stimulus and four signed to determine which formant transition stimuli that formed a liquid-to-vowel continuum. (or combination of transitions) in the li- At one end of the continuum, Stimulus 1 had full liquid transitions in all formants, as found in LAY quid is necessary for fusion. and LICK, At the other end of the continuum, Stimulus 4 had the same duration but began with Method a steady state vowel, AY and ICK. Between the The PAY set and the KICK set were again extremes were Stimuli 2 and 3 with intermediate used. Liquid stimuli appeared in many forms. amounts of formant transitions. Equal increments Some were degraded in that certain formants were of acoustic change occurred between successive omitted from their acoustic structure. Figure 4 PHONOLOGICAL FUSION 117

shows the component parts of liquid stimuli. All TABLE 2 possible combinations of all formants were used. PERCENT FUSION RESPONSES FOR ALL Eleven liquid-like stimuli resulted in each set: 2 PAIRS IN EXPERIMENT 12 three-formant stimuli identical to those used in previous studies, 5 two-formant stimuli, and 4 one- Dichotic pairs Percent fusion formant stimuli, as listed in Table 2. Each of the 11 liquid-like stimuli was paired with its appropriate stop stimulus. In addition, Stop + three-formant liquids 1.2.3/1/ 57 two control pairs were constructed per set: One 1,2,3/r/ 51 pair was a stop/stop pair (PAY/PAY and KICK/ Stop + two-formant liquids KICK like those used by Day, 1968), and the 1, 2 57 other was a stop presented to one ear and nothing 1, 3/1/ 23 to the other (PAY/ and KICK/ ). 1, 3/r/ 51 No fusion responses should occur for control pairs 2, 3/1/ 55 if fusion occurs only for pairs containing liquid- 2, 3/r/ 54 like stimuli. Twelve subjects listened to a di- Stop + one-formant liquids 1 19 chotic tape of 156 items: (2 sets of stimuli) X 2 60 (13 pairs per set) X (3 lead times) X (2 channel 3/1/ 16 arrangements per pair). 3/r/ 64 Control items Results and Discussion Stop + stop 6 Stop -j- blank There were two general fusion rates: 2 Most pairs fused at a rate of about 55%, but a few pairs fused at a considerably lower formant transitions in the mid-frequency rate, as shown in Table 2. Pairs that rarely range (1,000-2,000 Hz), while all other liq- fused contained liquid-like stimuli with only the first formant or the third formant of /!/. uid-like stimuli had transitions in this region Figure 4 shows that these stimuli lacked (either the second formant or the third formant of /r/). LIQUID STIMULI Stop + three-formant liquid stimuli fused at rates comparable to previous studies: 54%. Stop + two-formant liquid pairs fused at a rate of 52%, with the exception of the pairs with only first and third for- LAY mants of /!/, where fusion level was only & 23%. All subjects showed this drop in fu- RAY sion rate (z = 3.18, p < .001). Stop + one-formant liquid pairs also showed high and low fusion rates, following a pattern similar to that of pairs with two-formant liquids. Again, all subjects showed these x quantal differences in fusion rate. As in o previous studies, stop + /!/ responses oc- curred on more fused trials than did stop + /r/ responses. Stop/stop pairs yielded few 3- stop + /!/ responses. Such responses would LICK be "false fusions," since the subject would & be reporting a liquid when, in fact, none was RICK presented (Day, 1968). A specific acoustic cue for phonological fusion of stop/liquid stimuli appears to be I- H the second-formant transition or the third- formant /r/ transition of the liquid stimulus. FIGURE 4. Schematic spectrograms of liquid A single rising formant transition in the stimuli and their component parts used in Experi- ment 12. (Fl = first formant, F2 = second for- range 1,000-2,000 Hz is necessary for fusion mant, F3 = third formant.) to occur. 118 JAMES E. CUTTING

RISING FALLING SJ EADV In addition to the stop/chirp pairs, there were STATT E control pairs. Ordinary fusible items such as PAY/LAY and PAY/RAY were employed to ob- tain baseline fusion rates. Stop/stop pairs were also included to set a lower boundary on fusion rate since Experiment 12 found few stop + liq- quid responses to such trials. Thus the control pairs provided boundary conditions within which to compare the fusion rates for stop/chirp items. Three lead times were used such that stop/chirp stimulus pairs had the same temporal relationships as the stop and the second-formant transition of the full liquid in previous experiments. Twelve CHIRPS subjects listened to a dichotic tape of 180 items: FIGURE 5. Schematic spectrograms of liquid (2 sets of stimuli) X (IS pairs per set, 12 stop/ stimuli and liquid "chirps" used in Experiment 13. chirp pairs plus such control pairs as PAY/LAY, (The original second-formant transition from the PAY/RAY, and PAY/PAY) X (3 lead times) X liquid stimuli is the rising chirp stimulus desig- (2 channel arrangements per pair). r nated 2 .) Results Experiment 13: Acoustic Level: Fusions, or stop + liquid responses, oc- Liquid "Chirps" curred at a substantially reduced rate for all stop/chirp pairs. While fusion rate for The previous study found that a mid- stop/liquid control pairs was 47%, a rate range formant transition was necessary for comparable to previous studies, fusion rates fusion to occur. The present study was de- for stop/chirp pairs averaged only 8%. signed to determine whether that transition This difference was highly significant (z = or any other is sufficient for fusion when 3.18, p < .001). paired with a stop stimulus. When the Fusion did occur, however, for selected second-formant transition is removed from stop/chirp pairs as shown in Table 3. Pairs a liquid stimulus and synthesized by itself, with rising chirps yielded an average fusion it sounds similar to a bird's twitter, hence rate of 14%, higher than all other stop/chirp the name "chirp." Mattingly, Liberman, pairs combined (z — 2.6, p < .005). Within Syrdal, and Halwes (1971) and Wood this category of rising chirps, fusion oc- (1975) have demonstrated that these chirps curred most readily for pairs involving chirp are not processed as speech. Chirp stimuli number 2r. Eight of 12 subjects fused at a have two general features, relative frequency higher rate for these pairs than for any other range and direction (rising vs. falling). stop/chirp pair (s = 4.0, p < .0001), but even here fusions occurred significantly less Method often than for the stop/liquid control pairs Two stop stimuli were synthesized, PAY and (2 = 3.18, p < .001). Thus, even the chirp KICK. Liquid stimuli were degraded so that only the second-formant transition remained, a 100- stimulus most appropriate to the full liquid msec chirp rising rapidly from a value of 1,200 TABLE 3 Hz to 1,800 Hz. This and other chirps were syn- thesized at the same amplitude as the second- PERCENT FUSION RESPONSES FOR STOP/CHIRP formant transition in the full liquid stimulus. PAIRS IN EXPERIMENT 13 Twelve chirp stimuli were used; there were four frequency values for rising, falling, and steady Class of chirp stimulus state chirps, as shown in Figure 5. Specific stim- SHmnl irnh rt frequency domain uli are numbered from low to high according to Rising Falling Steady M ordinal position on the frequency scale, and each state is given a superscript to designate the class to which it belongs. Endpoints for rising and falling 1 (600-1,200 Hz) 14 5 5 8 chirps were 600, 1,200, 1,800, 2,400, and 3,000 Hz, 2 (1,200-1,800 Hz) 24 6 7 12 while steady state chirps had frequencies of 900, 3 (1,800-2,400 Hz) 12 3 5 7 1,500, 2,100, and 2,700 Hz. The original second- 4 (2,400-3,000 Hz) 7 8 3 6 formant transition from the liquid stimuli is the M 14 6 5 rising chirp stimulus designated 21'. PHONOLOGICAL FUSION 119 stimulus is not entirely sufficient for fusion and stop/liquid pairs. Thus, the high to occur at an unreduced rate. rate of THE TREES ARE GLOWING The /l/-for-/r/ substitutions occurred for AGAIN responses appears to be the result stop/liquid control pairs at rates comparable of the synergy of cues from three very dif- to previous studies. Fusion of stop/chirp ferent levels of language. For further dis- pairs, however, were not dominated by stop cussion of linguistic influences on phonolog- + /!/ responses. In fact, only stop/21' pairs ical fusion, see Cutting and Day (1975). yielded more stop + /!/ fusions than stop + /r/, /w/, or /y/ fusions. Fusions for the ADDITIONAL COMMENTS lowest frequency chirps paired with a stop Day (1973; Note 1) found marked indi- stimulus were dominated by stop + /w/ re- vidual differences in phonological fusion sponses, whereas those for highest frequency using natural speech disyllabic pairs. In chirps were dominated by stop + /y/ and the present series of fusion experiments, stop + /!/ responses. False fusion re- using synthetic speech monosyllabic fusible sponses for stop/stop pairs occurred only items, such individual differences did occur, 2% of the time. but to an extent less marked than that found Overview of Experiments 8-13 by Day. Consider fusion rates for the 140 individuals in Experiments 1, 2, and 4-10 Three levels of linguistic cues were ex- (the other 48 individuals are eliminated be- plored (the semantic level, the phoneme cause fusion broke down in their tasks), and level, and the acoustic level), and the effect consider the number of subjects whose fu- of cues at each level on fusion rate was ob- sion rates fell in the five quintiles between served. The results suggest that the cogni- 0% and 100% fusion. Only 7 subjects' tive processes involved in phonological fu- rates fell between 0% and 20%, whereas 39 sion are influenced by cues at all three lin- fell between 21% and 40%, and 54 between guistic levels. Fusion rate was enhanced 41% and 60%. Only 14 subjects' fusion when fusible pairs were imbedded in sen- rates were between 61% and 80%, whereas tence contexts; fusion occurred best for cer- 26 were between 81% and 100%. This tain classes of phonemes; and specific acous- overall bimodal trend is indicative of sub- tic cues were found to be important for jects within each of the nine experiments in fusion. Thus, linguistic cues within and question. It is similar to that found by Day, outside of the consonant/liquid pairs have but the lower fusion mode has been shifted a distinct effect on fusion rate. Given the "upward" in frequency through the use of appropriate experimental situation, cues higher fusing synthetic pairs. from all three levels appear to work in con- Phonological fusion has a direct analog in cert. Consider one of the sentence pairs vision. In fact, Ruth Day discovered this from Experiment 8: THE TREES ARE phenomenon in search of an auditory paral- GOING AGAIN presented to one ear and lel to the work of Rommetveit and his co- THE TREES ARE ROWING AGAIN to workers (Rommetveit, Berkeley, & Br0g- the other. Semantic cues at the sentence ger, 1968; Rommetveit & Kleven, 1968; level increased fusion rate for GO/ROW Rommetveit, Toch, & Svendsen, 1968a, pairs beyond the rate they yielded when pre- 1968b). They found that when letters such sented in isolation. Cues at the phoneme as SHAR were presented to one eye and level were also influential: Experiment 9 SHAP to the other eye, many subjects re- demonstrated that GO/ROW pairs fused at ported seeing SHARP. a higher rate than FOE/ROW pairs. Ex- periment 12 found that the second-formant transition in the liquid was a specific cue for CONCLUSION fusion. Moreover, this cue appears to be Phonological fusion occurs in dichotic lis- pertinent to the /l/-for-/r/ substitutions: tening in a very systematic fashion. Within Fusion rates and /!/ substitutions were ap- a relatively wide range of variation, tem- proximately equal for stop/second-formant poral alignment and physical attributes of 120 JAMES E. CUTTING the signals appear to matter very little. Lin- Lisker, L. Minimal cues for separating /w, r, 1, guistic aspects of the stimuli, however, mat- y/ in intervocalic position. Word, 1957, 13, 257- 267. ter quite a lot. Stop consonants and liquids Mattingly, I. G., Liberman, A. M., Syrdal, A. K., fuse most readily, and fuse in the order in & Halwes, T. Discrimination in speech and non- which they are phonologically reasonable in speech modes. Cognitive Psychology, 1971, 2, English. Given a stop stimulus and a liquid 131-157. Morley, M. E. Development of speech disorders stimulus of the same general pattern, lin- in childhood. London: Livingstone, 1957. guistic attributes within and outside of the Murray, R. S. The development of /r/ in the dichotic pairs contribute to the percept. speech of preschool children (Doctoral disserta- tion, Stanford University, 1962). Dissertation REFERENCE NOTES Abstracts International, 1963, 23, 4462. (Uni- 1. Day, R. S. Temporal-order judgments in versity Microfilms No. 63-2734) speech: Are individuals language-bound or stim- O'Connor, J. D., Gerstman, L. S., Liberman, A. M., ulus-bound. Paper presented at the meeting of Delattre, P. C., & Cooper, F. S. Acoustic cues the Psychonomic Society, St. Louis, Missouri, for the perception of initial /w, y, r, I/ in Eng- November 1969. (Also in Haskins Laboratories lish. Word, 1957, 13, 25-43. Status Report on Speech Research, 1969, SR- Pisoni, D. B. Perceptual processing time for con- 21/22, 71-87.) sonants and vowels. Journal of the Acoustical 2. Day, R. S., & Cutting, J. E. Levels of proces- Society of America, 1973, 53, 369. (Abstract) sing in speech perception. Paper presented at Powers, M. H. Functional disorders of articula- the meeting of the Psychonomic Society, San tion : Symptomatology and etiology. In L. E. Antonio, Texas, November 1970. Travis (Ed.), Handbook of speech pathology. New York: Appleton-Century-Crofts, 1957. 3. Delattre, P. C. A dialect study of American Rommetveit, R., Berkeley, M., & Br0gger, J. Gen- /r/ by x-ray motion picture. The general pho- eration of words from tachistoscopically pre- netic characteristics of languages (Final Re- sented nonword strings of letters. Scandinavian port). University of California at Santa Bar- Journal of Psychology, 1968, 9, 150-156. bara, 1967. Rommetveit, R., & Kleven, J. Word generation: REFERENCES A replication. Scandinavian Journal of Psy- chology, 1968, 9, 277-281. Applegate, R. B. Segmental analysis of articula- Rommetveit, R., Toch, H., & Svendsen, D. Ef- tory errors under delayed auditory feedback. fects of contingency and contrast on the cogni- Project on Linguistic Analysis (University of tion of words. Scandinavian Journal of Psy- California at Berkeley), 1968, 8, 1-27. chology, 1968, 9, 138-144. (a) Bronstein, A. M. The pronunciation of American Rommetveit, R., Toch, H., & Svendsen, D. Se- English. New York: Harper & Row, 1960. mantic, syntactic, and associative context effects Carroll, J. B., Davies, P., & Richman, B. (Eds.). in stereoscopic rivalry situation. Scandinavian Word frequency book. New York: Houghton & Journal of Psychology, 1968, 9, 145-149. (b) Mifflin, 1971. Rosen, J. Phoneme identification in sensorineural Cooper, F. S., & Mattingly, I. G. Computer-con- deafness (Doctoral dissertation, Stanford Uni- trolled PCM system for investigation of dichotic versity, 1962). Dissertation Abstracts Interna- speech perception. Journal of the Acoustical So- tional, 1962, 23, 2253. (University Microfilms ciety of America, 1969, 46, 115. (Abstract) No. 62-5510) Cutting, J. E., & Day, R. S. The perception of Studdert-Kennedy, M., Shankweiler, D. P., & stop-liquid clusters in phonological fusion. Jour- Schulman, S. Opposed effects of a delayed chan- nal of , 1975, 3, 9-23. nel on perception of dichotically and monotically Day, R. S. Fusion in dichotic listening (Doctoral presented CV . Journal of the Acousti- dissertation, Stanford University, 1968). Dis- cal Society of America, 1970, 48, 599 602. sertation Abstracts International, 1969, 29, 2649B. Thorndike, E. L., & Lorge, I. The teacher's word- (University Microfilms No. 69-211) book of 30,000 words. New York: Columbia Day, R. S. Temporal-order perception of a re- University, Teachers College, Bureau of Publica- versible phoneme cluster. Journal of the Acous- tions, 1944. tical Society of America, 1970, 48, 95. (Ab- Turvey, M. T. On peripheral and central proces- stract) ses in vision: Inferences from an information- Day, R. S. Digit span memory in language-bound processing analysis of masking with patterned and stimulus-bound subjects. Journal of the stimuli. Psychological Review, 1973, 80, 1-52. Acoustical Society of America, 1973, 54, 287. Wood, C. C. Auditory and phonetic levels of pro- (Abstract) cessing in speech perception: Neurophysiological Denes, P. B. On the statistics of spoken English. and information-processing analyses. Journal of Journal of the Acoustical Society of America, Experimental Psychology: Human Perception 1965, 35, 892-904. and Performance,'1975, 104, 3-20. Hultzen, L., Allen, J., & Miron, M. Tables of transitional frequencies of English phonemes. Urbana: University of Illinois Press, 1964. (Received May 1, 1974)