<<

'I� $'. \[o...,\,\ l \\.. ��. ..�\v- (�.ls.). 'S,kviu.._�' \.\.°""-ol�.... k \!)\ \::..�r,-i,�e,�J �'S�Cl�li-�o-· '(o\ J;; s���� � ��'. w \�, JJe""�t-At..,rr. i11--.l\.�.l_ (1-,ol'

CHAPTER 10

Auditory and Cognition

STEPHEN McADAMS AND CAROLYN DRAKE

The environment in which we live is ex­ information from the many sources and eval­ traordinarily rich. As we scurry about in our uate the properties of individual events or se­ little animal lives, we perceive sound sources quences of events arising from a given source. and the sequences of events they emit and At a more cognitive level, it is also useful to must adapt our behavior accordingly. How do process the temporal relations among events we extract and make use of the information in more lengthy sequences to understand the available in this highly structured acoustic nature of actions on objects that are extended array? in time and that may carry important cultural arise in the environment from the messages such as in speech and music for hu­ mechanical excitation of bodies that are set mans. Finally, in many cases, with so much into vibration. These bodies radiate some of going on, listening must be focused on a given their vibratory energy into the surrounding air source of sound. Furthermore, this focusing ( or water) through which this energy propa­ process must possess dynamic characteristics gates, getting bounced off some objects and that are tuned to the temporal evolution of the partially absorbed by differentmaterials. The source that is being tracked in order to under­ nature of the acoustic wave arising from a stand its message. source depends on the mechanical proper­ Aspects of these complex areas are ad­ ties both of that source and of the interaction dressed in this chapter to give a nonexhaus­ with other objects that set it into vibration. tive flavor for current work in auditory per­ Many of these excitatory actions are extended ception and cognition. We focus on auditory through time, and this allows a listener to pick scene analysis, timbre and sound source per­ up important information concerningboth the ception, temporal pattern processing, and at­ source and the action through an analysis of tentional processes in hearing and finish with the sequences of events produced. Further, a consideration of developmental issues con­ most environments contain several vibrating cerning these areas. The reader may wish to structures, and the acoustic waves impending consult several general texts for additional on the eardrums represent the sum of many information and inspiration (Bregman, 1990; sources, some near, others farther away. Handel, 1989; McAdams & Bigand, 1993; To perceive what is happening in the envi­ Warren, 1999, with an accompanying CD), ronment and adjust its behavior appropriately as well as compact discs of audio demos to the sound sources present, a listening organ­ (Bregman & Ahad, 1995; Deutsch, 1995; ism must be able to disentangle the acoustic Houtsma, Rossing & Wagenaars, 1987). 398 Auditory Perception and Cognition

AUDITORY SCENE ANALYSIS tion of sound-producing events. This process is called auditory scene analysis (Bregman, It is useful for an organism to build a mental 1990) by analogy with the analysis of a vi­ representation of the acoustic environment in sual scene in terms of objects (see Chap. 5, terms of the behavior of sound sources (ob­ this volume, for a comparison of how these jects set into vibration by actions upon them) two sensory systems have come to solve anal­ in order to be able to structure its behavior in ogous problems). Contrary to vision, in which relation to them. We can hear in the same room a contiguous array of stimulation of the sen­ and at the same time the noise of someone sory organ corresponds to an object (although typing on a keyboard, the sound of someone this is not always the case, as with partially walking, and the speech of someone talking occluded or transparent objects), in hearing in the next room. From a phenomenological the stimulation is a distributed ar­ point of view, we hear all of these sounds as if ray mapped onto the basilar membrane. For a they arrive independently at our ears without complex sound arising from a given source, distortion or interference among them, unless, the must thus reunite the of course, one source is much more intense sound components coming from the same than the others, in which case it would mask source that have previously been channeled them, making them inaudible or at least less into separate auditory nerve fibers on the basis audible. of their frequency content. Further, it must The acoustic waves of all sources are com­ separate the information coming from dis­ bined linearly in the atmosphere, and the com­ tinct sources that contain close posite waveform is then analyzed as such by that would stimulate the same auditory nerve the peripheral auditory system (Figure 10.1 ; fibers . This is the problem of concurrent or­ see Chap. 9, this volume). Sound events ganization. The problem of sequential orga­ are not opaque like most visual objects are. nization concerns perceptually connecting (or The computational problem is thus to inter­ binding) over time successive events emitted pret the complex waveform as a combina- by the same source and segregating events

'"grandpere" "grandpere" mixed with playground noises and ducks . . 0- Cl) '-...

time grand pere

Figure 10.1 Spectrogram of (a) a target sound-the word grandpere ("grandfather" in French)-and (b) the target sound embedded in a noisy environment (a children's playground with voices and ducks). NOTE: A spectrogram represents time on the horizontal axis and frequency on the vertical axis. The level at a given frequency is coded by the darkness of the spectrographic trace. Note that in many places in the mixture panel, the frequency information of the target sound is strongly overlapped by that of the noisy environment. In particular, the horizontal lines representing harmonic frequency components of the target word become intermingled with those of other voices in the mixture. Auditory Scene Analysis 399 coming from independent sources in order to which all the acoustic information is carried follow the message of only one source at a by a small number of adjacent auditory nerve time. fibers, different periodicities in the stimulat­ This section examines the mechanisms that ing waveform can be discerned on the basis of are brought into play by the auditory system the temporal pattern of neural discharges that to analyze the acoustic events and the be­ are time-locked to the stimulating waveform havior over time of sound sources. The ul­ (see Chap. 9, this volume). The term auditory timate goal of such a system would be to event refers to the unity and limited temporal segregate perceptually actions that occur si­ extent that are experienced when, for exam­ multaneously; to detect new actions in the en­ ple, a single sound source is set into vibration vironment; to follow actions on a given ob­ by a time-limited action on it. Some authors ject over time; to compute the properties of use the term auditory objects, but we pre­ sources to feed into categorization, recogni­ fer to distinguish objects (as vibrating physi­ tion, identification, and comprehension pro­ cal sources) from perceptual events. A single cesses; and to use knowledge of derived source can produce a series of events. attributes to track and extract sources and A relatively small number of acoustic cues messages. We consider in order the processes appear to signal either common behavior involved in auditory event formation (concur­ among acoustic components (usually arising rent grouping), the distinction of new event from a single source) or incoherent behav­ arrival from change of an ongoing event, audi­ ior between components arising from distinct tory stream formation (sequential grouping), sources. The relative contribution of a given the interaction of concurrent and sequential cue for scene analysis, however, depends on grouping factors, the problem posed by the the perceptual task in which the listener is en­ transparency of auditory events, and, finally, gaged: Some cues are more effective in signal­ the role of schema-based processes in audi­ ing grouping for one attribute, such as iden­ tory organization. tifying the pitch or vowel quality of a sound, than for another attribute, such as judging its position in space. Furthermore, some cues are Auditory Event Formation more resistant than are others to environmen­ (Concurrent Grouping) tal transformations of the acoustic waves orig­ The processes of concurrent organization re­ inating from a vibrating object (reflections, sult either in the perceptual fusion or group­ reverberation, filtering by selective absorp­ ing of components of the auditory sensory tion, etc.). representation into a single auditory event or Candidate cues for increasing segregation in their perceptual segregation into two or of concurrent sounds include inharmonicity, more distinct events that overlap in time. The irregularity of spacing of frequency compo­ nature of these components of the sensory nents, asynchrony of onset or offset of com­ representation depends on the dual coding ponents, incoherence of change over time of scheme in the auditory periphery. On the one level and frequency of components, and dif­ hand, different parts of the acoustic frequency ferences in spatial position. spectrum are represented in separate anatom­ ical locations at many levels of the auditory Harmonicity system, a representation that is called tono­ In the environment two unrelated sounds topic (see Chap. 9, this volume). On the other rarely have frequency components that line up hand, even within a small frequency range in such that each frequency is an integer multiple 400 Auditory Perception and Cognition of the fundamental frequency (FO) , which above approximately 2000 Hz, where tempo­ is called a harmonic series. It is even less ral information in the neural discharge pat­ likely that they will maintain this relation with tern is no longer reliably related to wave­ changes in frequency over time. A mechanism form periodicities (Hartmann, McAdams, & that is sensitive to deviations from harmonic­ Smith, 1990). A mistuned harmonic can also ity and groups components having harmonic affect the virtual pitch (see Chap. 11, this relations could be useful for grouping acous­ volume) of the whole complex, pulling it in tic components across the spectrum that arise the direction of mistuning. This pitch shift from a single source and for segregating those increases for mistunings up to 3% and then that arise from distinct sources. decreases beyond that, virtually disappear­ Two main classes of stimuli have been used ing beyond about 8% (Hartmann, 1988; to study the role of harmonicity in concur­ Hartmann et aI., 1990; Moore, Glasberg, & rent grouping: harmonic complexes with a Peters, 1985). This relation between mis­ single component mistuned from its purely tuning and pitch shift suggests a harmonic­ harmonic relation and complexes composed template model with a tolerance function on of two or more sets of harmonic components the harmonic sieve (Duifhuis, Willems, & with a difference in fundamental frequency Sluyter, 1982) or a time-domain autocoinci­ (Figures 10.2a-b). dence processor (de Cheveigne, 1993) with a Listeners report hearing out a single, mis­ temporal margin of error. Harmonic mistun­ tuned harmonic component from the rest of ing can also affect vowel perception by influ­ the complex tone if its harmonic rank is low encing whether the component frequency is and the mistuning is around 2% of its nominal integrated into the computation of the spectral frequency (Moore, Peters, & Glasberg, 1985). envelope that determines the vowel identity If mistuning is sufficient, listeners can match (Darwin & Gardner, 1986). By progressively the pitch of the segregated harmonic, but this mistuning this harmonic, a change in vowel ability deteriorates at component frequencies percept has been recorded up to about 8%

Figure 10.2 Stimuli used to test concurrent grouping cues. NOTE: a) Harmonicity tested with the mistuned harmonic paradigm. A harmonic stimulus without the fundamental frequency still gives a pitch at that frequency (dashed line). A shift of at least 2% but no more than 8% in the frequency of the fourth harmonic causes the harmonic to be heard separately but still contributes to a shift in the pitch of the complex sound. b) Harmonicity tested with the concurrent vowel paradigm. In the left column two vowels (indicated by the spectral envelopes with formant peaks) have the same fundamental frequency (FO). The resulting spectrum is the sum of the two, and the new spectral envelope does not correspond to either of the vowels, making them difficult to identify separately. In the right column, the FO of one of the vowels is shifted, and two separate groups of harmonics are repre­ sentedin the periodicity information in the auditory nerve, making the vowels more easily distinguished. c) Spectral spacing. An even harmonic in an odd-harmonic base, or vice versa, is easier to hear out than are the harmonics of the base. d) Onset/offset asynchrony. When harmonics start synchronously, they are fused perceptually into a single perceptual event. An asynchrony of the onset of at least 30-50 ms makes the harmonic easier to hear out. An asynchrony of the offset has a relatively weak effect on hearing out the harmonic. e) Level comodulation (comodulation masking release). The amplitude envelopes of a sine tone (black) and a narrow band noise with a modulating envelope (white) are shown. The masking threshold of the sine tone in the noise is measured. When flanking noise bands with amplitude envelopes identical to that of the on-signal band are added, the masked threshold of the sine tone decreases by about 3 dB. f) Frequency comodulation. A set of harmonics that are coherently modulated in frequency (with a sinusoidal vibrato in this example) are heard as a single event. Making the modulation incoherent on one -- -, -- - ,. - --!- -. ~ I..~~- ~ ... \..o~n .. ~o "f thp t ; tYlP_"~,",';na ;nh~rm(mi('itv th::lt j" created. Auditory Scene Analysis 401 a) Harmonicity Mistuned harmonic b) Harmonicity · FO difference

c.. E ca ~~~ + freq + freq freq i ~bdb1h~ freq freq =

III IIIfreq ~lhilwl freq freq c) Spectral spacing d) Onset/offset synchrony

Odd base ~E Synchronous ~II I : I time 1 f 3 If 5f 6f 7f freq Onset Even base ~E asynchrony time ~I I : I Offset I I ~~ asynchrony 2 f 4f5f6f 8f freq time

e) Level comodulation f) Frequency comodulation 0. E Sine tone in noise band ca Coherent modulation IlL...--_ 0. E time ca Incoherent modulation

time 402 Auditory Perception and Cognition mistuning, as for the pitch-shift effect. Note sufficient. A mechanism might be used that that above 2% mistuning, the component is operates on the temporal fine structure of heard as a separate event, but it continues to the neural discharge pattern in the auditory affect pitch and vowel perception up to a mis­ nerve fiber array. Two possibilities have been tuning of about 8%. examined by experimentation and modeling A difference in fundamental frequency (de Cheveigne, 1993). One proposes that the (~FO) of 2% across two sets of harmon­ extraction process uses the harmonicity of the ics forming different synthetic speech for­ target event. The other hypothesizes inversely mants 1 gives an impression of two sources that the harmonicity of the backg round is used with the same vowel identity (Cutting, 1976; to cancel it out. In line with the cancellation Gardner, Gaskill, & Darwin, 1989), but much hypothesis, results from concurrent double­ larger ~FOs are necessary to affect the identity vowel experiments show that it is easier to of consonant-vowel (CV) syllables. In four­ identify periodic target vowels than inhar­ formant CV syllables in which perceptual seg­ monic or noisy ones, but that the harmonic­ regation of the second formant (F2) changes ity of the target vowel itself has no effect (de the identity from fruf to /lil, a 58% ~FO on Cheveigne, Kawahara, Tsuzaki, & Aikawa, F2 gave /lil plus an additional buzz, whereas 1997; de Cheveigne, McAdams, Laroche, & 4% gave an impression of two sources, both Rosenberg, 1995; de Cheveigne, McAdams, being fruf (Darwin, 1981). Fundamental fre­ & Marin, 1997; Lea, 1992). The harmonic quency differences increase the intelligibil­ cancellation mechanism is extremely sensi­ ity of a target speech stream mixed with a tive because improvement in vowel identifi­ nonsense stream up to approximately 3%. In­ cation can be obtained with a ~FO as small as telligibility remains relatively constant there­ 0.4% (de Cheveigne, 1999). after, but with a drop at the octave (Brokx & Nooteboom, 1982). At this 2: 1 frequency Regularity of Spectral Spacing ratio, the frequency components ofthe higher­ The results from studies of harmonicity sug­ FO complex would coincide perfectly with gest a role for temporal coding rather than the even-ranked components of the lower­ spectral coding in concurrent grouping. This FO complex. In more controlled experiments view is complicated, however, by results con­ in which pairs of synthesized vowels are cerning the regularity of spectral spacing, that mixed and must both be identified, perfor­ is, the pattern of distribution of components mance increases up to a 3% ~FO (Assmann & along the frequency diinension. If a listener Summerfield, 1990; Scheffers, 1983). \~/ is presented with a base spectrum composed One question raised by these studies is how of only the odd harmonics plus one even the harmonic structure of the source is ex­ harmonic, the even harmonic is more easily ploited by a segregation mechanism. At the heard out than are its odd neighbors (Fig­ small differences in FO that give significant ure 1O.2c). This is true even at higher har­ identification improvement, it is clear that monic numbers, where spectral resolution2 cochlear frequency selectivity would be in-

2The frequency organization along the basilar membrane in the cochlea (see Chap. 9, this volume) is roughly loga­ I Formants are regions in the frequency spectrum where rithmic, so higher harmonics are more closely spaced than the energy is higher than in adjacent regions. They are are lower harmonics. At sufficiently high ranks, adjacent due to the resonance properties of the vocal tract and harmonics no longer stimulate separate populations of determine many aspects of consonant and vowel identity auditory nerve fibers and are thus "unresolved" in the (see Chap. 12, this volume). tonotopic representation. Auditory Scene Analysis 403 is reduced. Note that the even harmonic sur­ more, the asynchrony effect is abolished if rounded by odd harmonics would be less re­ the asynchronous portion of the component solved on the basilar membrane than would ei­ (i.e., the part that precedes the onset of the ther of its neighbors. Contrary to the ~FO cue, vowel complex) is grouped with another set harmonic sieve and autocoincidence models of components that are synchronous with it cannot account for these results (Roberts & alone and that have a common FO that is dif­ Bregman, 1991). Nor does the underlying ferent from that of the vowel. This result sug­ mechanism involve a cross-channel compari­ gests that it is indeed a grouping effect, not son of the amplitude modulation envelope in the result of adaptation (Darwin & Sutherland, the output of the auditory filter bank, because 1984). minimizing the modulation depth or perturb­ The effect of a mistuned component on the ing the modulation pattern by adding noise pitch of the complex (discussed earlier) is in­ does not markedly reduce the difference in creasingly reduced for asynchronies from 80 hearing out even and odd harmonics (Roberts to 300 ms (Darwin & Ciocca, 1992). This & Bailey, 1993). However, perturbing the reg­ latter effect is weakened if another compo­ ularity of the base spectrum by adding extra­ nent groups with a preceding portion of the neous components or removing components asynchronous component (Ciocca & Darwin, reduces the perceptual "popout" of even har­ 1993). Note that in these results the asyn­ monics (Roberts & Bailey, 1996), confirming chronies necessary to affect pitch percep­ the spectral pattern hypothesis. tion are much greater than are those that affect vowel perception (Hukin & Darwin, Onset and Offset Asynchrony 1995a). Unrelated sounds seldom start or stop at ex­ actly the same time. Therefore, the audi­ Coherence of Change in Level tory system assumes that synchronous com­ From Gestalt principles such as common fate ponents are part of the same sound or were (see Chap. 5, this volume), one might ex­ caused by the same environmental event. Fur­ pect that common direction of change in level thermore, the auditory system is extremely would be a cue for grouping components to­ sensitive to small asynchronies in analyzing gether; inversely, independent change would the auditory scene. A single frequency com­ signal that segregation was appropriate. The ponent in a complex tone becomes audible on evidence that this factor is a grouping cue, its own with an asynchrony as small as 35 ms however, is rather weak. In experiments by (Rasch, 1978). Onset asynchronies are more Hall and colleagues (e.g., Hall, Grose, & effective than offset asynchronies are in cre­ Mendoza, 1995), a phenomenon called co­ ating segregation (Figure 10.2d; Dannenbring modulation masking release is created by & Bregman, 1976; Zera & Green, 1993). placing a narrow-band noise masker centered When a component is made asynchronous, it on a target frequency component (sine tone) also contributes less to the perceptual proper­ that is to be detected (Figure lO.2e). The ties computed from the rest of the complex. masked threshold of the tone is measured in For example, a 30-ms asynchrony can affect the presence of the noise. Then, noise bands timbre judgments (Bregman & Pinker, 1978). with similar or different amplitude envelopes Making a critical frequency component that are placed in more distant frequency regions. affects the estimation of a vowel sound's spec­ The presence of similar envelopes (i.e., co­ tral envelope asynchronous by 40 ms changes modulation) makes it possible to detect the the vowel identity (Darwin, 1984). Further- tone in the noise at a level of about 3 dB lower 404 Auditory Perception and Cognition than in their absence. The masking seems to be tween coherent and incoherent modulation released to some extent by the presence of co­ across the harmonics of several vowels ei­ modulation on the distant noise bands. Some ther on vowel prominence (McAdams, 1989) authors have attributed this phenomenon to or on vowel identification (Summerfield & the grouping of the noise bands into a sin­ Culling, 1992). However, frequency modula­ gle auditory image that then allows the noise tion can help group together frequency com­ centered on the tone to be interpreted as part ponents for computing pitch. In a mistuned of a different source, thus making detection harmonic stimulus, shifts in the perceived of the tone easier (Bregman, 1990, chap. 3). pitch of the harmonic complex continue to Others, however, consider either that cross­ occur at greater mistunings when all com­ channel detection of the amplitude envelope ponents are modulated coherently than when simply gives a cue to the auditory system con­ they are unmodulated (Darwin, Ciocca, & cerning when the masking noise should be in Sandell, 1994). a level dip, or that the flanking maskers sup­ press the on-signal masker (Hall et aI., 1995; Spatial Position McFadden & Wright, 1987). It was thought early on that different spa­ tial positions should give rise to binaural Coherence of Change in Frequency cues that could be used to segregate tempo­ For sustained complex sounds that vary in fre­ rally and spectrally overlapping sound events. quency, there is a tendency for all frequen­ Although work on speech comprehension cies to change synchronously and to main­ in noisy environments (e.g., Cherry's 1953 tain the frequency ratios. As such, one might "cocktail party effect") emphasized spatial imagine that frequency modulation coherence cues to allow listeners to ignore irrelevant would be an important cue in source grouping sources, the evidence in support of such cues (Figure 10.20. The effects of frequency mod­ for grouping is in fact quite weak. An interau­ ulation incoherence may have two origins: ral time difference (ITD) is clearly a power­ within-channel cues and cross-channel cues. ful cue for direction (see Chap. 9, this vol­ Within-channel cues would result from the ume), but it is remarkably ineffective as a interactions of unresolved components that cue for grouping simultaneous components changed frequency incoherently over time, that compose a particular source (Culling creating variations in beating or roughness in & Summerfield, 1995; Hukin & Darwin, particular auditory channels. They could sig­ 1995b). nal the presence of more than one source. Such The other principal grouping cues gener­ cues are detectable for both harmonic and in­ ally override spatial cues. For example, the harmonic stimuli (McAdams & Marin, 1990) detection of changes in ITD on sine com­ but are easier to detect for the former because ponents across two successive stimulus in­ of the reliability of within-channel cues for tervals is similar when they are presented in periodic sounds. Frequency modulation co­ isolation or embedded within an inharmonic herence is not, however, detectable across au­ complex. However, detection performance is ditory channels (i.e., in distant frequency re­ much worse when they are embedded within gions) above and beyond the mistuning from a harmonic complex; thus harmonicity over­ harmonicity that they create (Carlyon, 1991, rides spatial incoherence (Buell & Hafter, 1992, 1994). Although frequency modulation 1991). Furthermore, mistuning a component increases vowel prominence when the ~FO can affect its lateralization (Hill & Darwin, is already large, there is no difference be- 1996), suggesting that grouping takes place on Auditory Scene Analysis 405 the basis of harmonicity, and only then is the strength of the evidence available and the per­ spatial position computed on the basis of the ceptual task in which the listener is engaged lateralization cues for the set of components (Bregman, 1993). As many ofthe results cited that have been grouped together (Darwin & earlier demonstrate, the grouping and segre­ Ciocca, 1992). gation of information in the auditory sensory Lateralization effects may be more sub­ representation precedes and thus determines stantial when the spatial position is attended the perceptual properties of a complex sound to over an extended time, as would be the case source, such as its spatial position, its pitch, in paying sustained attention to a given sound or its timbre. However, the perceived prop­ source in a complex environment (Darwin & erties can in turn become cues that facilitate Carlyon, 1995). Listeners can attend across sustained attending to, or tracking of, sound time to one of two spoken sentences distin­ sources over time. guished by small differences in ITD, but they do not use such continuity of ITD to deter­ New Event Detection versus Perception mine which individual frequency components of a Changing Event should form part of a sentence. These results suggest that ITD is computed on the periph­ The auditory system appears to be equipped eral representation of the frequency compo­ with a mechanism that triggers event-related nents in parallel to a grouping of components computation when a sudden change in the on the basis of harmonicity and synchrony. acoustic array is detected. The computation Subsequently, direction is computed on the performed can be a resampling of some prop­ grouped components, and the listener attends erty of the environment, such as the spatial to the direction of the grouped object (Darwin position of the source, or a grouping process & Hukin, 1999). that results in the decomposition of an acous­ tic mixture (Bregman, 1991). This raises the General Considerations Concerning questions of what constitutes a sudden change Concurrent Grouping indicating the arrival of a new event and how Note that there are several possible cues for it can be distinguished from a more gradual grouping and segregation, which raises the change that results from an evolution of an possibility that what the various cues sig­ already present event. nal in terms of source structures in the en­ An example of this process is binaural vironment can diverge. For example, many adaptation and the recovery from such ad­ kinds of sound sources are not harmonic, but aptation when an acoustic discontinuity is the acoustic components of the events pro­ detected. Hafter, Bue1l, & Richards (1988) duced by them would still start and stop at the presented a rapid (40/s) series of clicks bin­ same time and probably have a relatively fixed aurally with an interaural time difference that spatial position that could be attended to. In gave a specific lateralization of the click train many cases, however, redundancy of segrega­ toward the leading ear (Figure 1O.3a). As one tion and integration cues works against am­ increases the number of clicks in the train, biguities in inferences concerning grouping accuracy in discriminating the spatial posi­ on the basis of sensory information. Further­ tion between two successive click trains in­ more, the cues to scene analysis are not all­ creases, but the improvement is progressively or-none. The stronger they are, the more they less (according to a compressive power func­ affect grouping, and the final perceptual result tion) as the click train is extended in duration. is the best compromise on the basis of both the The binaural system thus appears to become 406 Auditory Perception and Cognition

a) Binaural adaptation release b) Suddenness of change

c. Short train ~ Gradual change (RT > 100 ms) -Right ear E I I I ca ca I I I --_. Left I I I ear I I I time c. Long train ~[ffrfr"..~=~~~~1I E I I I $9' time ca I I I I I I ~ Sudden change (RT < 100 ms) I I I time ca Long train short trains gap c. = 2 + I E I I I I ca I I I I I I I I I I I I time

Figure 10.3 Stimuli used to test the resetting of auditory sampling of the environment upon new event detection. NOTE: a) Binaural adapatation release. A train of clicks (interclick separation = 2.5 ms) is sent to the two ears with a small interaural time difference (ITD) that displaces the perceived lateralization of the sound toward the leading ear (the right ear in this example). The just noticeable ITD decreases as a function of the number of clicks in the train, but the relative contribution of later clicks is lesser than is that of the earlier clicks, indicating binaural adaptation. Release from adaptation is triggered by the detection of a new event, such as a discontinuity in the click train (e.g., a silent gap of 7.5 ms). b) Suddenness of change. The amplitude envelopes on harmonic components are shown. All harmonics are constant in level except one, which increases in level in the middle. A slow change in level (> 100 ms) is heard as a change in the timbre of the event, whereas a sudden change « 100 ms) is heard as a new (pure-tone) event.

progressively quiet beyond stimulus onset for puts and to update its spatial map at the time of constant stimulation. However, if some kind the restart, suggesting that knowledge about of discontinuity is introduced in the click train the direction of a source may rely more on (a longer or shorter gap between clicks, or a memory than on the continual processing of brief sound with a sudden onset in a remote ongoing information. spectral region, even of fairly low intensity), Similarly, an acoustic discontinuity can the spatial environment is suddenly resampled provoke the emergence of a new pitch in an at the moment of the discontinuity. A com­ otherwise continuous complex tone. A sud­ plete recovery from the process of binaural den interaural phase disparity or frequency adaptation appears to occur in the face of such disparity in one component of a complex discontinuities and indicates that the auditory tone can create successive-difference cues system is sensitive to perturbations of regular­ that make the component emerge (Kubovy, ity. Hafter and Buell (1985) proposed that at 1981; Kubovy, Cutting, & McGuire, 1974). a fairly low level in the auditory system, mul­ In this case, the successive disparity triggers tiple bands are monitored for changes in level a recomputation of which pitches are present. that might accompany the start of a new signal Thus, various sudden changes trigger res am­ or a variation in the old one. Sudden changes pling. But how fast a change is "sudden"? If cause the system to res ample the binaural in- listeners must identify the direction of change Auditory Scene Analysis 407 in pitch for successive pure-tone events added 3. Streaming is cumulative. The auditory sys­ in phase to a continuous harmonic complex tem appears by default to assume that a (Figure 1O.3b), performance is a monotone sequence of events arises from a single decreasing function of rise time; that is, the source until enough evidence to the con­ more sudden the change, the more the change trary can be accumulated, at which point is perceived as a new event with its own pitch, segregation occurs (Bregman, 1978b). and the better is the performance. From these Also, if a cyclical sequence is presented results Bregman, Ahad, Kim, and Melnerich over a long period of time (several tens (1994) proposed that "sudden" can be defined of seconds), segregation tends to increase as basically less than 100 ms for onsets. (Anstis & Saida, 1985). 4. Sequential grouping precedes stream at­ Auditory Stream Formation tribute computation. The perceptual prop­ (Sequential Grouping) erties of sequences depend on what events are grouped into streams, as was shown for The processes of sequential organization re­ concurrent grouping and event attributes. sult in the perceptual integration of succes­ A corollary of this point is the fact that sive events into a single auditory stream or the perception of the order of events de­ their perceptual segregation into two or more pends on their being assigned to the same streams. Under everyday listening conditions, stream: It is easier to judge temporal order an auditory stream corresponds to a sequence on within-stream patterns than on across­ of events emitted by a single sound source. stream patterns that are perceptually frag­ mented (Bregman & Campbell, 1971; van General Considerations Concerning Noorden, 1975). Sequelltial Grouping Several basic principles of auditory stream The cues that determine sequential auditory formation emerge from research on sequential organization are closely related to the Gestalt grouping. These principles reflect regularities principles of proximity and similarity (see in the physical world that shaped the evolution Chap. 5, this volume). The notion of prox­ of the auditory mechanisms that detect them. imity in audition is limited here to the tem­ poral distance between events, and similarity 1. Source properties change slowly. Sound encompasses the acoustic similarity of suc­ sources generally emit sequences of events cessive events. Given the intrinsically tem­ that are transformed in a progressive man­ poral nature of acoustic events, grouping is ner over time. Sudden changes in event considered in terms of continuity and rate of properties are likely to signal the presence change in acoustic properties between succes­ of several sources (Bregman, 1993). sive events. In considering the acoustic factors 2. Events are allocated exclusively to streams. that affect grouping in the following, keep A given event is assigned to one or another in mind that not all acoustic differences are stream and cannot be perceived as belong­ equally important in determining segregation ing to both simultaneously (Bregman & (Hartmann & Johnson, 1991). Campbell, 1971), although there appear to be exceptions to this principal in interac­ Frequency Separation and Temporal tions between sequential and concurrent Proximity grouping processes and in duplex percep­ A stimulus sequence composed of two al­ tion (discussed later). ternating frequencies in the temporal pattern 408 Auditory Perception and Cognition

ABA-ABA- (where "-" indicates a si­ critical band (i.e., they stimulate overlapping lence) is heard as a galloping rhythm if the sets of auditory nerve fibers) in a high fre­ tones are integrated into a single stream and quency region, and if X and Y are within a . r as two isochronous sequences (A-A-A­ critical band in a low frequency region, then A- and BB) if they are seg­ A and B form one stream, and X and Y form regated. At slower tempos and smaller fre­ another stream (see Figure 10.5). If X and Y quency separations, integration tends to occur, are now moved to the same frequency region whereas at faster tempos and larger frequency as A and B such that A and X are close and separations, segregation tends to occur. Van B and Y are close, without changing the fre­ Noorden (1975) measured the frequency sep­ quency ratios between A and B nor between X aration at which the percept changes from in­ and Y, then the relative frequency differences tegration to segregation or vice versa for vari­ predominate and streams of A-X and B-Y are ous event rates. Iflisteners are instructed to try obtained. to hear the gallop rhythm or conversely to fo­ The abruptness of transition from one fre­ cus on one of the isochronous sequences, tem­ quency to the next also has an effect on stream poral coherence and fission boundaries are ob­ segregation. In the studies just cited, one tone tained, respectively (see Figure 10.4). These stops on one frequency, and the next tone be­ functions do not have the same form. The fis­ gins at a different frequency. In many sound sion boundary is limited by the frequency res­ sources that produce sequences of events and olution of the peripheral auditory system and vary the fundamental frequency, such as the is relatively unaffected by the event rate. The voice, such changes may be more gradual. temporal coherence boundary reflects the lim­ Bregman & Dannenbring (1973) showed that its of inevitable segregation and strongly de­ the inclusion of frequency ramps (going to­ pends on tempo. Between the two is an am­ ward the next tone at the end and com­ biguous region where the listener's perceptual ing from the previous tone at the beginning) intent plays a strong role. or even complete frequency glides between Streaming is not an all-or-none phe­ tones yielded greater integration of the se­ nomenon with clear boundaries between in­ quence into a single stream. tegration and segregation along a given sen­ sory continuum, however. In experiments in The Cumulative Bias toward Greater which the probability of a response related Segregation to the degree of segregation was measured Anstis and Saida (1-985) showed that there is (Brochard, Drake, Botte, & McAdams, 1999), a tendency for reports of a segregated per­ the probability varied continuously as a func­ cept to increase over time when listening to tion of frequency separation. This does not alternating-tone sequences. This stream bi­ imply that the percept is ambiguous. It is ei­ asing decays exponentially when the stimu­ ther one stream or two streams, but the prob­ lus sequence is stopped and has a time con­ ability of hearing one or the other varies for a stant of around 4 s on average (Beauvois & given listener and across listeners. Meddis, 1997). Anstis and Saida proposed a It is not the absolute frequency difference mechanism involving the fatigue of frequency that determines which tones are bound to­ jump detectors to explain this phenomenon, gether in the same stream, but rather the rela­ but Rogers and Bregman (1993a) showed that tive differences among the frequencies. Breg­ an inductor sequence with a single tone could man (1978a), for example, used a sequential induce a bias toward streaming in the absence tone pattern ABXY. If A and B are within a of jumps. The biasing mechanism requires Auditory Scene Analysis 409

a) 1 stream: Gallop rhythm b) 2 streams: Isochronous rhythms

>­ >­ CJ CJ .!l...... __ ... .a C C Cl) Cl) :::J :::J tT tT Cl) 6 6 Cl) .... I~'" ~ .... LL ~.. ...~ _ ____ 0-'"' "",- LL A A A A A A A A Time Time

, A c) Temporal' A ...... - Coherence : Fission A A ~~~­ ~~- , A ~~~- : A "'.,.----' ' L.~~~- A I ...... A ~~-r A .- , A ~_ I , 6-~~ ,6 '6 B B B ...... A ~ ...;;;."J' I~- -- - __ - _~ - __ ---- ___ ------_____ I I,

Fission : Temporal : Coherence

~A A : ~~-~~ A ' ~-~, A B B B B ~ __i'_~_ B A ------__ ------__ - I ~_ I

e) 15~------~------~

c o ::: ~-;n10 1 or 2 ca Cl) 2 streams Q.c streams ~2 (ambiguous >-E region) CJ Cl) 5 ~!!. :::J tT Cl).... A3 ~--+----' LL 1 stream 0-1-...... ------.------.....------1 50 100 150 Interonset interval (ms)

Figure 10.4 Van Noorden's temporal coherence and fission boundaries. NOTE: A repeating "ABA-" pattern can give a percept of either a) a gallop rhythm or b) two isochronous sequences, depending on the presentation rate and AB frequency difference. c) To measure the temporal coherence boundary (TCB), the initial frequency difference is small and increases while the listener attempts to hold the gallop percept. d) To measure the fission boundary (FB), the initial difference is large and is decreased while the listener tries to focus on a single isochronous stream. In both cases, the frequency separation at which the percept changes is recorded. The whole procedure is repeated at different interonset intervals, giving the curves shown in (e). SOURCE: Adapted from van Noorden (1975, Figure 2.7). Copyright © 1975 by Leon van Noorden. Adapted with permission. 410 Auditory Perception and Cognition

AB isolated AB absorbed B .--- _.--..... ·-·-0----y •.

...... ----0--X·...... A

y C" cv -...... ",,0 ...... -...... '0 .,

- X -;;'- . '~.

time

Figure 10.5 Two stimulus configurations of the type used by Bregman (1978) to investigate the effect of relative frequency difference on sequential grouping. NOTE: Time is on the horizontal axis, and frequency is on the vertical axis. On the left, pure tones A and B are isolated from pure tones X and Y and are heard in a single stream together. Dashed lines indicate stream organization. On the right, A and B have the same frequency relation, but the insertion of X and Y between them causes their reorganization into separate streams because the A-X and B-Y differences are proportionally much smaller than are the A-B and X-Y differences.

the inductor stimulus to have the same spa­ ates the streaming. Spectral differences can tiallocation and loudness as the test sequence compete even with pitch-based patterns in de­ (Rogers & Bregman, 1998). When disconti­ termining the melodies that are heard (Wessel, nuities in these auditory attributes are present, 1979). Note that in music both pitch register the test sequence is more integrated-a sort and instrument changes produce discontinu­ of sequential counterpart to the resampling­ ities in spectral content. on-demand mechanism discussed previously. Although other timbre-related differences Such a mechanism would have the advantage can also induce segregation, not all percep­ of preventing the auditory system from accu­ tual attributes gathered under the term timbre mulating data across unrelated events. (discussed later) do so. Listeners are unable to identify interleaved melodies any better when Timbre-Related Differences different amplitude envelopes are present on Sequences with alternating tones that have the tones of the two melodies than when they the same fundamental frequency (i.e., same are absent, and differences in auditory rough­ virtual pitch) but that are composed of dif­ ness are only weakly useful for melody segre­ ferently ranked harmonics derived from that gation, and only for some listeners (Hartmann fundamental (i.e., different timbres) tend to & Johnson, 1991). However, dynamic (tem­ segregate (Figure 1O.6a; van Noorden, 1975). poral) cues can contribute to stream segre­ Differences in spectral content can thus cause gation. Iverson (1995) played sequences that stream segregation (Hartmann & Johnson, alternated between different musical instru­ 1991; Iverson, 1995; McAdams & Bregman, ments at the same pitch and asked listen­ 1979). It is therefore not pitch per se that cre- ers for ratings of the degree of segregation. Auditory Scene Analysis 411

(a) Spectral discontinuity (b) Discontinuity in period

11f_ 10f__ 9f- freq ••• 0.. 5f_ E 4f- ~ 3f-

f ...... _. 1IR:>I>j~t IRSIRS.AU IIAAARR~ time ~ freq

(c) Discontinuity in level

ilL = 0 dB : integration 0.. 0.. ilL = 5 dB : segregration E E ~ ~

ilL = 10 dB : roll effect LlL = 18 dB : continuity 0.. E c:l

Figure 10.6 Stimuli used to test sequential grouping cues. NOTE: a) Spectral discontinuity. An alternating sequence of tones with identical fundamental frequencies gives rise to a perception of constant pitch but differing timbres when the spectral content of the tones are different. This discontinuity in spectral content also creates a perceptual segregation into two streams. b) Discontinuity in period. A harmonic complex that is filtered in the high-frequency region gives rise to a uniform pattern of excitation on the basilar membrane, even if the period of the waveform (the fundamental frequency) is changed. The upper diagram has a lower FO than has the lower diagram. There can be no cue of spectral discontinuity in a sequence of tones that alternates between these two sounds, yet segregation occurs on the basis of the difference in period, presumably carried by the temporal pattern of neural discharges in the auditory nerve. c) Discontinuity in level. A sequence of pure tones of constant frequency but alternating in level gives rise to several percepts depending on the relative levels. A single stream is heard if the levels are close. Two streams at half the tempo are heard if the levels differ by about 5 dB . A roll effect in which a louder half-tempo stream is accompanied by a softer full-tempo stream is obtained at certain rapid tempi when the levels differ by about 10 dB. Finally, at higher tempi and large differences in level, a louder pulsing stream is accompanied by a softer continuous tone. 412 Auditory Perception and Cognition

Multidimensional scaling analyses of these tern of neural discharges. In a study in which ratings revealed a perceptual dimension re­ integration was required to succeed, a time lated to the temporal envelope of the sound shift was applied to the B tones in a gal­ in addition to a more spectrum-based di­ lop sequence (ABA-ABA-), and listeners mension. The temporal coherence boundary were required to detect the time shift. This for alternating-tone or gallop sequences is task is more easily performed when the A and situated at a smaller frequency separation B tones are integrated and the gallop rhythm if differences in temporal envelope are also is heard than when they are segregated and present, suggesting that dynamic cues can two isochronous sequences are heard (Figure combine with spectrum- or frequency-based 10.4; Vliegen, Moore, & Oxenham, 1999). cues (Singh & Bregman, 1997). Listeners can It seems that whereas period-based cues in also use voice-timbre continuity rather than the absence of distinctive spectral cues can FO continuity to disambiguate intersections be used for segregation, as shown in the first in voices that cross in FO contour (Culling study, they do not induce obligatory segre­ ~'. & Darwin, 1993). When there is no timbre gation when the listener tries to achieve in­ difference between crossing glides, bouncing tegration to perform the task. Period-based contours are perceived (a pitch similarity ef­ cues are thus much weaker than spectral cues fect), but when there is a difference, crossing (Grimault, Micheyl, Carlyon, Arthaud, & contours are perceived (a timbre-similarity Collet, 2000), suggesting that temporal infor­ effect). mation may be more useful in tasks in which selective attention can be used in addition to Differences in Period primitive scene-analysis processes. Several studies have led to the conclusion that local differences in excitation patterns on Differences in Level the basilar membrane and fine-grained tem­ Level differences can also create segrega­ poral information may also contribute interac­ tion, although this effect is quite weak. Van tively to stream segregation (Bregman, Liao & N oorden (1977) found that alternating pure­ Levitan, 1990; Singh, 1987), particularly tone sequences with constant frequency but when the harmonics of the stimuli are re­ level differences on the order of 5 dB segre­ solved on the basilar membrane. Vliegen and gated into loud and soft streams with identical Oxenham (1999) used an interleaved melody tempi (Figure 1O.6c). Hartmann and 10hnson recognition task in which the tones of a tar­ (1991) also found a weak effect of level dif­ get melody were interleaved with those of a ferences on interleaved melody recognition distractor sequence. If segregation does not performance. When van Noorden increased occur at least partially, recognition of the tar­ the level difference and the sequence rate was get is nearly impossible. In one condition, relatively fast (greater than 13 tones/s), other they applied a band-pass filter that let unre­ perceptual effects began to emerge. For differ­ solved harmonics through (Figure 1O.6b). Be­ ences of around 10 dB, a percept of a louder cause the harmonics would not be resolved in stream at one tempo accompanied by a softer the peripheral auditory system, there would stream at twice that tempo was obtained. For be no cue based on the tonotopic representa­ even greater differences (> 18 dB), a louder in­ tion that could be used to segregate the tones. termittent stream was accompanied by a con­ However, segregation did occur, most likely tinuous softer stream. In both cases, the more on the basis of cues related to the periods of intense event would seem to be interpreted the waveforms carried in the temporal pat- as being composed of two events of identical Auditory Scene Analysis 413 spectral content. These percepts are exam­ ples of what Bregman (1990, chap. 3) has termed the old-plus-new heuristic (discussed later). RLRLRLRL eO <)I I' ~o 6 ii Differences in Spatial Location ~oon Dichotically presented alternating-tone se­ b) quences do not tend to integrate into a trill

Q Q Q Q 0 0 0 0 percept even for very small frequency separa­ it e, L , 0 I' 0 0 0 tions (van Noorden, 1975). Similarly, listeners "

can easily identify interleaved melodies pre­ 0 0 0 0 0 0 0 R sented to separate ears (Hartmann & J ohnson, f:? 6 El -- 0 0 - 1991). Ear of presentation is not, however, a sufficient cue for segregation. Deutsch (1975) c) presented simultaneously ascending and de­

Cl 0 <> v 0 I) 11 0 0 L scending musical scales such that the notes o 0 o 0 alternated between ears; that is, the frequen­ cies sent to a given ear hopped around (Fig­ ure 10.7). Listeners reported hearing an up­ down pattern in one ear and a down-up pat­ Figure 10.7 Melodic patterns of the kind used by tern in the other, demonstrating an organiza­ Deutsch (1975). tion based on frequency proximity despite the NOTE: a) Crossing scales are played simultane­ alternating ear of presentation. An interaural ously over headphones. In each scale, the tones time difference is slightly less effective in cre­ alternate between left (L) and right eR) earpieces. b) The patterns that would be heard if the listener ating segregation than is dichotic presentation focused on a given ear. c) The patterns reported by (Hartmann & Johnson, 1991). listeners.

Interactions between Concurrent regation of A and B) were in competition for and Sequential Grouping Processes the same component (B). When the sequential Concurrent and sequential organization pro­ organization is reinforced by the frequency cesses are not independent. They can interact proximity of A and B and the concurrent or­ and even enter into competition, the final per­ ganization is weakened by the asynchrony of ceptual result depending on the relative orga­ Band C, A and B form a single stream, and C nizational strength of each one. In the physical is perceived with a pure timbre. If the concur­ environment, there is a fairly good consensus rent organization is reinforced by synchrony among the different concurrent and sequen­ while the sequential organization is weakened tial grouping cues. However, under laboratory by separating A and B in frequency, A forms conditions or as a result of compositional arti­ a stream by itself, and B fuses with C to form fice in music, they can be made to conflict with a second stream with a richer timbre. one another: Bregman and Pinker (1978) de­ veloped a basic stimulus (Figure 10.8) for test­ The Transparency of Auditory Events ing the situation in which a concurrent organi­ zation (fusion or segregation of B and C) and In line with the belongingness principle of a sequential organization (integration or seg- the Gestalt psychologists, Bregman (1990, 414 Auditory Perception and Cognition

eT eT Q) Q) -.... -.... A A A A •-. .- . -B· ". -----.8 ... c c .. - -- - ~------

time time

A & B are close in frequency A & B are distant in frequency Stimulus: B & C are asynchronous B & C are synchronous

A & B form one stream A & B are segregated Percept: The timbre of C is pure The timbre of C is richer

Figure 10.8 Schematic representative of some of the stimulus configurations used by Bregman and Pinker (1978) to study the competition between concurrent and sequential grouping processes. NOTE: Pure tone A alternates with a complex tone composed of pure tones Band C. The relative frequency proximity of A and B and the asynchrony of Band C are varied. When A and B are close in frequency and Band C are sufficiently asynchronous (left diagram), an AB stream is formed, and C is perceived as having a pure timbre. When A and B are distant in frequency and Band C are synchronous (right diagram), A forms a stream by itself, and Band C fuse into a single event with a richer timbre.

chap. 7) has proposed the principle of exclu­ are remixed in the same ear, a /ba/ sound re­ sion allocation: A given bit of sensory infor­ sults. However, when the form ant transition mation cannot belong to two separate percep­ and base sounds are presented to opposite tual entities simultaneously. In general, this ears, listeners hear both a /ba/ sound in the principle seems to hold: Parts of a spectrum ear with the base (integration of information that do not start at the same time are exhaus­ from the two ears to form the syllable) and a tively segregated into temporally overlapping simultaneous chirp in the other ear (Cutting, events, and tones presented sequentially are 1976; Rand, 1974). The formant transition exhaustively segregated into streams. There thus contributes both to the chirp and to the are, however, several examples of both speech /ba/-hence the term duplex. It is not likely and nonspeech sounds that appear to violate that this phenomenon can be explained by this principle. presuming that speech processing is uncon­ strained by primitive scene analysis mecha­ Duplex Perception of Speech nisms (Darwin, 1991). If the formant transition specifying a stop con­ To account for this apparent paradox, sonant such as /bl (see Chap. 12, this vol­ Bregman (1990, chap. 7) proposes a two­ ume) is excised from a consonant-vowel syl­ component theory that distinguishes sensory lable and is presented by itself, a brief chirp evidence from perceptual descriptions. One sound is heard. The remaining base part of the component involves primitive scene analysis original sound without the formant transition processes that assign links of variable strength gives a Ida/ sound. If the base and transition among parts of the sensory evidenCe". The link Auditory Scene Analysis 415 strength depends both on the sensory evidence idence, (c) that higher-level schemas can build (e.g., the amount of asynchrony or mistun­ from regularities detected in the descriptions ing for concurrent grouping, or the degree of built by lower-level schemas, (d) that the de­ temporal proximity and spectral dissimilar­ scriptions are constrained by criteria of con­ ity for sequential grouping) and on compe­ sistency and noncontradiction, and (e) that tition among the cues. The links are evidence when schemas (including speech schemas) for belongingness but do not necessarily cre­ make use of the information that they need ate disjunct sets of sensory information; that from a mixture, they do not remove it from is, they do not provide an all-or-none parti­ the array of information that other description­ tioning. A second component then 1?uilds de­ building processes can use (which may give scriptions from the sensory evidence that are rise to duplex-type phenomena). Although exhaustive partitionings for a given percep­ many aspects of this theory have yet to be tual situation. Learned schemas can intervene tested empirically, some evidence is consis­ in this process, making certain descriptions tent with it, such as the fact that duplex per­ more likely than others, perhaps as a function ception of speech can be influenced by prim­ of their frequency of occurrence in the envi­ itive scene-analysis processes. For example, ronment. It is at this latter level that evidence sequential organization of the chirp compo­ can be interpreted as belonging to more than nent can remove it from concurrent grouping one event in the global description. But why with the base stimulus, suggesting that duplex should one allow for this possibility in au­ perception occurs in the presence of conflict­ ditory processing? The reason is that acous­ ing cues for the segregation and the integration tic events do not occlude other events in the of the isolated transition with the base (Ciocca way that most (but not all) objects occlude & Bregman, 1989). the light reflected from other objects that are farther from the viewer. The acoustic signal Auditory Continuity arriving at the ears is the weighted sum of A related problem concerns the partitioning the waveforms radiating from different vibrat­ on the basis of the surrounding context of ing objects, where the weighting is a function sensory information present within overlap­ of distance and of various transformations of ping sets of auditory channels. The auditory the original waveform due to the properties continuity phenomenon, also called auditory of the environment (reflections, absorption, induction, is involved in perceptual restora­ etc.). It is thus possible that the frequency con­ tion of missing or masked sounds in speech tent of one event coincides partially with that and music interrupted by a brief louder sound of another event. To analyze the properties or by an intermittent sequence of brief, loud of the events correctly, the auditory system sound bursts (for reviews, see Bregman, 1990, must be able to take into account this prop­ chap. 3; Warren, 1999, chap. 6). If the wave­ erty of sound, which, by analogy with vision, form of a speech stream is edited such that Bregman has termed transparency. chunks of it are removed and other chunks This theory presumes · (a) that primitive are left, the speech is extremely difficult to scene analysis is performed on the sensory understand (Figure 1O.9a). If the silent pe­ input prior to the operation of more com­ riods are replaced with noise that is loud plex pattern-recognition processes, (b) that enough to have masked the missing speech, the complex processes that build perceptual were it present, and whose spectrum includes descriptions are packaged in schemas em­ that of the original speech, listeners claim to bodying various regularities in the sensory ev- hear continuous speech (Warren, Obusek, & 416 Auditory Perception and Cognition

a) Perceptual restoration of interrupted speech

Interrupted speech Continuous speech plus pulsing noise

~ ~ ~ ~ -~ -~

time time

b) Auditory continuity of a pulsed sine tone

~ ~ -~ Continuous tone + < Pulsing noise time

~ ~ -~

Pulsing noise

time

~ ~ -~ 1_1_1_1_1 < PUls;n: tone

Pulsing noise

time

Figure 10.9 Stimuli used to test the auditory continuity phenomenon. NOTE: a) Speech that is interrupted by silences is heard as such and is difficult to understand. If the silences are filled with a noise of bandwidth and level sufficient to have masked the absent speech signal, a pUlsing noise is heard accompanied by an apparently continuous speech stream. b) Auditory continuity can be demonstrated also with a pulsed sine tone. When the silent gaps are filled with noise, a continuous tone is heard along with the pulsing noise. However, if small silent gaps of several milliseconds separate the tone and noise bursts, indicating to the auditory system that the tone actually ceased, then no continuity is obtained. Furthermore, the continuity effect does not occur if the noise does not have any energy in the frequency region of the tone. Auditory Scene Analysis 417

Ackroff, 1972). Speech intelligibility can bre of the high-level sounds in the presence of even improve if contextual information that the low-level sounds compared to when these facilitates identification of key words is are absent (Warren et al., 1994). The relative present (Warren, Hainsworth, Brubaker, durations of high- and low-level sounds are Bashford, & Healy, 1997). crucial to the phenomenon. The continuity ef­ Similar effects of continuity can be demon­ fect is much stronger when the interrupting strated with nonspeech stimuli, such as a sine event is short compared to the uninterrupted tone interrupted by noise (Figure 10.9b) or by portion. The perceived loudness is also a func­ a higher-level sine-tone of similar frequency. tion of the relative levels of high and low por­ An intermittent sequence superimposed on a tions, their relative durations, and the percep­ continuous sound is heard, as if the more in­ tual stream to which attention is being directed tense event were being partitioned into two (Drake & McAdams, 1999). Once again, this entities, one that was the continuation of the continuity phenomenon demonstrates the ex­ lower-level sound preceding and following istence of a heuristic for partitioning acous­ the higher-level event and another that was tic mixtures (if there is sufficient sensory evi­ a sound burst. This effect works with pure dence that a mixture indeed exists). It provides tones, which indicates that it can be a com­ the listener with the ability to deal efficiently pletely within-channel operation. However, if and veridically with the stimulus complexity any evidence exists that the lower-level sound resulting from the transparency of auditory stopped (such as short, silent gaps between events. the sounds), two series of intermittent sounds are heard. Furthermore, the spectrum of the Schema-Based Organization interrupting sound must cover that of the in­ terrupted sound for the phenomenon to oc­ Much mention has been made of the possibil­ cur; that is, the auditory system must have ev­ ity that auditory stream formation is affected idence that the interrupting sound could have by conscious, controlled processes, such as masked the softer sound (Figure 10.9). searching for a given source or event in the The partitioning mechanism has been con­ auditory scene. Bregman (1990) proposed ceived by Bregman (1990) in terms of an a component that he termed schema-based "old-plus-new" heuristic. The auditory sys­ scene anaLysis in which specific information tem performs a subtraction operation on the is selected on the basis of attentional focus high-level sound. A portion of the energy and previously acquired knowledge, resulting equivalent to that in the lower-level sound is in the popout of previously activated events assigned to the continuous stream, and the rest or the extraction of sought-after events. Along is left to form the intermittent stream. Indeed, these lines, van Noorden's (1975) ambiguous the perceived levels of the continuous sound region is an example in which what is heard and intermittent sequence depend on the rel­ depends in part on what one tries to hear. Fur­ ative level change and are consistent with a ther, in his interleaved melody recognition ex­ mechanism that partitions the energy (Warren, periments, Dowling (1973a) observed that a Bashford, Healy, & Brubaker, 1994). How­ verbal priming of an interleaved melody in­ ever, the perceived levels are not consistent creased identification performance. with a subtraction performed either in units of Other top-down effects in scene analysis loudness (sones) or in terms of physical pres­ include the role of pattern context (good con­ sure or power (McAdams, Botte, & Drake, tinuation in Gestalt terms) and the use of pre­ 1998). Furthermore, changes occur in the tim- vious knowledge to select target information 418 Auditory Perception and Cognition from the scene. For example, a competition and subjective duration. Thus, an oboe and between good continuation and frequency a trumpet playing the same note, for exam­ proximity demonstrates that melodic pattern ple, would be distinguished by their timbres. can affect the degree of streaming (Heise This definition indeed leaves everything to & Miller, 1951). Frequency proximity alone be defined. The perceptual qualities grouped cannot explain these results. under this term are multiple and depend on Bey (1999; Bey & McAdams, in press) several acoustic properties (for reviews,see used an interleaved melody recognition Hajda, Kendall, Carterette, & Harshberger, paradigm to study the role of schema-based 1997; McAdams, 1993; Risset & Wessel, organization. In one interval an interleaved 1999). In this section, we examine spec­ mixture of target melody and distractor se­ tral profile analysis, the perception of au­ quence was presented, and in another in­ ditory roughness, and the multidimensional terval an isolated comparison melody was approach to timbre perception. presented. Previous presentation of the iso­ lated melody gave consistently better per­ Spectral Profile Analysis formance than when the mixture sequence was presented before the comparison melody. The sounds that a listener encounters in the Furthermore, if the comparison melody was environment have quite diverse spectral prop­ transposed by 12, 13, or 14 semitones­ erties. Those produced by resonating struc­ requiring the listener to use a pitch-interval­ tures of vibrating objects have more energy based representation instead of an absolute­ near the natural frequencies of vibration of pitch representation to perform the task­ the object (string, plate, air cavity, etc.) than performance was similar to when the isolated at more distant frequencies. In a frequency comparison melody was presented after the spectrum in which amplitude is plotted as a mixture. These results suggest that in this task function of frequency, one would see peaks an absolute-pitch representation constitutes in some frequency regions and dips in others. the "knowledge" used to extract the melody. The global form of this spectrum is called the However, performance varied as a function of spectral envelope. The extraction of the spec­ the frequency separation of the target melody tral envelope by the auditory system would and distractor sequence, so performance de­ thus be the basis for the evaluation of constant pended on both sensory-based organizational resonance structure despite varying funda­ constraints and schema-based information mental frequency (Plomp & Steeneken, 1971; selection. Slawson, 1968) and may possibly contribute to source recognition. This extraction is surely strongly involved in vowel perception, the TIMBRE PERCEPTION quality of which is related to the position in the spectrum of resonance regions called for­ Early work on timbre perception paved the mants (see Chap. 12, this volume). Spiegel way to the exploration of sound source per­ and Green (1982) presented listeners with ception. The word timbre gathers together a complex sounds in which the amplitudes were number of auditory attributes that until re­ equal on all components except one. The level cently have been defined only by what they of this component was increased to create a are not: Timbre is what distinguishes two bump in the spectral envelope. They showed sounds coming from the same position in that a listener is able to discriminate these space and having the same pitch, loudness, spectral envelopes despite random variations Timbre Perception 419 in overall intensity, suggesting that it is truly range of modulations has been demonstrated the profiles that are being compared and not in primary auditory cortex in awake monkeys just an absolute change in intensity in a given (Fishman, Reser, Arezzo, & Steinschneider, auditory channel. This analysis is all the eas­ 2000). For two simultaneous complex har­ ier if the number of components in the spec­ monic sounds, deviations from simple ratios trum is large and the range of the spectrum between fundamental frequencies create sen­ is wide (Green, Mason, & Kidd, 1984). Fur­ sations of beats and roughness that corre­ ther, it is unaffected by the phase relations late very strongly with judgments of musical among the frequency components compos­ dissonance (Kameoka & Kuriyagawa, 1969; ing the sounds (Green & Mason, 1985). The Plomp & Levelt, 1965). Musical harmony mechanism that detects a change in spectral thus has a clear sensory basis (Helrnholtz, envelope most likely proceeds by estimating 1885; Plomp, 1976), although contextual fac­ the level in each auditory channel and then tors such as auditory grouping (Pressnitzer, by combining this information in an optimal McAdams, Winsberg, & Fineberg, 2000) and way across channels (for a review, see Green, acculturation (Carterette & Kendall, 1999) 1988). may intervene. One sensory cue contributing to rough­ ness perception is the depth of modulation Auditory Roughness in the signal envelope after auditory filtering Auditory roughness is the sensory compo­ (Aures, 1985; Daniel & Weber, 1997; nent of musical dissonance (see Chap. 11, this Terhardt, 1974), which can vary greatly for volume), but is present also in many envi­ sounds having the same power spectrum but ronmental sounds (unhappy newborn babies differing phase relations among the compo­ are pretty good at progressively increasing nents. The importance of such phase relations the roughness component in their vocaliza­ has been demonstrated in signals in which tions, the more the desired attention is de­ only one component out of three was changed layed). In the laboratory roughness can be in phase in order to leave the waveform en­ produced with two pure tones separated in velope unmodified (Pressnitzer & McAdams, frequency by less than a critical band (the 1999b). Marked differences in roughness per­ range of frequencies that influences the out­ ception were found. The modulation envelope put of a single auditory nerve fiber tuned to shape, after auditory filtering, thus contributes a particular characteristic frequency). They also to roughness, but as a factor secondary interact within an auditory channel produc­ to modulation depth. In addition, the coher­ ing fluctuations in the amplitude envelope at ence of amplitude envelopes across auditory a rate equal to the difference between their filters affects roughness: In-phase envelopes frequencies. When the fluctuation rate is less create greater roughness than do out-of-phase than about 20 Hz to 30 Hz, auditory beats envelopes (Daniel & Weber, 1997; Pressnitzer are heard (cf. Plomp, 1976, chap. 3). As the & McAdams, 1999a). rate increases, the perception becomes one of auditory roughness, peaking at around 70 Hz The Multidimensional Approach (depending on the center frequency) and then to Timbre decreasing thereafter, becoming smooth again when the components are completely resolved One important goal in timbre research has into separate auditory channels (cf. Plomp, been to determine the relevant perceptual di­ 1976, chap. 4). The temporal coding of such a mensions of timbre (Plomp, 1970; Wessel, 420 Auditory Perception and Cognition

1979). A systematic approach was made pos­ sible by the development of multidimen­ a) sional scaling (MDS) analyses, which are used as exploratory data analysis techniques (McAdams, Winsberg, Donnadieu, De Soete & Krimphoff, 1995). These consist in pre­ +I senting pairs from a set of sounds to listen­ t ers and asking them to rate the degree of similarity or dissimilarity between them. The (dis)similarity ratings are translated into dis­ ~ &J ~ lIJ tances by a computer algorithm that, accord­ ~ ing to a mathematical distance model, projects ~II~ the set of sound objects into a multidimen­ sional space. Similar objects are close to one b) another, and dissimilar ones are far apart in ~ !l the space. 8 A structure in three dimensions represent­ 19 ing the sounds of 16 musical instruments of t ~ I the string and wind families was interpreted t qualitatively by Grey (1977). The first dimen­ sion (called brightness; see Figure 10.1 Oa) ap­ peared to be related to the spectral envelope. The sounds having the most energy in har­ monics of high rank are at one end of the di­ mension (oboe, 01), and those having their energy essentially confined to low-ranking Figure 10.10 Three-dimensional timbre spaces harmonics are at the other end (French horn, found by a) Grey (1977) and b) Grey & Gordon (1978) from multidimensional scaling analyses of FH). The second dimension (called spectral similarity judgments. flux) was related to the degree of synchrony NOTE: Pairs of musical instrument sounds from in the attacks of the harmonics as well as the original Grey (1977) study were modified by their degree of incoherent fluctuation in am­ exchanging their spectral envelopes in Grey & plitude over the duration of the sound event. Gordon (1978) to test the hypothesis that dimen­ sion I was related to spectral envelope distribution. The flute (FL) had a great deal of spectral flux Note that the pairs switch orders along dimension in Grey's set, whereas the clarinet's (Cl) spec- . I, with small modifications along other dimensions trum varied little over time. The position on in some cases. the third dimension (called attack quality) var­ SOURCE: Adapted froin Grey & Gordon (1978, ied with the quantity of inharmonic energy Figures 2 and 3) with permission. Copyright © present at the beginning of the sound, which 1978 by the Acoustical Society of America. are often called the attack transients. Bowed strings (S 1) can have a biting attack due to these transients, which is less the case for the tral envelopes (the patterns of bumps and dips brass instruments (e.g., trumpet, TP). To test in the frequency spectrum). This modifica­ the psychological reality of the brightness di­ tion created a change in position along the mension, Grey and Gordon (1978) resynthe­ brightness dimension, with a few shifts along sized pairs of sounds, exchanging their spec- other axes that can be fairly well predicted • Sound Source Perception 421

by the side-effects of the spectral envelope Wessel (1978) proposed a definition of a change on other acoustic parameters (Fig­ timbre interval, by analogy with a pitch in­ ure 1O.lOb). terval in musical scales. According to their This seminal work has been extended to conception, a timbre interval is an oriented (a) synthesized sounds representing orchestral vector in the perceptual space. The equiv­ instruments or hybrids among them, including alent to a transposition of a pitch interval more percussive sounds produced by plucking (changing the absolute pitches while main­ or striking strings or bars (Krurnhansl, 1989; taining the pitch interval) would thus be a McAdams et aI., 1995); (b) recorded sounds, translation of the vector to a different part of including percussion instruments such as the space, maintaining its length and orienta­ drums and gongs in addition to the standard tion. Both musician and nonmusician listeners sustained vibration instruments such as winds are sensitive to such abstract relations among and bowed strings (Iverson & Krurnhansl, timbres. However, when intervals are formed 1993; Lakatos, 2000); and (c) car sounds from complex musical instrument sounds, the (Susini, McAdams, & Winsberg, 1999). In specific features possessed by certain timbres some cases, new acoustical properties corre­ often distort them (McAdams & Cunibile, sponding to the perceptual dimensions still 1992). need to be developed to explain these new timbre spaces. For example, attack quality is strongly correlated with the rise time in the amplitude envelope when impact-type SOUND SOURCE PERCEPTION sounds are included (Krimphoff, McAdams, & Winsberg, 1994). More advanced MDS Human listeners have a remarkable ability to procedures allow for individual items to be understand quickly and efficiently the cur­ modeled not only in terms of dimensions rent state of the world around them based on shared with all the other items but also as the behavior of sound-producing objects, even possessing perceptual features that are unique when these sources are not within their field and specific to them (Winsberg & Carroll, of vision (McAdams, 1993). We perceive the 1988). They also provide for the estimation relative size and form of objects, properties of latent classes of listeners that accord dif­ of the materials that compose them, as well fering weights to the perceptual dimensions as the nature of the actions that had set them (Winsberg & De Soete, 1993) and for the es­ into vibration. On the basis of such percep­ tablishment of functional relations between tion and recognition processes, listening can the perceptual dimensions and the relevant contribute significantly to the appropriate be­ acoustic parameters using MDS analyses con­ haviors that we need to adopt with respect to strained by stimulus properties (Winsberg & the environment. To begin to understand these De Soete, 1997). All these advances in data processes more fully, researchers have stud­ analysis have been applied to musical timbre ied the perception of object properties such perception (Krurnhansl, 1989; McAdams & as their geometry and material composition Winsberg, 2000; McAdams et aI., 1995). as well as the acoustic cues that allow recog­ nition of sound sources. Although pitch can communicate information concerning certain Timbre Interval Perception source properties, timbre seems in many cases On the basis of such a multidimensional to be the primary vehicle for sound source and model of timbre perception, Ehresman and event recognition. 422 Auditory Perception and Cognition

Perception of Source Shape reproduce the vertical and horizontal dimen­ sions for objects formed of metal, wood, and Listeners are able to distinguish sources on Plexiglas. Again listeners reproduc4ed the pro­ . the basis of the geometry of air cavities and portional, but not absolute, dimensions. They of solid objects. Changes in the positions of further showed that listeners could identify the the two hands when clapping generate differ­ shape of plates across materials when asked ences in the geometry of the air cavity between to choose from among rectangles, triangles, :';;;' them that is excited by their impact and can and circles. ~j~:.~~ be discriminated as such (Repp, 1987). Rect­ angular bars of a given material and constant length, but varying in width and thickness Perception of Source Material have mechanical properties that give mode~ There are several mechanical properties of of vibration with frequencies that depend on materials that give rise to acoustic cues: den­ these geometrical properties. When asked to sity, elasticity, damping properties, and so match the order of two sounds produced by on. With the advent of physical modeling bars of different geometries to a visual repre­ techniques in which the vibratory processes sentation of the cross-sectional geometry, lis­ extended in space and time are simulated teners succeed as a function of the difference with computers (Chaigne & Doutaut, 1997; in ratio between width and thickness, for both Lambourg, Chaigne, & Matignon, 2001), fine­ ..~ metal and wood bars (Lakatos, McAdams, grained control of complex vibratory sys­ & Causse, 1997). There are two potential tems becomes available for perceptual ex­ sources of acoustic information in these bars: perimentation, and a true psychomechanics the ratio ofthe frequencies oftransverse bend­ ., becomes possible. Roussarie, McAdams, and ing modes related to width and thickness and Chaigne (1998) used a model for bars with the frequencies of torsional modes that de­ constant cross-sectional geometry but vari­ pend on the width-thickness ratio. Both kinds able material density and internal damping of information are more reliably present in factors. MDS analyses on dissimilarity ratings isotropic metal bars than in anisotropic wood revealed that listeners are sensitive to both me­ bars, and indeed listeners' judgments are more chanical properties, which are carried by pitch coherent and reliable in the former case. information and by a combination of ampli­ The length of dowels and dimensions of tude envelope decay and spectral centroid, re­ plates can also be judged in relative fashion spectively, in the acoustic signal. Roussarie by listeners. Carello, Anderson, and Kunkler­ (1999) has further shown that listeners are sen­ Peck (1998) demonstrated that listeners can sitive to elasticity, viscoelastic damping, and reproduce proportionally (but not absolutely) thermoelastic damping in thin plates that are the length of (unseen) dowels that are dropped struck at constant force. He used a series of on a floor. They relate this ability to the iner­ simulated plates that were hybrids between an tial properties of the dowel, which may give aluminum plate and a glass plate. MDS anal­ rise to both timbral and rhythmic informa­ yses of dissimilarity ratings revealed mono­ tion (the "color" of the sound and the rapidity tonic relations between the perceptual and of its clattering), although the link between mechanical parameters. However, when lis­ perception and acoustic cues was not estab­ teners were required to identify the plate as lished. Kunkler-Peck and Turvey (2000) pre­ either aluminum or glass, they used only the sented the sounds of (unseen) rectangular and acoustic information related to damping fac­ square plates to listeners and asked them to tors, indicating an ability to select the most Sound Source Perception 423 appropriate from among several sources of by a broad-spectrum burst followed by a mul­ acoustic information, according to the percep­ tiplicity of different resonating objects with tual task that must be performed. uncorrelated rhythmic patterns. Thus, both the resonance cues (derived from a large unitary object or multiple smaller objects) and the Recognition and Identification rhythmic cues (unitary or multiple accelerat­ of Sources and Actions ing patterns) were possible sources of identi­ Studies of the identification of musical in­ fication. To test these cues, the authors used struments have revealed the properties of the synthetically reconstructed events in which acoustic structure of a complex sound event the rhythmic patterns and spectral content (the to which listeners are sensitive (for a review, various resonances present) were controlled. see McAdams, 1993). For example, if the at­ Most of the variance in identification perfor­ tack transients at the beginning of a sound mance was accounted for by the rhythmic be­ are removed, a listener's ability to identify in­ havior. struments that have characteristic transients Cabe and Pittenger (2000) studied the lis­ is reduced (Saldanha & Corso, 1964). Other teners' sensitivities to the acoustic informa­ studies have demonstrated the importance of tion in a given action situation: accurately spectral envelope and temporal patterns of filling a vessel. When water is poured into change for identification of modified instru­ a cylindrical vessel and the rate of (silent) ment tones (Strong & Clark, 1967a, 1967b). outflow and turbulent (splashy) inflow are A large class of sounds that has been rela­ controlled, listeners can identify whether the tively little studied until recently-ostensibly vessel is filling, draining, or remaining at because of difficulties in analyzing and con­ a constant level. Their perception ostensibly trolling them precisely under experimental depends on the fundamental resonance fre­ conditions--consists of the complex acous­ quency of the unfilled portion of the vessel. tic events of our everyday environment. The Participants were asked to fill a vessel them­ breaking of glass, porcelain, or clay objects; selves while seeing, holding, and hearing the the bouncing of wooden, plastic, or metal ob­ water pouring, or while only hearing the water jects; the crushing of ice or snow underfoot; pouring. They were fairly accurate at judging the scraping of objects against one another­ the moment at which the brim level was at­ all carry acoustic information both about the tained, although they tended to underestimate nature of the objects involved, about the way the brim level under auditory-only conditions. they are interacting, and even about changes Consideration of the time course of available in their geometric structure (a broken plate, acoustic information suggests that prospec­ becomes several smaller objects that vibrate tive behavior (anticipating when the full-to­ independently). the-brim level will be reached) can be based Warren and Verbrugge (1984) asked listen­ on acoustic information related to changes in ers to classify a sound event as a breaking or fundamental resonance frequency. bouncing glass object. The events were cre­ Work in the realm of auditory kinetics con­ ated by letting jars fall from different heights cerns the listeners' abilities to estimate the ki­ or from various acoustic manipulations of the netic properties of mechanical events (mass, recordings. Bouncing events were specified dropping height, energy). Guski (2000) stud­ by simple sets of resonances with the same ied the acoustic cues that specify such kinetic accelerating rhythmic pattern as the object properties in events in which collisions be­ came to rest. Breaking events were specified tween objects occur. He found that listeners' 424 Auditory Perception and Cognition estimates of the striking force of a ball falling gether and are not confused with events com­ on a drumhead are linearly related to physi­ ing from other sources. Once auditory events cal work (the energy exchange between the and streams have been perceptually created, drum and the head at impact). They were it is necessary to establish the relationship less successful in judging the mass of the between successive events within a stream. ball and could not reliably judge the dropping We enter the realm of sequence perception, in height. The acoustic cues that seem to con­ which each event takes its existence from its tribute to these judgments include the peak relation with these surrounding events rather level (a loudness-base cue) and the rhythm of than from its own specific characteristics. An the bouncing pattern of the ball on the drum­ essential function of the perceptual system is head. The former cue strongly affects judg­ thus to situate each event in time in relation ments of force, which become unreliable if to other events occurring within a particular sound events are equalized in peak level. The time span. How do listeners follow the tempo­ rhythm cue depends on the energy of the ball ral unfolding of events in time? How do they and is thus affected by the height and the select important information? How are they mass (hence, perhaps, the lower reliability of able to predict "what" and "when" something direct judgments of these parameters). The will occur in the future? The way in which the time interval between the first two bounces auditory system identifies the characteristics explains much of the variance in judgments of each event (the "what") has been described of force and is strongly correlated with the in the section on sound source perception. We amount of physical work. Considerations of now discover the type of temporal information the timbre-related spectral and temporal char­ coding involved (the "when"). acteristics of individual bounces were not examined. The Limits of Sequence Perception This nascent area of research provides evi­ dence for human listeners' remarkable sensi­ Temporal processes occur over a wide range tivity to the forms and material compositions of rates (see Figure 10.11). In this section of the objects of their environment purely on we focus on those that are relevant to se­ the basis of the sounds they produce when quence perception and temporal organization. set into vibration. The few results already Sound events are perceived as isolated if they available pave the way for theoretical devel­ are separated from surrounding events by opments concerning the nature of auditory about 1.5 s. This constitutes the upper limit source and event recognition, and may indeed of sequence perception and probably corre­ reveal aspects of this process that are specific sponds to the limits of echoic (sensory) mem­ to the auditory modality. ory (Fraisse, 1963). At the other extreme, if the onsets of successive tones are very close in time (less than about 100 ms), a single TEMPORAL PATTERN PROCESSING event is perceived, and the lower limit of IN HEARING sequence perception is surpassed. The zone of sequence perception falls between these As described in the auditory scene analysis two extremes (100-1500 ms interonset inter­ section, the acoustic mixture is perceptually val, or 101). This range can be subdivided organized into sound objects and streams. In into three separate zones depending on se­ this way, sound events that originate from quence rate or tempo (ten Hoopen et aI., a single source are perceptually grouped to- 1994). It has been suggested that different or Temporal Pattern Processing in Hearing 425

sequence pitch perception 0.25-60 ms 101 100-1500 ms I

roughness temporal localization & beats organization (20-600 JlS 101) (>2000 ms 101)

20k 2k 200 20 2 0.5 freq (Hz)

Figure 10.11 The temporal continuum. NOTE: Event rate greatly influences the perceived nature of event sequences. At extremely fast rates, event attributes such as spatial position, pitch, and timbre are perceived, and at intermediate rates sequence attributes are perceived. Higher-level cognitive processes allow listeners to organize event structures over longer time spans. Event rate can be described in terms of the number of events per second (Hz) or the duration between the onset of successive events (interonset interval, or 101).

processes are involved at different sequence the ticks of a clock (Creelman, 1962; Divenyi rates: an intermediate zone (300-800 ms 101) & Danner, 1977; Getty, 1975; Kristofferson, for which sequence processing is optimum 1980; Luce, 1972; Sorkin, Boggs, & Brady, and within which Weber's law applies; a fast 1982; Treisman, 1963) or by concepts of in­ zone (l00-300 ms 101) with a fixed time con­ formation processing independent of a clock stant (Weber's law does not apply); and a slow (Allan, 1992; Block, 1990; Michon, 1975; zone (800-1500 ms 101), also with a fixed Ornstein, 1969). Not surprisingly, each model time constant. explains best the data for which it was devel­ oped. Thus, models without clocks best ex­ plain behavior that requires only temporal The Problem of Temporal Coding order processing, whereas models with clocks Psychologists investigating temporal process­ best explain behavior requiring relative or ing have to confront a tricky problem: the ab­ absolute duration coding. sence of an observable sensory organ for cod­ For our present interests, it suffices to con­ ing time (unlike the ear for hearing and the eye sider that our perceptual system provides an for vision). A detailed discussion of the pos­ indication of the duration of events in relative sible existence of an internal clock or a psy­ terms (same, longer, or shorter) rather than chological time base is beyond the scope of absolute terms (the first tone lasts x beats, the this chapter (see Block, 1990; Church, 1984; second x beats; or the first tone occurred at Killeen & Weisse, 1987). For our present con­ 12:00:00, the second at 12:00:01). In the next cerns, it suffices to say that there is no single, sections we discover how information con­ identifiable part of the brain devoted to time cerning these relative durations allows indi­ perception. Numerous models have been pro­ viduals to create elaborate and reliable tem­ posed to explain temporal behavior, either by poral representations that are sufficient for 426 Auditory Perception and Cognition

adapting their behaviors to the environment. simultaneously if their onsets occur within Of course, this representation is a simplifi­ a relatively small time span (40 ms; Rasch, cation of the physically measurable temporal 1979). However, much larger differences in structure of events, but it does allow the lis­ tone on sets are required to give a judgment tener to overcome processing constraints. about order, for instance, by indicating which tone occurs first (about 100 ms; Fitzgibbons & Gordon Salant, 1998). Coding Temporal Characteristics of Tones and Sequences Tone Sequences

Single Tones If a tone is perceived as belonging to a se­ . -; ' ~ In the case of a single tone, the perceptual sys­ quence rather than being an isolated event, tem probably retains an indication of the dura­ several types of temporal information must be tion of the sound (a in Figure 10.12). However, coded. First, the temporal distance between . the onset is usually more precisely coded than successive events must be coded. A possi­ is the offset, due to the usually more abrupt ble solution would be the coding of the dura­ nature of tone on sets (Schulze, 1989; P. G. tion of the interval between the offset of one Vos, Mates, & van Kruysbergen, 1995). It is tone and the onset of the following tone (c in less likely that there remains an absolute cod­ Figure 10.12). However, numerous studies ing of the precise moment in time at which (Schulze, 1989; P. G. Vos et aI., 1995) indicate the tone occurred (b in Figure 10.12). Rather, that this parameter is not the most perceptually there is probably an indication of the time of salient. Rather, the most dominant informa­ occurrence of the tone relative to other events. tion concerns the duration between successive Listeners are able to indicate quite precisely onsets (IOls; d in Figure 10.12). For instance, whether two tones are perceived as occurring the capacity to detect slight changes in tempo

Single tone

a =tone duration ~ ~ t b = moment of tone onset

Tone sequence

c = duration of silence between tones ~ , , , ~ .'-( 1011 1012 1013 1014 1015 , , .' '-( ., d =duration between tone onsets (101)

Figure 10.12 The physical parameters of isolated tones and tones within a sequence are not all of equal perceptual importance. NOTE: The moment of tone onset (b) and the duration between successive tone on sets (d) appear to be the most perceptually salient. Temporal Pattern Processing in Hearing 427 between two sequences is unaffected by either (JND) in relative tempo decreased as the num­ tone duration (a) or off duration (c), but rather ber of intervals in the sequence increased. by the duration between the on sets of succes­ These findings suggest that the more obser­ sive tones (d; J. Vos & Rasch, 1981; P. G. Vos vations the system has concerning the du­ et aI., 1995; P. G. Vos, 1977). More precisely, ration of intervals within the sequence (re­ it is not the physical onset of the tone, but member that they are all identical), the more rather the perceptual center (P-center) of the precise is the memory code for that interval, tone that detennines the 101, which in turn is and thus the more precise is the temporal cod­ influenced by tone duration, off duration, and ing. Therefore, the temporal coding of inter­ the shape of the tone attack (P. G. Vos et aI., vals contained within a sequence is more pre­ 1995). cise than is that of isolated intervals. In the case of sequences of several events, A second parameter enters into the equa­ it could be supposed that the temporal coding tion: the degree of regularity of the sequence. may involve the coding of the duration be­ If the sequence is regular (isochronous with tween the onset of each successive tone in the all 10ls equal), the multiple-look process sequence. For instance, in the case of the se­ can work perfectly (no variability). How­ quence in Figure 10.12, this would involve the ever, if a high degree of irregularity is intro­ coding of 1011,1012,1013,1014, and 1015. duced into the sequence by varying the 10ls This sort of coding is probably possible if (increased standard deviation between 101), the sequence does not contain more than five relative tempo JNDs are considerably higher, or six events, does not last longer than sev­ indicating less efficient processing. It is there­ eral seconds, and does not display a clearly­ fore suggested that the multiple-look pro­ defined temporal structure (i.e., irregular se­ cess incorporates an indication of both the quences; Povel, 1981). However, this type of mean and the variability of intervals within processing appears to be the exception rather a sequence: The multiple-look process works than the rule. more efficiently as the number of intervals increases and as the degree of variability decreases. Basic Temporal Organization Principles However, most environmental sequences in Sequence Processing (footsteps, music, speech) are not entirely reg­ ular but contain a low level of temporal ir­ Multiple-Look Model regularity. Does this interfere with a precise Is the processing of a sequence of events the temporal coding? It appears not. Relative tem­ same as the sum of the processing of each poral JNDs are as low for quasi-regular se­ individual event? The multiple-look model quences (with a low standard deviation) as (Drake & Botte, 1993; Schulze, 1978; P. G. they are for truly regular sequences (standard Vos, van Assen, & Franek, 1997) suggests deviation = 0). These findings suggest the that this is not the case. Listeners were pre­ existence of a tolerance window by which sented with two isochronous sequences vary­ the system treats sequences that vary within ing slightly in tempo (mean 101). The se­ this window as if they were purely regular quences contained either a single interval (two sequences. tones) or a succession of intervals (2, 4, 6). Listeners were required to indicate which was Temporal Window the faster of the two sequences (tempo dis­ A third parameter influencing the multiple­ crimination). The just noticeable difference look process concerns sequence rate or 428 Auditory Perception and Cognition tempo: The faster the sequence, the greater Segmentation into Groups the number of events over which an increase One way to overcome processing limitations in sensitivity is observed. For relatively slow and to allow events to be processed together sequences (800-1500 ms 101) JNDs decrease is to group the events into small perceptual up to 2 or 3 intervals and then remain sta­ units. These units result from a comparison ble, whereas for fast sequences (200-400 ms process that compares incoming events with 101) adding additional intervals up to about events that are already present in memory. If 20 results in decreased JNDs. This finding a new event is similar to those that are al­ suggests the existence of another factor in­ ready present, it will be assimilated. If the volved in sequence processing: a temporal new event differs too much (by its acousti­ window. calor temporal characteristics), the sequence The idea is that all events would be stored will be segmented. This segmentation leads in a sensory memory buffer lasting several to the closure of one unit and the opening of seconds (3 s according to Fraisse, 1982; 5 s the next. Elements grouped together will be according to Glucksberg & Cowen, 1970). processed together within a single perceptual It probably corresponds to echoic or work­ unit, and thus can be situated in relation to ing memory (see Crowder, 1993). During this each other. time the exact physical characteristics remain An essential question concerns the factors in a raw state without undergoing any con­ that determine when one perceptual unit ends scious cognitive processing (Michon, 1975, and the next one begins. A first indication was 1978). Higher-level processes have access to provided by Gestalt psychologists who de­ this information as long as it is available (un­ scribed the physical characteristics of sounds til it decays), which allows the system to that determine which events are perceived as extract all relevant information (e.g., inter­ belonging to a single unit (Koehler, 1929; val duration; Fraisse, 1956, 1963; Michon, Wertheimer, 1925). They considered that unit 1975, 1978; Preusser, 1972). The existence boundaries are determined by a relatively im­ of such an auditory buffer has been sug­ portant perceptual change (pitch, loudness, ar­ gested by many psychologists under various ticulation, timbre, duration) between succes­ names: psychological present (Fraisse, 1982), sive events. For instance, two notes separated precategorical acoustic storage (Crowder & by a long temporal interval or by a large jump Morton, 1969), brief auditory store (Treis­ in pitch are perceived as belonging to sepa­ man & Rostran, 1972), nonverbal memory rate groups. They described three principles: trace (Deutsch, 1975), and primary or im­ temporal proximity (events that occur rela­ mediate memory (Michon, 1975). One can tively close in time), similarity (events that imagine a temporal window gliding gradu­ are relatively similar in timbre, pitch, loud­ ally through time, with new events arriving ness and duration), and continuity (events ori­ at one end and old events disappearing at the ented in the same direction, i.e., progressive other due to decay. Thus only events occur­ increase in pitch). Much research has con­ ring within a span of a few seconds would firmed the essential role of these principles be accessible for processing at anyone time, in sequence segmentation (Bregman, 1990; and only events occurring within this limited Deutsch, 1999; Dowling & Harwood, 1986; time window can be situated in relation to Handel, 1989; Sloboda, 1985). They also each other by the coding of relevant relational appear to function with musical sequences information. (Clarke & Krumhansl, 1990; Deliege, 1987). Temporal Pattern Processing in Hearing 429

Temporal Regularity Extraction and shortenings of more than 10% are quite Segmentation into basic perceptual units pro­ common and are not necessarily picked up by vides one means of overcoming memory lim­ listeners as being irregularities per se (Drake, its and allows adjacent events to be processed 1993a; Palmer, 1989; Repp, 1992). together. However, this process poses the in­ Coding events in terms of temporal regu­ convenience oflosing information concerning larity is thus an economical processing princi­ the continuity of the sequence over time be­ ple that has other implications. If an incoming tween successive perceptual units. In order to sequence can be coded in such a fashion, pro­ maintain this continuity, a second basic pro­ cessing resources are reduced, thus making cess may occur in parallel: the extraction of it easier to process such a sequence. Indeed, temporal regularities in the sequence in the we can say that the perceptual system exploits form of an underlying pulse. In this way the this predisposition by actively seeking tempo­ listener may extract certain temporal relation­ ral regularities in all types of sequences. When ships between nonadjacent events. Thus, the listening to a piece of music, we are predis­ process of segmentation breaks the sequence posed to finding a regular pulse, which is em­ down into small processing units, whereas the phasized by our tapping our foot in time with process of regularity extraction pulls together the music (tactus in musical terms). Once this temporally nonadjacent events. underlying pulse has been identified, it is used Rather than coding the precise duration as an organizational framework with respect of each interval, our perceptual system com­ to which other events are situated. pares each newly arriving interval with pre­ The Hypothesis of a Personal ceding ones. If the new interval is similar Internal Tempo in duration to preceding intervals (within an acceptable temporal window, called the tol­ The fact that we appear to be predisposed to erance window), it will be categorized as processing temporal regularities and that tem­ "same"; if it is significantly longer or shorter poral processing is optimum at an intermedi­ than the preceding intervals (beyond the tol­ ate tempo has led J ones and colleagues (J ones, erance window), it will be categorized as 1976; Jones & Boltz, 1989) to suggest that we "different." There may be an additional cod­ select upcoming information in a cyclic fash­ ing of "longer" or "shorter." Thus, two or ion. Each individual would have a personal three categories of durations (same/different, tempo, a rate at which incoming events would or same/longer/shorter) may be coded, but be sampled. For a given individual, events that note that this is a relative, rather than an ab­ occur at his or her personal tempo would be solute, coding system. processed preferentially to events occurring at One consequence of this type of processing different rates. is that if a sequence is irregular (each interval has a different duration) but all the interval du­ Temporal Organization over Longer rations remain within the tolerance window, Time Spans then we will perceive this sequence as the suc­ cession of "same" intervals and therefore per­ The temporal processes described so far oc­ ceive a regular sequence. Such a tolerance in cur within a relatively short time window our perceptual system is quite understandable of several seconds. When the sequence be­ when we examine the temporal microstruc­ comes more complicated (longer or having ture of performed music: Local lengthenings more events), the number of events that must 430 Auditory Perception and Cognition ~~}.' ~i~~ be processed quickly goes beyond the limits curring at other moments in time (Dowling, -~ {{lR'· , of the buffer. However, it is clear that we are 1973b). Learning and memorization are de­ ,~~~1_ able to organize events over longer time spans; teriorated, and listeners have difficulty in an­ .~~~~ otherwise the perception of music and speech alyzing both temporal and non temporal as­ i~~ '~ would be impossible. Consequently, simple pects of the sequence (Boltz, 1995). This type y concatenation models (Estes, 1972), which of hierarchical organization appears to func­ propose that the characteristics of each event tion with music, of course, but with other are maintained in memory, are not able to ac­ types of natural sequences as well, such as count for the perception oflong sequences be­ speech, walking, and environmental sounds cause of problems of processing and memory (Boltz, 1998; Boltz, Schulkind, & Kantra, overload. Imagine the number of intervals that 1991; Jones, 1976). would need to be stored and accessed when listening to a Beethoven sonata! We now ex­ Two Types of Temporal Hierarchical plore how several coding strategies have de­ Organizations veloped to overcome these limits and to allow Two types of temporal hierarchical organiza­ the perception of longer and more complex tions have been proposed (see Figure 10.13; sequences. Drake, 1998; Jones, 1987; Lerdahl & The organization of information into hier­ Jackendoff, 1983) to work in parallel, each archical structures has often been proposed as based on a basic process. One hierarchical a means of overcoming processing and mem­ organization is based on the combination of ory limits. This idea, originally formulated by perceptual units (hierarchical segmentation Miller (1956), presupposes that the informa­ organization). A second is created from the tion processing system is limited by the quan­ combination of underlying pulses (hierarchi­ tity of information to be processed. By or­ cal metric organization). In both cases basic ganizing the information into larger units, the units combine to create increasingly larger limiting factor becomes the number of groups, perceptual units, encompassing increasingly not the number of events. Applying this prin­ larger time spans. Thus, a particular musical ciple to music perception, Deutsch and Feroe sequence may evoke several hierarchicallev­ (1981) demonstrated that the units at each hi­ els at once. In theory, listeners may focus on erarchical level combine to create structural each of these levels, switching attention from units at a higher hierarchical level. The pres­ one to the other as desired. ence of regularly occurring accents allows the Hierarchical segmentation organization in­ listener to extract regularities at higher hierar­ volves the integration of small, basic groups chicallevels, and thus attention can be guided into increasingly larger units that, in the end, in the future to points in time at which up­ incorporate whole musical phrases and the coming events are more likely to occur. This entire piece (levels A2, A3, and A4 in Fig­ process facilitates learning and memoriza­ ure 10.13). Similar segmentation processes tion (Deutsch, 1980; Dowling, 1973b; Essens probably function at each hierarchical level, & Povel, 1985; Halpem & Darwin, 1982). except, of course, that the segmentation cues Sequences that are not structured in such a are more salient (larger temporal separation or hierarchical fashion are harder to organize larger pitch jump) at higher hierarchical levels perceptually and require greater attentional (Clarke & Krumhansl, 1990; Deliege, 1993; resources. Events that occur at moments of Penel & Drake, 1998). heightened attention are better encoded and Hierarchical metric organization involves then remembered better than are events oc- the perception of temporal regularities at Temporal Pattern Processing in Hearing 431

Hierarchical metric organization B6 I B5 I B4 I B3 I regularity B2 I I I I I I I I I I I I I I I extraction BI I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I nJ nJ J:JJ J. fJ:JJ J:JJ nJ segmentation AI,--/ '--./ , / '---./ '--/ '--./ into groups A2, / , / , / '--./

A3, / , /

A4, / Hierarchical segmentation organization

Figure 10.13 Grouping and metrica~ hierarchies in musical sequences. NOTE: When listening to a tone sequence (here a musical rhythm), two basic processes function the same way in everyone: The sequence is perceptually segmented into small perceptual units (segmentation into groups, A), and a regular underlying pulse that retains a continuity between events is established (regularity extraction, B). Two types of hierarchical organization, each based on a basic process (hierar­ chical segmentation organization and hierarchical metric organization, respectively), allow the listener to organize events perceptually over longer time spans. The functioning of these organizational processes differs considerably across individuals.

multiple hierarchical levels. Regularly occur­ sequence background) facilitates the identifi­ ring events can be situated within a hierarchi­ cation and implementation of higher hierar­ cal structure by which multiples of the ref­ chicallevels. erence period (usually two, three, or four) Many models of meter have been proposed are incorporated into larger units (level B3 (Desain, 1992; Lee, 1991; Longuet-Higgins in Figure 10.13), and these units can them­ & Lee, 1982; Longuet-Higgins & Lee, 1984; selves be incorporated into increasingly larger for a review of these models, see Essens, units (levels B4, B5, and B6). Subdivisions 1995), and these are perhaps the most pop­ (level B 1) of the reference period are also ular aspect of temporal processing. Most of possible. In Western tonal music, this type of the models try to demonstrate that a com­ organization corresponds to the metric struc­ puter is able to identify correctly the metric ture involving the integration of regularly oc­ structure of a piece of music, thus demon­ curring beats, measures, and hyper-measures. strating which factors may intervene (e.g., the Thus, events separated by time spans larger position of long or accented events). These than the reference period can be processed to­ models are certainly interesting for both com­ gether. Jones (1987) emphasized how these puter science and musicology, but their useful­ regularities allow the creation of expectations. ness for psychologists is currently limited be­ The perception of regularly occurring accents cause of the rarity of appropriate comparison (events that stand out perceptually from the with human behavior. There are, however, two 432 Auditory Perception and Cognition notable exceptions. In the first, Povel and aged to compare the two sensory modalities Essens (1985) considered that an internal in order to note similarities and differences clock provides a regular beat around which between them. rhythms are perceptually organized. They demonstrated that sequences that strongly in­ Attentional Focus duce a clock (sounds occurring at the same time as beats) are easier to reproduce than In audition, attention is often considered as are sequences that do not induce a clock as a spotlight aimed at a single auditory event strongly (sounds occurring between beats). In (NiUiUinen, 1992; Scharf, 1998). Each per­ the second, Large and Jones (1999) proposed ceived object within an auditory scene is po­ a neural net model designed to follow, in real tentially available for attentional focus. Lis­ time, the coding of an auditory sequence and teners are unable to focus on more than one

to allow predictions about upcoming events. object at a time (Bigand, McAdams, & Foret, ,. :r By the use of coupled oscillators and with the 2000; Bregman, 1990). The process of focally '.;c.: parameter values established by experimen­ attending to one particular object is known as tal data and theory, the model predictions are selective or focal attending. Also, situations successfully confirmed by new experimental have been described in which listeners divide data. their attention between one or more objects, switching back and forth between them in a process know as divided attention. Selective -I or divided attention requires effort and be­ ATTENTIONAL PROCESSES comes more difficult in more complex situ­ IN HEARING ations. For instance, focusing attention on a stream within a two-stream context is easier Perceptual Organization than focusing on the same stream within a Is an Active Process three- or four-stream context (Brochard et aI., We have described how listeners perceptu­ 1999). ally organize auditory scenes into objects, Attentional focus or selective attention re­ streams, and sources and how they follow sults in enhanced processing of the selected changes over time. Listeners are active and object and limited processing of the remaining through attentional processes are able to in­ nonselected auditory scene. When attention fluence the resultant perceptual organization is engaged, the representation of the object is to a certain extent. What a listener tries to more salient; semantic analysis is possible; hear can affect the way a sound sequence is cognitive decisions are faster; and the rep­ organized perceptually, indicating an interac­ resentation better r~sists decay. Conversely, tion between top-down selection and percep­ when attention is directed elsewhere, stimuli tual integration or segregation (van Noorden, are superficially processed; representations 1977). Indeed, the listener is "free" to focus decay rapidly; and long-term storage is pre­ attention on any individual auditory object or vented. In short, selective attention enhances stream within an auditory stream, as was in­ the representation of a target and inhibits the dicated in the section on auditory scene anal­ representation of distractors. ysis (Alain, 2000; Giard, Fort, Mouchetant­ Such benefits have been demonstrated in Rostaing & Pernier, 2000). Luck and Vecera the detection of pure tones (Scharf, 1998; (Chap. 6, this volume) reviewed recent litera­ Scharf, Quigley, Aoki, & Peachey, 1987). The ture in visual attention. The reader is encour- detection of a pure tone in noise is facilitated Attentional Processes in Hearing 433 if attention is drawn to that frequency by pre­ more perceptually salient than others: For in­ ceding it with a cue at that frequency. Facili­ stance, loud events tend to capture attention tation progressively declines as the frequency more than relatively quiet events do (Cowan, difference between the cue and the target is 1988; Sokolov, 1963); in the case of music, increased. Scharf (1998) suggests that atten­ relatively high-pitched events tend to cap­ tion acts as a filter enhancing the detection of ture attention more than low-pitched events the target and attenuating the detection of a do (Huron & Fantini, 1989). Second, sud­ distractor (whose properties differ from those den changes in the sound environment (the of the target). Contrary to Broadbent (1958), appearance or disappearance of a new object Scharf considers that these attentional filters or stream) will tend to attract attention away are fitted with frequency-selective auditory from previously focused objects: The sudden filters. If a signal has a frequency tuned to appearance of a motor engine or the stopping the central frequency of the attentional filter, of a clock's ticking usually lead to a change the representation of this signal is amplified. in attentional focus (Cowan, 1988; Sokolov, If the pitch of the signal falls outside the band­ 1963). Furthennore, events with particular width of the attentional filter, the signal is personal significance (your own name, or your attenuated and neglected. Similar effects are own baby's crying) have enhanced perceptual obtained with more complex stimuli (simul­ salience and lead to a change in attentional taneous presentation of complex sequences focus for that particular object (Moray, 1959; composed of subsequences varying in tempo Wood & Cowan, 1995). and frequency). When attention is focused on Personal characteristics, such as a person's one subsequence (by preceding the complex spontaneous internal tempo (or referent pe­ sequence by a single sequence cue), it is easy riod), may lead to a preferential focusing on to detect a temporal irregularity located within events occurring at one particular rate (Boltz, it. However, when the temporal irregularity is 1994; Jones, 1976). People with faster refer­ located in a nonfocused subsequence (no pre­ ent periods will tend to focus spontaneously ceding cue), the intensity level has to be in­ on streams containing faster-occurring events. creased by 15 dB to allow the same level of de­ Similarly, the more skilled an individual is tection perfonnance (Botte, Drake, Brochard, at organizing events over longer time spans & McAdams, 1997). (such as musicians compared with nonmusi­ cians), the more likely it is that the individual will focus at a higher hierarchical level within Determinants of Attentional Focus a complex sequence such as music (Drake, What detennines which object will receive Penel, & Bigand, 2000). this privileged attentional focus? Two sets of factors have been described: stimulus-driven Directed Attentional Focus attentional capture and directed attentional Thus, in any given sound environment, atten­ focus. tion will tend to be focused spontaneously on one particular object according to the factors Stimulus-Driven Attentional Capture just described. Listeners can actively control Certain characteristics of an auditory event attention, but this ability is limited by charac­ or of an individual increase the probability teristics of both the stimulus and the listener. than a particular stream or object will receive In the case of a complex sound sequence com­ capture attention. First, physical characteris­ posed of multiple subsequences, not all sub­ tics of events may make one particular object sequences are equally easy to direct attention 434 Auditory Perception and Cognition to. For instance, it is much easier to focus at­ formation to be a primarily primitive, preat­ tention on sub sequences with the highest or tentive process. Attention intervenes either lowest pitch, rather than an intermediate pitch as a partial selection of information orga­ (Brochard et aI., 1999). A corresponding find­ nized within stream (a top-down influence on ing has been observed in the perception of primitive organization) or as a focusing that musical fugues in which it is easier to detect brings into the foreground a particular stream changes in the outer voices, in particular the from among several organized streams (the at­ voice with the highest pitch (Huron & Fantini, tended stream then benefits from further pro­ 1989). Similarly, it is easier to focus on one cessing). particular stream if it has been cued by tones An alternative position (Jones, 1976; Jones resembling the target tones in some respect. & Boltz, 1989; Jones & Yee, 1993) consid­ The more physically similar a stream is to the ers that attention intervenes much earlier in object of spontaneous focus, the easier it is for processing, during the formation of streams a listener to direct attention to that particular themselves. The way in which an auditory object. scene is organized into streams depends on Individual characteristics also play an im­ the listener's attentional focus: If an individ­ portant role. The more a listener is familiar ual directs attention at some particular aspect with a particular type of auditory structure, of the auditory signal, this particular informa­ "~. the easier it is for him or her to focus atten­ tion will be a determinant in stream creation. tion toward different dimensions of that struc­ Thus, Jones proposed that stream formation ture. For instance, musicians can switch atten­ is the result of dynamic attentional processes tion from one stream to another more easily with temporal attentional cycles. In this view, than can nonmusicians, and musicians have attention is viewed as a structuring and or­ access to more hierarchical levels than do non­ ganizing process. It orchestrates information musicians (Drake et aI., 2000; Jones & Yee, selection with a hierarchy of coupled oscil­ 1997). lators. Therefore, fission is a breakdown of

tracking caused by a noncorrelation between 5.- ., cyclic attentional processes and the periodic­ Role of Attention in Auditory t ity of event occurrence. However, no effect of Organization temporal predictability (Rogers & Bregman, Much debate concerns whether stream forma­ 1993b) or of frequency predictability (van tion is the result of attentional processes, pre­ Noorden, 1977) on stream segregation has cedes them, or is the result of some kind of been found. interaction between the two. One way to disentangle this problem is More traditional approaches limit the role to measure precisely when and where in the of attention to postorganizational levels (af­ auditory pathway attention exerts its prime ter object and stream formation). In the first influence. Electrophysiological measures cognitive model of attention proposed by (N1Hitanen, 1992; Woods, Alho, & Algazi, Broadbent (1958), unattended information is 1994) consistently show that the earliest at­ prevented from reaching a central limited tentional modulation of the auditory signal ap­ canal of conscious processing by a filter lo­ pears 80 ms to 100 ms after stimulation, which cated after sensory storage and before the per­ is when the signal reaches the cortical level. ceptual stage. Unattended information would Some authors have suggested that the effer­ therefore receive only extremely limited ent olivo-cochlear bundle that projects onto processing. Bregman (1990) holds stream the outer hair cells (see Chap. 9, this volume) Developmental Issues 435 could be a device that allows attention to were actively attending to the sounds. These operate at a peripheral level (e.g., Giard, results suggest a model containing a certain Collet, Bouchet & Pernier, 1993; Meric & level up to which perceptual organization is Collet, 1994). automatic and impermeable to attentional pro­ Using electrophysiological measures, cesses and above which the information pro­ unattended sounds seem to be organized per­ duced by this stage can be further processed ceptually. Sussman, Ritter, and Vaughan if attended to. This view is consistent with (1999) found electrophysiological evidence data showing that listeners have difficulty de­ for a preattentive component in auditory tecting a temporal deviation in one of several stream formation using the mismatch nega­ simultaneous streams if they are not cued to tivity (MMN; NiHiHinen, 1995), an electroen­ the target stream. It is also consistent with an cephalographic (EEG) component based on apparent inhibition of nonfocused events as a an automatic, preattentive deviant detection kind of perceptual attenuation that has been system. MMN responses were found on de­ estimated to be as much as 15 dB (Botte et aI., viant sub sequences only when presented at a 1997). fast tempo that usually yields two auditory streams under behavioral measures. Within­ stream patterns whose memory traces give DEVELOPMENTAL ISSUES rise to the MMN response appeared to have emerged prior to or at the level of the MMN Up to this point, we have examined the func­ system. Sussman, Ritter, and Vaughan (1998) tioning of the human perceptual system in its observed, however, that if tempo and fre­ final state: that of the adult. However, much quency separation were selected so that the can be learned about these systems by com­ stimulus sequence was in the ambiguous zone paring this "final" state with "initial" or "tran­ (van Noorden, 1977; see also Figure 10.4), sitional" states observed earlier in life. Dif­ the MMN was observed for the deviant se­ ferences may be related to (a) the functional quence only in the attentive condition (attend characteristics and maturity of the auditory to the high stream)-not in the inattentive con­ system (e.g., speed of information transmis­ dition with a distracting (reading) task. This sion, neuronal connectivity, and properties of physiological result confirms the behavioral the cochlea), (b) acculturation through passive finding that streaming can be affected by at­ exposure to regularities in the sound environ­ tentional focus in the ambiguous region. In ment, and (c) specific learning, such as music a similar vein, Alain et al. (1994) found an tuition, in which musicians learn explicit or­ interaction between the automatic perceptual ganizational rules. The relative importance of analysis processes and volitional attentional these factors can be demonstrated by compar­ processes in EEG measures. ing the behavior of (a) infants (as young as In a paradigm derived from work on con­ possible to minimize all influences of learn­ current grouping (mistuned harmonic segre­ ing), (b) children and adults varying in age gation), Alain, Amott, and Picton (in press) (opportunities for acculturation increase with found two major ERP components related to age), and (c) listeners with and without musi­ segregation: one related to automatic segre­ cal training (explicit learning-together with gation processes that are unaffected by at­ many other factors-increases with training). tentional focus but that varied with stimulus Of considerable interest is the question of the conditions related to segregation, and another relative roles of passive exposure (and matu­ that varied depending on whether listeners ration) compared with explicit training in the 436 Auditory Perception and Cognition development of these higher-level processes. ies that do exist, both the ages examined and Because it is impossible to find people with­ the techniques used vary considerably. Re­ out listening experience (except the recent searchers usually compare the performance emergence of profoundly deaf children fitted of three sets of listeners: infants (aged 2 days with hearing-aids and cochlear implants later to 10 months), children in midchildhood in life), the usual experimental strategy has (aged 5-10 years), and adults (usually psy­ been to compare the performances of listen­ chology students aged 18-25 years). Obvi­ ers who have received explicit training in a ously, there is considerable room for change particular type of auditory environment (al­ within each set of listeners (new-born in­ most always music) with the performance of fants and l-year-olds do not have much in listeners who have not received such training. common!). There are also considerable peri­ A newly emerging idea is to compare listen­ ods in the developmental sequence that re­ ers from different cultures who have been ex­ main almost completely unexplored, proba­ posed to different sound environments; com­ bly because of experimental difficulties. For monalties in processing are considered to instance, we have very little idea of the per­ be fundamental, universal, or innate (Drake, ceptual abilities of toddlers and teenagers, in press; Krurnhansl, Louhivuori, Toiviainen, but probably not for the same reasons. The laervinen, & Eerola, 1999; Krurnhansl et aI., tasks adopted are usually appropriate for each 2000). age group, but the degree of comparability The dominant interactive hypothesis between those used for different groups is (Bregman, 1990; Deutsch, 1999; Dowling & debatable. Harwood, 1986; Drake et aI., 2000; Handel, Despite these problems and limitations, 1989; lones, 1990; Sloboda, 1985) is that low­ general confirmation of the interactive de­ level perceptual processes (such as perceptual velopmental hypothesis is emerging, and it attribute discriminations, stream segregation, is possible to draw up a tentative picture of and grouping) are hard wired or innate, and the auditory processes that are functional at thus more or less functional at birth. Slight im­ birth, those that develop considerably dur­ provements in functioning precision are pre­ ing childhood through passive exposure, and dicted to occur during infancy, but no sig­ those whose development is enhanced by nificant changes in functioning mode should specific training. We would like to empha­ be observed. However, higher-level cognitive size, however, that this picture remains highly processes (such as attentional flexibility and speculative. hierarchical organization) are less likely to be functional at birth. They will thus develop Auditory Sensitivity and Precision throughout life by way of passive exposure and explicit learning. These more "complex" Most recent research indicates that infants' processes do not appear from nowhere, but detection and discrimination thresholds are rather emerge from low-level processes as higher than those of adults, despite the pres­ they extend in scope and combine into larger, ence of an anatomically mature peripheral au­ more elaborate constructs. ditory system at birth (e.g., Berg, 1993; Berg Our knowledge about infants' and chil­ & Smith, 1983; Little, Thomas, & Letterman, dren's auditory perception and cognitive skills 1999; Nozza & Wilson, 1984; Olsho, Koch, is currently. extremely piecemeal (Baruch, Carter, Halpin, & Spetner, 1988; Schneider, 2001). Only a handful of auditory processes Trehub, & Bull, 1980; Sinnot, Pisoni, & have been investigated. Moreover, of the stud- Aslin, 1983; Teas, Klein, & Kramer, 1982; Developmental Issues 437

Trehub, Schneider, & Endman, 1980; Werner, from inharmonic tones (Clarkson & Clifton, Folsom, & ManeI, 1993; Werner & Marean, 1995). 1991). Take, for example, the case of ab­ solute thresholds. The magnitude of the ob­ Primary Auditory Organization Processes served difference between infants and adults has decreased significantly over recent years, In a similar fashion, certain primary audi­ mainly because of improvements in measure­ tory organization processes appear to func­ ment techniques. However, even with the most tion early in life in the same way as they do in sophisticated methods (observer-based psy­ adults, although not necessarily as efficiently choacoustic procedures), infants' thresholds and not in more complex conditions. remain 15 dB to 30 dB above those of adults. At six months the difference is 10 dB to 15 dB Stream Segregation (Olsho et al., 1988). The difference contin­ As stream segregation is such an important ues to decrease with age (Schneider, Trehub, process in auditory perception, it is surpris­ Morrongiello, & Thorpe, 1986), and by the ing to note how few studies have investi­ age of 10 years, children's thresholds are gated its development. Stream segregation comparable with those of adults (Trehub, has been demonstrated at ages 2 months Schneider, Morrongiello, & Thorpe, 1988). to 4 months for frequency-based streaming It would be tempting to suggest that sim­ (Demany, 1982) as well as in new-born in­ ilar patterns of results have been found for fants, but only in easy streaming conditions the perception of various auditory attributes. (slow tempi and large pitch jumps) for timbre In this field, however, the experimental data and spectral position-based streaming are so sparse and the developmental sequence (McAdams & Bertoncini, 1997). Surprisingly, so incomplete that such a conclusion would even less is known about how stream segrega­ be premature. The best we can say is that tion develops during childhood. One excep­ most of the existing data are not incompatible tion is the study by Andrews and Dowling with the proposed pattern (loudness: Jensen & (1991), who demonstrated that even 5-year­ Neff, 1993; Schneider & Trehub, 1985; olds are able to identify a familiar tune within Sinnot & Aslin, 1985; frequency selectivity: a complex mixture based on pitch and timbre Olsho, 1985; Olsho et al., 1988; Sinnot & differences between tones. These abilities de­ Aslin, 1985; Jensen & Neff, 1993; timbre: velop with age, although the differences may Trehub, Endman, & Thorpe, 1990; Clarkson, be due to changes in performance level rather Martin, & Miciek, 1996; AlIen & Wightman, than to changes in functioning mode. Elderly 1995; and temporal acuity: Werner & Rubel, people do not show a deficit in this pro­ 1992). cess compared with young adults (Trainor & Despite this poorer performance level in Trehub, 1989). infants, the same functioning mode seems to underlie these low-level auditory processes. For instance, similar patterns of results in in­ Segmentation into Groups fants and adults indicate the existence in both Segmentation into groups has been demon­ groups of tuning curves and auditory filters strated in 6- to 8-month-old infants (Trainor (Abdala & Folsom, 1995), the double coding & Adams, 2000; Krurnhansl & Jusczyk, 1990; of pitch (pitch height and chroma; Demany & Thorpe & Trehub, 1989) and in 5- to 7-year­ Armand, 1984, 1985), and the phenomenon of old children (Bamberger, 1980; Drake, 1993a, the missing fundamental and pitch extraction 1993b; Drake, Dowling & Palmer, 1991; 438 Auditory Perception and Cognition

Thorpe & Trehub, 1989). Similar segmenta- Higher-Level Auditory Organization tion principles are used by adult musicians and In contrast to a noted general absence of nonmusicians, although musicians are more changes with age and experience in the pre­ systematic in their responses (Fitzgibbons, ceding sections, considerable differences are Pollatsek, & Thomas, 1974; Peretz & Morais, expected both in the precision and mode of 1989). functioning of more complex auditory pro­ cesses. Because higher-level processes are ,j Temporal Regularity Extraction , l conceived as being harder, researchers have , :- .; Temporal regularity extraction is functional simply not looked for (or published) their at an early age: The capacity to detect a small functioning in infants and children: It is never change in tempo of an isochronous sequence clear whether the absence of effect is due to is present in 2-month-old infants (Baruch & the lack of process or to methodologicallim­ Drake, 1997). Children are able to synchro­ its. Very few studies investigate these pro­ nize with musical sequences by the age of cesses in infants or children. However, some 5 years (Dowling, 1984; Dowling & Harwood, studies do demonstrate significant differences 1986; Drake, 1997; Fraisse, Pichot, & between different groups of adults (musi­ Clairouin, 1969), and their incorrect repro­ cians and nonmusicians, different cultures, ductions of musical rhythms almost always etc.). Whereas few studies have investigated respect the underlying pulse (Drake & Gerard, higher-level processing in infants for sim­ 1989). Both musicians and nonmusicians use ple and musical sequences, more is known underlying temporal regularities to organize about such processes for speech signals (see complex sequences (povel, 1985). Chap. 12, this volume). Tempo Discrimination Tempo discrimination follows the same pat­ Hierarchical Segmentation Organization tern over age: The same zone of optimal tempo This type of organization has been investi­ is observed at 2 months as in adults, although gated with a segmentation paradigm that has there is a significant slowing and widening of been applied to children and adults but not this range during childhood (Baruch & Drake, to infants. Participants are asked to listen to 1997; Drake et aI., 2000). a piece of music and then to indicate where they perceive a break in the music. In both Simple Rhythmic Forms and Ratios children (Bertrand, 1999) and adults (Clarke Like children and adults, infants demonstrate & Krumhansl, 1990; Deliege, 1990; Pollard­ a processing preference for simple rhythmic Gott, 1983), the smallest breaks correspond to and melodic forms and ratios (e.g., isochrony low-level grouping processes, described ear­ and 2: 1 ratios; Trainor, 1997). Two-month­ lier, that are based on changes in the physical old infants discriminate and categorize simple characteristics of events and temporal proxim­ rhythms (De many, McKenzie, & Vurpillot, ity. Larger groupings over longer time spans 1977; Morrongiello & Trehub, 1987), and were observed only in adults. These segmen­ young children discriminate and reproduce tations correspond to greater changes in the better rhythmic sequences involving 2: 1 time same parameters. Such a hierarchy in segmen­ ratios compared with more complex ratios tation was observed in both musicians and (Dowling & Harwood, 1986; Drake, 1993b; nonmusicians, although the principles used Drake & Gerard, 1989). were more systematic in adult musicians. This .. Conclusions 439

type of organization has not been investigated ever, adult musicians show a greatly enhanced in infants. ability to attend selectively to a particular as­ pect of an auditory scene, and this enhance­ Hierarchical Metric Organization ment appears after only a couple of years of The use of mUltiple regular levels has also not musical tuition (Drake et aI., 2000). been investigated in infants. Whereas 4-year­ old children do not demonstrate the use of Summary this organizational principle, its use does in­ The interactive developmental hypothesis pre­ crease with age. By the age of about 7 years, sented here therefore provides a satisfactory children use the reference period and one starting point for future research. The exist­ hierarchical level above or below this level ing data fit relatively well into this frame­ (Drake, Jones & Baruch, 2000). However, work, but future work that more extensively the use of more hierarchical levels seems to investigates the principle findings concerning be restricted to musicians (Bamberger, 1980; auditory perception and cognition in adults Drake, 1993b), and musicians tend to orga­ could conceivably invalidate such a position nize musical rhythms around higher hierarchi­ and lead to the creation of a completely dif­ callevels than do nonmusicians (Drake et aI., ferent perspective. 2000; Monahan, Kendall, & Carterette, 1987; Stoffer, 1985). The ability to use the metri­ cal structure in music performance improves CONCLUSIONS rapidly with musical tuition (Drake & Palmer, 2000; Palmer & Drake, 1997). One major theme running through this chap­ ter has been that our auditory perceptual sys­ Melodic Contour tem does not function like a tape recorder, Infants aged 7 to 11 months can detect changes recording passively the sound information ar­ in melodic contours better than they can de­ riving in the ear. Rather, we actively strive tect local changes in intervals, a pattern that to make out of the ever-changing ar­ is reflected in adult nonmusicians but not in ray of sounds, putting together parts that be­ adult musicians (Ferland & Mendelson, 1989; long together and separating out conflicting Trehub, Bull, & Thorpe, 1984; Trehub, information. The result is the perception of Thorpe, & Morrongiello, 1987; fo{more de­ auditory events and sequences. Furthermore, tails, see Chap. 11, this volume). based on previous information and current desires, attentional processes help determine Focal Attending the exact contents of the perceived sound Nothing is known about infants' ability to environment. focus attention on particular events in the Such a dynamic approach to auditory per­ auditory environment. Four-year-old children ception and cognition has been developing spontaneously focus attention on the most gradually over the last 30 years, being greatly physically salient event and have difficulty influenced by a handful of creative thinkers pulling attention away from this object and and experimentalists, to whom we hope we directing it toward others. Considerable im­ have done justice. In this chapter, the first in provements in this ability are observed up to the Stevens ' Handbook series devoted to audi­ the age of 10 years, and no additional improve­ tory perception and cognition, we have tried ment is observed for adult nonmusicans. How- to bring together evidence from a range of 440 Auditory Perception and Cognition complementary fields in the hopes that the of the Acoustical Society of America, 88, 680- resulting juxtaposition of ideas will facilitate 697. and enable future research to fill in the gaps. Aures, W. (1985). Ein Berechnungsverfahren der Rauhigkeit [A roughness calculation method]. Acustica, 58, 268-281. Bamberger, J. (1980). Cognitive structuring in the REFERENCES apprehension and description of simple rhythms. Archives of Psychology, 48,177-199. Abdala, c., & Folsom, R. C. (1995). The de­ Baruch, C. (2001). L'audition du beM et dujeune velopment of frequency resolution in human enfant. Annee Psychologique, 101, 91-124. as revealed by the auditory brainstem response Baruch, c., & Drake, C. (1997). Tempo discrimi­ recorded with notched noise masking. Journal nation in infants. Infant Behavior and Develop­ of the Acoustical Society of America, 98(2), ment, 20(4), 573-577. 921-930. Beauvois, M. W., & Meddis, R. (1997). Time de­ Alain, C. A. (2000). Selectively attending to au­ cay of auditory stream biasing. Perception and ditory objects. Frontiers in Bioscience, 5, 202- Psychophysics, 59, 81-86. 212. Berg, K. M. (1993). A comparison of thresholds for Alain, C. A., Arnott, S. R., & Picton T. W. (2001). 1/3-octave filtered clicks and noise burst in in­ Bottom-up and top-down influences on audi­ fants and adults. Perception and Psychophysics, tory scene analysis: Evidence from event-related 54(3),365-369. brain potentials. Journal of Experimental Psy­ Berg, K. M., & Smith, M. C. (1983). Behavioral chology: Human Perception and Peiformance, thresholds for tones during infancy. Journal of 27(5), 1072-1089. Experimental Child Psychology, 35, 409-425. Alain, C. A., Woods, D. L., & Ogawa, K. H. (1994). Bertrand, D. (1999). Groupement rythmique et Brain indices of automatic pattern processing. representation mentale de melodies chez NeuroReport, 6, 140-144. [,enfant. Liege, Belgium: Universite de Liege. AlIan, L. (1992). The internal clock revisited. In Bey, C. (1999). Reconnaissance de melodies in­ E Macar, V. Pouthas, & W. J. Friedman (Eds.), tercatees et formation de flux auditifs: Analyse Time, action and cognition: Towards bridg­ fonctionnelle et exploration neuropsychologique ing the gap (pp. 191-202). Dordrecht: Kluwer [Recognition of interleaved melodies and audi­ Academic. tory stream formation: Functional analysis and Alien, P., & Wightman, E (1995). Effects of signal neuropsychological exploration]. Unpublished and masker uncertainty on children's detection. doctoral dissertation, Ecole des Hautes Etudes Journal ofSpeech and Hearing Research, 38(2), en Sciences Sociales (EHESS), Paris. 503-51l. Bey, c., & McAdams, S. (in press). Schema-based Andrews, M. W., & Dowling, W. J. (1991). The de­ processing in auditory scene analysis. Percep­ velopment of perception of interleaved melodies tion and Psychophysics. and control of auditory attention. Music Percep­ Bigand, E., McAdams, S., & Foret, S. (2000). Di­ tion, 8(4),349-368. vided attention in music. International Journal Anstis, S., & Saida, S. (1985). Adaptation of au­ of Psychology, 35, 270-278. ditory streaming to frequency-modulated tones. Block, R. A. (1990). Models of psychological Journal of Experimental Psychology: Human time. In R. A. Block (Ed.), Cognitive models Perception and Peiformance, 11, 257-27l. of psychological time (pp. 1-35). Hillsdale, NJ: Assmann, P. E, & Summerfield, Q. (1990). Model­ Erlbaum. ing the perception of concurrent vowels: Vowels Boltz, M. G. (1994). Changes in internal tempo and with different fundamental frequencies. Journal effects on the learning and remembering of event References 441

durations. Journal of Experimental Psychology: grounds or of harmonics in a complex tone. Per­ Learning, Memory, and Cognition, 20(5), 1154- ception and Psychophysics, 56, 155-162. 1171. Bregman, A. S., & Campbell, J. (1971). Primary Boltz, M. G. (1995). Effects of event structure on auditory stream segregation and perception of retrospective duration judgments. Perception & order in rapid sequences of tones. Journal of Psychophysics, 57, 1080-1096. Experimental Psychology, 89, 244-249. Boltz, M. G. (1998). Task predictability and Bregman, A. S., & Dannenbring, G. L. (1973). The remembered duration. Perception and Psy­ effect of continuity on auditory stream segre­ chophysics, 60(5), 768-784. gation. Perception and Psychophysics, 13, 308- Boltz, M. G., Schulkind, M., & Kantra, S. (1991). 312. Effects of background music on the remember­ Bregman, A. S., Liao, c., & Levitan, R (1990). ing of filmed events. Memory and Cognition, Auditory grouping based on fundamental fre­ 19(6),593-606. quency and formant peak frequency. Canadian Botte, M. c., Drake, c., Brochard, R, & JournaL of Psychology, 44, 400-413. McAdams, S. (1997). Perceptual attenuation Bregman, A. S., & Pinker, S. (1978). Auditory of nonfocused auditory stream. Perception and streaming and the building of timbre. Canadian Psychophysics, 59(3),419-425. JournaL of Psychology, 32, 19-31. Bregman, A. S. (1978a). Auditory streaming: Com­ Broadbent, D. E. (1958). Perception and commu­ petition among alternative organizations. Per­ nication. London: Pergamon. ception and Psychophysics, 23, 391-398. Brochard, R, Drake, c., Botte, M.-C., & Bregman, A. S. (1978b). Auditory streaming is cu­ McAdams, S. (1999). Perceptual organization mulative. JournaL of ExperimentaL Psychology: of complex auditory sequences: Effect of num­ Human Perception and Peiformance, 4, 380- ber of simultaneous subsequences and frequency 387. separation. JournaL of ExperimentaL Psychol­ Bregman, A. S. (1990). Auditory scene anaLysis: ogy: Human Perception and Peiformance, 25, The perceptuaL organization of sound. Cam­ 1742-1759. bridge: MIT Press. Brokx, J. P. L., & Nooteboom, S. G. (1982). Into­ Bregman, A. S. (1991). Using brief glimpses to nation and the perceptual separation of simulta­ decompose mixtures. In J. Sundberg, L. Nord, neous voices. Journal of Phonetics, 10, 23-36. & R. Carleson (Eds.), Music, language, speech Buell, T. N., & Hafter, E. R. (1991). Combination and brain (pp. 284-293). London: Macmillan. of binaural information across frequency bands. Bregman, A. S. (1993). Auditory scene analy­ JournaL ofthe AcousticaL Society ofAmerica, 90, sis: Hearing in complex environments. In S. 1894-1900. McAdams & E. Bigand (Eds.), Thinking in Cabe, P. A., & Pittenger, J. B. (2000). Human sen­ sound: The cognitive psychology of human au­ sitivity to acoustic information from vessel fill­ dition (pp. 10-36). Oxford: Oxford University ing. JournaL of Experimental PsychoLogy: Hu­ Press. man Perception and Peiformance, 26, 313-324. Bregman, A. S., & Ahad, P. (1995). Demon­ Carello, c., Anderson, K. A., & Kunkler-Peck, A. strations of auditory scene anaLysis: The 1. (1998). Perception of object length by sound. perceptual organization of sound. Montreal, PsychologicaL Science, 9, 211-214. Quebec, Canada: McGill University. Com­ Carlyon, R. P. (1991). Discriminating between co­ pact disc available at http://www.psych.mcgill. herent and incoherent frequency modulation of callabslaudi tory /bregmancd. h tml. complex tones. Journal ofthe Acoustical Society Bregman, A. S., Ahad, P., Kim, 1., & Melnerich, of America, 89, 329-340. L. (1994). Resetting the pitch-analysis system: Carlyon, R P. (1992). The psychophysics of 1. Effects of rise times of tones in noise back- concurrent sound segregation. Philosophical 442 Auditory Perception and Cognition

Transactions of the Royal Society, London B, Creelman, C. D. (1962). Human discrimination of 336, 347-355. auditory duration. Journal of the Acoustical So­ Carlyon, R. P. (1994). Detecting mistuning in ciety of America, 34, 582-593. the presence of synchronous and asynchronous Crowder, R. G. (1993). Auditory memory. In interfering sounds. Journal of the Acoustical S. McAdams & E. Bigand (Eds.), Thinking in Society ofAmerica, 95, 2622-2630. sound: The cognitive psychology of human au­ Carterette, E. c., & Kendall, R. A. (1999). Com­ dition (pp. 113-145). Oxford: Oxford University parative music perception and cognition. In Press. D. Deutsch (Ed.), The psychology of music Crowder, R. G., & Morton, J. (1969). Precategor­ (2nd ed., pp. 725-791). San Diego: Academic ical acoustic storage (PAS). Perception & Psy­ Press. chophysics,5, 365-373. Chaigne, A., & Doutaut, V. (1997). Numerical sim­ Culling, J. E, & Darwin, C. J. (1993). The role of ulations of xylophones: 1. Time-domain model­ timbre in the segregation of simultaneous voices ing of the vibrating bars. Journal of the Acous­ with intersecting Fo contours. Perception and tical Society of America, 101,539-557. Psychophysics, 34, 303-309. Cherry, E. C. (1953). Some experiments on the Culling, J. E, & Summerfield, Q. (1995). Percep­ recognition of speech, with one and two ears. tual separation of concurrent speech sounds: Ab­ Journal of the Acoustical Society of America, sence of across-frequency grouping by common 25, 975-979. interaural delay. Journal of the Acoustical Soci­ Church, R. M. (1984). Properties of the internal ety of America, 98, 758-797. clock. In J. Gibbon & L. Allan (Eds.), Timing Cutting, J. E. (1976). Auditory and linguistic pro­ and time perception (Vol. 423, pp. 566-582). cesses in speech perception: Inferences from six New York: New York Academy of Sciences. fusions in dichotic listening. Psychological Re­ Ciocca, V., & Bregman, A. S. (1989). The effects view, 83, 114-140. of auditory streaming on duplex perception. Per­ Daniel, P., & Weber, R. (1997). Psychoacousti­ ception and Psychophysics, 46, 39-48. ca1 roughness: Implementation of an optimized Ciocca, v., & Darwin, C. J. (1993). Effects of onset model. Acustica, 83, 113-123. asynchrony on pitch perception: Adaptation or Dannenbring, G. L., & Bregman, A. S. (1976). grouping? Journal of the Acoustical Society of Stream segregation and the illusion of over­ America, 93, 2870-2878. lap. Journal of Experimental Psychology: Hu­ Clarke, E. E, & Krurnhansl, C. (1990). Perceiv­ man Perception and Performance, 2, 544- ing musical time. Music Perception, 7(3),213- 555. 252. Darwin, C. J. (1981). Perceptual grouping of Clarkson, M. G., & Clifton, R. K. (1995). Infant's speech components differing in fundamental pitch perception: Inharmonic tonal complexes. frequency and onset time. Quarterly Journal of Journal of the Acoustical Society of America, Experimental Psychology, 33A, 185-208. 98(3), 1372-1379. Darwin, C. J. (1984). Perceiving vowels in the pres­ Clarkson, M. G., Martin, R. L., & Miciek, S. G. ence of another sound: Constraints on formant (1996). Infants perception of pitch: Number of perception. Journal of the Acoustical Society of harmonics. Infant Behavior and Development, America, 76, 1636-1647. 19, 191-197. Darwin, C. J. (1991). The relationship between Cowan, N. (1988). Evolving conceptions of mem­ speech perception and the perception of other ory storage, selective attention, and their mu­ sounds. In 1. G. Mattingly & M. G. Studdert­ tual constraint within the human information­ Kennedy (Eds.), Modularity and the motor processing system. Psychological Bulletin, 104, theory of speech perception (pp. 239-259). 163-191. Hillsdale, NJ: Erlbaum. References 443

Darwin, C. J., & Carlyon, R. P. (1995). Audi­ de Cheveigne, A., McAdams, S., & Marin, C. M. H. tory grouping. In B. C. J. Moore (Ed.), Hearing (1997). Concurrent vowel identification: H. Ef­ (pp. 387-424). San Diego: Academic Press. fects of phase, harmonicity, and task. Journal of Darwin, C. J., & Ciocca, V. (1992). Grouping in the Acoustical Society of America, 101, 2848- pitch perception: Effects of onset asynchrony 2856. and ear of presentation of a mistuned compo­ Deliege, 1. (1987). Grouping conditions in lis­ nent. Journal of the Acoustical Society ofAmer­ tening to music: An approach to Lerdahl & ica, 91, 3381-3390. Jackendoff's grouping preference rules. Music Darwin, C. J., Ciocca, v., & Sandell, G. R. (1994). Perception, 4, 325-360. Effects of frequency and amplitude modulation Deliege, I. (1990). Mechanisms of cue extraction on the pitch of a complex tone with a mistuned in musical grouping: A study of Sequenza VI for harmonic. Journal of the Acoustical Society of Viola Solo by L. Berio. Psychology ofMusic, 18, America, 95, 2631-2636. 18-45. Darwin, C. J., & Gardner, R. B. (1986). Mistuning Deliege,1. (1993). Mechanisms of cue extraction in a harmonic of a vowel: Grouping and phase ef­ memory for musical time. Contemporary Music fects on vowel quality. Journal of the Acoustical Review, 9(1&2), 191-205. Society of America, 79, 838-845. Demany, L. (1982). Auditory stream segregation Darwin, C. J., & Hukin, R. W. (1999). Auditory in infancy. Infant Behavior and Development, 5, objects of attention: The role of interaural time 261-276. differences. Journal of Experimental Psychol­ Demany, L., & Armand, F. (1984). The perceptual ogy: Human Perception and Performance, 25, reality of tone chroma in early infancy. Journal 617-629. of the Acoustical Society of America, 1,57-66. Darwin, C. J., & Sutherland, N. S. (1984). Group­ Demany, L., & Armand, F. (1985). A propos de la ing frequency components of vowels: When is a perception de la hauteur et du chroma des sons harmonic not a harmonic? Quarterly Journal of purs. Bulletin d'Audiophonologie, 1-2, 123- Experimental Psychology, 36A, 193-208. 132. de Cheveigne, A. (1993). Separation of concurrent Demany, L., McKenzie, B., & Vurpillot, E. (1977). harmonic sounds: Fundamental frequency esti­ Rhythm perception in early infancy. Nature, 266, mation and a time-domain cancellation model 718-719. of auditory processing. Journal of the Acousti­ Desain, P. (1992). A (de)composable theory of cal Society of America, 93, 3271-3290. rhythm perception. Music Perception, 9(4),439- de Cheveigne, A. (1999). Waveform interactions 454. and the segregation of concurrent vowels. Jour­ Deutsch, D. (1975). Two-channel listening to mu­ nal of the Acoustical Society of America, 106, sical scales. Journal of the Acoustical Society of 2959-2972. America, 57, 1156-1160. de Cheveigne, A., Kawahara, H., Tsuzaki, M., & Deutsch, D. (1980). The processing of structured Aikawa, K. (1997). Concurrent vowel identifica­ and unstructured tonal sequences. Perception & tion: I. Effects of relative level and FO difference. Psychophysics, 28, 381-389. Journal of the Acoustical Society of America, Deutsch, D. (1995). Musical illusions and para­ 101, 2839-2847. doxes. La Jolla, CA: Philomel Records. Com­ de Cheveigne, A., McAdams, S., Laroche, J., & pact disc available at http://www.philomel.com. Rosenberg, M. (1995). Identification of concur­ Deutsch, D. (Ed.). (1999). The psychology ofmusic rent harmonic and inharmonic vowels: A test (2nd ed.). San Diego: Academic Press. of the theory of harmonic cancellation and en­ Deutsch, D., & Feroe, 1. (1981). The internal rep­ hancement. Journal of the Acoustical Society of resentation of pitch sequences in tonal music. America, 97, 3736-3748. Psychological Review, 88(6), 503-522. 444 Auditory Perception and Cognition

Divenyi, P. L., & Danner, W. E (1977). Discrimi­ nitive framework to structure simple rhythms. nation of time intervals marked by brief acoustic Psychological Research, 51, 16-22. pulses of various intensities and spectra. Percep­ Drake, C., Jones, M. R, & Baruch, C. (2000). The tion & Psychophysics, 21(2), 125-142. development of rhythmic attending in auditory Dowling, W. J. (1973a). The perception of inter­ sequences: Attunement, referent period, focal at­ leaved melodies. Cognitive Psychology,S, 322- tending. Cognition, 77, 251-288. 327. Drake, c., & McAdams, S. (1999). The continuity Dowling, W. J. (1973b). Rhythmic groups and sub­ illusion: Role of temporal sequence structure. jective chunks in memory for melodies. Percep­ Journal of the Acoustical Society of America, tion & Psychophysics, 14, 37-40. 106, 3529-3538. Dowling, W. J. (1984). Development of musical Drake, C., & Palmer, C. (1993). Accent structures schemata in children's spontaneous singing. In in music performance. Music Perception, 10(3), W. R Crozier & A. J. Chapman (Eds.), Cognitive 343-378. processes in the perception ofart (pp. 145-163). Drake, c., & Palmer, C. (2000). Skill acquisition in Amsterdam: North-Holland. music performance: Relations between planning Dowling, W. J., & Harwood, D. L. (1986). Music and temporal control. Cognition, 74(1), 1-32. cognition. New York: Academic Press. Drake, c., Penel, A., & Bigand, E. (2000). Tap­ Drake, C. (1993a). Perceptual and performed ac­ ping in time with mechanically and expres­ cents in musical sequences. Bulletin of the Psy­ sively performed music. Music Perception. 18, chonomic Society, 31(2), 107-110. 1-23. Drake, C. (1993b). Reproduction of musical rhy­ Duifhuis, H., Willems, L. E, & Sluyter, R J. thms by children, adult musicians and adult (1982). Measurement of pitch in speech: An nonmusicians. Perception & Psychophysics, implementation of Goldsteins's theory of pitch 53(1),25-33. perception. Journal of the Acoustical Society of Drake, C. (1997). Motor and perceptually preferred America, 83, 687-695. synchronisation by children and adults: Binary Ehresman, D., & Wessel, D. L. (1978). Percep­ and ternary ratios. Polish Journal of Develop­ tion oftimbral analogies, IRCAM Report no. 13. mental Psychology, 3(1),41-59. Paris: IRCAM. Drake, C. (1998). Psychological processes in­ Essens, P. J. (1995). Structuring temporal se­ volved in the temporal organization of complex quences: Comparison of models and factors of auditory sequences: universal and acquired pro­ complexity. Perception & Psychophysics, 57(4), cesses. Music Perception, 16(1), 11-26. 519-532. Drake, c., & Bertrand, D. (2001). The quest for Essens, P. J., & Povel, D. J. (1985). Metrical universals in temporal processing in music. An­ and nonmetrical representations of temporal pat­ nals of the New York Academy of Sciences, 930, terns. Perception & Psychophysics, 37(1), 1-7. 17-27. Estes, W. K. (1972). An associative basis for coding Drake, c., & Botte, M.-C. (1993). Tempo sen­ and organization in memory. In A. W. Me1ton & sitivity in auditory sequences: Evidence for E. Martin (Eds.), Coding processes in human a multiple-look model. Perception & Psy­ memory. New York: Wiley. chophysics, 54, 277-286. Ferland, M. B., & Mendelson, M. J. (1989). Infant Drake, c., Dowling, W. J. & Palmer, C. (1991). categorization of melodic contour. Infant Behav­ Accent structures in the reproduction of simple ior and Development, 12, 341-355. tunes by children and adult pianists. Music Per­ Fishman, Y. 1., Reser, D. H., Arezzo, J. c., & Stein­ ception, 8, 315-334. schneider, M. (2000). Complex tone processing Drake, c., & Gerard, c. (1989). A psychological in primary auditory cortex of the awake mon­ pulse train: How young children use this cog- key: 1. Neural ensemble correlates of roughness. References 445

Journal of the Acoustical Society of America, ical bases of hearing (pp. 609-622). New York: 108, 235-246. Wiley. Fitzgibbons, P. J., & Gordon Salant, S. (1998). Au­ Green, D. M., & Mason, C. R. (1985). Auditory ditory temporal order perception in younger and profile analysis: Frequency, phase, and Weber's older adults. Journal of Speech, Language, and law. Journal of the Acoustical Society of Amer­ Hearing Research, 41(5), 1052-1060. ica, 77, 1155-1161. Fitzgibbons, P. J., Pollatsek, A., & Thomas, 1. B. Green, D. M., Mason, C. R., & Kidd, G. (1984). (1974). Detection of temporal gaps within and Profile analysis: Critical bands and duration. between perceptual tonal groups. Perception & Journal of the Acoustical Society of America, Psychophysics, 16, 522-528. 74, 1163-1167. Fraisse, P. (1956). Les structures rythmiques. Lou­ Grey, J. M. (1977). Multidimensional perceptual vain, Belgium: Publications Universitaires de scaling of musical timbres. Journal ofthe Acous­ Louvain. tical Society of America, 61, 1270-1277. Fraisse, P. (1963). The psychology of time. New Grey, J. M., & Gordon, J. W. (1978). Perceptual York: Harper & Row. effects of spectral modifications on musical tim­ bres. Journal of the Acoustical Society ofAmer­ Fraisse, P. (1982). Rhythm and tempo. In ica,63, 1493-1500. D. Deutsch (Ed.), The psychology of music (pp. 149-180). New York: Academic Press. Grimault, N., Micheyl, c., Carlyon, RP., Arthaud, P., & Collet, L. (2000). Influence of peripheral Fraisse, P., Pichot, P., & Clairouin, G. (1969). resolvability on the perceptual segregation of Les aptitudes rythmiques: Etude compan!e des harmonic complex tones differing in fundamen­ oligophrenes et des enfants normaux. Journal de tal frequency. Journal of the Acoustical Society Psychologie Normale et Pathologique, 42, 309- of America, 108,263-271. 330. Guski, R (2000). Studies in auditive kinetics. In A. Gardner, R. B., Gaskill, S. A. , & Darwin, C. J. Schick, M. Meis, & C. Reckhardt (Eds.), Con­ (1989). Perceptual grouping of formants with tributions to psychological acoustics: Results of static and dynamic differences in fundamental the 8th Oldenburg Symposium on Psychological frequency. Journal of the Acoustical Society of Acoustics (pp. 383-401). Oldenburg: Bis. America, 85, 1329-1337. Hafter, E. R, & Buell, T. N. (1985). The importance Getty, D. J. (1975). Discrimination of short tem­ of transients for maintaining the separation of poral intervals: A comparison of two models. signals in space. In M. Posner & O. Marin (Eds.), Perception & Psychophysics, 18, 1-8. Attention and performance Xl (pp. 337-354). Giard, M.-H., Collet, L., Bouchet, P., & Pernier, J. Hillsdale, NJ: Erlbauffi. (1993). Modulation of human cochlear activity Hafter, E. R , Buell, T. N., & Richards, V. M. during auditory selective attention. Brain Re­ (1988). Onset-coding in lateralization: Its form, search, 633, 353-356. site, and function. In G. M. Edelman, W. E. Giard, M.-H., Fort, A. , Mouchetant-Rostaing, Y., Gall, & W. M. Cowan (Eds.), Auditory function: & Pernier, J. (2000). Neurophysiological mech­ Neurobiological bases ofhearing (pp. 647-676). anisms of auditory selective attention in humans. New York: Wiley. Frontiers in Bioscience, 5, 84-94. Hajda, J. M., Kendall, R A., Carterette, E. c., Glucksberg, S., & Cowen, G. N. (1970). Mem­ & Harshberger, M. L. (1997). Methodologi­ ory for non-attended auditory material. Cogni­ cal issues in timbre research. In I. Deliege & tive Psychology, 1, 149-156. J. Sloboda (Eds.), Perception and cognition of Green, D. M. (1988). Auditory profile analysis: music (pp. 253-306). Hove, England: Psychol­ Some experiments on spectral shape discrimi­ ogy Press. nation. In G. M. Edelman, W. E. Gall, & W. M. Hall, J. W., Grose, J. H., & Mendoza, L. (1995). Cowan (Eds.), Auditory function: Neurobiolog- Across-channel processes in masking. In B. C. J. Moore (Ed.), Hearing (pp. 243-266). San Diego: a vowel. Journal of the Acoustical Society of Academic Press. America, 98, 1380-1387. Halpem, A. R., & Darwin, C. J. (1982). Duration Huron, D., & Fantini, D. (1989). The avoidance discrimination in a series of rhythmic events. of inner-voice entries: Perceptual evidence and Perception & Psychophysics, 31(1), 86-89. musical practice. Music Perception, 7, 43-48. Handel, S. (1989). Listening: An introduction to the Iverson, P. (1995). Auditory stream segregation by perception of auditory events. Cambridge, MA: musical timbre: Effects of static and dynamic MITPress. acoustic attributes. Journal ofExperimental Psy­ Hartmann, W. M. (1988). Pitch perception and the chology: Human Perception and Peiformance, segregation and integration of auditory entities. 21, 751-763. In G. M. Edelman, W. E. Gall, & W. M. Cowan Iverson, P., & Krumhansl, C. L. (1993). Isolating (Eds.), Auditory function (pp. 623-645). New the dynamic attributes of musical timbre. Jour­ York: Wiley. nal of the Acoustical Society of America, 94, Hartmann, W. M., & Johnson, D. (1991). Stream 2595-2603. segregation and peripheral channeling. Music Jensen, K. J., & Neff, D. L. (1993). Development of Perception, 9, 155-184. basic auditory discrimination in preschool chil­ Hartmann, W. M., McAdams, S., & Smith, B. K. dren. Psychological Science, 4, 104-107. (1990). Hearing a mistuned harmonic in an Jones, M. R (1976). Time, our last dimension: To­ otherwise periodic complex tone. Journal of ward a new theory of perception, attention, and the Acoustical Society of America, 88, 1712- memory. Psychological Review, 83(5), 323-355. 1724. Jones, M. R. (1987). Dynamic pattern structure in Heise, G. A., & Miller, G. A. (1951). An exper­ music: Recent theory and research. Perception imental study of auditory patterns. American & Psychophysics, 41(6),631-634. Journal of Psychology, 64, 68-77. Jones, M. R (1990). Learning and development of Helmholtz, H. L. F. von. (1885). On the sensations expectancies: An interactionist approach. Psy­ of tone as a physiological basis for the theory chomusicology, 2(9), 193-228. of music. New York, from 1877 trans by A. J. Jones, M. R., & Boltz, M. (1989). Dynamic attend­ Ellis of 4th German ed.: republ.l954, New York: ing and responses to time. PsychologicaL Review, Dover. 96, 459-491. Hill, N. I. , & Darwin, C. J. (1996). Lateralization Jones, M. R, & Yee, W. (1993). Attending to audi­ of a perturbed harmonic: Effects of onset asyn­ tory events: The role of temporal organization. chrony and mistuning. Journal of the Acoustical In S. McAdams & E. Bigand (Eds.), Thinking in Society of America, 100, 2352-2364. sound: The cognitive psychoLogy of human au­ Houtsma, A. J. M., Rossing, T. D., & Wagenaars, dition (pp. 69-112). Oxford: Oxford University W. M. (1987). Auditory demonstrations on Press. compact disc. Melville, NY: Acoustical So­ Jones, M. R., & Yee, W. (1997). Sensitivity to time ciety of America. Compact disc available at change: the role of context and skill. Journal http://asa.aip.org/discs.html. ofExperimentaL Psychology: Human Perception Hukin, R. W., & Darwin, C. J. (1995a). Compari­ and Peiformance, 23, 693-709. son of the effect of onset asynchrony on auditory Kameoka, A., & Kuriyagawa, M. (1969). Conso­ grouping in pitch matching and vowel identifi­ nance theory: Part n. Consonance of complex cation. Perception and Psychophysics, 57, 191- tones and its calculation method. Journal of the 196. Acoustical Society of America, 45, 1460-1469. Hukin, R W., & Darwin, C. J. (1995b). Effects Killeen, P. R., & Weisse, N. A. (1987). Optimal of contralateral presentation and of interaural timing and the Weber function. Psychological time differences in segregating a harmonic from Review, 94(4),445-468. J{et'erences 447

Koehler, W. (1929). Gestalt psychology. New York: Lakatos, S., McAdams, S., & Causse, R. (1997). Liveright. The representation of auditory source charac­ Krimphoff, J., McAdams, S., & Winsberg, S. teristics: Simple geometric form. Perception & (1994). Caracterisation du timbre des sons com­ Psychophysics, 59, 1180-1190. plexes: n. Analyses acoustiques et quantification Lambourg, C., Chaigne, A, & Matignon, D. psychophysique [Characterization of the tim­ (2001). Time-domain simulation of damped im­ bre of complex sounds: n. Acoustic analyses pacted plates: n. Numerical model and results. and psychophysical quantification]. Journal de Journal of the Acoustical Society of America, Physique, 4(C5), 625-628. 109, 1433-1447. Kristofferson, A B. (1980). A quantal step function Large, E. W., Jones, M. R. (1999). The dynam­ in duration discrimination. Perception & Psy­ ics of attending: How people track time-varying chophysics, 27, 300-306. events. Psychological Review, 106(1), 119-159. Krumhansl, C. L. (1989). Why is musical timbre so Lea, A (1992). Auditory models of vowel percep­ hard to understand? In S. Nielzen & O. Olsson tion. Unpublished doctoral dissertation, Univer­ (Eds.), Structure and perception ofelectroacous­ sity of Nottingham, Nottingham, England. tic sound and music (pp. 43-53). Amsterdam: Lee, C. S. (1991). The perception of metrical struc­ Excerpta Medica. ture: Experimental evidence and a model. In Krurnhansl, C. L., & Jusczyk, P. W. (1990). In­ P. Howell, R. West, & 1. Cross (Eds.), Repre­ fants' perception of phrase structure in music. senting musical structure (pp. 59-127). London: Psychological Science, 1, 70-73. Academic Press. Krumhansl, C. L., Louhivuori, J., Toiviainen, P., Lerdahl, F., & Jackendoff, R. (1983). A generative Jaervinen, T., & Eerola, T. (1999). Melodic ex­ theory of tonal music. Cambridge: MIT Press. pectation in Finnish spiritual folk hymns: Con­ Little, V. M., Thomas, D. G., & Letterrnan, M. R. vergence of statistical, behavioral, and compu­ (1999). Single-trial analysis of developmental tational approaches. Music Perception, 17(2), trends in infant auditory event-related potentials. 151-195. Developmental Neuropsychology, 16(3), 455- Krumhansl, C. L., Toivanen, P., Eerola, T., 478. Toiviainen, P., Jaervinen, T., & Louhivuori, J. Longuet-Higgins, H. c., & Lee, C. S. (1982). The (2000). Cross-cultural music cognition: Cogni­ perception of musical rhythms. Perception, 11, tive methodology applied to North Sami yoiks. 115-128. Cognition, 76( 1), 13-58. Longuet-Higgins, H. c., & Lee, C. S. (1984). The Kubovy, M. (1981). Concurrent-pitch segregation rhythmic interpretation of monophonic music. and the theory of indispensable attributes. In M. Music Perception, 1,424-440. Kubovy & J. R. Pomerantz (Eds.), Perceptual or­ Luce, G. G. (1972). Body time. London: Temple ganization (pp. 55-98). Hillsdale, NJ: Erlbaum. Smith. Kubovy, M., Cutting, J. E., & McGuire, R. M. McAdams, S. (1989). Segregation of concurrent (1974). Hearing with the third ear: Dichotic per­ sounds: 1. Effects of frequency modulation co­ ception of a melody without monaural familiar­ herence. Journal of the Acoustical Society of ity cues. Science, 186,272-274. America, 86, 2148-2159. Kunkler-Peck, A. J., & Turvey, M. T. (2000). Hear­ McAdams, S. (1993). Recognition of sound ing shape. Journal of Experimental Psychology: sources and events. In S. McAdams & E. Human Perception and Performance, 26, 279- Bigand (Eds.), Thinking in sound: The cognitive 294. psychology of human audition (pp. 146-198). Lakatos, S. (2000). A common perceptual space Oxford: Oxford University Press. for harmonic and percussive timbres. Perception McAdams, S., & Bertoncini, J. (1997). Organi­ and Psychophysics, 62, 1426-1439. zation and discrimination of repeating sound 448 Auditory Perception and Cognition

sequences by newborn infants. Journal of the and performance VII (pp. 89-111). Hillsdale, Acoustical Society ofAmerica, 102, 2945-2953. NJ: Erlbaum. McAdams, S., & Bigand, E. (Eds.). (1993). Think­ Miller, G. A. (1956). The magic number seven, plus ing in sound: The cognitive psychology ofhuman or minus two: Some limits on our capacity for audition. Oxford: Oxford University Press. processing information. Psychological Review, McAdams, S., Botte, M.-C., & Drake, C. (1998). 63,81-97. Auditory continuity and loudness computation. Monahan, C. B., Kendall, R. A., & Carterette, E. C. Journal of the Acoustical Society of America, (1987). The effect of melodic and temporal con­ 103, 1580-159l. tour on recognition memory for pitch change. McAdams, S., & Bregman, A. S. (1979). Hearing Perception & Psychophysics, 41(6), 576-600. musical streams. Computer Music Journal, 3(4), Moore, B. C. J., Glasberg, B. R., & Peters, R. W. 26-43. (1985). Relative dominance of individual par­ McAdams, S., & Cunibile, 1. C. (1992). Percep­ tials in determining the pitch of complex tones. tion of timbral analogies. Philosophical Trans­ Journal ofthe Acoustical Society ofAmerica, 77, actions of the Royal Society, London, Series B, 1853-1860. 336, 383-389. Moore, B. C. J., Peters, R. W., & Glasberg, B. R. McAdams, S. , & Marin, C. M. H. (1990). Audi­ (1985). Thresholds for the detection of inhar­ tory processing offrequency modulation coher­ monicity in complex tones. Journal ofthe Acous­ ence. Paper presented at the Fechner Day '90, tical Society of America, 77, 1861-1867. 6th Annual Meeting of the International Society Moray, N. (1959). Attention in dichotic listening: for Psychophysics, Wiirzburg, Germany. Affective cues and the influence of instructions. McAdams, S., & Winsberg, S. (2000). Psychophys­ Quarterly Journal of Experimental Psychology, ical quantification of individual differences in 11, 56-60. timbre perception. In A. Schick, M. Meis, & Morrongiello, B. A., & Trehub, S. E. (1987). Age­ C. Reckhardt (Eds.), Contributions to psycho­ related changes in auditory temporal perception. logical acoustics: Results of the 8th Olden­ Journal of Experimental Child Psychology, 44, burg Symposium on Psychological Acoustics 4l3-426.

(pp. 165-182). Oldenburg, Germany: Bis. N~iaUinen , R. (1992). Attention and brain function.

,f. McAdams, S., Winsberg, S., Donnadieu, S., De Hillsdale, NJ: Erlbaum. ·"·c. <. Soete, G., & Krimphoff, J. (1995). Perceptual N~Uitanen, R. (1995). The mismatch negativity: A scaling of synthesized musical timbres: Com­ powerful tool for cognitive . Ear & mon dimensions, specificities, and latent subject Hearing, 16, 6-18. classes. Psychological Research, 58, 177-192. Nozza, R. J., & Wilson, W. R. (1984). Masked and McFadden, D., & Wright, B. A. (1987). Comod­ unmasked pure-tone thresholds of infants and ulation masking release in a forward-masking adults: Development of auditory frequency se­ paradigm. Journal of the Acoustical Society of lectivity and sensitivity. Journal of Speech and America, 82, 1615-1630. Hearing Research, 27, 6l3-622. Meric, C. & Collet, L. (1994). Attention and otoa­ Olsho, L. W. (1985). Infant auditory perception: coustic emissions: A review. Neuroscience and Tonal masking. Infant Behavior and Develop­ Behavioral Review, 18,215-222. ment,8, 371-384. Michon, J. A. (1975). Time experience and mem­ Olsho, L. W., Koch, E. G., Carter, E. A., Halpin, C. ory processes. In J. T. Fraser & N. Lawrence F. , & Spetner, N. B. (1988). Pure-tone sensitiv­ (Eds.), The study of time (pp. 2-22). Berlin: ity of human infants. Journal of the Acoustical Springer. Society of America, 84(4), l316-l324. Michon, J. A. (1978). The making of the present: Ornstein, R. E. (1969). On the experience of time. A tutorial review. In J. Requin (Ed.), Attention Harmondsworth, England: Penguin. References 449

Pal mer, C. (1989). Mapping musical thought to ofthe Acoustical Society ofAmerica, 105, 2773- musical performance. Journal of Experimen­ 2782. tal Psychology: Human Perception and Peifor­ Pressnitzer, D., McAdams, S., Winsberg, S., & mance, 15, 331-346. Fineberg, J. (2000). Perception of musical ten­ Palmer, c., & Drake, C. (1997). Monitoring and sion for nontonal orchestral timbres and its re­ planning capacities in the acquisition of music lation to psychoacoustic roughness. Perception performance skills. Canadian Journal of Exper­ and Psychophysics, 62, 66-80. imental Psychology, 51(4), 369-384. Preusser, D. (1972). The effect of structure and rate Penel, A, & Drake, C. (1998). Sources of tim­ on the recognition and description of auditory ing variations in music performance: A psycho­ temporal patterns. Perception & Psychophysics, logical segmentation model. Psychological Re­ 11(3),233-240. search, 61, 12-32. Rand, T. C. (1974). Dichotic release from masking Peretz, I., & Morais, J. (1989). Music and modu­ for speech. Journal of the Acoustical Society of larity. Contemporary Music Review, 4, 279-294. America, 55, 678-680 (Letter to the editor). Plomp, R (1970). Timbre as a multidimensional Rasch, R (1978). The perception of simultaneous attribute of complex tones. In R Plomp & notes as in polyphonic music. Acustica, 40, 21- G. F. Smoorenburg (Eds.), Frequency analysis 33. and periodicity detection in hearing (pp. 397- Rasch, R. A. (1979). Synchronization in performed 414). Leiden: Sijthoff. ensemble music. Acustica, 43(2), 121-131. Plomp, R. (1976). Aspects oftone sensation: A psy­ Repp, B. H. (1987). The sound of two hands chophysical study. London: Academic Press. clapping: An exploratory study. Journal of the Plomp, R, & Levelt, W. J. M. (1965). Tonal con­ Acoustical Society of America, 81, 1100-1109. sonance and critical bandwidth. Journal of the Repp, B. H. (1992). Probing the cognitive repre­ Acoustical Society of America, 38, 548-560. sentation of musical time: Structural constraints Plomp, R., & Steeneken, J. M. (1971). Pitch ver­ on the perception of timing perturbations. Cog­ sus timbre. Proceedings of the 7th International nition, 44, 241-281. Congress of Acoustics, Budapest, 3, 377-380. Risset, J. c., & Wessel, D. L. (1999). Exploration of Pollard-Gott, L. (1983). Emergence of thematic timbre by analysis and synthesis. In D. Deutsch concepts in repeated listening to music. Cogni­ (Ed.), The psychology of music (2nd ed., pp. tive Psychology, 15, 66-94. 113-168). San Diego: Academic Press. Povel, D. J. (1981). Internal representation of sim­ Roberts, B., & Bailey, P. J. (1993). Spectral pattern ple temporal patterns. Journal of Experimen­ and the perceptual fusion of harmonics: I. The tal Psychology: Human Perception and Peifor­ role of temporal factors. Journal of the Acousti­ mance, 7, 3-18. cal Society of America, 94, 3153-3164. Povel, D. J. (1985). Perception of temporal pat­ Roberts, B., & Bailey, P. J. (1996). Regularity of terns. Music Perception, 2(4), 411-440. spectral pattern and its effects on the percep­ Povel, D. J., & Essens, P. (1985). Perception oftem­ tual fusion of harmonics. Perception and Psy­ poral patterns. Music Perception, 2, 411-440. chophysics, 58, 289-299. Pressnitzer, D. , & McAdams, S. (1999a). An effect Roberts, B., & Bregman, A. S. (1991). Effects of of the coherence between envelopes across fre ­ the pattern of spectral spacing on the perceptual quency regions on the perception of roughness. fusion of harmonics. Journal of the Acoustical In T. Dau, V. Hohmann, & B. Kollmeier (Eds.), Society of America, 90, 3050-3060. Psychophysics, physiology and models of hear­ Rogers, W. L., & Bregman, A S. (1993a). An ing (pp. 105-108). London: World Scientific. experimental evaluation of three theories of Pressnitzer, D., & McAdams, S. (1999b). Two auditory stream segregation. Perception and phase effects in roughness perception. Journal Psychophysics, 53, 179-189. 450 Auditory Perception and Cognition

Rogers, W. L., & Bregman, A S. (1993b). An Schulze, H. H. (1978). The detectability of local experimental study of three theories of au­ and global displacements in regular rhythmic ditory stream segregation. Perception & Psy­ patterns. Psychological Research, 40(2), 173- chophysics, 53, 179-189. 181. Rogers, W. L., & Bregman, A S. (1998). Cu­ Schulze, H. H. (1989). The perception of temporal mulation of the tendency to segregate audi­ deviations in isochronic patterns. Perception & tory streams: Resetting by changes in location Psychophysics, 45, 291-296. and loudness. Perception and Psychophysics, 60, Singh, P. G. (1987). Perceptual organization of 1216-1227. complex-tone sequences: A tradeoff between Roussarie, V. (1999). Analyse perceptive des struc­ pitch and timbre? Journal of the Acoustical So­ tures vibrantes [Perceptual analysis ofvibrating ciety of America, 82, 886-899. structures]. Unpublished doctoral dissertation, Singh, P. G., & Bregman, A S. (1997). The in­ Universite du Maine, Le Mans, France. fluence of different timbre attributes on the per­ Roussarie, v., McAdams, S., & Chaigne, A. (1998). ceptual segregation of complex-tone sequences. Perceptual analysis of vibrating bars synthesized Journal of the Acoustical Society of America, with a physical model. Proceedings of the 16th 120, 1943-1952. International Congress on Acoustics, Seattle, Sinnot, J. M., & Aslin, R. N. (1985). Frequency 2227-2228. and intensity discrimination in human infants Saldanha, E. L., & Corso, J. E (1964). Timbre cues and adults. Journal of the Acoustical Society of and the identification of musical instruments. America, 78, 1986-1992. Journal of the Acoustical Society of America, Sinnot, J. M., Pisoni, D. B., & Aslin, R. M. (1983). 36,2021-2126. A comparison of pure tone auditory thresholds in human infants and adults. Infant Behaviorand Scharf, B. (1998). Auditory attention: The psycho­ Development, 6, 3-17. acoustical approach. In H. Pashler (Ed.), Atten­ tion (pp. 75-117). Hove, England: Psychology Slawson, A. W. (1968). Vowel quality and musical Press. timbre as functions of spectrum envelope and fundamental frequency. Journal of the Acousti­ Scharf, B., Quigley, S., Aoki, c., & Peachey, N. cal Society of America, 43, 97-1Ol. (1987). Focused auditory attention and Sloboda, J. A. (1985). The musical mind: The frequency selectivity. Perception and Psy­ cognitive psychology of music. Oxford: Oxford chophysics, 42(3),215-223. University Press. Scheffers, M. T. M. (1983). Sifting vowels: Audi­ Sokolov, E. N. (1963). Higher nervous functions: tory pitch analysis and sound integration. Un­ The orienting reflex. Annual Review of Physiol­ published doctoral dissertation, University of ogy, 25, 545-580. Groningen, Groningen, Netherlands. Sorkin, R. D., Boggs, G. J., & Brady, S. L. (1982). Schneider, B. A., & Trehub, S. E. (1985). Behav­ Discrimination of temporal jitter in patterned se­ ioral assessment of basic auditory abilities. In quences of tones. Journal of Experimental Psy­ S. E. Trehub & B. A Schneider (Eds.), Audi­ chology: Human Perception and Performance, tory development in infancy (pp. 104-114). New 8,46-57. York: Plenum. Spiegel, M . E, & Green, D. M . (1982). Signal and Schneider, B. A., Trehub, S. E., & Bull, D. (1980). masker uncertainty with noise maskers of vary­ High frequency sensitivity in infants. Science, ing duration, bandwidth, and center frequency. 207, 1003-1004. Journal ofthe Acoustical Society ofAmerica, 71, Schneider, B. A., Trehub, S. E., Morrongiello, B. 1204-121l. A, & Thorpe, L. A (1986). Auditory sensitivity Stoffer, T. H. (1985). Representation of phrase in preschool children. Journal of the Acoustical structure in the perception of music. Music Per­ Society of America, 79(2),447-452. ception, 3, 191-220. References 451

Strong, W., & Clark, M. (1967a). Perturbations segmentation of tone patterns. Perception and of synthetic orchestral wind-instrument tones. Psychophysics, 62(2), 333-340. Journal of the Acoustical Society of America, Trainor, L. J., & Trehub, S. E. (1989). Aging and 41,277-285. auditory temporal sequencing: Ordering the ele­ Strong, W., & Clark, M. (1967b). Synthesis of ments of repeating tone patterns. Perception and wind-instrument tones. Journal of the Acous­ Psychophysics, 45(5),417-426. tical Society of America, 41, 39-52. Trehub, S. E. , Bull, D., & Thorpe, L. (1984). Summerfield, Q., & Culling, J. F. (1992). Audi­ Infant's perception of melodies: The role of tory segregation of competing voices: Absence melodic contour. Child Development, 55, 821- of effects of FM or AM coherence. Philosoph­ 830. ical Transactions of the Royal Society, London, Trehub, S. E. , Endman, M. W. , & Thorpe, L. A. series B, 336, 357-366. (1990). Infant's perception of timbre: Classifi­ Susini, P., McAdams, S., & Winsberg, S. (1999). cation of complex tones by spectral structure. A multidimensional technique for sound qual­ Journal of Experimental Child Psychology, 49, ity assessment. AcusticalActaAcustica, 85, 650- 300. 656. Trehub, S. E., Schneider, B. A, & Endman, M. Sussman, E., Ritter, W., & Vaughan, J. H. G. (1980). Developmental changes in infant's sen­ (1998). Attention affects the organization of au­ sitivity to octave-band noise. Journal of Experi­ ditory input associated with the mismatch nega­ mental Child Psychology, 29, 283-293. tivity system. Brain Research, 789, 130-138. Trehub, S. E., Schneider, B. A., MOITongiello, Sussman, E., Ritter, W., & Vaughan, J. H. G. B. A., & Thorpe, L. A. (1988). Auditory sen­ (1999). An investigation of the auditory stream­ sitivity in school-age children. Journal of Ex­ ing effect using event-related brain potentials. perimental Child Psychology, 46(2), 273-285. Psychophysiology, 36(1),22-34. Trehub, S. E., Thorpe, L., & MOITongiello, B. A. Teas, D. c., Klein, A J., & Kramer, S. J. (1982). (1987). Organizational processes in infant's per­ An analysis of auditory brainstem responses in ception of auditory patterns. Child Development, infants. Hearing Research, 7, 19-54. 58, 741-749. Attention. New York: Academic Treisman, M. (1963). Temporal discrimination ten Hoopen, G., Boelaarts, L., Gruisen, A., Apon, and the indifference interval: Implications for I., Donders, K., Mul, N., & Akerboon, S. (1994). a model of the "internal clock." Psychological The detection of anisochrony in monaural and Monographs, 77(13, Whole No. 576). interaural sound sequences. Perception & Psy­ Treisman, M., & Rostran, A B. (1972). Brief chophysics, 56(1), 110-120. auditory storage: A modification of Sperling's Terhardt, E. (1974). On the perception of peri­ paradigm applied to audition. Acta Psycholog­ odic sound fluctuations (roughness). Acustica, ica, 36, 161-170. 30,201-213. van Noorden, L. P. A. S. (1975). Temporal coher­ Thorpe, L. A. , & Trehub, S. E. (1989). Duration ence in the perception of tone sequences. Eind­ illusion and auditory grouping in infancy. De­ hoven, Netherlands: Eindhoven University of velopmental Psychology, 25, 122-127. Technology. Trainor, L. J. (1997). Effect of frequency ratio on van Noorden, L. P. A S. (1977). Minimum dif­ infants' and adults' discrimination of simultane­ ferences of level and frequency for perceptual ous intervals. Journal of Experimental Psychol­ fission of tone sequences AB AB. Journal of the ogy: Human Perception and Peiformance, 23(5), Acoustical Society of America, 61, 1041-1045. 1427-1438. Vliegen, J., Moore, B. C. J., & Oxenham, A. J. Trainor, L. J., & Adams, B. (2000). Infants' and (1999). The role of spectral and periodicity cues adults' use of duration and intensity cues in the in auditory stream segregation, measured us- 452 Auditory Perception and Cognition

ing a temporal discrimination task. Journal of Werner, L. A., Folsom, R c., & Mancl, L. R the Acoustical Society of America, 106, 938- (1993). The relationship between auditory brain­ 945. stem response and behavioral thresholds in nor­ Vliegen, J., & Oxenham, A. J. (1999). Se­ mal hearing infants and adults. Hearing Re­ quential stream segregation in the absence of search, 68, 131-:-141. spectral cues. Journal of the Acoustical Society Werner, L. A., & Marean, G. C. (1991). Method of America, 105, 339-346. for estimating infant thresholds. Journal of Vos, J., & Rasch, R (1981). The perceptual onset the Acoustical Society of America, 90, 1867- of musical tones. Perception & Psychophysics, 1875. . 29(4),323-335. Werner, L. A., & Rubel, E. W. (Eds.). (1992). Vos, P. G. (1977). Temporal duration factors in the Developmental . Washington, perception of auditory rhythmic patterns. Scien­ D.C.: American Psychological Association. tific Aesthetics, 1, 183-199. Wertheimer, M. (1925). Der Gestalttheorie. Vos, P. G., Mates, J., & van Kruysbergen, N. W. Erlangen: Weltkreis-Verlag. (1995). The perceptual centre of a stimulus as Wessel, D. L. (1979). Timbre space as a musical the cue for synchronization to a metronome: control structure. Computer Music Journal, 3(2), Evidence from asynchronies. Quarterly Jour­ 45-52. nal of Experimental Psychology: Human Exper­ Winsberg, S., & Carroll, J. D. (1988). A quasi­ imental Psychology, 48A(4), 1024-1040. nonmetric method for multidimensional scaling Vos, P. G., van Assen, M., & Franek, M. (1997). via an extended Euclidean model. Psychome­ Perceived tempo change is dependent on base trika, 53, 217-229. tempo and direction of change: Evidence for a Winsberg, S., & De Soete, G. (1993). A latent !.;, generalized version of Schulze's (1978) inter­ class approach to fitting the weighted euclidean ~ nal beat model. Psychological Research, 59(4), model: CLASCAL. Psychometrika, 58, 315- 240-247. 330. Warren, R. M. (1999). Auditory perception: A new Winsberg, S., & De Soete, G. (1997). Multidi- analysis and synthesis. Cambridge: Cambridge mensional scaling with constrained dimensions: University Press. CONSCAL. British Journal of Mathematical Warren, R M., Bashford, J. A., Healy, E. W., and Statistical Psychology, 50, 55-72. & Brubaker, B. S. (1994). Auditory induction: Wood, N. L., & Cowan, N. (1995). The cocktail Reciprocal changes in alternating sounds. Per­ party phenomenon revisited: How frequent are ception and Psychophysics, 55, 313-322. attention shifts to one's name in an irrelevant Warren, R. M., Hainsworth, K. R, Brubaker, B. S., auditory channel? Journal of Experimental Psy­ Bashford, J. A., & Healy, E. W. (1997). Spectral chology: Learning, Memory and Cognition, 21, restoration of speech: Intelligibility is increased 255-260. by inserting noise in spectral gaps. Perception Woods, D. L., Alho, K., & Algazi, A. (1994). and Psychophysics, 59, 275-283. Stages of auditory feature conjunction: An Warren, R M., Obusek, C. J., & Ackroff, J. M. event-related brain potential study. Journal of (1972). Auditory induction: Perceptual synthe­ Experimental Psychology: Human Perception sis of absent sounds. Science, 176, 1149-1151. and Performance, 20, 81-94. Warren, W. H., & Verbrugge, R R. (1984). Audi­ Zera, J., & Green, D. M. (1993). Detecting tempo­ tory perception of breaking and bouncing events: ral onset and offset asynchrony in mUlticompo­ A case study in ecological acoustics. Journal nent complexes. Journal of the Acoustical Soci­ of Experimental Psychology: Human Perception ety of America, 93, 1038-1052. and Performance, 10, 704-712.