Mid-Level Audition

Habilitation à Diriger des Recherches

présentée et soutenue publiquement le 21 décembre 2009 par

Daniel Pressnitzer

Devant le jury composé de :

Bertrand Dubus Christian Lorenzi Brian C.J. Moore (Rapporteur) Israel Nelken (Rapporteur) Roy D. Patterson Shihab A. Shamma (Rapporteur)

Equipe Audition: Psychophysique, Modélisation, Neurosciences (APMN) Laboratoire de Psychologie de la Perception UMR 8158 CNRS – Université Paris Descartes & Département d’Etudes Cognitives, Ecole Normale Supérieure 29 rue d’Ulm, 75005 Paris Tel: 01 44 32 26 73 Email: [email protected]

Summary

Hearing transforms the incredibly complex superposition of acoustic sound- waves that reaches our ears into meaningful auditory scenes, inhabited by different talkers or musical melodies, for instance. During the last ten years, my research has attempted to link the properties of sound scenes (acoustical or within the peripheral auditory system) to the behavioral performance of listeners when confronted with various auditory tasks. The level of analysis can be described as mid-level, the processes that sit between an acoustical description of sound and the use of auditory information to guide behavior. Starting with auditory features, my contributions have focused on the extraction of temporal structure within sound, over different time scales. In particular, pitch perception has been studied by combining psychophysics, physiology, and modeling, and by comparing normal- and hearing-impaired listeners. Then, the temporal dynamics of perceptual organization over yet longer time scales has been explored, by introducing a “bistability” paradigm, where an unchanging ambiguous stimulus produces spontaneous alternations between different percepts in the mind of the listener. This line of research again combined psychophysics and physiology, and it revealed that correlates of perceptual organization may be found very early on in the auditory pathways. Finally, we have investigated memory and context effects on perception. We have shown that listeners were remarkably able to create and memorize features from random signals, and that basic features such as pitch or spatial location can be largely influenced by preceding context. Taken together, these projects suggest that mid-level processes may be distributed throughout several levels of the auditory pathways. They may also interact closely in order to deal with natural auditory scenes.

i

Table of Contents

Chapter 1: Introduction ...... 1 Chapter 2: Features from the temporal structure of sound ...... 3 2.1 Introduction ...... 3 2.2 The perception of pitch ...... 5 2.3 The perception of envelope regularity ...... 9 2.4 Summary and conclusion ...... 10 Chapter 3: Features from sound sequences ...... 11 3.1 Introduction ...... 11 3.2 The ups and downs of pitch sequences ...... 11 3.3 Envelope constancy ...... 14 3.4 Auditory change detection vs. visual change blindness ...... 14 3.5 Summary and conclusion ...... 15 Chapter 4: The temporal dynamics of auditory scene analysis ...... 17 4.1 Introduction ...... 17 4.2 Change of scenes: Auditory bistability ...... 17 4.3 Subcortical correlates of auditory scene analysis ...... 20 4.4 Summary and conclusion ...... 24 Chapter 5: Memory and context effects ...... 25 5.1 Introduction ...... 25 5.2 Rapid formation of robust auditory memories ...... 25 5.3 Effect of preceding context on auditory features ...... 28 5.4 Summary and conclusions ...... 30 Chapter 6: A distributed proposal for scene analysis ...... 31 6.1 Introduction ...... 31 6.2 The neural correlates of auditory scene analysis ...... 31 6.3 A comparison of different functional cartoons ...... 33 6.4 Summary and conclusions ...... 36 Chapter 7: Research Project ...... 37 7.1 Overview ...... 37 7.2 Features for sound recognition ...... 37 7.3 Bistability as perceptual decision ...... 38 7.4 Mechanisms for memory of noise and pitch hysteresis ...... 40 Chapter 8: Conclusion ...... 41 References ...... 43

ii

Chapter 1

Introduction

Sound is a one-dimensional phenomenon. The acoustic pressure wave that impinges on one of our eardrums can only do one of two things: it can push it a little bit, or it can pull it a little bit. This is all it can do. Moreover, the information carried by sound is totally “transparent”: as it propagates through the air, it sums linearly at each point. As a consequence, at any one moment in time, the little push or the little pull effected on the eardrum may be caused by one sound source out there in the world, but it may also be caused by two sound sources, or by many sound sources indeed. These are trivial observations, but they are worth remembering when we consider how different they feel from our inner auditory world, which, we know, clearly must have many, many dimensions. A typical auditory scene may be inhabited by, for instance, different talkers, and what one of them says can effortlessly be understood even though there is music in the background. Alternately, we can on a sudden whim ignore the talker and switch our focus to the music, to try to remember who wrote this particular piece. How do we go from the one-dimensional acoustic waveform to such lively auditory scenes? Understanding this has been (and probably will be for the foreseeable future) one of the main goals of the scientific study of hearing. Each author has, at one point or another, tried to convey the intricacy of the problem by a personal metaphor. Helmholtz (e.g. 1877) evokes the interior of a 19th century ball-room, complete with “a number of musical instruments in action, speaking men and women, rustling garments, gliding feet, clinking glasses, and so on”. He goes on to describe the resulting soundfield as a “tumbled entanglement of the most different kinds of motion, complicated beyond conception”. Closer to home, but to similar effect, Shamma (2008) goes to a “crowded reverberant nightclub, with a hubbub of multiple conversations amidst blaring music”. My own pet example, which, unfortunately, tends to draw more and more puzzled looks when I use it in class, is that of listening to a jazz tune played by a famous trumpet player. In no time at all, or so it felt, I used to be able to tell whether it was Miles Davis or Chet Baker. How did I do this? It seems I must first have been able to extract a wealth of “features” from the one- dimensional sound waves reaching my ears. These features may be described as pitch, loudness, timbre, tempo, and so on. But then, I must also have been able to parse all the incoming flow of features into what we may call “streams”: this pitch goes with the piano line, and not with the trumpet. And finally, all of this ongoing processing must somehow have contacted my long-term memory to produce recognition. That it sometimes works is, frankly, quite baffling.

1

The rest of this thesis describes past research projects that investigated some of these different levels of processing. The second chapter concerns the extraction of auditory features from sound, and more precisely of features that are related to the temporal structure of sound. The third chapter summarizes a line of work on change- detection mechanisms, or, equivalently, mechanisms that produce second-order features when presented with a temporal sequence of sounds. The fourth chapter addresses the auditory scene analysis issue proper, by summarizing behavioral and electrophysiological work that looked at the temporal dynamics of scene analysis. The fifth chapter then describes recent studies concerned with the effects of context and memory on perception. The sixth chapter is probably be the most controversial one. It puts forward the view that sequencing the problems solved by hearing in terms of feature extraction, followed by scene analysis, followed by context and memory, may not parallel the way hearing is done by the brain. It is certainly a convenient way to separate and tackle the many issues involved, which is why I will still use this framework to organize the thesis. But, I will argue that feature extraction, organization, and memorization probably interact in very fundamental ways in order to be able to deal with complex auditory scenes. My research project, summarized in the seventh chapter, builds on these observations. The issue of auditory features will be re-assessed in the context of natural sound recognition, with the idea that features may adaptively code for the sound set and the task at hand. Scene analysis will be related to the general issue of perceptual decision-making. The memory and context strand will address the possible neural bases of the strong perceptual effects we are now uncovering.

2

Chapter 2

Features from the temporal structure of sound

2.1 Introduction As we stressed in the introduction, acoustic information has only one dimension, and this dimension is time. A sound consists of an unfolding series of instantaneous pressure values over time. It also turns out that many interesting sounds contain some form of repetition over time. Vowels in speech, communication sounds in nature, individual notes in music, all contain a form of temporal regularity. This is because of the physics of sound production. An efficient way to produce a sound that lasts for some time is to take advantage of the resonant properties of physical bodies (Patterson et al., 2008). Resonances will favor some vibration modes over others, and may interact with the ongoing source of the excitation to further reinforce a limited set of vibration periods. It is therefore tempting to try and test which aspects of this temporal structure are used by the hearing sense to form “auditory features”. Following Helmholtz and Ohm, however, many models of auditory perception use spectral representations to represent temporal structure (Ohm, 1843; Helmholtz, 1877). The frequency content of a sound, as defined by Fourier analysis, is of course just another way to represent the one-dimensional waveform. The two representations, time and complex frequency, are equivalent in that it is possible to go from one to the other and back without any loss of information. Why is it then that the controversy between spectral and temporal models of hearing is one of the longest-standing ones (de Cheveigné, 2005), with no signs of abating anytime soon? The problem possibly lies just after the acoustical description of sound. Crucially, we know that both types of information, spectral and temporal, are available in some form at different stages of the auditory pathways. The decomposition into frequency channels imposed by the is preserved (or even possibly enhanced) from the auditory nerve up to primary auditory cortex (Kiang, 1965; Shamma, 1985; Bitterman et al., 2008). However, within frequency channels, temporal information is also preserved at different time-scales (Cariani and Delgutte, 1996; Elhilali et al., 2004; Joris et al., 2004). A more precise description of the controversy is then: is the auditory system using the timing of neural spikes as the cue to temporal structure, or is it rather using frequency selectivity to identify harmonic relationships between components? To clarify these issues, let us consider a sound that contains a high level of temporal structure such as the sung vowel /a/. A highly simplified model of the representation of the vowel in the peripheral auditory system, the output of a gammatone filterbank (Patterson et al., 1995), is presented in Figure 2.1. Several cues are available to infer the periodicity of the vowel. In some channels, the timing

3

Envelope

10 pattern Excitation

1

Centre Frequency (kHz) 0.1 0 50 100 150 200 250 Time (ms)

Fine structure

Figure 2.1. The output of a simple model of peripheral auditory filtering, for the sung vowel /a/. Warm colors indicate activity in frequency channels. Several cues are available to infer the periodicity of the input sound: among others, temporal cues in the fine structure and in the envelope within channels, and spectral cues in the excitation pattern across channels. between maxima of the waveform, or “temporal fine structure”, correspond to the period of the vowel. The sum the activity over time, or “excitation pattern”, has all the telltale signs of a periodic sound: all active frequency channels are integer multiples (harmonics) of the lowest one, at the fundamental frequency F0. This is true up to a limit, though, as for higher harmonics the pattern of activation does not distinguish between adjacent harmonics. These harmonics are said to be unresolved – at which harmonic number this actually happens is currently unclear (Shackleton and Carlyon, 1994; Shera et al., 2002; Bitterman et al., 2008). In the unresolved channels, beating between harmonics introduces an additional cue in the modulation of the “envelope”. Yet other cues might be found in the phase pattern between channels (Shamma and Klein, 2000; de Cheveigné and Pressnitzer, 2006). The most recent incarnations of the controversy make use of the full gamut of available cues. For instance, there is currently a debate on the use of temporal fine structure (e.g. Moore et al., 2009), as opposed to envelope information (e.g. Bernstein and Oxenham, 2003). This is important, as impaired processing of fine structure may be one of the consequences of sensorineural , whereas envelope processing is usually preserved (Lorenzi et al., 2006). It is not possible to decide a priori which cues will be used to form auditory features by human listeners. One possible way forward is to construct artificial stimuli that will eliminate some of the cues, while preserving others. Then, by measuring how well people can do with the remaining cues, theories can be constructed and evaluated. The experiments described below followed this general method, looking at different time scales of temporal structure.

4

2.2 The perception of pitch 2.2.1 The lower limit of melodic pitch A first basic issue is that all natural sounds that produce pitch contain some form of temporal regularity, but not all sounds that contain temporal regularity produce pitch. This can be easily demonstrated by slowing down a sound file that contains a musical note, such as the voice shown in Figure 2.1. What is first heard as a clear pitch progressively transforms into a form of rough amplitude modulation, and finally into rhythm where the individual periods are heard as distinct beats. What is the lowest repetition rate that is perceived as pitch? Together with Roy Patterson and Katrin Krumbholz, we approached the issue using two objective measures of performance. The first one was a measure of rate- discrimination thresholds (Krumbholz et al., 2000). We used bandpass-filtered harmonic complex tones, in order to vary independently the fundamental frequency, F0, and the spectral region containing energy. We tested F0s ranging from 16 to 256 Hz. The first feature of the results was that rate discrimination thresholds increased substantially when F0 decreased from 64 Hz to 16 Hz. Spectral region also had an influence, as the F0 where thresholds became poorer was elevated for higher spectral regions. Overall, these results were consistent with previous reports using subjective measures (Ritsma, 1962; Moore, 1973). A companion study introduced an “operational” definition for pitch (Pressnitzer et al., 2001b). We measured the lower limit of melodic pitch (LLMP), defined as the lowest repetition rate for which listeners were able to perform a simple melody task. The melody task was chosen to encourage the use of pitch cues, as we hypothesized that other auditory cues such as roughness or timbre cues – that covary with repetition rate and thus potentially contribute to discrimination thresholds – should be less useful for melodies (Moore and Rosen, 1979; Semal and Demany, 1991). We used random melodies of 4 notes each, based on the chromatic scale (i.e. all notes equiprobable with a semitone resolution). Two successive melodies were presented that were identical except for a single note, changed by one semitone. The task of the listener was to indicate which note had changed. Results are illustrated in Figure 2.2. For broadband sounds, the lowest repetition rate that supported the melody task corresponded to a period of about 32ms. For bandpass-filtered sounds identical to the ones used in the discrimination threshold experiment (Krumbholz et al., 2000), we found that the LLMP increased steeply with spectral region. Phase had a complex effect on the LLMP. When using alternating phase between spectral components (Patterson, 1987), and thus introducing a higher pseudo-periodicity in the envelope of the waveform, the LLMP decreased. When another phase condition was used that minimized the peaks in the stimulus envelope (Schroeder, 1970), no effect on the LLMP was observed. A model was used to simulate these results, based on the autocorrelation model of pitch (Licklider, 1951; Meddis and Hewitt, 1991a; b). We introduced a modification to this model by applying a linear weighting function on the autocorrelation functions, which tapered off towards long delays. This single

5

Figure 2.2. Illustrations of stimuli and results for the lower limit of melodic pitch experiments (Pressnitzer et al. 2001b, Pressnitzer and Patterson 2001). A. The repetition rate (Rrep) of band-pass filtered harmonic complex tones was lowered until it was impossible for listeners to perform a melody task. The phase relation between harmonics was a parameter. For cosine phase (CPH), clear peaks in the envelope can be seen. For alternating phase (APH), the fine structure still has a period of 1/Rrep but the envelope has a period at 2/Rrep. For Schroeder phase (SPH), the peaks in the waveform are reduced. B. Average results obtained with masking noise, for the different phase conditions and for different spectral regions (indicated by the lower frequency cutoff of the filter pass-band, Fc). C. Same, without masking noise. modification was sufficient to account for all of the effect of acoustic parameters on the LLMP: it restricted the range of pitches for broadband sounds, and other properties of the model were sufficient to reproduce the effects of filtering and phase. This idea of a limit in time-intervals used for computing pitch bears a strong resemblance to the auditory image model, which had a maximal interval duration as a parameter (Patterson et al., 1995). It is also closely related to a scheme proposed by Moore (1982). In this scheme, sounds are first decomposed into frequency channels and neural spikes generated. Then, inter-spike interval histograms are computed. Importantly, not all intervals contribute to pitch. Rather, Moore (1982) suggested that the range of intervals should be a function of the characteristic frequency. Our model partially implemented this idea by setting an interval limit, however, the limit was not frequency-dependent. This was to reduce the number of parameters fitted in the simulation. More recent data and models revisited the issue and found evidence in favor of a frequency-dependent limit (Bernstein and Oxenham, 2005; de Cheveigné and Pressnitzer, 2006). 2.2.2 Distortion products for harmonic complex tones There is a spectral twist in the LLMP story (Pressnitzer and Patterson, 2001). The data of Pressnitzer et al. (2001b) were collected with masking noise, as it is known that non-linear processing in the cochlea may re-introduce quadratic and cubic distortion products in the spectral region that has been filtered out (e.g. Goldstein, 1967). However, during the same experiment, we also collected data without masking noise. These data are reproduced in Figure 2.2C. In strong contrast to the case where masking noise was present, the LLMP was now found to be independent of spectral region for cosine-phase complexes. Phase had a strong effect: the LLMP for alternating phase was also independent of spectral region, but one octave lower. Schroeder-phase complexes produced results resembling the case with masking noise.

6

We interpreted these findings by the presence of a “distortion spectrum” when masking noise was omitted. A follow-up experiment measured the amplitude and phase of the distortion products for bandpass-filtered harmonic complex tones. Distortion products are usually measured with much simpler sounds, such as two- tones complexes (Goldstein, 1967). A prediction for harmonic complex tones is that distortion may be found for several harmonics of F0, in a complete distortion spectrum, as different pairs of harmonics have the right frequency separation to produce distortion products at F0, 2F0, 3F0, etc. For cosine phase, the estimated amplitude of the distortion component was found to be surprisingly large, as high as − 10dB relative to the spectrum level of the stimulus, or 15 dB above hearing threshold. Phase had a strong influence on the distortion spectrum. Alternating phase only produced sizeable distortion for even harmonics of the distortion spectrum, in effect producing an octave jump in the distortion spectrum. Schroeder-phase sounds produced only a small amount of distortion, which was not measurable in all subjects. We interpreted the effect of phase as constructive and destructive interference at the site of the distortion products (Pressnitzer and Patterson, 2001). These results explain the effect of masking noise on the LLMP. For cosine phase and without masking noise, a whole distortion spectrum is always produced in the low spectral region, and thus the LLMP is just as good for all spectral regions. For alternating phase, the distortion spectrum jumps one octave, and so the LLMP is improved by one octave. Shroeder-phase sounds produce less distortion, so the results resemble those obtained with masking noise. These observations encourage a speculation on a possibly useful side-effect for auditory nonlinearity. When periodicity cues are only presented in high frequency regions, nonlinearity re- introduces them in low spectral regions. They may then be used to extract pitch, quite possibly by temporal mechanisms (Pressnitzer et al., 2001b). 2.2.3 Objections to autocorrelation I: first-order vs all-order analysis The autocorrelation model is a convenient way to estimate the amount of temporal information within frequency channels, but it is doubtlessly a functional simplification of the neural mechanisms used by the auditory system to extract pitch. At the time of the LLMP study, a controversy was emerging as to the use of first- order vs. all-order periodicity information (Kaernbach and Demany, 1998; Carlyon et al., 2002; Yost et al., 2005). The controversy is as follows: assume that the auditory system measures the time-intervals between neural spikes to derive a pitch-measure. Is the time-interval measure only performed for successive spikes (first-order), or between all possible pairs of spikes (all-order, akin to autocorrelation)? Kaernbach and Demany (1998) claimed that the pitch of specially designed click trains containing some second-order but no first-order regularity was very weak at best, and thus proposed that only first-order regularity was extracted to estimate pitch. With Ian Winter and Alain de Cheveigné, we investigated this issue, noting that the predictions derived from the autocorrelation of the waveform may be totally different than predictions from simulated or recorded spike trains in the auditory

7

nerve. In a first behavioral and modeling study (Pressnitzer et al., 2002), we found that listeners were reliably able to match the pitch of click trains without first-order regularity similar to the ones of Kaernbach and Demany (1998). Moreover, a substantial pitch-shift was observed between click trains that had similar peaks in waveform autocorrelation. A model suggested an explanation: not all clicks in a click- train stimulus will produce a spike in all auditory nerve fibers. Thus, some second- order waveform regularity will be transformed into first-order neural regularity. In fact, to recover the order of waveform regularity, information has to be pooled across a large number of fibers with heuristic thresholding (Carlyon et al., 2002). It may be more parsimonious to assume that any regularity within spike trains is exploited if available (de Cheveigné, 2005). These results were confirmed by single unit recordings in the ventral cochlear nucleus of anaesthetized guinea pigs (Pressnitzer et al., 2004). The cochlear nucleus is the first mandatory synapse after the auditory nerve. It is thus an early processing stage in the auditory pathways, but at the same time it contains a variety of unit types which have been implicated in pitch encoding (Wiegrebe and Winter, 2001; Winter, 2005). All of the units we recorded from exhibited some temporal regularity in their spike trains, irrespective of the order of regularity in the stimuli. A qualitative correlate of the pitch shift observed behaviorally was found in the interspike interval distributions. However, we could not propose a neurometric scheme that matched exactly the behavioral results from the neural spike trains. Such a scheme should likely take the whole shape of the interspike distributions into account, rather than relying solely on the periodicity peaks. Alternatively, the selectivity to pitch may be refined in later processing stages (e.g. Bendor and Wang, 2005). 2.2.4 Objections to autocorrelation II: long delays There is another, long-standing criticism of the autocorrelation model. Autocorrelation operates by comparing a signal with a delayed version of itself. The original formulation of Licklider (1951) suggested that the delay operation may be achieved by conductance delays in neuronal axons. No anatomical evidence has been found for such delay lines. An operation equivalent to autocorrelation may rather be achieved by means of membrane conductance properties (Meddis and Hewitt, 1991a; Winter, 2005). We have also suggested that it may be possible to “synthesize” long delays by exploiting the multiple phase shifts available in adjacent frequency channels (de Cheveigné and Pressnitzer, 2006). The phase shifts used by the model may stem from cochlear mechanics itself. An interesting feature of the model, in spite of its computational cost, is that the longest delay available is directly related to the bandwidth of auditory filters, which change with frequency. This prediction happens to match the behavioral LLMP data, and is also fully compatible with the frequency- dependent processing scheme of Moore (1982).

8

2.3 The perception of envelope regularity 2.3.1 Roughness When pitch fades away at low repetition rates, another dimension of auditory perception takes over. In this region, the sensation produced by envelope beats has been described as “roughness” (Helmholtz, 1877). This auditory feature was the cornerstone of the wide-ranging theory of musical consonance proposed by Helmholtz. He demonstrated that all intervals that are considered consonant by Western music theory (the octave, the fifth) are those that produce little or no roughness. Highly dissonant intervals (tritone, minor second) produce a large amount of roughness. He thus suggested that there was a sensory component to consonance, in addition to the acknowledged importance of musical acculturation (Pressnitzer et al., 2000a). During my PhD, with Stephen McAdams, we examined the acoustic cues to roughness. Again, spectral (Plomp and Levelt, 1965) or temporal (Terhardt, 1974) theories of roughness exist. To contrast the spectral vs. temporal cues, we manipulated the phase of amplitude-modulated signals (Pressnitzer and McAdams, 1999). Sounds with the same spectrum but different temporal envelopes were compared, and also sounds with the same temporal envelope but different fine structures. Subjective judgments of roughness increased regularly with increasing modulation of the envelope for a same spectrum. In addition, temporal fine structure had a role: sounds with the same physical envelope produced different roughnesses. A simple model suggested that manipulating the fine structure of the waveform could change the shape of the envelope within frequency channel. A second experiment, again using phase manipulations, showed that envelope shape had a strong effect on subjective roughness. Envelopes with a sharp rise and a slow decay produced more roughness that envelopes with a slow rise and a sharp decay (Pressnitzer and McAdams, 1999). We concluded roughness was influenced by both envelope modulation depth and temporal asymmetry of the envelope. 2.3.2 Physiological representations of temporal asymmetry Patterson (1994a; b) had already observed effects of envelope shape on the perception of “damped” and “ramped” tones. Damped tones are produced by applying a decaying exponential modulation function (with a given half-live and repetition period) to a pure tone carrier. Ramped tones are time-reversed versions of damped tones. The perception of damped and ramped tones has been described as the superposition of a tonal component, corresponding to the carrier, and a drumming component, corresponding to the envelope. Over a range of half-lives, listeners report that ramped tones have a stronger tonal component and a weaker drumming component than damped tones (Patterson, 1994a; b). With Ian Winter and Roy Patterson, we investigated the neural representation of envelope asymmetry at the level of the ventral cochlear nucleus (Pressnitzer et al., 2000b). Damped and ramped tones were presented at the best frequency of each unit, with different half-lives. We found that temporal asymmetry was reflected in several statistics of the spike trains. First, ramped tones produced more spikes on average than damped tones. Second, spikes where more precisely locked to the envelope 9

period for damped tones than ramped tones. Interestingly, not all unit types exhibited the same degree of temporal asymmetry, suggesting a hierarchy of asymmetry coding. Results were extended at a higher level of the auditory pathways, the inferior colliculus (Neuert et al., 2001). The inferior colliculus is a major converging stage of auditory processing before the thalamus and cortex. The asymmetry in neural responses observed in the inferior colliculus was important, and a larger proportion of units showed an asymmetry than in primary auditory cortex (Lu et al., 2001). Using a statistical method to correlate firing rate asymmetry to behavioral data (Lu et al., 2001), we found that neural asymmetry paralleled behavioral data at all stages where it had been recorded (cochlear nucleus, inferior colliculus, and cortex). The main difference between stages was the proportion of units displaying an asymmetry (Neuert et al., 2001). The degree of temporal asymmetry of a sound is usually not considered as an auditory feature per se, but it could modulate several features. We have already seen that roughness is affected by temporal asymmetry (Pressnitzer and McAdams, 1999). It may also influence loudness and duration judgments (Ries et al., 2008) or participate in a possible bias for looming sounds (ramped) compared to receding sounds (Neuhoff, 1998).

2.4 Summary and conclusion The one-dimensional sound waveform gives rise to a many dimensional auditory feature space. It is as if sound was being scrutinized from several angles by the auditory system. Some of my past research projects have focused on features related to the temporal structure of sound, pitch and amplitude modulations, using a combination of behavioral, modeling, and physiological approaches. Although this may have been quite predictable, these studies confirm that it is essential to consider peripheral processing when thinking about features, if only to define in a meaningful way the cues manipulated experimentally. Phase and waveform fine structure may change internal envelope cues, because of filtering and transduction (Pressnitzer and McAdams, 1999; Pressnitzer et al., 2001b). Non-linear processing may create behaviorally-relevant spectral cues (Pressnitzer and Patterson, 2001). The distinction between first-order and all-order regularities in the stimulus may be blurred because of the spike generation processes (Pressnitzer et al., 2002; 2004). Thus, a possible way forward on the time vs. spectrum controversies may be to work towards new definitions of such cues, where peripheral processing is explicitly taken into account (Gilbert and Lorenzi, 2006; Sheft et al., 2008). Of course, this may only be possible after a consensus is reached about the main aspects of peripheral auditory processing, such as the limits of frequency selectivity and neural phase locking in humans (Heinz et al., 2001; Shera et al., 2002).

10

Chapter 3

Features from sound sequences

3.1 Introduction All of the features we discussed in the previous chapter were studied for single sounds. However, sounds often come in sequences that extend over time. In such cases, whether for musical melodies (von Ehrenfels, 1937) or speech utterances, it seems that the properties of the whole sequence are something more than the sum of features of each individual element. For instance, to understand that someone has asked a question, the whole pitch pattern has to be taken into account, and not only the final pitch value (otherwise, it would seem that women always ask questions whereas men always make assertions, which is clearly not the case). Sequence processing could be qualitatively different from multiple single- element processing. For instance, for pitch sequences, pitch changes could be encoded explicitly. This hypothesis has received support from the recent work of Demany and colleagues. In a series of experiments, they observed that listeners were able to detect the direction of a frequency change between two pure tones, even though one of the tones was embedded in a tone complex and could not be isolated perceptually (Demany and Ramos, 2005; Demany et al., 2008). Such automatic encoding of frequency differences may serve to bind the successive sounds of a sequence (Demany and Ramos, 2005). More generally, it could be an example of a change- detection mechanism, useful to monitor unfolding auditory scenes (Chait et al., 2007). This chapter presents experiments that investigated sequence-processing for pitch and loudness, and then envelope. Finally, the mechanisms of auditory change detection are compared to visual change detection.

3.2 The ups and downs of pitch sequences In two recent studies (conducted by Marion Cousineau, my PhD student, and in collaboration with Laurent Demany), we compared listeners’ ability to process sequences of pitch vs. sequences of loudness (Cousineau et al., 2009; Cousineau et al., submitted). The main novelty was the use of a psychophysical method aimed at dissociating the discriminability between elements of a sequence, on the one hand, from sequence processing per se, on the other hand. Indeed, the default hypothesis when sequences are presented to listeners should be that each sequence element is processed independently. If performance measures on the sequence can be derived from performance measures on individual elements (by factoring in multiple comparisons), then no additional sequence processing mechanism is required. In our task, we first equated the discriminability of individual elements on the auditory features of interest. For pitch, we measured the difference in fundamental

11

A

B

Figure 3.1. Method and results for the sequence-processing experiments of Cousineau et al (2009; submitted). A. Listeners had to perform a same/different task on binary sequences. For pitch sequences, harmonic complex tones were used with two possible pitch values, separated by a fundamental frequency difference of ΔF0. For loudness sequences, pink noises were used, separated by a level difference of ΔSPL. The Δ-values were adjusted for each listener and condition to ensure that the discriminability of single elements (sequences of N=1) were identical. B. Results (d’, large values indicate good performance), for cochlear-implant users (CI), normal- hearing listeners (NH), and NH listeners presented with a cochlear-implant simulation (noise vocoder, NH-voc). For NH listeners, as more elements are added, a sequence-processing advantage is observed for pitch. For CI and NH-voc listeners, this advantage is not observed. frequency, ΔF0, required to reach a certain level of performance in a two-sound pitch discrimination task. For loudness, we measured the difference in sound pressure level, ΔSPL, to reach the same level of performance in a two-sound loudness discrimination task. Signal detection theory was used (MacMillan and Creelman, 2001) and in both cases performance was set to d’ = 2. Then, we constructed binary sequences of sounds where pitch or loudness could only take one of two values, separated by ΔF0 or ΔSPL, respectively. Such sequences are illustrated in Figure 3.1A. Listeners were finally presented with two successive sequences, which could either be the same or differ on a single element chosen randomly. Their task was to perform a same/different judgment. The procedure was largely inspired by McFarland and Cacace (1992), who however did not use signal detection theory. Our version ensures that the discriminability between individual elements of the binary sequences is the

12

same, for each listener, whether it is pitch or loudness that is varied. Thus, any difference in discriminability for sequences cannot be traced back to discriminability of individual elements. Rather, the differences point to sequence-processing mechanisms. We found a reliable sequence-processing advantage for pitch compared to loudness (Cousineau et al., 2009). Pitch sequences were discriminated better than loudness sequences (see also Moore and Rosen, 1979). Moreover, we observed that performance for pitch was unchanged for sequences of one, two, and four elements. This is surprising. Adding elements to sequences should increase the difficulty of the task. The default hypothesis – independent processing of sequence elements – was tested by using an ideal observer model, again based on signal detection theory. Listeners outperformed the model for pitch, and underperformed the model for loudness. This later observation is easily explained by assuming that memory limitations prevented optimal performance. For pitch, however, we must assume that an additional feature was used to encode pitch sequences. In the paper, we suggested that this feature might be frequency-shift (or contour), as defined by Demany and Ramos (2005). Interestingly, the sequence-processing advantage only occurred for stimuli that contained resolved harmonics. This led to the prediction that people who use a may not display any pitch-sequence processing advantage. The cochlear implant (CI) is a surgically-implanted device that bypasses cochlear processing to stimulate directly the auditory nerve. It is used to restore auditory function for individuals with severe deafness. However, because of current limitations of the technique, CI users generally cannot resolve individual harmonics of complex tones (Laneau et al., 2004). We used the psychophysical method just described to investigate pitch and sequence processing for CI users, normal-hearing listeners (NH), and normal-hearing listeners presented with noise-vocoder simulations of cochlear implant processing (NH-voc). Our paradigm is well suited to compare performance across listeners groups, as the discriminability between sequence elements is measured for each listener individually. Thus, the large variability in pitch discriminability which is expected between (and within) the CI, NH, and NH-voc groups (McDermott, 2004; Moore and Carlyon, 2005) is factored out. Results of this second study are presented in Figure 3.1B. The advantage of pitch sequence processing was again observed for NH listeners. However, no advantage was found for CI or NH-voc listeners. Loudness sequence processing, in contrast, was just as good for all three groups. This suggests that CI users display a specific impairment on pitch sequence processing, which is likely due to the nature of the cues transmitted by current generation of implants (see the similar pattern of results for CI and NH-voc listeners). This finding is consistent with the growing number of studies that show that melody perception is especially challenging for CI users (e.g. Cooper et al., 2008). In summary, these two studies suggest that there may be extra features that are computed from pitch sequences, in addition to the pitch of each element. Consistent with this hypothesis, brain imaging studies showed that secondary auditory regions in 13

the right hemisphere respond more strongly to melodies with pitch changes than to sequences of tones with a fixed pitch (Patterson et al., 2002). There is also neuropsychological (Johnsrude et al., 2000) and behavioral (Semal and Demany, 2006) evidence for a possible dissociation between the detection of a pitch difference and the identification of the direction of a pitch change. Finally, it is tempting to relate our findings to the role of pitch as one of the main structural element in Western music (Dowling and Harwood, 1986). However, our experiments used pitch shifts which were usually much smaller than a musical semitone, so it remains to be seen whether the pitch sequence processing advantage remains for larger steps (McDermott et al., 2008).

3.3 Envelope constancy Another line of evidence for sequence-specific features comes from work on temporal envelope perception (conducted by Marine Ardoint, a PhD student in the team supervised by Christian Lorenzi). In this study, the ability of listeners to recognize a time-stretched or a time-compressed version of a random envelope was measured. Listeners were presented with a broadband noise of about 1s, modulated by a random envelope. Their task was to compare this reference stimulus with two time- warped comparison stimuli, one that simply involved compression or stretching in time, and another one where a time-reversal was additionally applied. Listeners had to indicate the comparison stimulus which matched the reference envelope without time reversal. Results showed that, as expected, performance decreased with increasing amounts of time warping. We then tried to compare the results with predictions from a correlation-based ideal-observer model. The model cross-correlated the envelopes of reference and comparison stimuli, and chose the comparison stimulus which produced higher correlation. Listeners outperformed the model for most time-warping values. This again suggested that they were able to use extra features of the unfolding temporal envelope, in addition to a simple moment-to-moment comparison of amplitude values between signals (Ardoint et al., 2008).

3.4 Auditory change detection vs. visual change blindness The data of Demany and Ramos (2005) suggested that the auditory system benefits from sophisticated change-detection mechanisms as far as changes in frequency are concerned. This and further data could be explained by assuming that listeners based their judgments on frequency-shift detectors (FSDs) that automatically encoded the direction of small frequency changes, with a maximum sensitivity of about one musical semitone (Demany et al., 2009). Importantly, because one of the tones involved in the frequency change was not perceived, the change-detection seemed to be able to operate without attention. In collaboration with Laurent Demany, we investigated in more details the importance of attention on frequency-change detection (Demany et al., in revision). In addition, we compared auditory change detection to visual change detection using

14

closely matched stimuli and tasks in the two modalities. On each trial, participants were presented with a test stimulus consisting of ten elements: pure tones with various frequencies for audition, or dots with various spatial positions for vision. The test stimulus was preceded or followed by a probe stimulus consisting of a single element, and two change-detection tasks were performed. In the “present/absent” task, the probe either matched one randomly selected element of the test stimulus or none of them; participants reported whether it had been present or absent. In the “direction- judgment” task, the probe was always slightly shifted relative to one randomly selected element of the test stimulus; participants reported the direction of the shift. We observed qualitative differences between performance patterns in the two modalities. The physical difference to be detected is always larger in the present/absent task than in the direction-judgment task. For audition, paradoxically but consistent with the FSD hypothesis, the direction-judgment task with smaller changes produced better performance. Visual performance, in contrast, was systematically better in the present/absent task, for larger changes. Moreover, visual change detection was strongly dependent on selective attention, consistent with the phenomenon of change blindness (O'Regan et al., 1999). This was not the case for auditory performance, which remained good without selective attention. Overall, these results suggest that some auditory changes can be detected automatically using an implicit memory system, with no apparent counterpart in the visual domain. This makes sense from an ecological point of view. In everyday life, memorizing the details of a complex visual scene for hundreds of milliseconds is superfluous because the visual world constitutes an external memory: the scene will generally remain available for scrutiny (O'Regan and Noe, 2001). Moreover, it is unlikely that an element of the scene will disappear and reappear some time later at a different position; instead, changes in position are most often continuous motions. By contrast, humans typically extract auditory information from fleeting sequences of sounds that differ from each other in frequency content and can be separated by substantial silent pauses. These successive sounds need somehow to be linked perceptually in order to form recognizable sentences or melodies. This may be why features extracted from sequences may be especially useful for audition.

3.5 Summary and conclusion A lot of the information we get from the acoustical world is contained in sound sequences that unfold over time. The studies presented here suggest that specific features may be involved in sequence processing. In other words, performance on an auditory task involving a sequence of sounds may be partially, but not fully predictable, from performance on each individual element. A psychophysical method was designed to quantify these effects. Sequence-processing mechanisms were especially apparent for frequency cues. Encoding of frequency shifts could subserve both effective melody processing and automatic change detection. We also showed that specific impairments in pitch-sequence processing were observed in people using cochlear implants. However, the psychophysical paradigm that we used to demonstrate the deficit is quite time-consuming and probably not 15

adapted to routine clinical testing. In future projects, and through consulting contracts with cochlear implant companies, we are proposing to develop pitch-sequence tests inspired from this finding but that should be easier to translate to the clinic (Pressnitzer et al., 2005; Jardine and Pressnitzer, 2009).

16

Chapter 4

The temporal dynamics of auditory scene analysis

4.1 Introduction Typical auditory scenes tend to have properties that change over time. Take a lively conversation: a given source, a talker, may start or stop producing sound at very short notice. Two conflicting abilities are then required from the listener. It is essential to be able to stabilize the perceptual organization of the scene so that each new word that is uttered by a talker is indeed assigned to this talker – or else, most conversations would be quite confusing. However, it is also essential that we can rapidly change our set of inferred sources if, for instance, a new person interrupts, or a fire alarm sounds. Most investigations of auditory scenes have focused on “average” properties of sound scenes. Important questions include, which cue is useful for grouping, or how many sources are heard on average when these cues are manipulated (Bregman, 1990). To improve the signal-to-noise ratio of the behavioral data, short sound presentations have often been used, and results presented as stimulus-related averages. In a series of experiments, we have tried to introduce new techniques to focus on the temporal dynamics of auditory scene analysis. We introduced percept-related averages and relatively long presentation times, using techniques inspired from the study of visual bistability (defined below). When applied to hearing, this new focus revealed that there were strong similarities between sensory modalities in terms of the temporal dynamics of perceptual organization. However, audio-visual experiments also confirmed that there had to be a purely auditory component to scene analysis. Physiology experiments then addressed the issue of the neural bases of auditory streaming, still focusing on the temporal dynamics of the phenomenon, and pursuing a previous line of research on sub-cortical correlates of perceptual organization.

4.2 Change of scenes: Auditory bistability 4.2.1 Bistability as a tool to study perceptual organization The study of visual scene analysis has made extensive use of what is called the bistability illusion (for reviews, Leopold and Logothetis, 1999; Sterzer et al., 2009). In this phenomenon, an unchanging stimulus presented for a certain amount of time evokes spontaneous perceptual alternations in the mind of the observer. There are plenty of examples of bistable stimuli in vision. For instance, reversible figures such as the Necker cube are bistable (Long and Toppino, 2004). Binocular rivalry, where two incompatible images are presented to the two eyes, also produce alternations

17

between one image and the other (Helmholtz, 1866/1925; Alais and Blake, 2005). Finally, there are bistable motion stimuli such as moving plaids (Hupé and Rubin, 2003). These stimuli are very diverse, but they all have two things in common. First, they present the visual system with ambiguous situations. The information that reaches the retina for a bistable figure such as the Necker cube may well have been caused by a real 3D cube made out of wires that was oriented towards the observer, but it could equally well have been caused by a cube oriented in the opposite direction. Second, it seems that faced with such insoluble dilemma, the perceptual system’s response is to explore in turn the different possible interpretations (and not to consider an “average” interpretation). The enduring interest in bistability for vision has at least two reasons. First, all sensory scenes contain by necessity some degree of ambiguity. The problem of “inverse optics” or “inverse acoustics”, that is, determining the physical stimuli that created a given pattern of activity in the eye or in the ear, is by nature ill-posed. The information available is not enough to solve the inverse problem with a unique solution. Decisions have to be taken to resolve this essential ambiguity (unconscious inferences, to use the term coined by Helmholtz). Our perceptual systems constantly operate in this regime, but we are generally not aware of it because, fortunately, one highly plausible interpretation usually trumps all the others. That this interpretation mostly corresponds to reality is an impressive sign of the sophistication of perception, and not of the simplicity of the problem (as attempts at artificial vision and audition remind us). With this in mind, bistability is a way to reveal and highlight the general background organization processes by presenting them with a problem that lacks any obvious solution. A second interest of the bistability paradigm is that it dissociates, in some respect, the conscious percept of the observer from the external stimulus. If a neural correlate of perceptual reports can be found, it cannot be traced back to some passive propagation of stimulus statistics throughout the system. Rather, it must inform us on which brain mechanisms were involved in creating the percept (Tong et al., 2006). 4.2.2 Auditory and visual bistability show strong similarities across modalities In collaboration with Jean-Michel Hupé, an expert on visual bistability, we compared the perception of two ambiguous stimuli in audition and vision (Pressnitzer and Hupé, 2006, see also online demonstrations). The auditory stimulus was the well- known streaming sequence that has been widely used to study sequential grouping and segregation (Miller and Heise, 1950; Bregman and Campbell, 1971; van Noorden, 1975). In its various forms, the paradigm generally uses pure tones of different frequencies. Depending on the frequency and time difference between the tones, listeners report grouping all tones together – in what is called a single stream – or splitting the sequence in two concurrent streams. Early on, it was noticed that perception of one or two streams could change across repeats for a range of acoustical parameters, and even within a single presentation (van Noorden, 1975; Bregman, 1978). In our experiments, we re-visited the issue by using long presentation times (4

18

minutes) and by analyzing the data using statistics usually applied to visual bistability. We also collected visual bistability judgments using moving plaids (Hupé and Rubin, 2003), in the same group of subjects. A similarity between streaming and plaids is that plaids can be perceptually grouped as a single moving object, or split in two different objects. We found that the dynamics of percept changes in auditory streaming had all of the characteristics that define visual bistability (Leopold and Logothetis, 1999). Percepts were mutually exclusive, that is, subjects reported successively one or two streams but very rarely an intermediate percept between the two. The percept durations were random, with statistical independence between successive percepts and a constant average duration after the first percept. Percept durations followed a log- normal distribution. Moreover, the same distribution was found for auditory and visual alternations. Finally, the effect of instructions was highly similar between modalities. When instructed to try and maintain one perceptual interpretation in mind, observers were unable to lengthen the average duration of the target interpretation. Rather, they were only able to shorten the duration of the unwanted percept. These findings suggest that there are strong similarities in the way ambiguities are resolved in audition and in vision. Additional data collected independently by other groups has since confirmed this conclusion (Denham and Winkler, 2006; Kondo and Kashino, 2007; Kondo and Kashino, in press). Interestingly, there are only a few computational principles required to account for the temporal dynamics of visual bistability (Lankheet, 2006; Shpiro et al., 2009): it can be modeled successfully by combining noise, adaptation, and mutual inhibition between neural populations coding for the competing percepts. As auditory scene analysis can exhibit bistable temporal dynamics, it is likely that the same functional principles are involved in resolving ambiguities in the auditory modality. 4.2.2 Audio-visual bistability show relative independence across modalities The previous study suggested that similar principles were at work for the resolution of ambiguity in the visual and auditory domain. Another, more extreme hypothesis could be that, in fact, auditory and visual bistability are resolved by the exact same neural mechanism. In some accounts of visual bistability, the perceptual alternations are triggered by competitive processes occurring in pre-frontal cortex, possibly related to attention (Sterzer and Kleinschmidt, 2007). It is thus conceivable that it is the same process that triggers the perceptual alternations in audition. We tested this hypothesis with bistability of audio-visual stimuli (Hupé et al., 2008). The same plaids and streaming sequences that had been used in separate trials were now presented simultaneously to observers, who had to perform a dual bistability task. This required the use of trained observers, and a control task was introduced to check that they could indeed perform the dual task (a visual bistability task combined with an auditory pseudo-bistability task, containing objective changes for which performance could be measured). We also manipulated the degree of audio- visual coherence of the stimuli. In a new condition, the visual plaids were replaced by an apparent-motion stimulus. We played the pure tones of the streaming stimulus over

19

loudspeakers that were spatially separated, while each tone was accompanied by a light flash spatially and temporally coincident with the tone. Observers either reported an apparent motion between the two lights, or independent flickering. Because of the spatial and temporal coincidence between tones and lights, a strong cross-modal fusion was reported by subjects. In addition, the percepts associated to apparent motion and streaming bear some introspective resemblance (Bregman, 1990). If a single process was causing both auditory and visual bistability, we would expect a strong and mandatory interference between alternations in the two modalities. In fact, we observed a relative independence. Interference was overall small and variable, plus it was modulated by cross-modal congruence. We concluded that bistability was based on at least partially independent processes in the two modalities. These processes may of course interact because of cross-modal convergence; cross- modal biases of bistable alternations have been shown for highly coherent audio- visual stimuli such as speech (Sato et al., 2007; Munhall et al., 2009). However, the observation that two bistable stimuli may coexist relatively independently is sufficient to show that interactions are not mandatory, and thus probably not related to the initiation of the perceptual alternations themselves.

4.3 Subcortical correlates of auditory scene analysis 4.3.1 The build-up of streaming Forming auditory streams is a very basic issue for acoustic communication, and behavioral manifestations of streaming have been suggested in several non- human species including primates, birds, fish, and frogs (for a review see Bee and Micheyl, 2008). As a result, studies have sought correlates of streaming in neural firing patterns recorded from animal models (Fishman et al., 2001; Bee and Klump, 2004; Fishman et al., 2004). These studies used a standard version of the streaming paradigm where two pure tones of different frequencies, A and B, were repeated in longer sequences (see Figure 4.1). Correlates were observed that can be summarized by a “grouping by co-activation” model. When acoustic parameters favor grouping, the same neural population tends to respond to both A and B tones. When segregation is favored, A and B tones tend to activate separate neural populations. In other words, frequency-selective neurons responded to both A and B for one-stream stimuli, and only to A or B for two-stream stimuli. This was accounted for by frequency selectivity and forward-suppression of neural activity (Fishman et al., 2001). It is noticeable that most studies focused on auditory cortex and beyond (Micheyl et al., 2007). However, if only frequency selectivity and forward-suppression are required, it is not unreasonable to hypothesize that similar correlates could be observed earlier in the auditory pathways. A more challenging test for neural correlates of streaming concerns temporal dynamics, such as bistability. Unfortunately, for technical reasons, it is difficult to record simultaneous behavior and neural activity in animal models using a full bistability paradigm (this is possible with functional brain imaging studies, discussed in Chapter 6). An average measure of temporal dynamics is nevertheless available

20

Figure 4.1 Example of a single-unit recording for streaming sequences at the level of the cochlear nucleus (Pressnitzer et al., 2008). A. The tones A and B have a small frequency difference, corresponding to a stimulus favoring one stream. The neuron responds equally well to all tones. In a “grouping by co-activation” model, this would signal one stream. B. The tones now have a large frequency difference, corresponding to a stimulus favoring two streams. The neuron responds less to B tones and, because of adaptation, the activity due to B tones may fall under detection threshold towards the end of the sequence. In the model, this signals two streams, with an increased likelihood of two streams as the sequence progresses. with the “build-up” of stream segregation (Bregman, 1978). The build-up refers to the fact that the listeners’ average probability of reporting two streams increases over a period of several seconds after the ABA- sequence is turned on. In the bistable framework, this can be interpreted as an initial bias to the one-stream interpretation that is followed by the random alternations regime (Pressnitzer and Hupé, 2006). Correlates of the build-up have been found in single-unit recordings from the primary auditory cortex of awake monkeys (Micheyl et al., 2005). Multi-second adaptation was observed in the firing rate of cortical neurons. Some neurons that responded to both A and B tones at the beginning of the sequence (signaling one stream) were only reliably responding to the A tones at the end of the sequence (signaling two streams). With Ian Winter, Marks Sayles, and Christophe Micheyl, we recorded responses to relatively long streaming sequences in the ventral cochlear nucleus of anaesthetized guinea pigs (Pressnitzer et al., 2008). As mentioned before, the cochlear nucleus is a peripheral sub-cortical processing stage in the auditory pathways, the first synapse after the auditory nerve. Responses were collected with 10s long ABA- sequences, with different frequency separations between A and B tones (the steepness of the behavioral build-up varies with frequency separation). One of those recordings in illustrated in Figure 4.1.We observed multi-second adaptation in all unit types we recorded from. In addition, using a neurometric model similar to the one used by Micheyl et al. (2005), we showed that the firing rate of the population of units we recorded from was able to predict accurately the build-up of streaming observed behaviorally in human listeners. These results show a neural correlate of the build-up is already present in the auditory periphery. Frequency selectivity and forward-suppression had been observed

21

in the cochlear nucleus (Bleeck et al., 2006). Our study showed that multi-second adaptation was also prevalent at this processing stage. A possible source of slow adaptation could be cortical feedback projections targeting the cochlear nucleus (Schofield and Coomes, 2006) or, more plausibly, indirect feedback through the olivocochlear efferent system (Sridhar et al., 1995). The point remains that the temporal dynamics of auditory scene analysis is reflected in the earliest stages of the auditory pathways. Intriguingly, we observed multi-second adaptation in primary-like units, which are thought to reflect auditory nerve activity. Recent computational models suggest that adaptation in the auditory nerve may be more sophisticated than previously thought, and in particular operate over longer time scales (Zilany et al., 2009). It is thus not impossible that a similar correlate to the build-up of streaming may be found in the auditory nerve. Our findings suggest that streaming, including some aspects of the temporal dynamics of streaming, involves computations which are available in early stages of the auditory pathways. The argument is not, however, that streaming is fully resolved in the cochlear nucleus. A key aspect of our study was that the streaming cue we manipulated was frequency, and the cochlear nucleus is known to contain neurons selective to frequency. Many other cues can influence streaming in addition to frequency, such as waveform envelope or spatial location (Vliegen and Oxenham, 1999; Moore and Gockel, 2002 for a review). For such cases, the correlates of streaming need to be sought in processing stages that exhibit selectivity to the cue of interest (Gutschalk et al., 2007; Itatani and Klump, 2009). Some recent data on streaming are hard to explain within the framework of grouping by co-activation (which has been used in most physiology studies to date). Elhilali et al. (2009a) investigated the effect of temporal synchrony on streaming both behaviorally and in single-unit recordings from primary auditory cortex. Synchrony was found to have a large effect on behavioral streaming: synchronous tones were always grouped together, irrespective of their frequency difference. This was not predicted by grouping by co-activation in frequency channels, nor was it reflected in the cortical recordings. A model based on correlation between neural populations was derived to account for the effect of synchrony (Elhilali et al., 2009a). There are two ways to interpret these findings. As suggested by the model, synchrony might be an overarching organization principle that is preeminent over all other cues (frequency, spatial location, envelope, etc) and need not be explicitly encoded in neural firing patterns. Alternately, synchrony could be considered as yet another cue, albeit a potent one, which can interact with others (Darwin and Sutherland, 1984) and be encoded in neural populations to be discovered, upon which the principle of co- activation may still operate. Awaiting further evidence, this remains an important open issue. 4.3.2 Comodulation masking release There had been previous reports of correlates of scene analysis in the subcortical auditory pathways. With Ian Winter and Ray Meddis, we investigated the responses of single-units in the ventral cochlear nucleus, in a paradigm known as

22

comodulation-masking release (CMR, Pressnitzer et al., 2001a). When a signal has to be detected in noise, adding more noise usually impairs signal detection, or at best it leaves signal detection unaffected if the noise energy is remote enough in frequency. In CMR, adding noise remote from the signal frequency may improve signal detection. The necessary condition is that both the on-signal and off-signal noise have the same temporal envelope (Hall et al., 1984). CMR is interesting because it shows that, in addition to the initial separation into frequency-bands evident in the peripheral auditory system, there are across-channel processes that pool information over frequency regions (for a discussion of across vs. within channels interpretations, see Verhey et al., 2003). CMR is also useful to extract signals from noisy backgrounds in realistic situations, because signals and background do not usually share the same pattern of amplitude modulations (Nelken et al., 1999). We used a protocol that aimed at maximizing the contribution of across- channel processing, as estimated by human psychophysics (Schooneveldt and Moore, 1987). Single-unit recordings were collected in the ventral cochlear nucleus of the anaesthetized guinea-pig. The firing rates of the units were analysed in terms of signal detection theory, by comparing mean rates and variability when the signal was present to when it was absent. In some unit types, but not all, better neural detection thresholds were predicted when human psychophysics showed the presence of CMR (Pressnitzer et al., 2001a). The units that never showed CMR were of the onset type. These are known to have broad receptive fields in terms of frequency, and as a consequence they responded to the envelope of the broad-band noise rather than to the narrow-band signal. Based on these observation and a computational model (see also Meddis et al., 2001), we suggested a simple circuit to account for the physiological CMR. Fast- acting wideband inhibition targeted on narrow-band neurons was shown to be able to enhance the contrast between a signal and background noise. Thus, the signal representation is enhanced in the auditory periphery with CMR stimuli. An even stronger effect, but with a similar pattern and interpretation, has since been observed in the dorsal cochlear nucleus (Neuert et al., 2004). Neural correlates of CMR, but with a different form, have been found in the cortex and thalamus (Nelken et al., 1999; Las et al., 2005). In these studies, the neural cue to the presence of a signal was a dramatic disruption of the representation of the noise envelope. Interestingly, the effect was interpreted as a major difference between cortical processing and early processing. Early processing is more closely related to the acoustic input, so enhanced behavioral detection of a faint signal within a loud noise is likely to take the form of an increase in the signal neural response. Later processing usually displays more complex selectivities, sometimes only remotely related to acoustical cues, and responses there may be dominated by the faintest sounds in a mixture (Bar-Yosef et al., 2002). This is could be how scene analysis both amplifies a faint signal in early processing stages and disrupt processing of a loud noise in late processing stages. The two types of representations may be used to predict behavioral performance.

23

4.4 Summary and conclusion The temporal dynamics of auditory scene analysis, or the way perceptual organization adapts to the changing (and sometimes unchanging) acoustic input, was the main focus of the studies described above. A useful paradigm to study perceptual organization is that of bistability. We have shown that there are strong similarities between auditory and visual bistability, demonstrating functional parallels between the two (Pressnitzer and Hupé, 2006; Hupé et al., 2008). However, auditory scene analysis recruits specific neural circuits, some of which can be found in the peripheral auditory pathways (Pressnitzer et al., 2001a; Pressnitzer et al., 2008). The argument is not that scene analysis is resolved in the periphery. Rather, early stages of processing participate to scene analysis thanks to physiological properties of neurons and circuits at these levels. This makes sense. Animals lacking a neocortex have been shown to possess scene analysis capabilities, so at least some primitive grouping principles can be implemented without the flexibility inherent to cortical processing.

24

Chapter 5

Memory and context effects

5.1 Introduction As sound necessarily develops over time, it is not obvious where to draw the line between perception and memory for auditory perception (Demany and Semal, 2007). For instance, pitch perception requires that two or more sound periods are compared, so it involves a form of memory. However, it is easier to think of pitch as a perceptual feature. At the other extreme, recognizing the voice of a distant relative we have not seen for several years will probably be better described as memory. In between the two time-scales, however, perception and memory may be closely related. The difficulty to define a clear memory/perception boundary is similar when one considers the possible neural correlates of memory. Adaptation implements a form of memory, as it controls the activity of a neuron depending on previous inputs and activity. In the cortex, stimulus-specific adaptation has been found across several time scales, from milliseconds to minutes (Ulanovsky et al., 2003; Ulanovsky et al., 2004). Memory-like effects of adaptation are not restricted to cortex, either. In what has been termed adaptive coding, neurons in early processing stages change their response properties depending on recent stimulation, for audition (Dean et al., 2005; Pressnitzer et al., 2008) and vision (Fairhall et al., 2001). In the behavioral studies presented in this chapter, the terms memory and context refer to changes in perception, for a given sound, that are caused by preceding sounds and percepts. The first study investigated the formation of new auditory memories for arbitrarily complex sounds and uncovered a surprisingly fast learning mechanism, which we likened to “auditory insight”. The last two studies investigated the role of context on basic auditory features, spatial location and pitch. These studies represent newer lines of research, so some results are still in the process of being published.

5.2 Rapid formation of robust auditory memories To start making sense of their acoustic world, human and other animals must learn to associate auditory features with sound sources. However, how memories for non-verbal complex sounds are formed is essentially unknown. Together with Trevor Agus, who is doing a post-doc with me, and Simon Thorpe, we have devised a psychophysical paradigm that aims at observing the time-course of the formation of auditory memories (Agus et al., 2009; Agus et al., submitted). The paradigm is based on the learning of noise. Noise is well suited to the investigation of memory, as any given sample would be new to the listener. Noise is also acoustically complex, but it lacks any obvious features that could support verbalization. Finally, by collecting 25

behavioral responses after each exposition to the noise, the temporal dynamics of memory formation can be characterized. In an influential study, Gutman and Julez (1963) introduced the use of noise to study auditory memory. They presented listeners with repetitions of “frozen” repeated-noise segments. Listeners were able to discriminate repeated-noise from white noise over a wide range of segment durations, from milliseconds up to tens of seconds (Guttman and Julesz, 1963; Warren and Bashford, 1981; Kaernbach, 2004). Variants of the paradigm have used the same noise samples for a whole experiment to investigate longer-term memory traces (Hanna, 1984; Goossens et al., 2008). The repeated-noise paradigm has thus become a useful test of auditory memory capacity (Kaernbach, 2004). However, some essential characteristics of learning are not captured by the current repeated-noise paradigms. First, real-world learning should happen in an unsupervised fashion, as it is not always obvious which segments of the ongoing sounds should be memorized and which can be safely ignored. Second, interfering sounds are likely to intervene between memory and recall. Third, the memories formed should be long-lasting. We introduced a modification to the repeated-noise paradigm to investigate these issues. Listeners had to detect a repetition in a 1s-long noise sample (Figure 5.1A). However, unbeknownst to them, a reference noise sample re-occurred in several trials, randomly interspersed throughout an experimental block. Any evolution in performance in the repetition-detection task for reference samples would indicate the formation of a memory trace. Note that learning would have to be unsupervised: listeners were not told that memorizing trials might be beneficial, and, in any case, they could not have identified which trials to memorize without prior learning (no feedback was given). In addition, reference samples were never presented on two consecutive trials, so there were intervening trials which had to be actively processed. Finally, in some experimental conditions, the same noise samples were used on sessions separated by several days, which tested for long-term memorization. Results showed that a sizeable improvement was observed for the reference samples as they re-occurred throughout the block (Figure 5.1B, C). Moreover, the time course of memory formation was surprisingly short. When learning occurred, almost perfect performance was achieved within less than ten presentations of the reference sample. Learning did not occur on all blocks. However, this variability did not seem to be related to peculiar features of some noises samples. Noise statistics, as estimated by auditory models, failed to correlate with performance (Agus et al., 2009). Also, in subsequent experiments, we selected the reference noise samples based on initial performance (Agus et al., submitted). Five of the worst-learnt samples were selected, together with five of the best-learnt samples, and the experiments were run again. Results showed that all noise samples could eventually be learnt. These new experiments also revealed that the memories for noise were long-lasting. Each reference sample was used in two separate blocks for each listener, with an average of two weeks between blocks. On the second blocks, listeners displayed perfect performance right from the first presentation of the reference samples (which they had 26

A B 2.5

Trial 7 N (No) ) d' 2.0 Trial 6 RefRN (Yes) 1.5

Trial 5 N (No) 1.0

0.5

Trial 4 RN (Yes) Mean Sensitivity ( 0.0 RN RefRN

Trial 3 RefRN (Yes) C 100

80

Trial 2 RN (Yes) 60 rate (%) - 40

f Hit 20 Trial 1 N (No) 0 0 10 20 30 40 50 500 ms 500 ms Trial order within block t

Figure 5.1 Illustration of methods and results for the memory of noise experiments (Agus et al. 2009; Agus et al. submitted). A. Listeners had to discriminate trials containing 1-s noise samples (N) from trials where two identical 0.5-s samples were repeated (repeated noise, RN). Unbeknownst to them, the same reference repeated noise (RefRN) was used on several trials interspersed throughout the experimental block. B. The performance differed between RN and RefRN trials, indicating that the reference samples were learnt during the block. C. The correct detection of RefRN samples is shown as a function of the trial number in the block. The blocks have been sorted in two groups, with and without learning. When learning occurred, it occurred remarkably fast and performance became almost perfect. Some features of the noise which were initially not heard must have become highly salient, a phenomenon similar to “insight”. last heard two weeks earlier). Thus, listeners retained over several days a form of memory for as much as ten, 0.5s-long random waveforms. Such an unsupervised, fast-acting, robust, and long-lasting learning presents challenges for current models. We hypothesized that the learning of noise could be supported by rapid plasticity of sensory feature maps interacting with top-down selection. It has now been demonstrated that the selectivity of cortical neurons is highly plastic and susceptible to change rapidly with the task at hand (e.g. Fritz et al., 2003). Importantly, it was recently found that such changes were accompanied by an overall gain reduction in receptive fields (Atiani et al., 2009). This may be interpreted as a filtering-out of task-irrelevant features. Our current hypothesis is that, guided by stimulus-specific adaptation produced by repeats (Ulanovsky et al., 2003), or by top- down selection (Ahissar et al., 2009), only a subset of the noise features were enhanced by sensory plasticity and then committed to long-term memory. The time course of the learning we observed was also reminiscent of what has been termed “insight” (Rubin et al., 2002 for a review). Insight designates an abrupt 27

and long-lasting change in performance, and it has been mostly associated with cognitive tasks such as problem-solving. However, in vision, insight has also been observed for low-level cues in perceptual learning tasks (Rubin et al., 1997). In our experiments, features in the noise that were initially not detected must have become quite salient at some point, as indicated by the near-perfect performance achieved by listeners after a few exposures. This could demonstrate the first instance of auditory insight based on low-level acoustic cues.

5.3 Effect of preceding context on auditory features 5.3.1 Auditory localization We now turn to the effect of acoustic context on the perception of basic auditory features such as spatial location and pitch. The subjective localization of a given sound can be affected by a preceding sound, in what has been termed an auditory spatial aftereffect (Thurlow and Jack, 1973). Performance improvements have also been reported with context, but either with long adaptation context presentation times (Kashino, 1998) or when the context was indicative of the location of the task (Getzmann, 2004). With Julia Maier (who did a short post-doc with me as part of an ongoing collaboration), David McAlpine and Georg Klump, we have started to investigate how spatial discriminability is affected after a brief, non- informative sound. Such a situation is arguably closer to the physiological investigations of adaptive coding (Dean et al., 2005). In a first behavioral study (Maier et al., in revision), the spatial context was established by a short sound (between 1s and 2s) that was lateralized, over headphones, by means of inter-aural time differences (ITD). This context sound was immediately followed by a pair of target sounds, at the same subjective location, or at a different location. For the target pair, subjective location was achieved by means of ITD cues or inter-aural level differences (ILD) cues. Both types of cues and subjective locations were randomly interleaved within experimental blocks. Prior to the main experiment, discrimination without any context was equalized for each cue and location, using a procedure similar to Cousineau et al. (2009). We found improved spatial discrimination performance when context and targets were from the same spatial location, compared to when they were mismatched in location. The effect was the same whether target location was achieved by means of ITD or ILD. Finally, it was only observed for target locations around the midline. We interpreted these findings as showing that the encoding of spatial location may be affected by context in a mandatory manner as, in our procedure, context was not informative of the location of the discrimination task. Future studies will compare physiological recordings in the inferior colliculus of the guinea-pig with the detailed features of these behavioral findings. 5.3.2 Hysteresis in the perception of pitch Context effects have also been demonstrated on pitch perception, but they have mainly been studied in relation to music. For instance, the tonality of a melody creates expectations for subsequent notes (Krumhansl and Kessler, 1982). With Claire

28

A B C

Figure 5.2. Illustration of hysteresis of pitch perception (Chambers et al., 2009). A. Schematic time-frequency representation of a pairs of Shepard tones. The interval between them is 6 semitones, so the direction of pitch change is ambiguous. B. Experimental results. Shepard tones were presented in orderly series, starting from a non-ambiguous interval (1 or 11 semitone). Ascending series are shown in red, descending series in black. The percept at the start of the series biases subsequent percepts, for most intervals. C. As in B., except that responses were only collected for a 6 semitone interval at the end of a biasing sequence. The percept is fully determined by the biasing sequence (see also online audio demonstration). Chambers, who is now starting a PhD with me, we are investigating whether the actual pitch value of a note can be manipulated with context. The basic idea is to use stimuli which contain ambiguous pitch cues and to explore perceptual hysteresis effects. Hysteresis is a memory-like phenomenon that occurs in systems whose present state depends on recent history. It has been previously associated with the perception of ambiguous stimuli, visual (Hock et al., 2005) or auditory (Giangrande et al., 2003; Snyder et al., 2009). Shepard tones are used as ambiguous pitch stimuli (Shepard, 1964). These are complexes of sinusoidal components with an octave relationship, filtered by a fixed Gaussian envelope. When two such tones with different F0s are successively presented, the dominant cue for judging pitch direction is the log-frequency proximity between components. At an F0-interval of a half-octave (6 semitones, or a tritone), the proximity cue is removed: the frequency components of the first Shepard tone are exactly halfway between those of the second Shepard tone (Figure 5.2A). Accordingly, it is predicted that pitch direction becomes ambiguous (Shepard, 1964; Deutsch, 1987). A first series of experiment already suggested a large effect of context on pitch (Chambers et al., 2009). Shepard tone pairs were first presented with a fixed F0- interval of 6 semitones. The tone with the lowest nominal F0 (the “standard tone”) was kept constant. The order between standard and test tones within a trial was random. Subjects had to indicate whether they heard an upward or a downward pitch change. Perception was ambiguous across subjects, but remarkably stable within a single subject; the same tone, standard or test, was always reported as higher in pitch regardless of the order of presentation. Then, different F0-intervals (from 1 to 11 semitones) were randomly presented within a same experimental block. Hysteresis was observed: the percept in a given trial was biased by the percept in the previous

29

trial, in an assimilative manner. Hysteresis was confirmed in additional conditions where F0-intervals were presented in ascending or descending series (Figure 5.2B, C). There, the pitch direction reported for the tritone interval depended strongly on the preceding context. Importantly, and contrary to previous investigations (Giangrande et al., 2003; Hock et al., 2005), in our procedure hysteresis is observed with an equivalent number of upward or downward responses. Thus, the biases observed cannot be attributed to response persistence and are likely to be of perceptual origin. These first results indicate that it is possible to produce strong hysteresis in pitch perception by using ambiguous stimuli. This method creates new possibilities for future investigation of the neural bases of pitch, as activity may be recorded for different subjective percepts evoked by the same stimulus (similar to the bistable technique presented in Chapter 4).

5.4 Summary and conclusion The ongoing research projects described in this chapter found strong interactions between memory (in a broad sense) and perception. In the case of the learning of noise, features of random waveforms that listeners were initially unaware of became highly salient after a few repeated exposures, to the extent that they were remembered perfectly for weeks (Agus et al., 2009; Agus et al., submitted). In the case of spatial location cues, the accuracy of behavioral localization was affected by a task-irrelevant preceding sound, suggesting a mandatory influence of context (Maier et al., in revision). In the case of pitch, the perceived direction of a 6-semitone change could be totally reversed by context, a not-too-subtle influence on the very basic feature of pitch itself (Chambers et al., 2009). Future projects will attempt to clarify the possible neural bases of such large behavioral effects, which are not easily accounted for in current models.

30

Chapter 6

A distributed proposal for scene analysis

6.1 Introduction Edwin Boring, one of the historic proponents of experimental psychology, once wrote (Boring, 1937): “Everything is interrelated in this way. The universe is a system, and to partition it into part-systems is a falsification. If a man is ever to utter the whole truth about a natural event, he must not shut his mouth until he has expressed all nature.” There is no arguing with that. However, in the next sentence, Boring went on to defend the scientific approach based on divide and conquer: “Nevertheless man has perforce to be content with much less than that mouthful. He describes events within limits, and he succeeds again and again in distorting them only by an amount that is negligible.” The crucial point here is, of course, to choose the limits within which to base experimentation, large enough so as to minimize the inevitable distortions, but narrow enough to be propitious to scientific inquiry. There has been a sharp increase in studies looking for neural correlates of scene analysis, and streaming in particular. The tacit assumption is often that, in some complex wiring diagram of the auditory system, there is a module implementing scene analysis that can be isolated and studied without too much distortion. In this chapter, which is much more speculative than the previous ones, I will suggest that this may not be the most promising approach to the problem. Scene analysis clearly is a well-defined function that deserves its own terminology and methods of study. However, this does not mean that its neural implementation necessarily requires a well-circumscribed set of brain regions. Rather, it may be a general property of the way mid-level audition operates.

6.2 The neural correlates of auditory scene analysis 6.2.1 Streaming Several recent studies have used repeating tone sequences and looked for neural correlates of the percepts of one or two streams. A first set of studies varied the streaming cues (frequency and timing) and sought patterns of neural activity that paralleled average measures of behavioral streaming. Correlates were found in multi- unit activity of primary auditory cortex in awake macaques (Fishman et al., 2001; Fishman et al., 2004) and in the avian equivalent of primary auditory cortex (Bee and Klump, 2004). Using functional magnetic resonance imaging (fMRI) in humans, Wilson et al. (2007) confirmed a correlate in primary auditory cortex, but also found one in secondary areas (planum temporale). The involvement of planum temporale

31

was also suggested by another brain imaging technique, magnetoencephalography (Gutschalk et al., 2005). Other studies have looked for neural correlates of the temporal dynamics of streaming, starting with the initial build-up phase (see Chapter 4). Using single units recordings, correlates were observed in the primary auditory cortex of awake macaques (Micheyl et al., 2005), but also subcortically in the cochlear nucleus of anaesthetized guinea-pigs (Pressnitzer et al., 2008). Recordings of evoked potential with a similar paradigm showed a cortical correlate in humans, possibly modulated by attention (Snyder et al., 2006). Finally, a third set of studies simultaneously recorded behavioral judgments and correlates of neural activity, based on a bistability paradigm (Pressnitzer and Hupé, 2006). Gutschalk et al. (2005) found a cortical correlate of the streaming percept, for identical stimuli, which was less reliable but similar to the one observed for physically different stimuli. Cusack (2005) used bistability and fMRI but he did not observe any correlate in primary or secondary areas of auditory cortex. However, a correlate was found in the intra-parietal sulcus, a locus outside of the main auditory pathways, associated with cross-modal processing. Finally, another recent fMRI study used an event-related design to focus on the moment of the perceptual switches (Kondo and Kashino, in press). Switch-related activations were found in auditory cortex, but also in the auditory thalamus. Moreover, the time course of the fMRI signal suggested that the thalamic activation happened before the cortical one when listeners switched from the non-dominant percept to the dominant percept. This brief review (see also Micheyl et al., 2007; Snyder and Alain, 2007) shows that, even for such a seemingly simple paradigm as streaming of tone sequences, there seems to be a bewildering array of neural correlates, from cochlear nucleus to supra-modal cortex. It could be that the divergence between studies is partly due to differences in experimental techniques. With animal electrophysiology, responses from a single site are collected. Human brain imaging potentially surveys the whole brain, but the cortex is more easily seen and the recordings are indirect correlates of neural activity. Technical subtleties could thus explain some of the negative results, i.e. why some regions were observed in some studies and not in others. However, the positive results, regardless of the technique, do suggest that correlates exist at many levels in the auditory pathways. 6.2.2 Comodulation masking release Even though there is less physiological evidence available, a very similar argument could be made for neural correlates of another aspect of auditory scene analysis, comodulation masking release (CMR, see Chapter 4). With some differences in the CMR paradigms used, correlates were found in auditory cortex (Nelken et al., 1999; Las et al., 2005) or its functional equivalent in birds (Hofer and Klump, 2003); thalamus (Las et al., 2005); dorsal cochlear nucleus (Neuert et al., 2004); and ventral cochlear nucleus (Pressnitzer et al., 2001a). As was mentioned previously, an interesting aspect of these data was that the neural correlate of the signal took different forms at different levels of analysis.

32

6.3 A comparison of different functional cartoons 6.3.1 Boxology At this point, it may be useful to engage in the highly risky exercise of boxology, i.e., speculating over a set of labeled boxes related by arrows. A first cartoon of a possible implementation of scene analysis, which could be termed the strictly hierarchical view, is presented in Figure 6.1A. Sound impinges on the sensory organ and is transformed into action potentials in primary sensory afferents. From this neural activity, feature maps are constructed for various aspects of the sound (e.g. Schreiner, 1995). Then, the feature maps serve as an input to scene analysis. Scene analysis is achieved by segmenting the maps into auditory streams or objects, where the features that belong to a same sound source are bound together (e.g. Griffiths and Warren, 2002). All of this might be modulated by attention and top-down processes, at the scene analysis stage. Such a presentation of the strictly hierarchical processing scheme is certainly over-simplified and may not reflect the view of most investigators today. Nevertheless, it is noticeable that the vast majority of engineering attempts at artificial hearing would be built more or less along those lines. Sensor data from a microphone is decomposed into time frames. Within these frames, features are extracted (such as MFCCs for speech). After noise-reduction algorithms (scene analysis), the features are passed on to statistical classifiers that try to match feature patterns with an internal dictionary acquired through training. The physiological data presented so far are sufficient to require amendments to this first cartoon. At the very least, a massive amount of feedforward and feedback connections are needed to account for the correlates of scene analysis at different levels of the auditory pathways. Anatomical evidence exist for intricate connections between many stations of cortical, thalamo-cortical, and sub-cortical processing (Kaas and Hackett, 2000; Schofield and Coomes, 2006; Winer, 2006). Attention can modulate neural processing at most levels, suggesting a possible functional role for such anatomical connections (Fritz et al., 2007 for a review). Attention may not even be required to recruit the corticofugal pathways, as suggested by recent results obtained under anesthesia (Nakamoto et al., 2008). Thus, the hierarchical view needs to be expanded by positing major feedback routes from the scene analysis module to the feature maps. The “central hub” for scene analysis is still there, however (Figure 6.1B). It is also possible to entertain another, more radical hypothesis: that there is no need for a central hub (Cusack and Pressnitzer, 2008). In this “distributed” view (Figure 6.1C), neural selectivity to sound features exists at all levels of computation, and represent different but possibly overlapping properties of the acoustic signals. Selectivity to cues such as frequency may be found at different stages, starting from early on in the system (Kiang, 1965), whereas selectivity to more complex features such as pitch and pitch sequences (Patterson et al., 2002; Winter, 2005), or even the vocal quality of sounds (Belin et al., 2000; Uppenkamp et al., 2006), could be progressively refined at the various levels of analysis. The idea then is that perceptual

33

A B C Top-down Top-down Top-down

Scene analysis Scene analysis

Feature 4 Feature 4 Feature 4

Feature 3 Feature 3 Feature 3

Feature 1 Feature 1 Feature 1 Feature 2 Feature 2 Feature 2

Sound Input Sound Input Sound Input

Figure 6.1 Cartoons of possible functional implementations of auditory scene analysis. A. In the strictly hierarchical view, features are first computed, and then scene analysis is performed on the output of the feature maps. Top-down effects may modulate processing in the scene analysis module. B. In an amended view of hierarchical processing, there is a large amount of feedback between scene analysis and feature maps. C. In a distributed view, there is no central hub for scene analysis: scene analysis is performed at the locus of each feature map. Interconnections between the various stages of processing are used to integrate the analysis. organization based on a specific feature is performed at the locus where selectivity to this feature is better expressed. This view may seem rather counter-intuitive, and it is certainly nothing more than speculation at this point. But it may be related to popular models in the visual domain for bistability (Sterzer et al., 2009 for a review) or selective attention (Desimone and Duncan, 1995; Duncan et al., 1997). In this later framework, the dominance of a perceptual object through attention is achieved by competition at several levels of representation. It is possible that scene analysis is also a form of competition between alternative interpretations, where the features that are grouped as foreground are enhanced compared to the features of the background. Such a competition could take place at several levels of processing. 6.3.2 Functional consequences of a distributed model A few additional considerations are worth making to highlight the differences between the cartoons of Figure 6.1. An issue for all models of scene analysis is that many features can influence behavioral streaming (Moore and Gockel, 2002 for a review). Frequency differences induce stream segregation (Miller and Heise, 1950), but so do differences in pitch without frequency cues (Vliegen and Oxenham, 1999); spatial cues such as ITD (Sach and Bailey, 2004); or even speaker size cues (Tsuzaki et al., 2006). It is unlikely that accurate selectivity to all of these cues would be observed at any single processing stage. A central scene analysis module should thus get input from all stages where the features relevant to streaming are coded. Conversely, it is also the case that scene analysis can influence many features in

34

return. The pitch of a complex tone may be shifted because of a mistuned harmonic (Moore et al., 1986). The nature of a vowel may change if one harmonic is captured in a concurrent stream (Darwin and Sutherland, 1984). So, the scene analysis module should also be able exert feedback modulation on all feature maps, and change its own input in the process. In the distributed hypothesis, the computation of features and scene analysis overlap, so strong interactions between the two are indeed expected. The arrangement could have a functional benefit. To take a metaphor suggested by David Poeppel (pers. comm.), consider the mathematical function of addition. If this function were useful for a computational system, it may not be the most efficient architecture to have an “addition” module that is recruited every time an addition is required. Rather, the function of addition may be implemented several times, possibly following a canonical architecture, whenever it is needed. Scene analysis may be viewed in this light as a computational primitive (as addition, or perhaps, more appropriately as subtraction). It is a well-defined behavioral function, but it does not have to be based on a single scene analysis module. Several of the findings presented in the earlier chapters would fit with a distributed hypothesis (with some extra assumptions, they would also fit in the amended hierarchical view). Early processing stages, where frequency selectivity is already established, could participate in scene analysis for this particular cue (Pressnitzer et al., 2008). It is also possible to reconcile the low-level correlates suggested for pitch perception (Pressnitzer et al., 2004; Winter, 2005) with the fact that pitch can be changed by scene analysis (Moore et al., 1986) or context (Chambers et al., 2009). Finally, if scene analysis is a computational primitive implemented as, for instance, adaptation and inhibition, then similar principles could apply to more than one sensory modality. This could explain why auditory and visual bistability are highly similar, but still distinct (Pressnitzer and Hupé, 2006; Hupé et al., 2008). 6.3.3 Cues integration An important issue remains that seems for now more easily resolved in a hierarchical view. At any moment in time, cues to scene analysis may all be consistent with each other and clearly favor one perceptual organization. But, because of the fundamental ambiguity of sensory information, it is more likely that some of the cues will sometimes be conflicting. As our everyday experience indicates and as studies of bistability have confirmed, the outcome of scene analysis is nevertheless a single organization at any moment in time. So, there must be a mechanism available to pool all available evidence and take a unique decision. In a hierarchical view, such a mechanism could take a form akin to “voting”, where the scene analysis module operate as a blackboard (Godsmark and Brown, 1999). In the distributed view, the integration of all cues would require that a dominant organization at one stage biases the other stages through reciprocal connections (Desimone and Duncan, 1995; Duncan et al., 1997). Conflict experiments involving cue combinations should be an interesting test for these possibilities (see Chapter 7).

35

6.4 Summary and conclusion Physiological correlates of scene analysis have been claimed at many levels of processing, from brainstem (Pressnitzer et al., 2008) to thalamus and cortex (Kondo and Kashino, in press). In addition, many features can influence scene analysis (Moore and Gockel, 2002), but also, the features themselves can be changed by scene analysis (Bregman, 1990). These two observations represent a challenge for functional cartoons of auditory processing. They refute the simplest possibility where scene analysis would sit between feature extraction and top-down processes. Amendments to this scheme are required, involving strong links between feature analysis and scene analysis. We have outlined two possibilities, a hierarchical view with massive feedback, or a distributed view. Several current accounts already depart significantly from a simple hierarchical view. Based on physiological and behavioral observations, they all emphasize the complex interplay between low and high levels of analysis needed to account for scene analysis (Nelken et al., 2003; Elhilali et al., 2009b), attention (Fritz et al., 2007), or perceptual learning (Ahissar et al., 2009). Here we have sketched a related but possibly more radical possibility, i.e., that there is no need for a neural hub dedicated to scene analysis. Rather, scene analysis could emerge as a computational primitive from the interaction between feature selectivity and competitive neural processes, such as adaptation and inhibition. It now remains to be seen whether this extra step can be distinguished from the amended hierarchical view, whether it is supported by further evidence, and, even more importantly, whether it can generate worthwhile research questions.

36

Chapter 7

Research Project

7.1 Overview The proposed research project combines the research strands outlined in the previous chapters. The first topic aims at revisiting the idea of features, by relating them to the issue of sound recognition. To apply psychophysical techniques to this vast question, we will use reaction-time, gating, and feature-reduction paradigms. We hypothesize that relevant features may depend on the task and sound set. The second topic is centered on the streaming and bistability paradigms. The aim will be to understand how cues to streaming, low-level and high-level, interact to produce a single perceptual decision. This will be done with behavioral experiments where cues are combined and sometimes put in conflict. The third topic will pursue the investigation of memory and context, with behavioral experiments refining the hypotheses concerning the neural bases of the novel perceptual effects described in Chapter 5. Hopefully, this should pave the way for future physiological and brain imaging studies.

7.2 Features for sound recognition 7.2.1 The time it takes to recognize a sound Perhaps surprisingly, there are not many psychophysical investigations of how we recognize natural sounds. Timbre should be useful in such cases, but it is unclear whether a small set of generic timbre dimensions are involved (McAdams et al., 1995), or whether the cues are more complex and specific to classes of sound. With Trevor Agus, Clara Suied, and Simon Thorpe, and thanks to a grant from the French Agence Nationale de la Recherche, we are developing a set of behavioral experiments related to sound identification. Several psychophysical measures are used on a common corpus of natural sounds. The corpus is selected from recorded samples of musical instruments and the singing voice (Goto et al. 2003), to balance the natural character of the sounds with the control of acoustic cues. A first psychophysical measure is reaction-time. Listeners are asked to identify as fast as possible various sounds drawn from a target category (for instance, the human voice) and to ignore all other types of sound (for instance, musical instruments). Preliminary data indicate that voice recognition may be remarkably fast, and faster than recognition of other natural sounds (Agus et al., in press). An auditory morphing technique is also used to decide whether the acoustic cues that can support fast voice recognition are based on temporal fine structure, excitation pattern, or joint spectro-temporal cues.

37

A second technique is gating (Robinson and Patterson, 1995). We are applying temporal windows to the sounds, with various lengths and starting points. The increase in behavioral performance that is expected with longer windows is compared to a multiple-looks model, testing the hypothesis that longer windows only provide more opportunities to accumulate information from features available at short time scales (Viemeister and Wakefield, 1991). Preliminary data indicate that very short samples may be sufficient to recognize sounds above chance, but that features at several time-scales are involved in the recognition process (Agus et al., in press). Interestingly, recent algorithms for sound classification have advocated the use of a multi-scale approach (Mesgarani et al., 2006; Mesgarani et al., 2008). 7.2.2 Auditory sketches A follow-up to this project will investigate the possibility of auditory sketches, that is, severely impoverished acoustic signals that nevertheless support recognition. The aim is to find the minimal set of cues involved in a recognition task. The approach is to design an original signal analysis and synthesis technique based on auditory models. Two types of models are considered: a standard auditory filterbank (Patterson et al., 1995), and a spectro-temporal receptive field representation (Chi et al., 1999; Mesgarani et al., 2006). After the analysis of a natural sound, only a small subset of features will be selected. Then, the impoverished stimulus (the “auditory sketch”) will be synthesized by inverting the model for use in behavioral experiments. Two methods can be imagined for the feature selection. A model-driven approach could use the optimal features set for classification, based on model predictions. A data-driven approach could use techniques inspired by reverse-correlation, where features are initially chosen randomly and then selected based on behavioral performance (Gosselin and Schyns, 2001). If the sketches are successfully recognized, then we will be able to suggest a sparse representation of the relevant features for recognition.

7.3 Bistability as perceptual decision 7.3.1 Cue combination in auditory streaming To address the issue of how cues are integrated to achieve perceptual organization, we are planning a series of experiment where low-level and high-level cues are combined in bistable streaming paradigms. This research strand has received the support of a new ANR grant, in collaboration with Jean-Luc Schwartz in Grenoble. Some preliminary data were collected during the MSc of Wendy de Heer, who is now doing her PhD in Berkeley. A first series of experiments will investigate the combination of spatial localization cues (ITD and ILD). Streaming is affected by each of these cues (Hartmann and Johnson, 1991). Both determine subjective spatial localization, but ITD is a purely binaural cue whereas ILD also contains monaural level cues. If only the high-level percept of subjective localization determines streaming, then tones with ITD and ILD cues matched in subjective localization should have the exact same effect on streaming. However, if monaural level cues also contribute to streaming,

38

there should be an excess of streaming produced by ILD compared to ITD. Of particular interest is the case where ITD and ILD cues are put in conflict. It should then be possible to “cancel-out” the localization cues, at least to some extent (Hafter and Carrier, 1972). A second series of experiments is inspired by an observation related to the build-up of streaming (Cusack et al., 2004; Micheyl et al., 2005; Denham and Winkler, 2006; Pressnitzer and Hupé, 2006). Different sequence durations were used in these studies. It appears that the duration of the build-up period roughly scaled with sequence duration: long sequences produced a slower build-up. We will manipulate systematically the duration of the sequences. Top-down expectations will also be manipulated, by cueing or not cueing the duration of the upcoming stimulus. The hypothesis is that context effects will be visible in the build-up period, depending on the duration of the preceding sequence (Snyder et al., 2009). Moreover, top-down knowledge of sequence duration may be sufficient to change the duration of the build- up. This could highlight two different aspects of the decision-making process during scene analysis: the accumulation of sensory evidence, and the response criterion. 7.3.2 Sensory-motor influences on streaming Another project aims at combining high-level and low-level cues to streaming, but this time by using a sensory-motor paradigm. The project is a collaboration with Makio Kashino, Hirohito Kondo, and Iwaki Toshima, supported by a research grant from NTT, Japan. It has been shown that the build-up of streaming can be reset by abruptly changing the spatial location of the tones composing the streaming sequence (Rogers and Bregman, 1998). After the location changed, listeners reacted as if the sequence had just started again and they displayed the usual initial bias towards one stream. In this experiment, several cues signaled a change in the scene and may have contributed to resetting: discontinuities in subjective location, in binaural cues, or in monaural level cues. We will investigate the effects of voluntary head movements on the build-up of streaming. Listeners will provide streaming judgments while maintaining visual fixation on a light spot. In some trials, the light spot will abruptly change location and listeners will be instructed to make head movements to follow it. During a voluntary head movement, all cues at the ears change, exactly as is the case when a source moves (Rogers and Bregman, 1998). However, we should know that we initiated the changes, and not the source. Will a resetting of the build-up of streaming still be observed in such cases? The answer should be no if changes at the ears can be ignored when they are accounted for by self-movement. But, it could be yes, if low-level feature changes have a mandatory effect on streaming. Using a telepresence robot (Toshima et al., 2008), we will add several control conditions to contrast the two hypotheses. Listeners will wear headphones connected to microphones located on the telepresence robot and head-tracking will be used. By having congruent and incongruent conditions between the listeners’ voluntary head movements and the robot’s head movement, we will be able to cross systematically changes in acoustic

39

cues and the presence or absence of voluntary movement. This will disentangle the contributions of monaural cues, subjective location cues, and attentional factors.

7.4 Mechanisms for memory of noise and pitch hysteresis 7.4.1 Is attention required for learning noises? Different hypotheses have been put forward about how learning of noise may be achieved (see Chapter 5, Agus et al., submitted). Here we are planning to use a technique based on “informational masking” to distinguish between the different levels of processing that may be involved. Gutschalk et al. (2008) presented a sequence of target tones, at a given frequency, embedded in a background of random tones. In some conditions, listeners were not able to detect the targets even though targets and background were likely to be processed in independent peripheral auditory channels. This is an instance of informational masking, as opposed to energetic masking in peripheral channels (Kidd et al., 2008). Importantly, using brain imaging, Gutschalk et al. (2008) showed that the sequences that were not detected because of informational masking were nevertheless represented up to primary auditory cortex. We are planning to use a similar stimulus arrangement, but replacing the tones by band-pass noises. Frozen-noise samples will be used as targets and the learning of noise will be measured (with a repeated-noise task on target sequences). The novelty concerns trials where sequences will not be detected by listeners. We will test whether a frozen sample, presented but not detected, still changes performance on subsequent presentations. If so, this would strongly suggest that correlates of noise-learning are likely to be found in primary auditory cortex or before. In addition, this would indicate that attention is not required for the effect, pointing to contributions of stimulus-specific adaptation (Ulanovsky et al., 2003). Whatever the result, this should guide the choice of future physiological investigations that would be most appropriate for the noise-learning paradigm. 7.4.2 The conditions for pitch hysteresis During the PhD thesis of Claire Chambers, we will conduct further experiments with the basic pitch hysteresis paradigm described in Chapter 5. First, we aim to map the parameters of the context and test stimulus that control the strength of the hysteresis. We will for instance vary the number of context items and estimate whether hysteresis is cumulative or simply depends on the preceding sound. The effect of the time gap between context and test will also be systematically investigated. Second, in a related set of experiments, we will optimize the hysteresis effect for future physiology or brain imaging experiments. The objective will be to find the range of parameters that guarantee a reliable bias in most listeners, with minimal overall stimulus duration, so that pitch for a same physical stimulus can be controlled a priori by the experimenter over several repeats during an experiment. If such a set of parameters could be found, this would remove the need for a co-registration of a behavioral pitch judgment when looking for the neural correlates of the pitch percept after contextual biasing.

40

Chapter 8

Conclusion

This thesis surveyed experimental studies concerned with different aspects of hearing: feature extraction, auditory scene analysis, memory and context effects. For several of these projects, I tried to combine a psychophysical approach with computational models and physiological recordings. I hope the work has contributed to a better understanding of how important features such as pitch, or essential functions such as perceptual organization, may emerge from the interaction between neural auditory processing and the acoustics of sounds and scenes. Some of these advances have practical consequences. As far as possible, I have tried to translate some of them to applications, often in the direction of clinical research. Some students that I supervised are now employed by cochlear implant companies, so these links will hopefully continue in the future. For the more fundamental aspects of the research, a couple of threads have emerged and will continue to be explored in the research project. First, the focus on time at several time scales remains an important aspect of my interests. Time-structure conveys information about the physics of a source, and changes in time are in the very nature of auditory scenes. Much remains to be done to understand how a sound’s time structure is registered as auditory features. A simple acoustic cue such as periodicity is reflected by several complementary neural cues within the auditory pathways. This one-to-many mapping is also obvious in perception: repetition can be detected as pitch, and roughness, and shifts in frequency may even generate a feature of their own. It is as if the auditory system has a bag of tricks to make full use of the important physical properties of sound sources. We are now beginning to understand some of them, but there is no guarantee that the overall picture will be nicely organized and isomorphic to the acoustical description of sound. However, it does seem that these tricks are remarkably efficient, so they could provide inspiration for designing better artificial systems. Second, there are profound interactions between feature extraction, scene analysis, and memory. This is to the extent that we have suggested that these distinct functions may well share common neural substrates. It means that a model of how a feature is encoded has to be able to account for its transformations by scene analysis and context. It also means that investigations of scene analysis and context must be grounded in how features are extracted, starting from the earliest stages of processing. I would argue that considering the subtle interplay between levels of processing seems a fruitful way forward. We do not yet understand how we recognize Chet Baker or Miles Davis in a fraction of a moment. However, the tantalizing glimpses we get of how the brain does it make the music even more enjoyable.

41

References

Agus, T. R., Beauvais, M., Thorpe, S. J., and Pressnitzer, D. (2009). "The implicit learning of noise: Behavioral data and computational models," in 15th International Symposium on Hearing, edited by E. A. Lopez-Poveda, A. R. Palmer, and R. Meddis (Salamanca). Agus, T. R., Suied, C., Thorpe, S. J., and Pressnitzer, D. (in press). "Characteristics of human voice processing," in IEEE International Symposium on Circuits and Systems (ISCAS) (Paris). Agus, T. R., Thorpe, S. J., and Pressnitzer, D. (submitted). "Rapid formation of robust auditory memories: Insights from noise." Ahissar, M., Nahum, M., Nelken, I., and Hochstein, S. (2009). "Reverse hierarchies and sensory learning," Phil. Trans. R. Soc. B 364, 285-299. Alais, D., and Blake, R. (eds). (2005). Binocular rivalry (MIT Press). Ardoint, M., Lorenzi, C., Pressnitzer, D., and Gorea, A. (2008). "Investigation of perceptual constancy in the temporal-envelope domain," J Acoust Soc Am 123, 1591-1601. Atiani, S., Elhilali, M., David, S. V., Fritz, J., and Shamma, S. (2009). "Task difficulty and performance induce diverse adaptive patterns in gain and shape of primary auditory cortical receptive fields.," Neuron 61, 467-480. Bar-Yosef, O., Rotman, Y., and Nelken, I. (2002). "Responses of neurons in cat primary auditory cortex to bird chirps: Effects of temporal and spectral context," J Neurosci 22, 8619-8632. Bee, M. A., and Klump, G. M. (2004). "Primitive auditory stream segregation: A neurophysiological study in the songbird forebrain," J Neurophysiol 92, 1088- 1104. Bee, M. A., and Micheyl, C. (2008). "The cocktail party problem: What is it? How can it be solved? And why should animal behaviorists study it?," Journal of Comparative Psychology 122, 235-251. Belin, P., Zatorre, R. J., Lafaille, P., Ahad, P., and Pike, B. (2000). "Voice-selective areas in human auditory cortex," Nature 403, 309-312. Bendor, D., and Wang, X. Q. (2005). "The neuronal representation of pitch in primate auditory cortex," Nature 436, 1161-1165. Bernstein, J. G., and Oxenham, A. J. (2003). "Pitch discrimination of diotic and dichotic tone complexes: Harmonic resolvability or harmonic number?," J Acoust Soc Am 113, 3323-3334. Bernstein, J. G., and Oxenham, A. J. (2005). "An autocorrelation model with place dependence to account for the effect of harmonic number on fundamental frequency discrimination," J Acoust Soc Am 117, 3816-3831. Bitterman, Y., Mukamel, R., Malach, R., Fried, I., and Nelken, I. (2008). "Ultra-fine frequency tuning revealed in single neurons of human auditory cortex," Nature 451, 197-201. Bleeck, S., Sayles, M., Ingham, N. J., and Winter, I. M. (2006). "The time course of recovery from suppression and facilitation from single units in the mammalian cochlear nucleus," Hear Res 212, 176-184. Boring, E. G. (1937). "A psychological function is the relation of successive differentiations of events in the organism," Psychological Review 44, 445-461. Bregman, A. (1990). Auditory scene analysis. (MIT Press, Cambridge, MA).

43

Bregman, A. S. (1978). "Auditory streaming is cumulative," J Exp Psychol Hum Percept Perform 4, 380-387. Bregman, A. S., and Campbell, J. (1971). "Primary auditory stream segregation and perception of order in rapid sequences of tones," J Exp Psychol 89, 244-249. Cariani, P. A., and Delgutte, B. (1996). "Neural correlates of the pitch of complex tones. I. Pitch and pitch salience," J Neurophysiol 76, 1698-1716. Carlyon, R. P., van Wieringen, A., Long, C. J., Deeks, J. M., and Wouters, J. (2002). "Temporal pitch mechanisms in acoustic and electric hearing," J Acoust Soc Am 112, 621-633. Chait, M., Poeppel, D., de Cheveigne, A., and Simon, J. Z. (2007). "Processing asymmetry of transitions between order and disorder in human auditory cortex," J Neurosci 27, 5207-5214. Chambers, C., Park-Thompson, V., and Pressnitzer, D. (2009). "Biasing perception of ambiguous pitch stimuli," International Journal of Audiology, abstract, in press. Chi, T., Gao, Y., Guyton, M. C., Ru, P., and Shamma, S. (1999). "Spectro-temporal modulation transfer functions and speech intelligibility," J Acoust Soc Am 106, 2719-2732. Cooper, W. B., Tobey, E., and Loizou, P. C. (2008). "Music perception by cochlear implant and normal hearing listeners as measured by the Montreal Battery for Evaluation of Amusia," Ear and Hearing 29, 618-626. Cousineau, M., Demany, L., Meyer, B., and Pressnitzer, D. (submitted). "What breaks a melody: Perceiving pitch and loudness sequences with a cochlear implant." Cousineau, M., Demany, L., and Pressnitzer, D. (2009). "What makes a melody: The perceptual singularity of pitch sequences," J Acoust Soc Am 126, 3179-3187. Cusack, R. (2005). "The intraparietal sulcus and perceptual organization," J Cogn Neurosci 17, 641-651. Cusack, R., Deeks, J., Aikman, G., and Carlyon, R. P. (2004). "Effects of location, frequency region, and time course of selective attention on auditory scene analysis," J Exp Psychol Hum Percept Perform 30, 643-656. Cusack, R., and Pressnitzer, D. (2008). "Auditory scene analysis emerges from a distributed yet integrated network," J Acoust Soc Am 123, 3049. Darwin, C. J., and Sutherland, N. S. (1984). "Grouping frequency components of vowels - when is a harmonic not a harmonic?," Quarterly Journal of Experimental Psychology Section A-Human Experimental Psychology 36, 193-208. de Cheveigné, A. (2005). "Pitch perception models," in Pitch - neural coding and perception, edited by C. J. Plack, A. Oxenham, R. R. Fay, and A. N. Popper (Springer, New York), pp. 169-233. de Cheveigné, A., and Pressnitzer, D. (2006). "The case of the missing delay lines: Synthetic delays obtained by cross-channel phase interaction," J Acoust Soc Am 119, 3908-3918. Dean, I., Harper, N. S., and McAlpine, D. (2005). "Neural population coding of sound level adapts to stimulus statistics," Nat Neurosci 8, 1684-1689. Demany, L., Pressnitzer, D., and Semal, C. (2009). "Tuning properties of the auditory frequency-shift detectors," J Acoust Soc Am 126, 1342-1348. Demany, L., and Ramos, C. (2005). "On the binding of successive sounds: Perceiving shifts in nonperceived pitches," J Acoust Soc Am 117, 833-841.

44

Demany, L., and Semal, C. (2007). "The role of memory in auditory perception," in Auditory perception of sound sources, edited by W. A. Yost, A. N. Popper, and R. R. Fay (Springer Verlag, New York), pp. 77-113. Demany, L., Semal, C., Cazalets, J. R., and Pressnitzer, D. (in revision). "Fundamental differences in change detection between vision and audition," Exp Brain Res. Demany, L., Trost, W., Serman, M., and Semal, C. (2008). "Auditory change detection - simple sounds are not memorized better than complex sounds," Psychol Sci 19, 85-91. Denham, S. L., and Winkler, I. (2006). "The role of predictive models in the formation of auditory streams," Journal of Physiology-Paris 100, 154-170. Desimone, R., and Duncan, J. (1995). "Neural mechanisms of selective visual- attention," Annual Review of Neuroscience 18, 193-222. Deutsch, D. (1987). "The tritone paradox - effects of spectral variables," Percept Psychophys 41, 563-575. Dowling, W. J., and Harwood, D. L. (1986). Music cognition (Academic, Orlando, CA). Duncan, J., Humphreys, G., and Ward, R. (1997). "Competitive brain activity in visual attention," Curr Opin Neurobiol 7, 255-261. Elhilali, M., Fritz, J. B., Klein, D. J., Simon, J. Z., and Shamma, S. A. (2004). "Dynamics of precise spike timing in primary auditory cortex," J Neurosci 24, 1159-1172. Elhilali, M., Ma, L., Micheyl, C., Oxenham, A. J., and Shamma, S. A. (2009a). "Temporal coherence in the perceptual organization and cortical representation of auditory scenes," Neuron 61, 317-329. Elhilali, M., Xiang, J. J., Shamma, S. A., and Simon, J. Z. (2009b). "Interaction between attention and bottom-up saliency mediates the representation of foreground and background in an auditory scene," PLoS Biol 7. Fairhall, A. L., Lewen, G. D., Bialek, W., and van Steveninck, R. R. D. (2001). "Efficiency and ambiguity in an adaptive neural code," Nature 412, 787-792. Fishman, Y. I., Arezzo, J. C., and Steinschneider, M. (2004). "Auditory stream segregation in monkey auditory cortex: Effects of frequency separation, presentation rate, and tone duration," J Acoust Soc Am 116, 1656-1670. Fishman, Y. I., Reser, D. H., Arezzo, J. C., and Steinschneider, M. (2001). "Neural correlates of auditory stream segregation in primary auditory cortex of the awake monkey," Hear Res 151, 167-187. Fritz, J., Shamma, S., Elhilali, M., and Klein, D. (2003). "Rapid task-related plasticity of spectrotemporal receptive fields in primary auditory cortex," Nat Neurosci 6, 1216-1223. Fritz, J. B., Elhilali, M., David, S. V., and Shamma, S. A. (2007). "Auditory attention - focusing the searchlight on sound," Curr Opin Neurobiol 17, 437-455. Getzmann, S. (2004). "Spatial discrimination of sound sources in the horizontal plane following an adapter sound," Hear Res 191, 14-20. Giangrande, J., Tuller, B., and Kelso, J. A. S. (2003). "Perceptual dynamics of circular pitch," Music Percept 20, 241-262. Gilbert, G., and Lorenzi, C. (2006). "The ability of listeners to use recovered envelope cues from speech fine structure," J Acoust Soc Am 119, 2438-2444. Godsmark, D., and Brown, G. J. (1999). "A blackboard architecture for computational auditory scene analysis," Speech Communication 27, 351-366. Goldstein, J. L. (1967). "Auditory nonlinearity," J Acoust Soc Am 41, 676-689.

45

Goossens, T., van de Par, S., and Kohlrausch, A. (2008). "On the ability to discriminate Gaussian-noise tokens or random tone-burst complexes," J Acoust Soc Am 124, 2251-2262. Gosselin, F., and Schyns, P. G. (2001). "Bubbles: A technique to reveal the use of information in recognition tasks," Vision Res 41, 2261-2271. Griffiths, T. D., and Warren, J. D. (2002). "The planum temporale as a computational hub," Trends Neurosci 25, 348-353. Gutschalk, A., Micheyl, C., Melcher, J. R., Rupp, A., Scherg, M., and Oxenham, A. J. (2005). "Neuromagnetic correlates of streaming in human auditory cortex," J Neurosci 25, 5382-5388. Gutschalk, A., Micheyl, C., and Oxenham, A. J. (2008). "Neural correlates of auditory perceptual awareness under informational masking," PLoS Biol 6, 1156-1165. Gutschalk, A., Oxenham, A. J., Micheyl, C., Wilson, E. C., and Melcher, J. R. (2007). "Human cortical activity during streaming without spectral cues suggests a general neural substrate for auditory stream segregation," J Neurosci 27, 13074-13081. Guttman, N., and Julesz, B. (1963). "Lower limits of auditory analysis," J Acoust Soc Am 35, 610. Hafter, E. R., and Carrier, S. C. (1972). "Binaural interaction in low-frequency stimuli - inability to trade time and intensity completely," J Acoust Soc Am 51, 1852- 1862. Hall, J. W., Haggard, M. P., and Fernandes, M. A. (1984). "Detection in noise by spectro-temporal pattern-analysis," J Acoust Soc Am 76, 50-56. Hanna, T. E. (1984). "Discrimination of reproducible noise as a function of bandwidth and duration," Percept Psychophys 36, 409-416. Hartmann, W. M., and Johnson, D. (1991). "Stream segregation and peripheral channeling," Music Percept 9, 155-184. Heinz, M. G., Colburn, H. S., and Carney, L. H. (2001). "Evaluating auditory performance limits: I. One-parameter discrimination using a computational model for the auditory nerve," Neural Computation 13, 2273-2316. Helmholtz, H. (1866/1925). Treatise on physiological optics (Southall, J.P., Dover, New York). Helmholtz, H. (1877). On the sensations of tone (Dover, New York). Hock, H. S., Bukowski, L., Nichols, D. F., Huisman, A., and Rivera, M. (2005). "Dynamical vs judgmental comparison: Hysteresis effects in motion perception," Spatial Vision 18, 317-335. Hofer, S. B., and Klump, G. M. (2003). "Within- and across-channel processing in auditory masking: A physiological study in the songbird forebrain," J Neurosci 23, 5732-5739. Hupé, J. M., Joffo, L. M., and Pressnitzer, D. (2008). "Bistability for audiovisual stimuli: Perceptual decision is modality specific," J Vis 8, 1-15. Hupé, J. M., and Rubin, N. (2003). "The dynamics of bi-stable alternation in ambiguous motion displays: A fresh look at plaids," Vision Res 43, 531 - 548. Itatani, N., and Klump, G. M. (2009). "Auditory streaming of amplitude-modulated sounds in the songbird forebrain," J Neurophysiol 101, 3212-3225. Jardine, G., and Pressnitzer, D. (2009). "Acoustic cues to disambiguate questions and statements in noise-vocoded speech," International Journal of Audiology, abstract, in press.

46

Johnsrude, I. S., Penhune, V. B., and Zatorre, R. J. (2000). "Functional specificity in the right human auditory cortex for perceiving pitch direction," Brain 123, 155-163. Joris, P. X., Schreiner, C. E., and Rees, A. (2004). "Neural processing of amplitude- modulated sounds," Physiol Rev 84, 541-577. Kaas, J. H., and Hackett, T. A. (2000). "Subdivisions of auditory cortex and processing streams in primates," Proc Natl Acad Sci U S A 97, 11793-11799. Kaernbach, C. (2004). "The memory of noise," Experimental Psychology 51, 240-248. Kaernbach, C., and Demany, L. (1998). "Psychophysical evidence against the autocorrelation theory of auditory temporal processing," J Acoust Soc Am 104, 2298-2306. Kashino, M. (1998). "Adaptation in sound localization revealed by auditory after- effects," in Psychophysical and physiological advances in hearing, edited by A. R. Palmer, A. Rees, A. Q. Summerfield, and R. Meddis (Whurr, London), pp. 322-328. Kiang, N. Y. S. (1965). Discharge patterns of single fibers in the cat's auditory nerve (MIT Press Cambridge, MA). Kidd, G., Mason, C. R., Richards, V. M., Gallun, F. J., and Durlach, N. I. (2008). "Informational masking," in Auditory perception of sound sources edited by W. A. Yost, A. N. Popper, and R. R. Fay (Springer, New York), pp. 143-189. Kondo, H. M., and Kashino, M. (2007). "Neural mechanisms of auditory awareness underlying verbal transformations," NeuroImage 36, 123-130. Kondo, H. M., and Kashino, M. (in press). "Involvement of the thalamocortical loop in the spontaneous switching of percepts in auditory streaming," J Neurosci. Krumbholz, K., Patterson, R. D., and Pressnitzer, D. (2000). "The lower limit of pitch as determined by rate discrimination," J Acoust Soc Am 108, 1170-1180. Krumhansl, C. L., and Kessler, E. J. (1982). "Tracing the dynamic changes in perceived tonal organization in a spatial representation of musical keys," Psychological Review 89, 334-368. Laneau, J., Wouters, J., and Moonen, M. (2004). "Relative contributions of temporal and place pitch cues to fundamental frequency discrimination in cochlear implantees," J Acoust Soc Am 116, 3606-3619. Lankheet, M. J. (2006). "Unraveling adaptation and mutual inhibition in perceptual rivalry," J Vis 6, 304-310. Las, L., Stern, E. A., and Nelken, I. (2005). "Representation of tone in fluctuating maskers in the ascending auditory system," J Neurosci 25, 1503-1513. Leopold, D. A., and Logothetis, N. K. (1999). "Multistable phenomena: Changing views in perception," Trends Cogn Sci 3, 254-264. Licklider, J. C. R. (1951). "A duplex theory of pitch perception," Experientia 7, 128- 134. Long, G. M., and Toppino, T. C. (2004). "Enduring interest in perceptual ambiguity: Alternating views of reversible figures," Psychol Bull 130, 748-768. Lorenzi, C., Gilbert, G., Carn, H., Garnier, S., and Moore, B. C. J. (2006). "Speech perception problems of the hearing impaired reflect inability to use temporal fine structure," Proc Natl Acad Sci U S A 103, 18866-18869. Lu, T., Liang, L., and Wang, X. Q. (2001). "Neural representations of temporally asymmetric stimuli in the auditory cortex of awake primates," J Neurophysiol 85, 2364-2380. MacMillan, N. A., and Creelman, C. D. (2001). Detection theory: A user's guide (Lawrence Erlbaum Associates, Inc, Mahwah, NJ).

47

Maier, J. K., McAlpine, D., Klump, G. M., and Pressnitzer, D. (in revision). "Context effects in the discriminability of spatial cues," JARO - Journal of the Association for Research in Otolaryngology. McAdams, S., Winsberg, S., Donnadieu, S., De Soete, G., and Krimphoff, J. (1995). "Perceptual scaling of synthesized musical timbres: Common dimensions, specificities, and latent subject classes," Psychol Res 58, 177-192. McDermott, H. J. (2004). "Music perception with cochlear implants: A review," Trends in Amplification 8, 49-92. McDermott, J. H., Lehr, A. J., and Oxenham, A. J. (2008). "Is relative pitch specific to pitch?," Psychol Sci 19, 1263-1271. McFarland, D. J., and Cacace, A. T. (1992). "Aspects of short-term acoustic recognition memory - modality and serial position effects," Audiology 31, 342-352. Meddis, R., Delahaye, R., O'Mard, L., Sumner, C., Fantini, D. A., Winter, I., and Pressnitzer, D. (2001). "A model of signal processing in the cochlear nucleus: Comodulation masking release," Acta Acustica United with Acustica 88, 387- 398. Meddis, R., and Hewitt, M. J. (1991a). "Virtual pitch and phase sensitivity of a computer-model of the auditory periphery. I. Pitch identification," J Acoust Soc Am 89, 2866-2882. Meddis, R., and Hewitt, M. J. (1991b). "Virtual pitch and phase sensitivity of a computer-model of the auditory periphery. II. Phase sensitivity," J Acoust Soc Am 89, 2883-2894. Mesgarani, N., David, S. V., Fritz, J. B., and Shamma, S. A. (2008). "Phoneme representation and classification in primary auditory cortex," J Acoust Soc Am 123, 899-909. Mesgarani, N., Slaney, M., and Shamma, S. A. (2006). "Discrimination of speech from nonspeech based on multiscale spectro-temporal modulations," IEEE Transactions on Audio, Speech, and Language Processing 14, 920-930. Micheyl, C., Carlyon, R. P., Gutschalk, A., Melcher, J. R., Oxenham, A. J., Rauschecker, J. P., Tian, B., and Wilson, E. C. (2007). "The role of auditory cortex in the formation of auditory streams," Hear Res 229, 116-131. Micheyl, C., Tian, B., Carlyon, R. P., and Rauschecker, J. P. (2005). "Perceptual organization of tone sequences in the auditory cortex of awake macaques," Neuron 48, 139-148. Miller, G. A., and Heise, G. A. (1950). "The trill threshold," J Acoust Soc Am 22, 637-638. Moore, B. C. J. (1973). "Some experiments relating to the perception of complex tones," Quarterly Journal of Experimental Psychology 25, 451-475. Moore, B. C. J. (1982). An introduction to the psychology of hearing (Academic, London). Moore, B. C. J., and Carlyon, R. P. (2005). "Perception of pitch by people with cochlear hearing loss and by cochlear implant users," in Pitch - neural coding and perception, edited by C. J. Plack, A. Oxenham, R. R. Fay, and A. N. Popper (Springer, New York), pp. 234-277. Moore, B. C. J., Glasberg, B. R., and Peters, R. W. (1986). "Thresholds for hearing mistuned partials as separate tones in harmonic complexes," J Acoust Soc Am 80, 479-483. Moore, B. C. J., and Gockel, H. (2002). "Factors influencing sequential stream segregation," Acta Acustica United with Acustica 88, 320-332.

48

Moore, B. C. J., Hopkins, K., and Cuthbertson, S. (2009). "Discrimination of complex tones with unresolved components using temporal fine structure information," J Acoust Soc Am 125, 3214-3222. Moore, B. C. J., and Rosen, S. M. (1979). "Tune recognition with reduced pitch and interval information," Quarterly Journal of Experimental Psychology 31, 229- 240. Munhall, K. G., ten Hove, M. W., Brammer, M., and Pare, M. (2009). "Audiovisual integration of speech in a bistable illusion," Curr Biol 19, 735-739. Nakamoto, K. T., Jones, S. J., and Palmer, A. R. (2008). "Descending projections from auditory cortex modulate sensitivity in the midbrain to cues for spatial position," J Neurophysiol 99, 2347-2356. Nelken, I., Fishbach, A., Las, L., Ulanovsky, N., and Farkas, D. (2003). "Primary auditory cortex of cats: Feature detection or something else?," Biological Cybernetics 89, 397-406. Nelken, I., Rotman, Y., and Bar Yosef, O. (1999). "Responses of auditory-cortex neurons to structural features of natural sounds," Nature 397, 154-157. Neuert, V., Pressnitzer, D., Patterson, R. D., and Winter, I. M. (2001). "The responses of single units in the inferior colliculus of the guinea pig to damped and ramped sinusoids," Hear Res 159, 36-52. Neuert, V., Verhey, J. L., and Winter, I. M. (2004). "Responses of dorsal cochlear nucleus neurons to signals in the presence of modulated maskers," J Neurosci 24, 5789-5797. Neuhoff, J. G. (1998). "Perceptual bias for rising tones," Nature 395, 123-124. O'Regan, J. K., and Noe, A. (2001). "A sensorimotor account of vision and visual consciousness," Behavioral and Brain Sciences 24, 939-1011. O'Regan, J. K., Rensink, R. A., and Clark, J. J. (1999). "Change-blindness as a result of 'mudsplashes'," Nature 398, 34-34. Ohm, G. S. (1843). "On the definition of a tone with the associated theory of the siren and similar sound producing devices," Poggendorf’s Annalen der Physik und Chemie 59, 497ff. Patterson, R. D. (1987). "A pulse ribbon model of monaural phase perception," J Acoust Soc Am 82, 1560-1586. Patterson, R. D. (1994a). "The sound of a sinusoid - Spectral models," J Acoust Soc Am 96, 1409-1418. Patterson, R. D. (1994b). "The sound of a sinusoid - Time-interval models," J Acoust Soc Am 96, 1419-1428. Patterson, R. D., Allerhand, M. H., and Giguere, C. (1995). "Time-domain modeling of peripheral auditory processing: A modular architecture and a software platform," J Acoust Soc Am 98, 1890-1894. Patterson, R. D., Smith, D. R. R., van Dinther, R., and Walters, T. C. (2008). "Size information in the production and perception of communication sounds," in Auditory perception of sound sources, edited by W. A. Yost, A. N. Popper, and R. R. Fay (Springer, New York), pp. 43-75. Patterson, R. D., Uppenkamp, S., Johnsrude, I. S., and Griffiths, T. D. (2002). "The processing of temporal pitch and melody information in auditory cortex," Neuron 36, 767-776. Plomp, R., and Levelt, W. J. M. (1965). "Tonal consonance and critical bandwidth," J Acoust Soc Am 38, 548-560.

49

Pressnitzer, D., Bestel, J., and Fraysse, B. (2005). "Music to electric ears: Pitch and timbre perception by cochlear implant patients," Ann N Y Acad Sci 1060, 343-345. Pressnitzer, D., de Cheveigné, A., and Winter, I. M. (2002). "Perceptual pitch shift for sounds with similar waveform autocorrelation," Acoustics Research Letters Online-ARLO 3, 1-6. Pressnitzer, D., de Cheveigné, A., and Winter, I. M. (2004). "Physiological correlates of the perceptual pitch shift for sounds with similar waveform autocorrelation," Acoustics Research Letters Online-ARLO 5, 1-6. Pressnitzer, D., and Hupé, J. M. (2006). "Temporal dynamics of auditory and visual bistability reveal common principles of perceptual organization," Curr Biol 16, 1351-1357. Pressnitzer, D., and McAdams, S. (1999). "Two phase effects in roughness perception," J Acoust Soc Am 105, 2773-2782. Pressnitzer, D., McAdams, S., Winsberg, S., and Fineberg, J. (2000a). "Perception of musical tension for nontonal orchestral timbres and its relation to psychoacoustic roughness," Percept Psychophys 62, 66-80. Pressnitzer, D., Meddis, R., Delahaye, R., and Winter, I. M. (2001a). "Physiological correlates of comodulation masking release in the mammalian ventral cochlear nucleus," J Neurosci 21, 6377-6386. Pressnitzer, D., and Patterson, R. D. (2001). "Distortion products and the perceived pitch of harmonic complex tones," in Physiological and psychophysical bases of auditory function, edited by D. J. Breebart, A. J. M. Houtsma, A. Kohlrausch, V. F. Prijs, and R. Schoonoven (Shaker Publishing BV, Maastricht), pp. 97-104. Pressnitzer, D., Patterson, R. D., and Krumbholz, K. (2001b). "The lower limit of melodic pitch," J Acoust Soc Am 109, 2074-2084. Pressnitzer, D., Sayles, M., Micheyl, C., and Winter, I. M. (2008). "Perceptual organization of sound begins in the auditory periphery," Curr Biol 18, 1124- 1128. Pressnitzer, D., Winter, I. M., and Patterson, R. D. (2000b). "The responses of single units in the ventral cochlear nucleus of the guinea pig to damped and ramped sinusoids," Hear Res 149, 155-166. Ries, D. T., Schlauch, R. S., and DiGiovanni, J. J. (2008). "The role of temporal- masking patterns in the determination of subjective duration and loudness for ramped and damped sounds," J Acoust Soc Am 124, 3772-3783. Ritsma, R. J. (1962). "Existence region of tonal residue. I," J Acoust Soc Am 34, 1224-1229. Robinson, K., and Patterson, R. D. (1995). "The stimulus duration required to identify vowels, their octave, and their pitch chroma," J Acoust Soc Am 98, 1858-1865. Rogers, W. L., and Bregman, A. S. (1998). "Cumulation of the tendency to segregate auditory streams: Resetting by changes in location and loudness," Percept Psychophys 60, 1216-1227. Rubin, N., Nakayama, K., and Shapley, R. (1997). "Abrupt learning and retinal size specificity in illusory-contour perception," Curr Biol 7, 461-467. Rubin, N., Nakayama, K., and Shapley, R. (2002). "The role of insight in perceptual learning: Evidence from illusory contour perception," in Perceptual learning, edited by M. Fahle, and T. Poggio (MIT Press, Cambridge, MA).

50

Sach, A. J., and Bailey, P. J. (2004). "Some characteristics of auditory spatial attention revealed using rhythmic masking release," Percept Psychophys 66, 1379-1387. Sato, M., Basirat, A., and Schwartz, J. L. (2007). "Visual contribution to the multistable perception of speech," Percept Psychophys 69, 1360-1372. Schofield, B. R., and Coomes, D. L. (2006). "Pathways from auditory cortex to the cochlear nucleus in guinea pigs," Hear Res 216-217, 81-89. Schooneveldt, G. P., and Moore, B. C. J. (1987). "Comodulation masking release (CMR) - effects of signal frequency, flanking-band frequency, masker bandwidth, flanking-band level, and monotic versus dichotic presentation of the flanking band," J Acoust Soc Am 82, 1944-1956. Schreiner, C. E. (1995). "Order and disorder in auditory cortical maps," Curr Opin Neurobiol 5, 489-496. Schroeder, M. R. (1970). "Synthesis of low-peak-factor signals and binary sequences with low autocorrelation," IEEE Transactions on Information Theory 16, 85- 89. Semal, C., and Demany, L. (1991). "Dissociation of pitch from timbre in auditory short-term memory," J Acoust Soc Am 89, 2404-2410. Semal, C., and Demany, L. (2006). "Individual differences in the sensitivity to pitch direction," J Acoust Soc Am 120, 3907-3915. Shackleton, T. M., and Carlyon, R. P. (1994). "The role of resolved and unresolved harmonics in pitch perception and frequency modulation discrimination," J Acoust Soc Am 95, 3529-3540. Shamma, S. (2008). "On the emergence and awareness of auditory objects," PLoS Biol 6, e155. Shamma, S., and Klein, D. (2000). "The case of the missing pitch templates: How harmonic templates emerge in the early auditory system," J Acoust Soc Am 107, 2631-2644. Shamma, S. A. (1985). "Speech processing in the auditory system. II: Lateral inhibition and the central processing of speech evoked activity in the auditory nerve," J Acoust Soc Am 78, 1622-1632. Sheft, S., Ardoint, M., and Lorenzi, C. (2008). "Speech identification based on temporal fine structure cues," J Acoust Soc Am 124, 562-575. Shepard, R. N. (1964). "Circularity in judgments of relative pitch," J Acoust Soc Am 36, 2346-2353. Shera, C. A., Guinan, J. J., and Oxenham, A. J. (2002). "Revised estimates of human cochlear tuning from otoacoustic and behavioral measurements," Proc Natl Acad Sci U S A 99, 3318-3323. Shpiro, A., Moreno-Bote, R., Rubin, N., and Rinzel, J. (2009). "Balance between noise and adaptation in competition models of perceptual bistability," J Comput Neurosci 27, 37-54. Snyder, J. S., and Alain, C. (2007). "Toward a neurophysiological theory of auditory stream segregation," Psychol Bull 133, 780-799. Snyder, J. S., Alain, C., and Picton, T. W. (2006). "Effects of attention on neuroelectric correlates of auditory stream segregation," J Cogn Neurosci 18, 1-13. Snyder, J. S., Carter, O. L., Hannon, E. E., and Alain, C. (2009). "Adaptation reveals multiple levels of representation in auditory stream segregation," J Exp Psychol Hum Percept Perform 35, 1232-1244.

51

Sridhar, T. S., Liberman, M. C., Brown, M. C., and Sewell, W. F. (1995). "A novel cholinergic 'slow effect' of efferent stimulation on cochlear potentials in the guinea pig," J Neurosci 15, 3667-3678. Sterzer, P., and Kleinschmidt, A. (2007). "A neural basis for inference in perceptual ambiguity," Proc Natl Acad Sci U S A 104, 323-328. Sterzer, P., Kleinschmidt, A., and Rees, G. (2009). "The neural bases of multistable perception," Trends Cogn Sci 13, 310-318. Terhardt, E. (1974). "Perception of periodic sound fluctuations (roughness)," Acustica 30, 201-213. Thurlow, W. R., and Jack, C. E. (1973). "Some determinants of localization- adaptation effects for successive auditory-stimuli," J Acoust Soc Am 53, 1573-1577. Tong, F., Meng, M., and Blake, R. (2006). "Neural bases of binocular rivalry," Trends Cogn Sci 10, 502-511. Toshima, I., Aoki, S., and Hirahara, T. (2008). "Sound localization using an acoustical telepresence robot: Telehead II," Presence-Teleoperators and Virtual Environments 17, 392-404. Tsuzaki, M., Takeshima, C., Irino, T., and Patterson, R. D. (2006). "Auditory stream segregation based on speaker size, and identification of size-modulated vowel sequences," in International Symposium on Hearing, edited by B. Kollmeier, V. Hohmann, M. Mauermann, J. Verhey, G. Klump, U. Langemann, and S. Uppenkamp (Cloppenburg, Germany), pp. 285-294. Ulanovsky, N., Las, L., Farkas, D., and Nelken, I. (2004). "Multiple time scales of adaptation in auditory cortex neurons," J Neurosci 24, 10440-10453. Ulanovsky, N., Las, L., and Nelken, I. (2003). "Processing of low-probability sounds by cortical neurons," Nat Neurosci 6, 391-398. Uppenkamp, S., Johnsrude, I. S., Norris, D., Marslen-Wilson, W., and Patterson, R. D. (2006). "Locating the initial stages of speech-sound processing in human temporal cortex," NeuroImage 31, 1284-1296. van Noorden, L. P. A. S. (1975). "Temporal coherence in the perception of tone sequences.," (PhD thesis, University of Technology, Eindhoven). Verhey, J. L., Pressnitzer, D., and Winter, I. M. (2003). "The psychophysics and physiology of comodulation masking release," Exp Brain Res 153, 405-417. Viemeister, N. F., and Wakefield, G. H. (1991). "Temporal integration and multiple looks," J Acoust Soc Am 90, 858-865. Vliegen, J., and Oxenham, A. J. (1999). "Sequential stream segregation in the absence of spectral cues," J Acoust Soc Am 105, 339-346. von Ehrenfels, C. (1937). "On Gestalt-qualities," Psychological Review 44, 521-524. Warren, R. M., and Bashford, J. A., Jr. (1981). "Perception of acoustic iterance: Pitch and infrapitch," Percept Psychophys 29, 395-402. Wiegrebe, L., and Winter, I. M. (2001). "Temporal representation of iterated rippled noise as a function of delay and sound level in the ventral cochlear nucleus," J Neurophysiol 85, 1206-1219. Wilson, E. C., Melcher, J., Micheyl, C., Gutschalk, A., and Oxenham, A. J. (2007). "Cortical fMRI activation to sequences of tones alternating in frequency: Relationship to perceived rate and streaming," J Neurophysiol 97, 2230-2238. Winer, J. A. (2006). "Decoding the auditory corticofugal systems," Hear Res 212, 1-8. Winter, I. M. (2005). "The neurophysiology of pitch," in Pitch - neural coding and perception, edited by C. J. Plack, A. Oxenham, R. R. Fay, and A. N. Popper (Springer, New York), pp. 99-146.

52

Yost, W. A., Mapes-Riordan, D., Dye, R., Sheft, S., and Shofner, W. (2005). "Discrimination of first- and second-order regular intervals from random intervals as a function of high-pass filter cutoff frequency," J Acoust Soc Am 117, 59-62. Zilany, M. S. A., Bruce, I. C., Nelson, P. C., and Carney, L. H. (2009). "A phenomenological model of the synapse between the inner hair cell and auditory nerve: Long-term adaptation with power-law dynamics," J Acoust Soc Am 126, 2390-2412.

53

Annex : CV and list of publications

Daniel Pressnitzer Born 25 August 1971, Toulouse, France [email protected] http://audition.ens.fr/dp/

Current position CNRS Research scientist (CR1) Team Leader, Equipe Audition: Psychophysique, Modélisation, Neuroscience CNRS & Université Paris Descartes & Ecole Normale Supérieure 29 rue d’Ulm, 75005 Paris, France

Previous positions 2000-2004: Research scientist, CNRS, Institut de Recherche et Coordination Acoustique Musique (Ircam), Paris, France. 1999-2000: Post-doctoral research associate, Wellcome Trust, The Physiological Laboratory, Cambridge University, UK. With I.M. Winter. 1998-1999: Post-doctorate, Fyssen Foundation, Centre for the Neural Basis of Hearing, Cambridge University, UK. With R.D. Patterson.

Education 1994-1998: PhD from Université Pierre et Marie Curie, Paris, in Acoustics, Signal Processing and Computer Science applied to Music. With S. McAdams. Summa cum laude. 1997-1998: Masters from Université Pierre et Marie Curie. Honours. 1990-1993: Engineering degree from the Ecole Nationale Supérieure d’Ingénieurs en Constructions Aéronautiques (Ensica), Toulouse. Honours.

Grants and awards 2009-2010: Royal Society travel grant ENS/University College London. Co-PI. 2008-2011: ANR “Multistability in speech and audition”, white program, special distinction for interdisciplinary project. Collaboration with J.L. Schwartz, Grenoble, France. 2007-2009: NTT research labs collaboration grant. Collaboration with M. Kashino, Atsugi, Japan. Co-PI. 2006-2009: ANR “Hearing in Time”. Collaboration with the S. Thorpe, Toulouse. Principal Investigator. 2005: Grant from the Association Franco-Israëlienne pour la Recherche en Neurosciences (AFIRNe). Co-PI. 2004-2007: European project “From Sense to Sound, From Sound to Sense”, 6th Framework Program, FET Open, contract for Coordination Action. 2001-2004: CNRS ACI grant from the interdisciplinary program “Cognition and information processing”. Principal Investigator. 1999: Yves Rocard Young Researcher Award from the French Society for Acoustics. Teaching Teaching of auditory perception at Master’s level (approx. 50h/year): - Coordinator of the teaching unit “Auditory Perception and Cognition” for the Master de Sciences Cognitives (ENS/EHESS/Université Paris Descartes). - Coordinator of the teaching unit “Music Perception and Cognition” for the Master Acoustics, Signal Processing and Computer science applied to Music (Université Pierre et Marie Curie & Ircam). - Teaching in the Master Acoustics, Université du Maine, Le Mans, France. - Teaching in the Institut Supérieur du Numérique (ISEN), Lille, France. Invited lectures: 2009: Collège de France, Paris. Invited within the lesson series of C. Petit. 2009: Lecturer for the Telluride Workshop, Institute of Neuromorphic Engineering, USA. 2007: Lecturer for the Spring School of the Hanse Institute of Advanced Studies, Neuroscience, Germany.

Peer reviewing Reviewing of research articles for : - Journal of the Acoustical Society of America - Hearing Research - Acta Acustica united with Acustica - Attention, Perception, and Psychophysics - Music Perception - Brain Research - Neuroscience Letters - Journal of Neurophysiology - Journal of Neuroscience - Journal of Computational Neuroscience - PLoS Computational Biology - Current Biology - Neuron - Trends in Cognitive Sciences Project evaluations for : - French Ministry of Research - National Science Foundation (NSF, USA) - Fonds Québécois de la Recherche Nature et Technologies (Canada)

Organisation and co-organisation of scientific conferences 2003: 13th International Symposium on Hearing (ISH), Dourdan. 2006: International workshop on “New Ideas in Hearing”, ENS, Paris. 2008: Special session on Integrated Approaches to Auditory Scene Analysis, 155th Meeting of the Acoustical Society of America - Acoustics’08, Paris. 2008: International workshop on “Perceptual Bistability in Audition and Vision”, ENS, Paris. 2008: French-Israeli workshop on “Hierachies in Hearing”, ENS, Paris. Scientific supervision Post-doctoral research associates 2008-2010: Trevor Agus (PhD from the Institute of Hearing Research, Nottingham, UK). Financed by the ANR “Hearing in Time”. 2009: Julia Maier (PhD from the University of Oldenburg, Germany). Financed by the ANR “Hearing in Time”. PhD students 2007-2010: Marion Cousineau (MSc in Cognitive Sciences, ENS). Financed by the French Ministry of Research, grant allocated by Université Paris Descartes. 2009-2012: Claire Chambers (Trinity College Dublin and MSc in Cognitive Sciences, ENS). Now financed on the ANR “Multistability in perception”. Follow-up partnership planned with the distribution network “Entendre”. MSc students or equivalent, 100% supervision except when indicated 2009: Claire Chambers. Conditions for bistability in an ambiguous stimulus. Master de Sciences Cognitives, ENS-EHESS-Paris Descartes. Now PhD in the team. 2009: Vanessa Park-Thompson. Summer internship from McGill University. 2008 : Marion Beauvais. Implicit learning of noise. Summer internship from Science-Po, Paris. Now MSc in Neuroscience in Toronto, Canada. 2008 : Sabine Caminade. Frequency selectivity. L3 Physics project, ENS. Now PhD in College de France. 2008: Wendy de Heer. Top-down and bottom-up contributions to auditory streaming. Master de Sciences Cognitives, ENS-EHESS-Paris Descartes. Now PhD in Berkeley, USA, with F. Theunissen. 2008 : Gaëlle Jardine. Perception of speech intonation with cochlear implant simulations. Master Phonétique Expérimentale, Université Paris 7 Denis Diderot. (supervision : 70%). 2008 : Anne Marchand. Timbre perception by hearing-impaired listeners. Mémoire d’audioprothèse, Ecole d’audioprothèse de Fougères (50%). 2007 : Marion Cousineau. Pitch sequence processing. Master de Sciences Cognitives, ENS-EHESS-Université Paris Descartes. Now PhD in the team. 2006 : Lu-Ming Joffo. Audio-visual bistability. Master de Sciences Cognitives, ENS- EHESS-Université Paris Descartes. Now employed by Advanced Bionics, LA, USA (cochear implants company). 2005: Dan Gnansia. Real time auditory models. Master Acoustique, Traitement de signal, Informatique Appliqués à la Musique. Université Paris 6, Paris. Did a Phd in the team with C. Lorenzi. Now employed by MXM-Neurelec, Sophia Antipolis, France (cochlear implants company). 2005: Isabelle Barba. Processing time for complex auditory stimuli. MSc Neurosciences, Université Paul Sabatier, Toulouse (30%). 2004 : Joan Llobera. The auditory continuity illusion, an MEG study. Master de Sciences Cognitives, ENS-EHESS-Paris Descartes. Now with STARLab, Barcelona, Spain. 2003 : Stéphane Loiselle. Spiking neural networks for non-linear information processing. Did a PhD at Sherbrooke University, Canada (30%). 2002: Jean-Pierre Arz. A psychoacoustical criterion for engine noise. Applied acoustics MSc, Université du Mans. (30%) 2002 : Julien Tardieu. Psychophysical study of the auditory continuity illusion. MSc Acoustique, Traitement de signal, Informatique Appliqués à la Musique. Université Paris 6, Paris. Did a PhD thesis at Ircam. 2001 : Thomas Wulfrank. Temporal aspects of auditory masking. MSc Acoustique, Traitement de signal, Informatique Appliqués à la Musique. Université Paris 6, Paris. Employed as acoustics consultant, Cambridge, UK.

Administrative responsibilities 2008- : Team leader, Equipe Audition. 2008- : Elected member, Acoustical Society of America, Psychological & Physiological Acoustics Technical Committee. 2008-: Elected member, Groupe Perception Sonore, Société Française d’Acoustique. 2006- : Member of two selection committees (7-16 ENS, Comité selection MC Université Paris Descartes). 2004- : Co-founder of the Equipe Audition, Member of the board of the Département d’Etudes Cognitives, Ecole Normale Supérieure.

Other 2008: Consultant for Arkamys (audio technologies). 2007: Exhibition on Perceptual Illusions, Palais de la Découverte, Paris, France. 2005-2006: Consultant for Advanced Bionics (cochlear implants). 1999: Development of a real-time auditory neurophysiology platform.

Publications list

The * symbols indicate supervised students, PhD or post-doctoral associates.

Submitted or in revision Agus*, T.R., Thorpe, S.J., & Pressnitzer, D. Rapid formation of robust auditory memories: Insights from noise. Submitted. Cousineau*, M., Demany, L., Meyer, B., & Pressnitzer, D. Pitch and loudness sequence perception with a cochlear implant. Submitted. Demany, L., Semal, C., Cazalets, J.R., & Pressnitzer, D. Fundamental differences in change detection between audition and vision. Experimental Brain Research. In revision. Demany, L., Semal, C., & Pressnitzer, D. Implicit versus explicit frequency comparisons: two mechanisms of auditory change detection. Journal of Experimental Psychology: Human Perception & Performance. In revision. Gnansia, D., Pressnitzer, D., Péan, V., Meyer, B., & Lorenzi, C. Intelligibility of interrupted and interleaved speech for normal-hearing listeners and cochlear implantees. Hearing Research. Accepted with minor revisions. Joly, O., Ramus, F., Pressnitzer, D., Pallier, C., Vanduffel, W., & Orban, G.A. Interhemispheric differences in early auditory processing revealed by fMRI in awake rhesus monkeys. Submitted.

Peer-reviewed journals Maier*, J.K., McAlpine, D., Klump, G., Pressnitzer, D. Context effects in the discriminability of spatial cues. (in press). JARO- Journal of the Association for Research in Otolaryngology. Cousineau*, M., Demany, L., & Pressnitzer, D. (2009). What makes a melody: The perceptual singularity of pitch sequences. Journal of the Acoustical Society of America, 126(6), 3179-3187. Demany, L., Pressnitzer, D., & Semal, C. (2009). Tuning properties of the auditory frequency-shift detectors. Journal of the Acoustical Society of America, 126(3), 1342- 1348. Ardoint, M., Lorenzi, C., Pressnitzer, D., & Gorea, A. (2008). Investigation of perceptual constancy in the temporal-envelope domain. Journal of the Acoustical Society of America, 123(3), 1591-1601. Hupé, J. M., Joffo*, L. M., & Pressnitzer, D. (2008). Bistability for audiovisual stimuli: Perceptual decision is modality specific. Journal of Vision, 8(7), Special issue on Perceptual organization and neural computation, 1-15. Pressnitzer, D., Sayles, M., Micheyl, C., & Winter, I. M. (2008). Perceptual organization of sound begins in the auditory periphery. Current Biology, 18, 1124-1128. Widmer, G., Rocchesso, D., Valimaki, V., Erkut, C., Gouyon, F., Pressnitzer, D., et al. (2007). Sound and music computing: Research trends and some key issues. Journal of New Music Research, 36(3), 169-184. de Cheveigné, A., & Pressnitzer, D. (2006). The case of the missing delay lines: Synthetic delays obtained by cross-channel phase interaction. Journal of the Acoustical Society of America, 119(6), 3908-3918. Pressnitzer, D., & Hupé, J. M. (2006). Temporal dynamics of auditory and visual bistability reveal common principles of perceptual organization. Current Biology, 16(13), 1351-1357. Pressnitzer, D., Bestel, J., & Fraysse, B. (2005). Music to electric ears: Pitch and timbre perception by cochlear implant patients. Annals of the New York Academy of Sciences, 1060, 343-345. Pressnitzer, D., de Cheveigné, A., & Winter, I. M. (2004). Physiological correlates of the perceptual pitch shift for sounds with similar waveform autocorrelation. Acoustics Research Letters Online, 5(1), 1-6. Verhey, J. L., Pressnitzer, D., & Winter, I. M. (2003). The psychophysics and physiology of comodulation masking release. Experimental Brain Research, 153(4), 405-417. Meddis, R., Delahaye, R., O'Mard, L., Sumner, C., Fantini, D. A., Winter, I. M., et al. (2002). A model of signal processing in the cochlear nucleus: Comodulation masking release. Acta Acustica United with Acustica, 88(3), 387-398. Pressnitzer, D., de Cheveigné, A., & Winter, I. M. (2002). Perceptual pitch shift for sounds with similar waveform autocorrelation. Acoustics Research Letters Online, 3(1), 1-6. Pressnitzer, D., Meddis, R., Delahaye, R., & Winter, I. M. (2001). Physiological correlates of comodulation masking release in the mammalian ventral cochlear nucleus. Journal of Neuroscience, 21(16), 6377-6386. Pressnitzer, D., Patterson, R. D. & Krumbholz, K. (2001) The lower limit of melodic pitch. Journal of the Acoustical Society of America, 109, 2074-84. Neuert, V., Pressnitzer, D., Patterson, R. D., & Winter, I. M. (2001). The responses of single units in the inferior colliculus of the guinea pig to damped and ramped sinusoids. Hearing Research, 159(1-2), 36-52. Krumbholz, K., Patterson, R. D., & Pressnitzer, D. (2000). The lower limit of pitch as determined by rate discrimination. Journal of the Acoustical Society of America, 108(3 Pt 1), 1170-1180. Pressnitzer, D., Winter, I. M., & Patterson, R. D. (2000). The responses of single units in the ventral cochlear nucleus of the guinea pig to damped and ramped sinusoids. Hearing Research, 149(1-2), 155-166. Pressnitzer, D., McAdams, S., Winsberg, S., & Fineberg, J. (2000). Perception of musical tension for nontonal orchestral timbres and its relation to psychoacoustic roughness. Percepion &t Psychophysics, 62(1), 66-80. Pressnitzer, D., & McAdams, S. (1999). Acoustics, psychoacoustics, and spectral music. Contemporary Music Review, 19, 33-60. Pressnitzer, D., & McAdams, S. (1999). Two phase effects in roughness perception. Journal of the Acoustical Society of America, 105(5), 2773-2782.

Abstracts or short papers in peer-reviewed journals Agus*, T.R., & Pressnitzer, D. (2010). Deep frozen noise: Long-term learning in adverse conditions. International Journal of Audiology. In press. Chambers*, C., Park-Thompson*, V., & Pressnitzer, D. (2010). Biasing perception of ambiguous pitch stimuli. International Journal of Audiology. In press. Cousineau*, M., Demany, L., Meyer, B., & Pressnitzer, D. (2010). Pitch-sequence processing for normal-hearing listeners, cochlear implant users, and noise- vocoder simulations. International Journal of Audiology. In press. Jardine*, G., & Pressnitzer, D. (2010). Acoustic cues to disambiguate questions and statements in noise-vocoded speech. International Journal of Audiology. In press. Agus*, T.R., & Pressnitzer, D. (2009). Implicit learning of noise. International Journal of Audiology. In press. Pressnitzer, D. (2009). Subcortical contributions to the temporal dynamics of auditory streaming. International Journal of Audiology. In press. Cousineau*, M., Pressnitzer, D., & Demany, L. (2008). From sounds to melodies: Memory for sequences of pitch and loudness. Journal of the Acoustical Society of America, 123(5), 3562. [Best student paper award]. Cusack, R., & Pressnitzer, D. (2008). Auditory scene analysis emerges from a distributed yet integrated network. Journal of the Acoustical Society of America, 123(5), 3052. Demany, L., Pressnitzer, D., & Semal, C. (2008). On the binding of successive tones: Implicit versus explicit pitch comparisons. Journal of the Acoustical Society of America, 123(5), 3049. Elhilali, M., Pressnitzer, D., & Shamma, S. (2006). Models of musical timbre using cortical spectro-temporal receptive fields and temporal codes. Journal of the Acoustical Society of America, 120, 3085. Pressnitzer, D., Joffo*, L. M., & Hupe, J. M. (2007). Bistability for auditory, visual, and audio-visual stimuli: Evidence for distributed neural mechanisms of perceptual organization. Hearing Research, 229, 246. Pressnitzer, D., & Hupé, J. M. (2005). Is auditory streaming a bistable percept? Acta Acustica united with Acustica, 91(S1), S102. Pressnitzer, D., Tardieu*, J., Ragot, R., & Baillet, S. (2004). Mechanisms underlying the continuity illusion. Journal of the Acoustical Society of America, 115(5), 2460. Patterson, R. D., Krumbholz, K., & Pressnitzer, D. (2002). The existence region for melodic pitch and computational models. Journal of the Acoustical Society of America, 111(5), 2416. Kult, A., Rupp, A., Pressnitzer, D., Scherg, M., & Supek, S. (2003). Meg study on temporal asymmetry processing in the human auditory cortex. NeuroImage, 19(2). Pressnitzer, D., Winter, I. M., & Patterson, R. D. (2000). A hierarchy of sensitivity to temporal asymmetry: Cochlear nucleus responses to damped and ramped sinusoids. British Journal of Audiology, 34(2), 88-89. Krumbholz, K., Patterson, R. D., & Pressnitzer, D. (2000). The lower limit of pitch as determined by rate discrimination. Journal of the Acoustical Society of America, 108(3 Pt 1), 1170-1180. Winter, I. M., Pressnitzer, D., & Meddis, R. (2000). Across frequency processing in the ventral cochlear nucleus: Searching for a physiological substrate of comodulation masking release. British Journal of Audiology, 34(2), 89-90. Pressnitzer, D., Patterson, R. D., & Krumbholz, K. (1999). The lower limit of melodic pitch with filtered harmonic complexes. Journal of the Acoustical Society of America, 105, 1052. Misdariis N., Smith B., Pressnitzer D., Susini P., & McAdams S. (1998). Validation of a multidimensional distance model for perceptual dissimilarities among musical timbres. Journal of the Acoustical Society of America, 103 : 3005. McAdams S., Pressnitzer D. (1996). Psychoacoustic factors to musical tension in Western nontonal music. International Journal of Psychology, 3(3-4): 148.

Non peer-reviewed journals Canévet, G., Demany, L., Grimault, N., McAdams, S., & Pressnitzer, D. (2005). La psychoacoustique: Science de l’audition, science du son. Acoustique et Techniques, 42-43, 28-34.

Peer-reviewed proceedings Agus*, T.R., Suied, C., Thorpe, S.J., & Pressnitzer, D. (2010). Characteristics of human voice processing. IEEE International Symposium on Circuits and Systems. In press. Joly, O., Ramus, F., Pallier, C., Pressnitzer, D., Dupoux, E., Hauser, M. D., et al. (2008). Functional lateralization in monkey auditory cortex? 37th annual meeting of the Society for Neuroscience, Washington, USA. Elhilali, M., Shamma, S., Thorpe, S. J., & Pressnitzer, D. (2007). Models of timbre using spectro-temporal receptive fields: Investigation of coding strategies. 19th International Congress on Acoustics, Madrid, Spain. Loiselle*, S., Rouat, J., Pressnitzer, D., & Thorpe, S. (2005). Exploration of rank order coding with spiking neural networks for speech recognition. International Joint Conference on Neural Networks, p. 2076-2080. Montreal, Canada. Pressnitzer, D., & Hupé, J. M. (2005). Is auditory streaming a bistable percept? Proceedings of Forum Acusticum, Budapest, Hungary. Pressnitzer, D., & Gnansia*, D. (2005). Real-time auditory models. Proceedings of International Computer Music Conference, p. 295-298. Barcelona, Spain. Pressnitzer, D., Ragot, R., Ducorps, A., Schwartz, D., & Baillet, S. (2004). Is the continuity illusion based on a change-detection mechanism? A MEG study. Proceedings of Joint Congress on Acoustics CFA/DAFA'04, p. 589-590. Strasbourg, France. Rivenez, M., Gorea A., Pressnitzer, D., Drake C. (2002) The tolerance window for sequences of musical, environmental and artificial sounds. Proceedings of the 7th International Conference on Music Perception and Cognition, Stevens C., Burnham D., McPherson G., Schubert, E., Renwick, J. (Eds). Causal Productions : Adelaide, Australie. Pressnitzer, D., McAdams S. (1997). Influence of phase effects on roughness modelling. Proceedings of the International Computer Music Conference, p 31-34. Thessaloniki, Grèce. [Best student paper award]. Pressnitzer, D., McAdams S. (1997). Influence de la phase sur la perception de rugosité sons complexes. Actes du 4ème Congrès Français d’Acoustique, pages 535-538. Marseille, France. Book chapters Agus*, T., Beauvais*, M., Thorpe, S.J. & Pressnitzer, D. (2009). The implicit learning of noise: Behavioural data and computational models. In E. A. Lopez-Poveda, R. Meddis & A. Palmer (Eds.), Advances in auditory physiology, psychophysics and models. New York: Springer-Verlag. Elhilali, M., Chi, T. S., Pressnitzer, D., & Shamma, S. (2009) Neural basis of timbre of musical instruments. In T. Klouche (Ed.), Mathematical and computational musicology (in press).Berlin. Krumbholz, K., Patterson, R. D., & Pressnitzer, D. (2001). The perception of periodicity near the lower limit of pitch. In D. J. Breebart, A. J. M. Houtsma, A. Kohlrausch, V. Prijs & R. Schoonoven (Eds.), Physiological and psychophysical bases of auditory function (pp. 75-82). Maastricht: Shaker Publishing BV. Krumbholz, K., Patterson, R. D., & Pressnitzer, D. (1999). Period difference limens for harmonic complex tones in and below the pitch region. In T. Dau, V. Hohmann & B. Kollmeier (Eds.), Psychophysics, physiology and models of hearing (pp. 85-88). Singapore: World Scientific Publishing. Meddis, R., Delahaye, R., Fantini, D., Winter, I. M., & Pressnitzer, D. (2001). A model of a brainstem circuit that might be involved in comodulation masking release. In D. J. Breebart, A. J. M. Houtsma, A. Kohlrausch, V. Prijs & R. Schoonoven (Eds.), Physiological and psychophysical bases of auditory function (pp. 252-257). Maastricht: Shaker Publishing BV. Pressnitzer, D., & Patterson, R. D. (2001). Distortion products and the perceived pitch of harmonic complex tones. In D. J. Breebart, A. J. M. Houtsma, A. Kohlrausch, V. Prijs & R. Schoonoven (Eds.), Physiological and psychophysical bases of auditory function (pp. 97-107). Maastricht: Shaker Publishing BV. Pressnitzer, D., & McAdams, S. (1999). Summation of roughness across frequency regions. In T. Dau, V. Hohmann & B. Kollmeier (Eds.), Psychophysics, physiology and models of hearing (pp. 105-108). Singapore: World Scientific Publishing. Pressnitzer, D., & McAdams, S. (1998). Phase effects in roughness perception. In A. Palmer, A. Rees, Q. Summerfield & R. Meddis (Eds.), Psychophysical and physiological advances in hearing (pp. 286-292). London: Whurr Publishers.

Edited book Pressnitzer, D., de Cheveigné, A., McAdams, S., & Collet, L. (Eds.). (2005). Auditory signal processing: Physiology, psychoacoutics and models. New York: Springer.

PhD Thesis Pressnitzer, D. (1998). Perception of auditory roughness: from a basic perceptual attribute to the perception of music. Université Paris 6, Paris, supervised by S. McAdams. Félicitations du jury.

Invited conferences Pressnitzer, D. (2010). The perception of pitch sequences by normal hearing listeners and people using a cochlear implant. Advanced Bionics Music Perception conference, Budapest, Hungary. [Keynote]. Pressnitzer, D. (2009). Models for auditory perception. Neuromorphic cognition engineering workshop, Telluride, USA. Pressnitzer, D. (2009) Auditory scene analysis: using illusions to probe perception. Wellcome Trust Symposium: Signalling Sound, Warwick, UK. [Keynote]. Pressnitzer, D., Sayles, M., Micheyl, C., & Winter I.M. (2009) Neural correlates of the temporal dynamics of auditory scene analysis. 9ème colloque de la Société des Neurosciences, Bordeaux. Pressnitzer, D. (2008). Perception auditive non-verbale chez les personnes normo- et malentendantes, Deuxième conférence virtuelle Audiologie/Audioprothèse Phonak. Pressnitzer, D. (2008). L'organisation des scènes auditives: des illusions pour mieux comprendre la perception. Collège National d'Audioprothèse, Paris. [Plenary]. Pressnitzer, D. (2008) Universals in Music Perception. 2nd Japanese−French Frontiers of Science. [Plénière]. Pressnitzer, D. (2007) Temporal dynamics of auditory scene analysis. Spring School of the Hanse Institute of Advanced Studies, Neuroscience, Delmenhorst, Allemagne. Pressnitzer, D. (2007) Temporal dynamics of auditory scene analysis. 2nd France-Israël Neuroscience Binational conference. Bordeaux, France. Pressnitzer, D. (2007) Méthodes d'evaluation de la perception de la musique. Première conférence virtuelle Audiologie/Audioprothèse Phonak. Pressnitzer, D. (2006). The perception of pitch and timbre by normally hearing listeners and cochlear implant users. Bionics Investigators Meeting, Venise, Italie. [Keynote]. Pressnitzer, D., Hupé, J. M. (2006). Bistable perception in audition: can it tell us anything about auditory scene analysis? Computational and systems neuroscience (Cosyne), Salt Lake City, USA. Pressnitzer, D. (2006). Ecoute musicale et perception de hauteur. Congrès Français de Phoniatrie, Paris, France. [Plenary]. Pressnitzer, D., & Bestel, J. (2005). CI-Music, a set of objective tasks to evaluate pitch and timbre perception in cochlear implant patients. Bionics European Investigators Conference, Istanbul, Turquie. Pressnitzer, D. (2005). Ecoute musicale et perception de hauteur. 9ème Symposium Entendre, Cagliari, Italie. [Plenary]. Pressnitzer, D., & Hupé, J. M. (2005). Is auditory streaming a bistable percept? Forum Acusticum, Budapest, Hongrie. Pressnitzer, D. (2004) Perception et Cognition Auditive. Ecole d’été Acoustique et Musique, Institut Scientifique de Cargèse, Corse. Pressnitzer, D., Ragot R., Ducorps A., Schwartz D., & Baillet S. (2004). Is the continuity illusion based on a change-detection mechanism? Joint Congress on Acoustics CFA/DAGA’04, Strasbourg, France. Pressnitzer, D., & Meddis, R. (2002) Modèles fonctionnels du système auditif périphérique. VIème congrès de la Société Française d’Audiologie, Paris, France [Plenary]. Pressnitzer D., Demany L., & Rupp A. (2002) The perception of frequency peaks and troughs: psychophysical data and functional brain imaging data. Forum Acusticum, Sevilla, Spain. Pressnitzer D., McKinney M., de Cheveigné A., & Winter I. M. (2002). Pitch perception and the encoding of click trains in the mammalian ventral cochlear nucleus. Forum Acusticum, Sevilla, Spain. Pressnitzer, D. (2000). Modèles psychoacoustiques et perception de hauteur. Journées d’Informatique Musicale, Bordeaux, France.

Talks in international conferences, symposia Agus*, T., Beauvais*, M., Thorpe, S.J., & Pressnitzer, D. (2009). The implicit learning of noise: Behavioural data and computational models. 15th International Symposium on Hearing, Salamanca, Spain. Cousineau*, M., Demany, L., Meyer, B., & Pressnitzer, D. (2009). The perception of sound sequences by normal-hearing and cochlear-implant listeners. 32nd MidWinter Meeting of the Association for Research in Otolaryngology, Baltimore. Demany, L., Pressnitzer, D., & Semal, C. (2009). Tuning properties of the auditory frequency- shift detectors. 32nd MidWinter Meeting of the Association for Research in Otolaryngology, Baltimore. Goodman*, D., Pressnitzer, D., & Brette, R. (2009) Sound localization with spiking neural networks. 18th Annual Computational Neuroscience Meeting, San Francisco, USA. Pressnitzer, D., & Agus*, T. (2009). Reaction times for natural sound identification. 32nd MidWinter Meeting of the Association for Research in Otolaryngology, Baltimore. Fraysse, B., Bestel, J., Pressnitzer, D., Sterkers, O., Frachet, B., Mondain, M., et al. (2008). Frequency alignment and music perception: Results of a multicenter study. 10th International Conference on Cochlear Implants and other Implantable Auditory Technologies, San Diego, USA. Maier*, J. K., McAlpine, D., Klump, G., & Pressnitzer, D. (2008). Coding of interaudal time and level differences in the human brain: Adaptation and interactions? 31st MidWinter meeting of the Association for Research in Otolaryngology, Phoenix, USA. Pressnitzer, D. (2008). The build-up of streaming adapts to sequence duration. 31st MidWinter meeting of the Association for Research in Otolaryngology, Phoenix, USA. Kirchner, H., Thorpe, S. J., & Pressnitzer, D. (2007). Ultra-rapid communication of natural sounds: Assessing auditory processing speed with saccadic eye movements. 14th European Conference on Eye Movement, Potsdam, Germany. Ardoint, M., Gorea, A., Debruille, X., Pressnitzer, D., & Lorenzi, C. (2007). Recognition of complex temporal envelopes in normal hearing listeners and cochlear implantees. Paper presented at the 8th EFAS Conference, Heidelberg, Germany. Maier*, J. K., McAlpine, D., Klump, G., & Pressnitzer, D. (2007). Adaptive coding of interaural time and level differences in the human brain: jnds and interactions. British Society of Audiology Short Papers Meeting on Experimental Studies of Hearing and Deafness, University College, London, UK. Pressnitzer, D., Micheyl, C., Sayles, M., & Winter, I. M. (2007). Responses to long- duration tone sequences in the cochlear nucleus. 30th MidWinter meeting of the Association for Research in Otolaryngology, Denver, USA. De Cheveigné, A., Pressnitzer, D., Parmentier*, F., & Gandon*, C. (2006). Temporal integration in pitch perception. 29th MidWinter meeting of the Association for Research in Otolaryngology, Baltimore, USA. Pressnitzer D., & Winter I. M. (2000) Encoding first- and second-order periodicity in the ventral cochlear nucleus. 23rd MidWinter meeting of the Association for Research in Otolaryngology, St Petersburg, USA. Winter I. M., Pressnitzer D., & Meddis R. (2000) Physiological correlates of comodulation masking release in the ventral cochlear nucleus. 23rd MidWinter meeting of the Association for Research in Otolaryngology,St Petersburg,USA. Pressnitzer, D., Patterson, R. D., & Krumbholz, K. (1999). The lower limit of melodic pitch with filtered harmonic complexes. Joint Meeting 137th ASA, 2nd EAA : Forum Acusticum 99, 25th DAGA, Berlin, Allemagne. Pressnitzer D., McAdams S. (1997). Influence de la phase sur la perception de rugosité sons complexes. 4ème Congrès Français d’Acoustique, Marseille. Pressnitzer D., McAdams S. (1997). Influence of phase effects on roughness modelling. International Computer Music Conference, Thessaloniki, Grèce. Pressnitzer D., McAdams S., Winsberg S., Fineberg J. (1996). Roughness and tension of orchestral timbres. 4th International Conference on Music Perception and Cognition. Montreal, Canada.

Lab seminars, Workshops 11/2008 Co-organiser of the international workshop “Hierarchies in Hearing”. [16 communications, programme at http://audition.ens.fr/ws2/ ] 10/2008 Scientific days Collège de France – Ecole Normale Supérieure, Paris, France. 06/2008 Co-organiseur of the session “Integrated approaches to auditory scene analysis”. Acoustics'08, EAA & 156th Acoustical Society of America meeting & SFA. [18 talks, 8 posters, > 300 participants]. 06/2008 Organiser of the international Workshop “Perceptual Bistability in Audition and Vision”, ENS, Paris. [14 talks, programme available online at http://audition.ens.fr/ws2/news/bistable_ws.html ]. 05/2007 Département Parole et Cognition, GIPSA lab, Grenoble, France. 04/2007 NTT Human and Information Science Laboratory, Atsugi, Japon. 02/2007 Institute of Hearing Research, Nottingham, UK. 01/2007 UPR CNRS 640, Laboratoire de Neurosciences Cognitives & d’Imagerie Cérébrale, Paris, France. 11/2006 UMR CNRS 5020, Neurosciences & Systèmes sensoriels & unité INSERM 280, Lyon, France. 05/2006 Co-organiser of the international Workshop “New Ideas in Hearing”, ENS Paris, France [14 talks, programme at http://audition.ens.fr/ws/ ]. 2006 Organiser of the « Séminaires Audition », 17 talks by invited professors to the Paris lab. [Programme at http://audition.ens.fr/news/seminaires.html ]. 06/2005 Journée d’Étude du GSAM/SFA, Description automatique et perception de la musique. Paris, France. 06/2005 LMA, UPR CNRS 7051, Marseille, France. 01/2005 Graduiertenkolleg Psychoakustik, Oldenburg, Allemagne. 08/2003 Co-organiser of the XIIIth International Symposium on Hearing [70 talks, proceedings and book]. 08/2002 Workshop on Pitch : Neural coding and perception, Delmenhorst, Allemagne. 03/2001 Organiser of the Journées Magnétoencéphalographie et Audition, Paris. 11/1999 Graduiertenkolleg Psychoakustik, Université de Oldenburg, Allemagne 06/1999 Hörobjekte, Zoologisches Institut, Université de Munich, Allemagne.

Software Pressnitzer, D. (2005-) CI-Music. Assessment software for cochlear implant users. Designed and implemented as part of a consulting contract with Advanced Bionics. The software has been used by several French and European centers. Used for two multi-centre studies.

Scientific diffusion, Media Pressnitzer, D. (2008). L'organisation des sons, de l'illusion à la perception. Pour la Science, 373, 116-123. [French edition of Scientific American, special issue on Auditory perception and Music]. Pressnitzer, D. (2006). La perception auditive. Entendre et comprendre. Découvertes, revue du Palais de la découverte, 341, 33-40. Palais de la découverte (2006-2007). Scientific advisor for the exhibit “Illusions”, at the Palais de la Découverte, Paris (French museum of science). Design of demonstrations related to auditory and visual bistability. Gruhier, F (2006). Les mirages de l'oreille. Le Nouvel Observateur (2177), 53. [Interview] Mangin, L. (2006). L'oreille piégée. Pour la science (347), 21. [Interview]. Charvet, P. (2003) Simple comme musique. TV show (France 5) & DVD. [Interview]. Fletcher K., Smith B. K., & Pressnitzer D. (2000). Perception, cerveau, musique. L’oeil électrique, 15 : 18-23. [Interview] Pressnitzer, D. (1997). Sons rugueux, sons tendus. Pour la science, 240, 34. [Interview].