Rapport D'activité 2009
Total Page:16
File Type:pdf, Size:1020Kb
Mid-Level Audition Habilitation à Diriger des Recherches présentée et soutenue publiquement le 21 décembre 2009 par Daniel Pressnitzer Devant le jury composé de : Bertrand Dubus Christian Lorenzi Brian C.J. Moore (Rapporteur) Israel Nelken (Rapporteur) Roy D. Patterson Shihab A. Shamma (Rapporteur) Equipe Audition: Psychophysique, Modélisation, Neurosciences (APMN) Laboratoire de Psychologie de la Perception UMR 8158 CNRS – Université Paris Descartes & Département d’Etudes Cognitives, Ecole Normale Supérieure 29 rue d’Ulm, 75005 Paris Tel: 01 44 32 26 73 Email: [email protected] Summary Hearing transforms the incredibly complex superposition of acoustic sound- waves that reaches our ears into meaningful auditory scenes, inhabited by different talkers or musical melodies, for instance. During the last ten years, my research has attempted to link the properties of sound scenes (acoustical or within the peripheral auditory system) to the behavioral performance of listeners when confronted with various auditory tasks. The level of analysis can be described as mid-level, the processes that sit between an acoustical description of sound and the use of auditory information to guide behavior. Starting with auditory features, my contributions have focused on the extraction of temporal structure within sound, over different time scales. In particular, pitch perception has been studied by combining psychophysics, physiology, and modeling, and by comparing normal-hearing and hearing-impaired listeners. Then, the temporal dynamics of perceptual organization over yet longer time scales has been explored, by introducing a “bistability” paradigm, where an unchanging ambiguous stimulus produces spontaneous alternations between different percepts in the mind of the listener. This line of research again combined psychophysics and physiology, and it revealed that correlates of perceptual organization may be found very early on in the auditory pathways. Finally, we have investigated memory and context effects on perception. We have shown that listeners were remarkably able to create and memorize features from random signals, and that basic features such as pitch or spatial location can be largely influenced by preceding context. Taken together, these projects suggest that mid-level processes may be distributed throughout several levels of the auditory pathways. They may also interact closely in order to deal with natural auditory scenes. i Table of Contents Chapter 1: Introduction .................................................................................... 1 Chapter 2: Features from the temporal structure of sound .............................. 3 2.1 Introduction ......................................................................................... 3 2.2 The perception of pitch ....................................................................... 5 2.3 The perception of envelope regularity ................................................. 9 2.4 Summary and conclusion .................................................................. 10 Chapter 3: Features from sound sequences .................................................. 11 3.1 Introduction ....................................................................................... 11 3.2 The ups and downs of pitch sequences ............................................ 11 3.3 Envelope constancy .......................................................................... 14 3.4 Auditory change detection vs. visual change blindness .................... 14 3.5 Summary and conclusion .................................................................. 15 Chapter 4: The temporal dynamics of auditory scene analysis ...................... 17 4.1 Introduction ....................................................................................... 17 4.2 Change of scenes: Auditory bistability .............................................. 17 4.3 Subcortical correlates of auditory scene analysis ............................. 20 4.4 Summary and conclusion .................................................................. 24 Chapter 5: Memory and context effects ......................................................... 25 5.1 Introduction ....................................................................................... 25 5.2 Rapid formation of robust auditory memories ................................... 25 5.3 Effect of preceding context on auditory features ............................... 28 5.4 Summary and conclusions ................................................................ 30 Chapter 6: A distributed proposal for scene analysis ..................................... 31 6.1 Introduction ....................................................................................... 31 6.2 The neural correlates of auditory scene analysis .............................. 31 6.3 A comparison of different functional cartoons ................................... 33 6.4 Summary and conclusions ................................................................ 36 Chapter 7: Research Project .......................................................................... 37 7.1 Overview ........................................................................................... 37 7.2 Features for sound recognition ......................................................... 37 7.3 Bistability as perceptual decision ...................................................... 38 7.4 Mechanisms for memory of noise and pitch hysteresis .................... 40 Chapter 8: Conclusion ................................................................................... 41 References .................................................................................................... 43 ii Chapter 1 Introduction Sound is a one-dimensional phenomenon. The acoustic pressure wave that impinges on one of our eardrums can only do one of two things: it can push it a little bit, or it can pull it a little bit. This is all it can do. Moreover, the information carried by sound is totally “transparent”: as it propagates through the air, it sums linearly at each point. As a consequence, at any one moment in time, the little push or the little pull effected on the eardrum may be caused by one sound source out there in the world, but it may also be caused by two sound sources, or by many sound sources indeed. These are trivial observations, but they are worth remembering when we consider how different they feel from our inner auditory world, which, we know, clearly must have many, many dimensions. A typical auditory scene may be inhabited by, for instance, different talkers, and what one of them says can effortlessly be understood even though there is music in the background. Alternately, we can on a sudden whim ignore the talker and switch our focus to the music, to try to remember who wrote this particular piece. How do we go from the one-dimensional acoustic waveform to such lively auditory scenes? Understanding this has been (and probably will be for the foreseeable future) one of the main goals of the scientific study of hearing. Each author has, at one point or another, tried to convey the intricacy of the problem by a personal metaphor. Helmholtz (e.g. 1877) evokes the interior of a 19th century ball-room, complete with “a number of musical instruments in action, speaking men and women, rustling garments, gliding feet, clinking glasses, and so on”. He goes on to describe the resulting soundfield as a “tumbled entanglement of the most different kinds of motion, complicated beyond conception”. Closer to home, but to similar effect, Shamma (2008) goes to a “crowded reverberant nightclub, with a hubbub of multiple conversations amidst blaring music”. My own pet example, which, unfortunately, tends to draw more and more puzzled looks when I use it in class, is that of listening to a jazz tune played by a famous trumpet player. In no time at all, or so it felt, I used to be able to tell whether it was Miles Davis or Chet Baker. How did I do this? It seems I must first have been able to extract a wealth of “features” from the one- dimensional sound waves reaching my ears. These features may be described as pitch, loudness, timbre, tempo, and so on. But then, I must also have been able to parse all the incoming flow of features into what we may call “streams”: this pitch goes with the piano line, and not with the trumpet. And finally, all of this ongoing processing must somehow have contacted my long-term memory to produce recognition. That it sometimes works is, frankly, quite baffling. 1 The rest of this thesis describes past research projects that investigated some of these different levels of processing. The second chapter concerns the extraction of auditory features from sound, and more precisely of features that are related to the temporal structure of sound. The third chapter summarizes a line of work on change- detection mechanisms, or, equivalently, mechanisms that produce second-order features when presented with a temporal sequence of sounds. The fourth chapter addresses the auditory scene analysis issue proper, by summarizing behavioral and electrophysiological work that looked at the temporal dynamics of scene analysis. The fifth chapter then describes recent studies concerned with the effects of context and memory on perception. The sixth chapter is probably be the most controversial one. It puts forward the view that sequencing the problems solved by hearing in terms of feature extraction, followed by scene analysis,