Pitch Perception As Probabilistic Inference

Pitch Perception as Probabilistic Inference Phillipp Hehrmann Gatsby Computational Neuroscience Unit University College London London, United Kingdom THESIS Submitted for the degree of Doctor of Philosophy, University of London 2 Declaration I, Phillipp Hehrmann, confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that this has been indicated in the thesis. Hannover, 2nd December 2011 3 Abstract Pitch is a fundamental and salient perceptual attribute of many behaviourally im- portant sounds, including animal calls, human speech and music. Human listeners perceive pitch without conscious effort or attention. These and similar observations have prompted a search for mappings from acoustic stimulus to percept that can be easily computed from peripheral neural responses at early stages of the central auditory pathway. This tenet however is not supported by physiological evidence: how the percept of pitch is encoded in neural firing patterns across the brain, and where { if at all { such a representation may be localised remain as yet unsolved questions. Here, instead of seeking an explanation guided by putative mechanisms, we take a more abstract stance in developing a model by asking, what computational goal the auditory system is set up to achieve during pitch perception. Many natural pitch-evoking sounds are approximately periodic within short observation time windows. We posit that pitch reflects a near-optimal estimate of the underlying periodicity of sounds from noisy evoked responses in the auditory nerve, exploiting statistical knowledge about the regularities and irregularities occurring during sound generation and transduction. We compute (or approximate) the statistically optimal estimate using a Bayesian probabilistic framework. Model predictions match the pitch reported by human listeners for a wide range of well- documented, pitch-evoking stimuli, both periodic and aperiodic. We then present new psychophysical data on octave biases and pitch-timbre interactions in human perception which further demonstrates the validity of our approach, while posing difficulties for alternative models based on autocorrelation analysis or simple spectral pattern matching. Our model embodies the concept of perception as unconscious inference, originally proposed by von Helmholtz as an interface bridging optics and vision. Our results support the view that even apparently primitive acoustic percepts may derive from subtle statistical inference, suggesting that such inferential processes operate at all levels across our sensory systems. 4 Acknowledgements Above all, my thanks and gratitude go to my supervisor Maneesh Sahani, whose insightful guidance, impeccable intuitions and never-resting analytic mind have saved my academic endeavours from stagnation time and time again. I thank Peter Dayan for shaping the Gatsby Unit into the exceptionally stimulating environment that it is, and Peter Latham for re-hiring me so many times. My studies and travels were generously supported by the Gatsby Charitable Foundation. I have made many acquaintances with fellow students, postdocs and visitors, and I am deeply grateful to all of them: to David Barrett and Gabi Teodoru for their dedication to keeping the artistic spirit alive and hosting so many wonderful, memorable evenings filled with musical joy and pizza; to Ross Williamson above all for being Ross, but also for his helpful comments on parts of this thesis; to Vinayak and Charles for generously sharing their machine-learning expertise in hours of need; to Andriy for bringing art- house cinema to the seminar room; to Biljana for her caring advice and support; to Ulrik, Amy, Adam, Jakob, David, Loic and Ritwik for unforgettable travel memories; and to all the others, which together have made spending almost five years at the Gatsby Unit so very valuable and enjoyable. I thank David McAlpine for sharing his inspiring passion for science with me, as well as Jennifer Linden, Björn,Lucy, Maria, Jannis and other members of the Ear Institute with whom I have had the privilege to interact. I have had many insightful discussions with Rich Turner that influenced the work presented in this thesis; similarly with Alain de Cheveigné,who I owe special thanks alongside Mark Plumbley for kindly agreeing to act as my examiners. Outside the walls of academia, Georgia has been a truly wonderful friend and source of moral support. I also want to thank Race and Gerri for their generosity and kindness. My parents, Kathi and Rainer, have always given me nothing but support and en- couragement, and I cannot thank them enough for granting me so many exceptional opportunities in life. It has been an absolute delight to watch the young families of my brothers Johannes and Matthias grow, albeit from a distance, and I look forward to being able to spend more time with them in the near future. Contents Front matter Abstract.......................................3 Acknowledgements.................................4 Contents.......................................5 List of figures....................................9 List of tables.................................... 11 List of algorithms.................................. 11 1 Introduction 12 1.1 Motivation, methodology and aims..................... 12 1.1.1 The relevance of pitch........................ 12 1.1.2 The puzzle about pitch....................... 13 1.1.3 Modelling methodologies....................... 14 1.1.4 Pitch as inference........................... 15 1.2 Thesis overview................................ 16 2 Background 18 2.1 Fundamentals of pitch perception...................... 18 2.1.1 The pitch of periodic sounds.................... 25 2.1.1.1 Pure tones......................... 25 2.1.1.2 Harmonic complex tones................. 27 2.1.2 Non-periodic sounds......................... 30 2.1.3 Binaural pitch............................ 35 2.2 The peripheral auditory system....................... 36 2.2.1 The external ear........................... 36 2.2.2 The middle ear............................ 37 2.2.3 The inner ear............................. 38 CONTENTS 6 2.2.3.1 Anatomy.......................... 38 2.2.3.2 Function.......................... 41 2.2.3.3 Non-linearities in the basilar membrane......... 44 2.3 Processing of pitch in the central auditory system............. 52 2.3.1 Brainstem and midbrain....................... 53 2.3.2 Cortex................................. 57 2.4 Theories and models of pitch perception.................. 63 2.4.1 Examples of spectral pattern-matching models.......... 67 2.4.1.1 Wightman's pattern transformation model....... 70 2.4.1.2 Terhardt's theory of virtual pitch............ 71 2.4.1.3 Goldstein's optimum processor model.......... 73 2.4.2 Temporal models and summary autocorrelation.......... 75 3 Generative model 83 3.1 Sound generation............................... 85 3.1.1 Uncoupled model........................... 88 3.1.2 Coupled model............................ 92 3.2 Transduction................................. 94 3.2.1 Basilar membrane response..................... 95 3.2.2 Auditory nerve response....................... 98 3.3 Inference.................................... 100 3.3.1 Laplace approximation........................ 104 3.3.2 Hamiltonian Annealed Importance Sampling........... 105 3.4 Summary................................... 110 4 Basic model evaluation 111 4.1 Pure tones................................... 114 4.2 Harmonic complex tones........................... 115 4.3 Iterated rippled noise............................. 118 4.4 Pitch shift of amplitude-modulated tones................. 122 4.5 Amplitude-modulated noise......................... 122 4.6 Transposed complex tones.......................... 125 4.7 Discussion................................... 129 5 Octave biases and timbral effects in the perception of non-uniform CONTENTS 7 periodic pulse trains 131 5.1 Motivation.................................. 131 5.1.1 Timbre, brightness and spectral centroid.............. 132 5.1.2 Relationship between fundamental frequency and spectral centroid in natural sounds........................ 133 5.1.3 Psychophysical effects of timbre on pitch.............. 134 5.2 Incorporating f0-dependent timbral characteristics into the Bayesian model138 5.2.1 General parametric form....................... 139 5.2.2 Fitting the timbral f0-dependence to natural pitched sounds.. 141 5.2.3 Qualitative predictions........................ 143 5.3 Octave biases in the perception of non-uniform periodic pulse trains.. 145 5.3.1 Experimental methods........................ 146 5.3.1.1 Participants........................ 146 5.3.1.2 Stimuli........................... 146 5.3.1.3 Task............................. 148 5.3.2 Psychophysical results........................ 150 5.3.2.1 Timbral effects on the pitch of alternating click trains and harmonic complex tones............... 151 5.4 Pitch-timbre interactions in models of pitch................ 154 5.4.1 Bayesian model: coupled and uncoupled.............. 154 5.4.2 Pattern matching: Terhardt..................... 156 5.4.3 Pattern transformation: Wightman................. 158 5.4.4 Summary autocorrelation...................... 160 5.5 Harmonic complex tones revisited: the strength of missing-f0 pitch... 165 6 Conclusions 169 6.1 Summary................................... 169 6.2 Outlook.................................... 171 A Gradient and Hessian of ln P(A; x Ω) 174 j A.1 Gradient x

Load more