Lecture 4: Auditory Perception

EE E6820: Speech & Audio Processing & Recognition Lecture 4: Auditory Perception Mike Mandel <[email protected]> Dan Ellis <[email protected]> Columbia University Dept. of Electrical Engineering http://www.ee.columbia.edu/∼dpwe/e6820 February 10, 2009 1 Motivation: Why & how 2 Auditory physiology 3 Psychophysics: Detection & discrimination 4 Pitch perception 5 Speech perception 6 Auditory organization & Scene analysis E6820 (Ellis & Mandel) L4: Auditory Perception February 10, 2009 1 / 44 Outline 1 Motivation: Why & how 2 Auditory physiology 3 Psychophysics: Detection & discrimination 4 Pitch perception 5 Speech perception 6 Auditory organization & Scene analysis E6820 (Ellis & Mandel) L4: Auditory Perception February 10, 2009 2 / 44 Why study perception? Perception is messy: can we avoid it? No! Audition provides the `ground truth' in audio I what is relevant and irrelevant I subjective importance of distortion (coding etc.) I (there could be other information in sound. ) Some sounds are `designed' for audition I co-evolution of speech and hearing The auditory system is very successful I we would do extremely well to duplicate it We are now able to model complex systems I faster computers, bigger memories E6820 (Ellis & Mandel) L4: Auditory Perception February 10, 2009 3 / 44 How to study perception? Three different approaches: Analyze the example: physiology I dissection & nerve recordings Black box input/output: psychophysics I fit simple models of simple functions Information processing models I investigate and model complex functions e.g. scene analysis, speech perception E6820 (Ellis & Mandel) L4: Auditory Perception February 10, 2009 4 / 44 Outline 1 Motivation: Why & how 2 Auditory physiology 3 Psychophysics: Detection & discrimination 4 Pitch perception 5 Speech perception 6 Auditory organization & Scene analysis E6820 (Ellis & Mandel) L4: Auditory Perception February 10, 2009 5 / 44 Physiology Processing chain from air to brain: Middle ear Auditory nerve Cortex Outer Midbrain ear Inner ear (cochlea) Study via: I anatomy I nerve recordings Signals flow in both directions E6820 (Ellis & Mandel) L4: Auditory Perception February 10, 2009 6 / 44 Outer & middle ear Ear canal Middle ear Pinna bones Cochlea (inner ear) Eardrum (tympanum) Pinna `horn' I complex reflections give spatial (elevation) cues Ear canal I acoustic tube Middle ear I bones provide impedance matching E6820 (Ellis & Mandel) L4: Auditory Perception February 10, 2009 7 / 44 Inner ear: Cochlea Oval window Basilar Membrane (from ME bones) (BM) Travelling wave Cochlea 16 kHz Resonant frequency 50 Hz 0Position 35mm Mechanical input from middle ear starts traveling wave moving down Basilar membrane Varying stiffness and mass of BM results in continuous variation of resonant frequency At resonance, traveling wave energy is dissipated inBM vibration I Frequency (Fourier) analysis E6820 (Ellis & Mandel) L4: Auditory Perception February 10, 2009 8 / 44 Cochlea hair cells Ear converts sound to BM motion I each point on BM corresponds to a frequency Cochlea Tectorial membrane Basilar membrane Auditory nerve Inner Hair Cell (IHC) Outer Hair Cell (OHC) Hair cells on BM convert motion into nerve impulses (firings) Inner Hair Cells detect motion Outer Hair Cells? Variable damping? E6820 (Ellis & Mandel) L4: Auditory Perception February 10, 2009 9 / 44 Inner Hair Cells IHCs convert BM vibration into nerve firings Human ear has ∼3500 IHCs I each IHC has ∼7 connections to Auditory Nerve Each nerve fires (sometimes) near peak displacement Local BM displacement 50 time / ms Typical nerve signal (mV) Histogram to get firing probability Firing count Cycle angle E6820 (Ellis & Mandel) L4: Auditory Perception February 10, 2009 10 / 44 Auditory nerve (AN) signals Single nerve measurements Tone burst histogram Frequency threshold Spike dB SPL count 80 100 60 40 Time 20 100 ms 100 Hz 1 kHz 10 kHz Tone burst (log) frequency Rate vs intensity One fiber: ~ 25 dB dynamic range 300 200 Spikes/sec 100 Intensity / dB SPL 0 0 20 40 60 80 100 Hearing dynamic range > 100 dB Hard to measure: probe living ANs? E6820 (Ellis & Mandel) L4: Auditory Perception February 10, 2009 11 / 44 AN population response All the information the brain has about sound average rate & spike timings on 30,000 fibers Not unlike a (constant-Q) spectrogram PatSla rectsmoo on bbctmp2 (2001-02-18) 5 4 3 2 1 freq / 8ve re 100 Hz 0 0 10 20 30 40 50 60 time / ms E6820 (Ellis & Mandel) L4: Auditory Perception February 10, 2009 12 / 44 Beyond the auditory nerve Ascending and descending Tonotopic × ? I modulation, position, source?? E6820 (Ellis & Mandel) L4: Auditory Perception February 10, 2009 13 / 44 Periphery models IHC IHC Outer/middle Cochlea Sound ear filterbank filtering SlaneyPatterson 12 chans/oct from 180 Hz, BBC1tmp (20010218) 60 Modeled aspects 50 40 I outer / middle ear 30 I hair cell channel 20 transduction 10 0 0.1 0.2 0.3 0.4 0.5 I cochlea filtering time / s I efferent feedback? Results: `neurogram' / `cochleagram' E6820 (Ellis & Mandel) L4: Auditory Perception February 10, 2009 14 / 44 Outline 1 Motivation: Why & how 2 Auditory physiology 3 Psychophysics: Detection & discrimination 4 Pitch perception 5 Speech perception 6 Auditory organization & Scene analysis E6820 (Ellis & Mandel) L4: Auditory Perception February 10, 2009 15 / 44 Psychophysics Physiology looks at the implementation Psychology looks at the function/behavior Analyze audition as signal detection: p(θ j x) I psychological tests reflect internal decisions I assume optimal decision process I infer nature of internal representations, noise, . ! lower bounds on more complex functions Different aspects to measure I time, frequency, intensity I tones, complexes, noise I binaural I pitch, detuning E6820 (Ellis & Mandel) L4: Auditory Perception February 10, 2009 16 / 44 Basic psychophysics Relate physical and perceptual variables e.g. intensity ! loudness frequency ! pitch Methodology: subject tests I just noticeable difference (JND) I magnitude scaling e.g. \adjust to twice as loud" Results for Intensity vs Loudness: Weber's law∆ I / I ) log(L) = k log(I ) Hartmann(1993) Classroom loudness scaling data 2.6 log2(L) = 0:3 log2(I ) Textbook figure: 2.4 α 0.3 log I L I = 0:3 10 2.2 log10 2 2.0 Power law fit: 0:3 dB α 0.22 = 1.8 L I log 2 10 Log(loudness rating) 10 1.6 = dB=10 1.4 -20 -10 0 10 Sound level / dB E6820 (Ellis & Mandel) L4: Auditory Perception February 10, 2009 17 / 44 Loudness as a function of frequency Fletcher-Munsen equal-loudness curves 120 100 80 60 40 Intensity / dB SPL 20 0 1001000 10,000 freq / Hz 100 100 100 Hz 1 kHz 80 80 60 60 40 rapid 40 @ 1kHz loudness @ 1kHz 20 growth 20 Equivalent loudness 0 Equivalent loudness 0 0 2040 6080 Intensity / dB 0 2040 60 80 Intensity / dB E6820 (Ellis & Mandel) L4: Auditory Perception February 10, 2009 18 / 44 Loudness as a function of bandwidth Same total energy, different distribution e.g. 2 channels at −6 dB (not −10 dB) freq time Same I0 total mag mag energy I1 I·B freq freq B0 B1 ... but wider perceived as louder Loudness ‘Critical’ Bandwidth B bandwidth Critical bands: independent frequency channels I ∼25 total (4-6 / octave) E6820 (Ellis & Mandel) L4: Auditory Perception February 10, 2009 19 / 44 Simultaneous masking A louder tone can `mask' the perception of a second tone nearby in frequency: masking absolute tone threshold masked threshold Intensity / dB log freq Suggests an ìnternal noise' model: p(x | I) p(x | I+∆I) p(x | I) internal noise σn decision variable I x E6820 (Ellis & Mandel) L4: Auditory Perception February 10, 2009 20 / 44 Sequential masking Backward/forward in time: masker envelope simultaneous masking ~10 dB masked threshold Intensity / dB time backward masking forward masking ~5 ms ~100 ms ! Time-frequency masking `skirt': Masking tone freq Masked threshold intensity time E6820 (Ellis & Mandel) L4: Auditory Perception February 10, 2009 21 / 44 What we do and don't hear “two-interval forced-choice”: A B X X = A or B? time Timing: 2 ms attack resolution, 20 ms discrimination I but: spectral splatter Tuning: ∼1% discrimination I but: beats Spectrum: profile changes, formants I variables time-frequency resolution Harmonic phase? Noisy signals & texture (Trace vs categorical memory) E6820 (Ellis & Mandel) L4: Auditory Perception February 10, 2009 22 / 44 Outline 1 Motivation: Why & how 2 Auditory physiology 3 Psychophysics: Detection & discrimination 4 Pitch perception 5 Speech perception 6 Auditory organization & Scene analysis E6820 (Ellis & Mandel) L4: Auditory Perception February 10, 2009 23 / 44 Pitch perception: a classic argument in psychophysics Harmonic complexes are a pattern on AN 70 60 50 40 freq. chan. 30 20 10 0 0.05 0.1 time/s I but give a fused percept (ecological) What determines the pitch percept? I not the fundamental How is it computed? Two competing models: place and time E6820 (Ellis & Mandel) L4: Auditory Perception February 10, 2009 24 / 44 Place model of pitch AN excitation pattern shows individual peaks `Pattern matching' method to find pitch broader HF channels cannot resolve harmonics AN excitation frequency channel resolved harmonics Correlate with harmonic ‘sieve’: Pitch strength frequency channel Support: Low harmonics are very important But: Flat-spectrum noise can carry pitch E6820 (Ellis & Mandel) L4: Auditory Perception February 10, 2009 25 / 44 Time model of pitch Timing information is preserved in AN down to ∼1 ms scale Extract periodicity by e.g. autocorrelation and combine across frequency channels per-channel freq autocorrelation autocorrelation time Summary autocorrelation 0 10 20 30 common period lag / ms (pitch) But: HF gives weak pitch (in practice) E6820 (Ellis & Mandel) L4: Auditory Perception February 10, 2009 26 / 44 Alternate & competing cues Pitch perception could rely on various cues I average excitation pattern I summary autocorrelation I more complex pattern matching Relying on just one cue is brittle I e.g.

Load more