Psycho-Acoustics
Richard Heusdens
April 28, 2020
1
Signal and Information Processing Lab, Dept. of Mediamatics Outline
• Introduction
• Anatomy and physiology of the auditory system
• Physical versus subjective scales
• Spatial perception
• Masking (spectral and temporal)
• Perceptual audio coding
• ISO MPEG perceptual model
April 28, 2020 2 Psychoacoustics = The scientific study of the perception of sound.
April 28, 2020 3 Digital Signal Processing
Psycho-acoustics !
1000101011 1000000000 1111010101 101101 1100000000 0111011101 0100000000
Audio Audio Encoder Decoder Storage or Microphone + AD Transmission DA + Loudspeaker
April 28, 2020 4 What you see is not what you hear
!
Time ®
April 28, 2020 5 Anatomy and Physiology of the Human Auditory System
Literature:
James O. Pickles, “An Introduction to the Physiology of Hearing”, Academic Press, London, 1982
April 28, 2020
6 The Peripheral Hearing System
April 28, 2020 7 Outer Ear
Acoustic energy is conducted to the eardrum • Conversion of acoustic energy into mechanical energy • Resonance in the range of 2 – 5 kHz (± 10 dB) • Acoustic filtering is in part dependent on source direction
April 28, 2020 8 Middle Ear
Impedance adjustment for effective energy transmission: • Eardrum: Large volume displacement, little pressure • Oval window: Little volume displacement, large pressure • Band-pass filter (200 Hz - 8 kHz)
April 28, 2020 9 Inner Ear
Cochlea: • Mechanical energy (oval window) is converted into a neural signal (auditory nerve) • Performs a time-frequency analysis
April 28, 2020 10 Cochlea
1. cochlear duct 2. scala vestibuli 3. scala tympani 4. spiral ganglion 5. auditory nerve fibres
2 mm
• The red arrow is from the oval window • The blue arrow points to the round window • The cochlea is about 2 mm in diameter
April 28, 2020 11 Basilar Membrane
April 28, 2020 12 Basilar Membrane
Frequency-to-place transformation:
April 28, 2020 13 Cochlea
1. Cochlear duct 2. Scala vestibuli 3. Scala tympani 4. Reissner's membrane 5. Basilar membrane 6. Tectorial membrane 7. Stria vascularis 8. Nerve fibres 9. Bony spiral lamina
The organ of Corti is on top of the basilar membrane under the tectorial membrane
April 28, 2020 14 Organ of Corti
• Tectorial membrane (TM) • Deiters’ cells (DC) • Basilar membrane (BM) • Reticular lamina (RL) • Inner haicells (IHC) • Pillar cells (PC) • Outer haircells (OHC) • Stereocilia (St)
April 28, 2020 15 Auditory nerve
• Auditory nerve fiber responses are spiky
• Auditory nerve responses can follow the phase at low frequencies
• Auditory nerve responses need to recover at high frequencies
• Beyond about 2 kHz phase locking response is lost
March 18, 2010 16 Auditory Transduction
April 28, 2020 17 Dependence of Subjective Parameters on Physical Parameters
Literature:
Brian C.J. Moore, “An Introduction to the Psychology of Hearing”, Fourth edition, Academic Press, London, 1997
April 28, 2020
18 Physical vs. Subjective Scales
Physical Subjective
Fundamental frequency (Hz) « Pitch Sound pressure (Pa, dB SPL) « Loudness
April 28, 2020 19 Fundamental Frequency and Pitch
• Linear frequency increase over time
• Exponential frequency increase over time • Mapping of frequency to pitch is approximately a logarithmic transformation • Empirical finding from several experiments measuring frequency just noticable differences (JNDs):
log Df = 0.0264 f - 0.52
f f +1 JND f +2 JND f +3 JND
f = 1 kHz → Df = 2 Hz f = 6 kHz → Df = 33 Hz
April 28, 2020 20 Sound Pressure and Loudness
• Linear sound pressure increase over time • Expansive sound pressure increase over time
® Mapping of sound pressure to loudness resembles a compressive transformation
Stevens Law (sones): S = kI 0.3 ® Thus each 10 dB increase in intensity leads to a doubling in loudness (= doubling in sones)
April 28, 2020 21 Sone - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/Sone
Sone - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/Sone sound pressure Source of sound sound pressure loudness level sound pressure Source of soundpascal sounddB re pressure 20 µPa sone loudness level threshold of pain 100 134 ~ 676 pascal dB re 20 µPa sone hearing damage during short-term threshold of pain 20 approx.100 120 ~ 256 134 ~ 676 effect hearing damage during short-term 20 ~approx. 128 ... 120 ~ 256 jet, 100 m away effect 6 ... 200 110 ... 140 1024 ~ 128 ... jack hammer, 1 m awayjet, 100 / nightclub m away 2 approx.6 ... 200 100 110~ 64 ... 140 1024 hearing damage during long-term effect 6×10−1 approx. 90 ~ 32 jack hammer, 1 m away / nightclub 2 approx. 100 ~ 64 2×10−1 ... major road, 10 m awayhearing damage during long-term effect 6×8010 −...1 90 ~ 16approx. ... 32 90 ~ 32 6×10−1 2×10−1 ... major road, 10 m away 2×10−2 ... 80 ... 90 ~ 16 ... 32 passenger car, 10 m away 6×6010 −...1 80 ~ 4 ... 16 2×10−1 2×10−2 ... TV set at home level,passenger 1 m away car, 10 m away 2×10−2 ca. 60 60~ 4 ... 80 ~ 4 ... 16 2×10−1 2×10−3 ... normal talking, 1 m TVaway set at home level, 1 m away 2×4010 −...2 60 ~ 1 ... ca.4 60 ~ 4 2×10−2 2×10−3 ... normal talking, 1 m away 2×10−4 ... 40 ... 60 ~ 1 ... 4 very calm room 2×2010 −...2 30 ~ 0.15 ... 0.4 6×10−4 2×10−4 ... leaves' noise, calm breathingvery calm room 6×10−5 10 ~ 0.0220 ... 30 ~ 0.15 ... 0.4 6×10−4 auditory threshold at 1 kHz 2×10−5 0 0 leaves' noise, calm breathing 6×10−5 10 ~ 0.02
−5 April 28, 2020 sone 1 2 4 auditory 8 16 threshold32 64 128at 1 kHz256 512 1024 2×10 0 022
phon 40 50 60 70 80 90 100 110 120 130 140 sone 1 2 4 8 16 32 64 128 256 512 1024 Formulae phon 40 50 60 70 80 90 100 110 120 130 140
According to Stevens'Formulae definition,[1] a loudness of 1 sone is equivalent to the loudness of a signal at 40 phons, the loudness level of a 1 kHz tone at 40 dB SPL. But phons scale with level in dB, not with loudness, so the soneAccording and phon to scales Stevens' are notdefinition, proportional.[1] a loudness Rather, ofthe 1 loudnesssone is equivalent in sones is, to atthe least loudness very of a signal at 40 nearly, a power law phonsfunction, the of loudness the signal level intensity, of a 1 kHzwith tonean exponent at 40 dB of SPL 0.3.. But[2][3 ]phons With scalethis exponent, with level in dB, not with each 10 phon increaseloudness, (or 10 dBso theat 1 sone kHz) and produces phon scales almost are exactly not proportional. a doubling ofRather, the loudness the loudness in in sones is, at least very sones.[4] nearly, a power law function of the signal intensity, with an exponent of 0.3.[2][3] With this exponent, each 10 phon increase (or 10 dB at 1 kHz) produces almost exactly a doubling of the loudness in At frequencies othersones. than [14 ]kHz, the loudness level in phons is calibrated according to the frequency response of human hearing, via a set of equal-loudness contours, and then the loudness level in phons is mapped to loudnessAt in frequenciessones via the other same than power 1 kHz, law. the loudness level in phons is calibrated according to the frequency response of human hearing, via a set of equal-loudness contours, and then the loudness level in phons is mapped to loudness in sones via the same power law. 2 of 4 27/04/15 15:36
2 of 4 27/04/15 15:36 Equal Loudness Contours
Alternative scale for loudness is the phon:
• The loudness in phon of a specific tone is defined as the level in dB SPL of a 1 kHz reference tone that sounds equaly loud as the specific tone.
• MAF = minimum audible field (absolute threshold of hearing)
April 28, 2020 23 Sound Pressure and Loudness
April 28, 2020 24 Weber’s Law
• Weber’s Law states that the just noticeable difference in level is a constant percetage of level: DI = constant I • For pure tones an increase of about 0.5 - 1 dB is just noticable
April 28, 2020 25 Spatial perception
April 28, 2020
26 Binaural Hearing
We have a remarkable ability to compare acoustic signals across the two ears. On headphones: • Identical signals lead to a narrow sound image centered in our head (correlation is 1) • An interaural level difference of 1 dB leads to a just noticeable shift in position towards the louder ear • An interaural time difference of 15 µs leads to a just noticeable shift in position towards the leading ear • When two signals are not identical (correlation <1), the sound image increases in width • Changes in correlation are best audibile around values of 1. • A reduction from 1 to 0.98 is audible, a reduction from 0.1 to 0 is not audible. ò L(t)R(t +t )dt Correlation: CLR (t ) = ò L(t)2 dt ò R(t +t )2 dt April 28, 2020 27 Spatial Hearing
Freefield listening: binaural cues help to localize sound source
• Low frequencies: Sound bends around the head (path length difference): Interaural time differences
• High frequencies: Sound is obstructed by the head: Interaural level differences
• Interaural time differences are not perceived above 2 kHz. • Room reflections lead to reduction of coherence
April 28, 2020 28 Spectral and Temporal Masking
Literature:
Brian C.J. Moore, “An Introduction to the Psychology of Hearing”, Fourth edition, Academic Press, London, 1997
T. Painter and A. Spanias, “Perceptual Coding of Digital Audio”. Proceedings of the IEEE, Vol. 88, No. 4, pp. 451-513, April 2000
April 28, 2020
29 Masking
The phenomenon where a sound (maskee/test signal) that is perfectly audible in isolation is not audible due to the presence of a masking sound (masker)
Relevance for audio coding: • Quantisation noise (maskee) that is introduced by the coding algorithm is masked by the signal which is coded (masker) • By shaping the spectro-temporal shape of the distortion a very efficient coding of a signal is possible (1-2 bits/sample)
Encoded signal: (= Original + quantisation noise) Quantisation noise:
April 28, 2020 30 Masking, Fletcher (1940)
• Measure threshold of detectability of a tone masked by bandpass noise centred spectrally around the tone
• Measure thresholds as a function of masker bandwidth
test signal SPL
masker masked threshold *
frequency bandwidth
April 28, 2020 31 Masking, Fletcher (1940)
• Measure threshold of detectability of a tone masked by bandpass noise centred spectrally around the tone
• Measure thresholds as a function of masker bandwidth
test signal SPL
masker * masked threshold *
frequency bandwidth
April 28, 2020 32 Masking, Fletcher (1940)
• Measure threshold of detectability of a tone masked by bandpass noise centred spectrally around the tone
• Measure thresholds as a function of masker bandwidth
test signal SPL *
masker * masked threshold *
frequency bandwidth
April 28, 2020 33 Masking, Fletcher (1940)
• Measure threshold of detectability of a tone masked by bandpass noise centred spectrally around the tone
• Measure thresholds as a function of masker bandwidth
test signal SPL * *
masker * masked threshold *
frequency bandwidth
April 28, 2020 34 Masking, Fletcher (1940)
• Measure threshold of detectability of a tone masked by bandpass noise centred spectrally around the tone
• Measure thresholds as a function of masker bandwidth
test signal SPL * * *
masker * masked threshold *
frequency critical bandwidth bandwidth
April 28, 2020 35 Masking, Fletcher (1940)
Assumption: signal is detected when the signal-to-noise ratio at the output of the auditory filter exceeds a certain criterion value
test signal
SPL auditory filter * * *
masker * masked threshold *
frequency critical bandwidth bandwidth
April 28, 2020 36 Demo: Critical Bands
Masking of a single 2000 Hz tone (decreasing in 10 steps of 5 dB) by spectrally flat (white) noise of different bandwidths:
• broadband noise • bandwidth 1000 Hz • bandwidth 250 Hz • bandwidth 10 Hz
April 28, 2020 37 Bark Scale
• A scale that converts frequency (Hz) into units of critical
bandwidth 2 éùæöf zf( )=+ 13arctan(0.00076 f ) 3.5arctanêúç÷ (Bark) ëûêúèø7500 named after Heinrich Barkhausen (who proposed the first subjective measurements of loudness)
• Critical bandwidth: ¶f CBW = ¶z
April 28, 2020 38 Bark Scale and Critical Bandwidth
4 x 10 2.5 2 1.5 1 Bark scale 0.5 Frequency (Hz) Frequency 0 5 10 15 20 25 z (Bark)
3 10 Critical bandwidth CBW (Hz)CBW 2 10
2 3 4 10 10 10 f0(Hz) April 28, 2020 39 Spectral Spread of Masking
SPL (dB) Threshold (dB)
Masker Masker
Test signal
Frequency (bark) Test signal frequency (bark)
Test signal is detected when the test-signal-to-masker ratio at the output of the auditory filter exceeds a certain criterion value k
April 28, 2020 40 Spectral Spread of Masking
SPL (dB) Threshold (dB)
Masker Masker
Auditory filter shape
Frequency (bark) Test signal frequency (bark)
Test signal is detected when the test-signal-to-masker ratio at the output of the auditory filter exceeds a certain criterion value k
April 28, 2020 41 Spectral Spread of Masking
SPL (dB) Threshold (dB)
Masker Masker
Frequency (bark) Test signal frequency (bark)
Test signal is detected when the test-signal-to-masker ratio at the output of the auditory filter exceeds a certain criterion value k
April 28, 2020 42 Spectral Spread of Masking
SPL (dB) Threshold (dB)
Masker Masker
Frequency (bark) Test signal frequency (bark)
Test signal is detected when the test-signal-to-masker ratio at the output of the auditory filter exceeds a certain criterion value k
April 28, 2020 43 Spectral Spread of Masking
SPL (dB) Threshold (dB)
Masker Masker
Frequency (bark) Test signal frequency (bark)
Test signal is detected when the test-signal-to-masker ratio at the output of the auditory filter exceeds a certain criterion value k
April 28, 2020 44 Spectral Spread of Masking
SPL (dB) Threshold (dB)
Masker Masker
Frequency (bark) Test signal frequency (bark)
Test signal is detected when the test-signal-to-masker ratio at the output of the auditory filter exceeds a certain criterion value k
April 28, 2020 45 Spectral Spread of Masking
M+S S M+S 300 Hz 500 Hz 700 Hz
Threshold (dB)
Masker
25 dB/bark -10 dB/bark
500 Hz Test signal frequency (bark)
Test signal is detected when the test-signal-to-masker ratio at the output of the auditory filter exceeds a certain criterion value k
April 28, 2020 46 Tonal versus Noise Maskers
Tone-on-tone masking Noise-on-tone masking
Threshold (dB) Threshold (dB) Masker Masker 4 dB 24 dB
Test signal frequency (bark) Test signal frequency (bark)
Threshold is about 24 dB Threshold is about 4 dB below masker (k = -24 dB) below masker (k = -4 dB)
April 28, 2020 47 Perceptual Audio Coding
April 28, 2020
48 Perceptual Audio Coder
Encoder:
Audio in Bitstream Analysis Lossless Quantization Filter bank Coding
Perceptual Model
Decoder:
Bitstream Audio out Synthesis Decoding Reconstruction Filter bank
April 28, 2020 49 Irrelevancy Removal
Quantize every spectral component such that the quantization error stays below the masking threshold
80
signal spectrum 70 masking threshold
60
50
40
30
20 Sound pressure level (dB SPL)
10
0
-10 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 Frequency (Hz)
April 28, 2020 50 Spreading Function Based Masking Models (ISO MPEG Model)
x Segmentation (15-30 ms) y
Derive power spectrum
Scaling to SPL (dB)
Transform Hz to Bark
Tonality detector
Global masking curve
April 28, 2020 51 Spreading Function Based Masking Models (ISO MPEG Model)
x wt() Segmentation (15-30 ms) y
Derive power spectrum t tstart tend Scaling to SPL (dB)
yt()== wtxt () (), t tstart ... t end Transform Hz to Bark
Tonality detector
Global masking curve
April 28, 2020 52 Spreading Function Based Masking Models (ISO MPEG Model)
x Segmentation (15-30 ms) y 2p 2 1 N -1 jkn kf Derive power spectrum N s psd(fynek )= å ( ) fk = N n=0 N Scaling to SPL (dB)
Transform Hz to Bark
Tonality detector
Global masking curve
April 28, 2020 53 Spreading Function Based Masking Models (ISO MPEG Model)
x Segmentation (15-30 ms) y
Derive power spectrum
æöpsd(fk ) Scaling to SPL (dB) SPL(fk )= 10log10 ç÷ èøpsd0dB Transform Hz to Bark
Tonality detector
Global masking curve
April 28, 2020 54 Spreading Function Based Masking Models (ISO MPEG Model)
x Segmentation (15-30 ms) y
Derive power spectrum
Scaling to SPL (dB)
2 éùæöf Transform Hz to Bark zf( )=+ 13arctan(0.00076 f ) 3.5arctanêúç÷ (Bark) ëûêúèø7500
Tonality detector
Global masking curve
April 28, 2020 55 Spreading Function Based Masking Models (ISO MPEG Model)
x • Because of the difference in masking strength Segmentation (15-30 ms) we need a classification of tonal and noisy y components in a signal spectrum Derive power spectrum • Spectral flatness measure: Tonal if the power Scaling to SPL (dB) spectral density of a component exceeds nearby components by more than e.g. 7 dB
Transform Hz to Bark Tone on tone masking Noise on tone masking
Tonality detector Masker(k) Masker(k) TNM(z,k) TTM(z,k) Global masking curve Test signal frequency (bark) Test signal frequency (bark)
April 28, 2020 56 Spreading Function Based Masking Models (ISO MPEG Model)
x
Segmentation (15-30 ms) • Tonal maskers (k): Threshold for each z: TTM(z,k) y • Noise maskers (k): Threshold for each z: TNM(z,k) Derive power spectrum • Threshold in quiet:
-0.8 -0.6*( f -3.3)2 -3 4 Scaling to SPL (dB) Tq (f) = 3.64 f -6.5e +10 f • Combining masking components by power Transform Hz to Bark addition:
æ 0.1T (z) ö T (z) =1010 logç10 q + 100.1TTM (z,k) + 100.1TNM (z,k) ÷ Tonality detector g å å è k k ø
Global masking curve
April 28, 2020 57 Spreading Function Based Masking Models (ISO MPEG Model)
x Segmentation (15-30 ms) y
Derive power spectrum
Scaling to SPL (dB)
Transform Hz to Bark
Tonality detector
æ 0.1T (z) ö Global masking curve 10 q 0.1TTM (z,k) 0.1TNM (z,k) Tg (z) =10 logç10 + å10 + å10 ÷ è k k ø April 28, 2020 58 Meaning of Global Masking Curve
• The (quantisation) noise-to-masker ratio (NMR) specifies the audibility of the quantisation noise • Quantisation noise inaudible when: NMR< 0 dB
• The NMR needs to be considered within each critical band
April 28, 2020 59 Forward and Backward Masking
Forward masking has a much stronger effect than backward masking
• Forward masking: Effect is observed until 100-200 ms after masker
• Backward masking: 10 ms, but sometimes not even present
• Simultaneous masking has the strongest effect
Backward Forward masking masking Inaudible Audible Masker Inaudible Masker
Time Time April 28, 2020 60 Forward and Backward Masking
Audible More realistic situation for audio coding: Masker Inaudible • Quantisation noise is distributed Time uniformly across a segment • Position of masker in segment is very relevant
Masker Inaudible
Time
April 28, 2020 61 Pre-Echoes
Pre-echoes 1
0
-1 0 500 1000 1500 pre-echoes 1 0
-1 0 500 1000 1500
0.5
0
-0.5 0 500 1000 1500
April 28, 2020 62 Other factors that influence masking
Detection of complex signals Buus et al. 1986
Multiband energy detector model (Green and Swets, 1966): • Spectral integration of “detectability” information • Each doubling of the number of components leads to a 1.5 dB reduction in threshold • Trading detectability across frequency
April 28, 2020 63 Concluding Remarks
• Although the auditory system is complicated, usefull quantitative perceptual models can be made
• It is impossible to include all psycho-acoustical effects in a perceptual model
• Often heuristic modifications to the model or encoder are used
• The quality of an audio coder can depend strongly on the quality of the perceptual model
April 28, 2020 64