<<

Psycho-

Richard Heusdens

April 28, 2020

1

Signal and Information Processing Lab, Dept. of Mediamatics Outline

• Introduction

• Anatomy and physiology of the auditory system

• Physical versus subjective scales

• Spatial perception

• Masking (spectral and temporal)

• Perceptual audio coding

• ISO MPEG perceptual model

April 28, 2020 2 = The scientific study of the perception of .

April 28, 2020 3 Digital Signal Processing

Psycho-acoustics !

1000101011 1000000000 1111010101 101101 1100000000 0111011101 0100000000

Audio Audio Encoder Decoder Storage or Microphone + AD Transmission DA + Loudspeaker

April 28, 2020 4 What you see is not what you hear

!

Time ®

April 28, 2020 5 Anatomy and Physiology of the Human Auditory System

Literature:

James O. Pickles, “An Introduction to the Physiology of ”, Academic Press, London, 1982

April 28, 2020

6 The Peripheral Hearing System

April 28, 2020 7 Outer Ear

Acoustic energy is conducted to the eardrum • Conversion of acoustic energy into mechanical energy • Resonance in the range of 2 – 5 kHz (± 10 dB) • Acoustic filtering is in part dependent on source direction

April 28, 2020 8 Middle Ear

Impedance adjustment for effective energy transmission: • Eardrum: Large volume displacement, little pressure • Oval window: Little volume displacement, large pressure • Band-pass filter (200 Hz - 8 kHz)

April 28, 2020 9 Inner Ear

Cochlea: • Mechanical energy (oval window) is converted into a neural signal (auditory nerve) • Performs a time- analysis

April 28, 2020 10 Cochlea

1. cochlear duct 2. scala vestibuli 3. scala tympani 4. spiral ganglion 5. auditory nerve fibres

2 mm

• The red arrow is from the oval window • The blue arrow points to the round window • The cochlea is about 2 mm in diameter

April 28, 2020 11 Basilar Membrane

April 28, 2020 12 Basilar Membrane

Frequency-to-place transformation:

April 28, 2020 13 Cochlea

1. Cochlear duct 2. Scala vestibuli 3. Scala tympani 4. Reissner's membrane 5. Basilar membrane 6. Tectorial membrane 7. Stria vascularis 8. Nerve fibres 9. Bony spiral lamina

The organ of Corti is on top of the basilar membrane under the tectorial membrane

April 28, 2020 14 Organ of Corti

• Tectorial membrane (TM) • Deiters’ cells (DC) • Basilar membrane (BM) • Reticular lamina (RL) • Inner haicells (IHC) • Pillar cells (PC) • Outer haircells (OHC) • Stereocilia (St)

April 28, 2020 15 Auditory nerve

• Auditory nerve fiber responses are spiky

• Auditory nerve responses can follow the phase at low

• Auditory nerve responses need to recover at high frequencies

• Beyond about 2 kHz phase locking response is lost

March 18, 2010 16 Auditory Transduction

April 28, 2020 17 Dependence of Subjective Parameters on Physical Parameters

Literature:

Brian C.J. Moore, “An Introduction to the Psychology of Hearing”, Fourth edition, Academic Press, London, 1997

April 28, 2020

18 Physical vs. Subjective Scales

Physical Subjective

Fundamental frequency (Hz) « Pitch (Pa, dB SPL) «

April 28, 2020 19 and Pitch

• Linear frequency increase over time

• Exponential frequency increase over time • Mapping of frequency to pitch is approximately a logarithmic transformation • Empirical finding from several experiments measuring frequency just noticable differences (JNDs):

log Df = 0.0264 f - 0.52

f f +1 JND f +2 JND f +3 JND

f = 1 kHz → Df = 2 Hz f = 6 kHz → Df = 33 Hz

April 28, 2020 20 Sound Pressure and Loudness

• Linear sound pressure increase over time • Expansive sound pressure increase over time

® Mapping of sound pressure to loudness resembles a compressive transformation

Stevens Law (): S = kI 0.3 ® Thus each 10 dB increase in intensity leads to a doubling in loudness (= doubling in sones)

April 28, 2020 21 - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/Sone

Sone - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/Sone sound pressure Source of sound sound pressure loudness level sound pressure Source of soundpascal sounddB re pressure 20 µPa sone loudness level 100 134 ~ 676 pascal dB re 20 µPa sone hearing damage during short-term threshold of pain 20 approx.100 120 ~ 256 134 ~ 676 effect hearing damage during short-term 20 ~approx. 128 ... 120 ~ 256 jet, 100 m away effect 6 ... 200 110 ... 140 1024 ~ 128 ... jack hammer, 1 m awayjet, 100 / nightclub m away 2 approx.6 ... 200 100 110~ 64 ... 140 1024 hearing damage during long-term effect 6×10−1 approx. 90 ~ 32 jack hammer, 1 m away / nightclub 2 approx. 100 ~ 64 2×10−1 ... major road, 10 m awayhearing damage during long-term effect 6×8010 −...1 90 ~ 16approx. ... 32 90 ~ 32 6×10−1 2×10−1 ... major road, 10 m away 2×10−2 ... 80 ... 90 ~ 16 ... 32 passenger car, 10 m away 6×6010 −...1 80 ~ 4 ... 16 2×10−1 2×10−2 ... TV set at home level,passenger 1 m away car, 10 m away 2×10−2 ca. 60 60~ 4 ... 80 ~ 4 ... 16 2×10−1 2×10−3 ... normal talking, 1 m TVaway set at home level, 1 m away 2×4010 −...2 60 ~ 1 ... ca.4 60 ~ 4 2×10−2 2×10−3 ... normal talking, 1 m away 2×10−4 ... 40 ... 60 ~ 1 ... 4 very calm room 2×2010 −...2 30 ~ 0.15 ... 0.4 6×10−4 2×10−4 ... leaves' noise, calm breathingvery calm room 6×10−5 10 ~ 0.0220 ... 30 ~ 0.15 ... 0.4 6×10−4 auditory threshold at 1 kHz 2×10−5 0 0 leaves' noise, calm breathing 6×10−5 10 ~ 0.02

−5 April 28, 2020 sone 1 2 4 auditory 8 16 threshold32 64 128at 1 kHz256 512 1024 2×10 0 022

40 50 60 70 80 90 100 110 120 130 140 sone 1 2 4 8 16 32 64 128 256 512 1024 Formulae phon 40 50 60 70 80 90 100 110 120 130 140

According to Stevens'Formulae definition,[1] a loudness of 1 sone is equivalent to the loudness of a signal at 40 , the loudness level of a 1 kHz tone at 40 dB SPL. But phons scale with level in dB, not with loudness, so the soneAccording and phon to scales Stevens' are notdefinition, proportional.[1] a loudness Rather, ofthe 1 loudnesssone is equivalent in sones is, to atthe least loudness very of a signal at 40 nearly, a power law phonsfunction, the of loudness the signal level intensity, of a 1 kHzwith tonean exponent at 40 dB of SPL 0.3.. But[2][3 ]phons With scalethis exponent, with level in dB, not with each 10 phon increaseloudness, (or 10 dBso theat 1 sone kHz) and produces phon scales almost are exactly not proportional. a doubling ofRather, the loudness the loudness in in sones is, at least very sones.[4] nearly, a power law function of the signal intensity, with an exponent of 0.3.[2][3] With this exponent, each 10 phon increase (or 10 dB at 1 kHz) produces almost exactly a doubling of the loudness in At frequencies othersones. than [14 ]kHz, the loudness level in phons is calibrated according to the frequency response of human hearing, via a set of equal-loudness contours, and then the loudness level in phons is mapped to loudnessAt in frequenciessones via the other same than power 1 kHz, law. the loudness level in phons is calibrated according to the frequency response of human hearing, via a set of equal-loudness contours, and then the loudness level in phons is mapped to loudness in sones via the same power law. 2 of 4 27/04/15 15:36

2 of 4 27/04/15 15:36 Equal Loudness Contours

Alternative scale for loudness is the phon:

• The loudness in phon of a specific tone is defined as the level in dB SPL of a 1 kHz reference tone that equaly loud as the specific tone.

• MAF = minimum audible field (absolute threshold of hearing)

April 28, 2020 23 Sound Pressure and Loudness

April 28, 2020 24 Weber’s Law

• Weber’s Law states that the just noticeable difference in level is a constant percetage of level: DI = constant I • For pure tones an increase of about 0.5 - 1 dB is just noticable

April 28, 2020 25 Spatial perception

April 28, 2020

26 Binaural Hearing

We have a remarkable ability to compare acoustic signals across the two ears. On headphones: • Identical signals lead to a narrow sound image centered in our head (correlation is 1) • An interaural level difference of 1 dB leads to a just noticeable shift in position towards the louder ear • An interaural time difference of 15 µs leads to a just noticeable shift in position towards the leading ear • When two signals are not identical (correlation <1), the sound image increases in width • Changes in correlation are best audibile around values of 1. • A reduction from 1 to 0.98 is audible, a reduction from 0.1 to 0 is not audible. ò L(t)R(t +t )dt Correlation: CLR (t ) = ò L(t)2 dt ò R(t +t )2 dt April 28, 2020 27 Spatial Hearing

Freefield listening: binaural cues help to localize sound source

• Low frequencies: Sound bends around the head (path length difference): Interaural time differences

• High frequencies: Sound is obstructed by the head: Interaural level differences

• Interaural time differences are not perceived above 2 kHz. • Room reflections lead to reduction of coherence

April 28, 2020 28 Spectral and Temporal Masking

Literature:

Brian C.J. Moore, “An Introduction to the Psychology of Hearing”, Fourth edition, Academic Press, London, 1997

T. Painter and A. Spanias, “Perceptual Coding of Digital Audio”. Proceedings of the IEEE, Vol. 88, No. 4, pp. 451-513, April 2000

April 28, 2020

29 Masking

The phenomenon where a sound (maskee/test signal) that is perfectly audible in isolation is not audible due to the presence of a masking sound (masker)

Relevance for audio coding: • Quantisation noise (maskee) that is introduced by the coding algorithm is masked by the signal which is coded (masker) • By shaping the spectro-temporal shape of the distortion a very efficient coding of a signal is possible (1-2 bits/sample)

Encoded signal: (= Original + quantisation noise) Quantisation noise:

April 28, 2020 30 Masking, Fletcher (1940)

• Measure threshold of detectability of a tone masked by bandpass noise centred spectrally around the tone

• Measure thresholds as a function of masker bandwidth

test signal SPL

masker masked threshold *

frequency bandwidth

April 28, 2020 31 Masking, Fletcher (1940)

• Measure threshold of detectability of a tone masked by bandpass noise centred spectrally around the tone

• Measure thresholds as a function of masker bandwidth

test signal SPL

masker * masked threshold *

frequency bandwidth

April 28, 2020 32 Masking, Fletcher (1940)

• Measure threshold of detectability of a tone masked by bandpass noise centred spectrally around the tone

• Measure thresholds as a function of masker bandwidth

test signal SPL *

masker * masked threshold *

frequency bandwidth

April 28, 2020 33 Masking, Fletcher (1940)

• Measure threshold of detectability of a tone masked by bandpass noise centred spectrally around the tone

• Measure thresholds as a function of masker bandwidth

test signal SPL * *

masker * masked threshold *

frequency bandwidth

April 28, 2020 34 Masking, Fletcher (1940)

• Measure threshold of detectability of a tone masked by bandpass noise centred spectrally around the tone

• Measure thresholds as a function of masker bandwidth

test signal SPL * * *

masker * masked threshold *

frequency critical bandwidth bandwidth

April 28, 2020 35 Masking, Fletcher (1940)

Assumption: signal is detected when the signal-to-noise ratio at the output of the auditory filter exceeds a certain criterion value

test signal

SPL auditory filter * * *

masker * masked threshold *

frequency critical bandwidth bandwidth

April 28, 2020 36 Demo: Critical Bands

Masking of a single 2000 Hz tone (decreasing in 10 steps of 5 dB) by spectrally flat (white) noise of different bandwidths:

• broadband noise • bandwidth 1000 Hz • bandwidth 250 Hz • bandwidth 10 Hz

April 28, 2020 37 Bark Scale

• A scale that converts frequency (Hz) into units of critical

bandwidth 2 éùæöf zf( )=+ 13arctan(0.00076 f ) 3.5arctanêúç÷ (Bark) ëûêúèø7500 named after Heinrich Barkhausen (who proposed the first subjective measurements of loudness)

• Critical bandwidth: ¶f CBW = ¶z

April 28, 2020 38 Bark Scale and Critical Bandwidth

4 x 10 2.5 2 1.5 1 Bark scale 0.5 Frequency (Hz) Frequency 0 5 10 15 20 25 z (Bark)

3 10 Critical bandwidth CBW (Hz)CBW 2 10

2 3 4 10 10 10 f0(Hz) April 28, 2020 39 Spectral Spread of Masking

SPL (dB) Threshold (dB)

Masker Masker

Test signal

Frequency (bark) Test signal frequency (bark)

Test signal is detected when the test-signal-to-masker ratio at the output of the auditory filter exceeds a certain criterion value k

April 28, 2020 40 Spectral Spread of Masking

SPL (dB) Threshold (dB)

Masker Masker

Auditory filter shape

Frequency (bark) Test signal frequency (bark)

Test signal is detected when the test-signal-to-masker ratio at the output of the auditory filter exceeds a certain criterion value k

April 28, 2020 41 Spectral Spread of Masking

SPL (dB) Threshold (dB)

Masker Masker

Frequency (bark) Test signal frequency (bark)

Test signal is detected when the test-signal-to-masker ratio at the output of the auditory filter exceeds a certain criterion value k

April 28, 2020 42 Spectral Spread of Masking

SPL (dB) Threshold (dB)

Masker Masker

Frequency (bark) Test signal frequency (bark)

Test signal is detected when the test-signal-to-masker ratio at the output of the auditory filter exceeds a certain criterion value k

April 28, 2020 43 Spectral Spread of Masking

SPL (dB) Threshold (dB)

Masker Masker

Frequency (bark) Test signal frequency (bark)

Test signal is detected when the test-signal-to-masker ratio at the output of the auditory filter exceeds a certain criterion value k

April 28, 2020 44 Spectral Spread of Masking

SPL (dB) Threshold (dB)

Masker Masker

Frequency (bark) Test signal frequency (bark)

Test signal is detected when the test-signal-to-masker ratio at the output of the auditory filter exceeds a certain criterion value k

April 28, 2020 45 Spectral Spread of Masking

M+S S M+S 300 Hz 500 Hz 700 Hz

Threshold (dB)

Masker

25 dB/bark -10 dB/bark

500 Hz Test signal frequency (bark)

Test signal is detected when the test-signal-to-masker ratio at the output of the auditory filter exceeds a certain criterion value k

April 28, 2020 46 Tonal versus Noise Maskers

Tone-on-tone masking Noise-on-tone masking

Threshold (dB) Threshold (dB) Masker Masker 4 dB 24 dB

Test signal frequency (bark) Test signal frequency (bark)

Threshold is about 24 dB Threshold is about 4 dB below masker (k = -24 dB) below masker (k = -4 dB)

April 28, 2020 47 Perceptual Audio Coding

April 28, 2020

48 Perceptual Audio Coder

Encoder:

Audio in Bitstream Analysis Lossless Quantization Filter bank Coding

Perceptual Model

Decoder:

Bitstream Audio out Synthesis Decoding Reconstruction Filter bank

April 28, 2020 49 Irrelevancy Removal

Quantize every spectral component such that the quantization error stays below the masking threshold

80

signal spectrum 70 masking threshold

60

50

40

30

20 Sound pressure level (dB SPL)

10

0

-10 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 Frequency (Hz)

April 28, 2020 50 Spreading Function Based Masking Models (ISO MPEG Model)

x Segmentation (15-30 ms) y

Derive power spectrum

Scaling to SPL (dB)

Transform Hz to Bark

Tonality detector

Global masking curve

April 28, 2020 51 Spreading Function Based Masking Models (ISO MPEG Model)

x wt() Segmentation (15-30 ms) y

Derive power spectrum t tstart tend Scaling to SPL (dB)

yt()== wtxt () (), t tstart ... t end Transform Hz to Bark

Tonality detector

Global masking curve

April 28, 2020 52 Spreading Function Based Masking Models (ISO MPEG Model)

x Segmentation (15-30 ms) y 2p 2 1 N -1 jkn kf Derive power spectrum N s psd(fynek )= å ( ) fk = N n=0 N Scaling to SPL (dB)

Transform Hz to Bark

Tonality detector

Global masking curve

April 28, 2020 53 Spreading Function Based Masking Models (ISO MPEG Model)

x Segmentation (15-30 ms) y

Derive power spectrum

æöpsd(fk ) Scaling to SPL (dB) SPL(fk )= 10log10 ç÷ èøpsd0dB Transform Hz to Bark

Tonality detector

Global masking curve

April 28, 2020 54 Spreading Function Based Masking Models (ISO MPEG Model)

x Segmentation (15-30 ms) y

Derive power spectrum

Scaling to SPL (dB)

2 éùæöf Transform Hz to Bark zf( )=+ 13arctan(0.00076 f ) 3.5arctanêúç÷ (Bark) ëûêúèø7500

Tonality detector

Global masking curve

April 28, 2020 55 Spreading Function Based Masking Models (ISO MPEG Model)

x • Because of the difference in masking strength Segmentation (15-30 ms) we need a classification of tonal and noisy y components in a signal spectrum Derive power spectrum • Spectral flatness measure: Tonal if the power Scaling to SPL (dB) of a component exceeds nearby components by more than e.g. 7 dB

Transform Hz to Bark Tone on tone masking Noise on tone masking

Tonality detector Masker(k) Masker(k) TNM(z,k) TTM(z,k) Global masking curve Test signal frequency (bark) Test signal frequency (bark)

April 28, 2020 56 Spreading Function Based Masking Models (ISO MPEG Model)

x

Segmentation (15-30 ms) • Tonal maskers (k): Threshold for each z: TTM(z,k) y • Noise maskers (k): Threshold for each z: TNM(z,k) Derive power spectrum • Threshold in quiet:

-0.8 -0.6*( f -3.3)2 -3 4 Scaling to SPL (dB) Tq (f) = 3.64 f -6.5e +10 f • Combining masking components by power Transform Hz to Bark addition:

æ 0.1T (z) ö T (z) =1010 logç10 q + 100.1TTM (z,k) + 100.1TNM (z,k) ÷ Tonality detector g å å è k k ø

Global masking curve

April 28, 2020 57 Spreading Function Based Masking Models (ISO MPEG Model)

x Segmentation (15-30 ms) y

Derive power spectrum

Scaling to SPL (dB)

Transform Hz to Bark

Tonality detector

æ 0.1T (z) ö Global masking curve 10 q 0.1TTM (z,k) 0.1TNM (z,k) Tg (z) =10 logç10 + å10 + å10 ÷ è k k ø April 28, 2020 58 Meaning of Global Masking Curve

• The (quantisation) noise-to-masker ratio (NMR) specifies the audibility of the quantisation noise • Quantisation noise inaudible when: NMR< 0 dB

• The NMR needs to be considered within each critical band

April 28, 2020 59 Forward and Backward Masking

Forward masking has a much stronger effect than backward masking

• Forward masking: Effect is observed until 100-200 ms after masker

• Backward masking: 10 ms, but sometimes not even present

• Simultaneous masking has the strongest effect

Backward Forward masking masking Inaudible Audible Masker Inaudible Masker

Time Time April 28, 2020 60 Forward and Backward Masking

Audible More realistic situation for audio coding: Masker Inaudible • Quantisation noise is distributed Time uniformly across a segment • Position of masker in segment is very relevant

Masker Inaudible

Time

April 28, 2020 61 Pre-Echoes

Pre-echoes 1

0

-1 0 500 1000 1500 pre-echoes 1 0

-1 0 500 1000 1500

0.5

0

-0.5 0 500 1000 1500

April 28, 2020 62 Other factors that influence masking

Detection of complex signals Buus et al. 1986

Multiband energy detector model (Green and Swets, 1966): • Spectral integration of “detectability” information • Each doubling of the number of components leads to a 1.5 dB reduction in threshold • Trading detectability across frequency

April 28, 2020 63 Concluding Remarks

• Although the auditory system is complicated, usefull quantitative perceptual models can be made

• It is impossible to include all psycho-acoustical effects in a perceptual model

• Often heuristic modifications to the model or encoder are used

• The quality of an audio coder can depend strongly on the quality of the perceptual model

April 28, 2020 64