<<

Tutorial 26 Applications of Binaural in Audio — Designing Spatial Audio Techniques for Human Listeners

Ville Pulkki Department of Signal Processing and Aalto University, Finland

June 6, 2016 Background literature

"Communication acoustics – Introduction to Speech, Audio and Psychoacoustics" Textbook. Ch 12: Spatial hearing, Pulkki & Karjalainen 2015, Wiley "Parametric time--domain spatial audio" Contributed book with 15 chapters. eds Pulkki, Delikaris-Manias and Politis, Wiley, early 2017

Spatial hearing 2/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Where and what? Localization of sources Beam-forming towards different directions

squawk chirp chirp

brr

toot! quack!

swooh clip clop wind swooh rustle

Spatial hearing 3/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Response to quite limited range of wavelengths (380-740nm)

Human eye

The cells in the eye are a priori sensitive to direction of light

Spatial hearing 4/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Human eye

The cells in the eye are a priori sensitive to direction of light Response to quite limited range of wavelengths (380-740nm)

Spatial hearing 4/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics One ear alone knows quite little of the spatial position of the source

Human spatial hearing

Response to very large range of wavelengths (2cm–30m) Ear canal diameter <1cm, sound just bends into the canal

Spatial hearing 5/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Human spatial hearing

Response to very large range of wavelengths (2cm–30m) Ear canal diameter <1cm, sound just bends into the canal One ear alone knows quite little of the spatial position of the source

Spatial hearing 5/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Signal characteristics in one ear / Signal differences between two ears Hearing mechanisms estimate the most probable position for source Hearing can be fooled easily by audio techniques!

Human spatial hearing

c J. Blauert

The sources are localized by signal analysis by the brains

Spatial hearing 6/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Hearing mechanisms estimate the most probable position for source Hearing can be fooled easily by audio techniques!

Human spatial hearing

c J. Blauert

The sources are localized by signal analysis by the brains Signal characteristics in one ear / Signal differences between two ears

Spatial hearing 6/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Hearing can be fooled easily by audio techniques!

Human spatial hearing

c J. Blauert

The sources are localized by signal analysis by the brains Signal characteristics in one ear / Signal differences between two ears Hearing mechanisms estimate the most probable position for source

Spatial hearing 6/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Human spatial hearing

c J. Blauert

The sources are localized by signal analysis by the brains Signal characteristics in one ear / Signal differences between two ears Hearing mechanisms estimate the most probable position for source Hearing can be fooled easily by audio techniques!

Spatial hearing 6/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics This chapter

Basic concepts Head-related acoustics Localization cues Localization accuracy of direction in presence of echoes Ability to listen selectively specific direction Distance perception

Spatial hearing 7/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Coordinate system Azimuth-elevation / median plane z Frontal plane Median plane y

r δ ϕ x

front

Horizontal plane

Spatial hearing 8/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Coordinate system 2 Cone of confusion

r

δ ϕ cc cc

Cone of confusion

Spatial hearing 9/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Basic concepts Listening conditions Monaural hearing (hearing processes and cues that need signal from only one ear) Binaural hearing (differences in ear signals have also an effect) Spatial sound reproduction methods Monotic (signal fed to only one ear) Diotic (same signal fed to both ears) Dichotic (different signal fed to ears)

Spatial hearing 10/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Dichotic listening with headphones

a) b)

τph1 τph2 delays a 1 a2 attenuators

signal signal

Spatial hearing 11/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Head-related acoustics Let us consider only free field first How does incoming sound change as it hits the listener? What is the transfer function from source to ear canal?

X

Yl y =h (t)*x(t) Yleft = Hleft X left left

Spatial hearing 12/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Head-related acoustics measurements Transfer function from source to ear canal Place a microphone to ear canal or use dummy heads Head-related transfer function, head-related impulse response Depends heavily on the direction of the source

Spatial hearing 13/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Head related transfer function (HRTF) Measured with real subject for three directions Magnitude response

0dB 0 0

-10 -10 -10

-20 -20 -20 left ear left ear left ear -30 ϕ = 0 -30 ϕ = 60 -30 ϕ = 0 δ = 0 δ = 0 δ = 60 -40 -40 -40 2 4 102 103 104Hz 102 103 104 Hz10 103 10 Hz a) b) c)

0dB 0 0

-10 -10 -10

-20 -20 -20 right ear right ear right ear -30 -30 -30 ϕ = 0 ϕ = 60 ϕ = 0 δ = 0 δ δ = 60 -40 -40 = 0 -40 2 4 2 4 102 103 104Hz 10 103 10 Hz 10 103 10 Hz

Spatial hearing 14/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Head related impulse response (HRIR) HRIR == HRTF in time domain, quite often HRIRs are also called HRTFs

0.2 left ear 0.2 left ear 0.2 left ear ϕ o ϕ o ϕ o 0.1 = 0 0.1 = 60 0.1 = 0 δ = 0o δ = 0o δ = 60o 0 0 0 -0.1 -0.1 -0.1 -0.2 -0.2 -0.2 0 1 2 ms 0 1 2 ms 0 1 2 ms a) b) c)

0.2 right ear 0.2 right ear 0.2 right ear ϕ = 0o ϕ o ϕ o 0.1 0.1 = 60 0.1 = 0 δ = 0o δ = 0o δ = 60o 0 0 0 -0.1 -0.1 -0.1 -0.2 -0.2 -0.2 0 1 2 ms 0 1 2 ms 0 1 2ms

Spatial hearing 15/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics HRTF in horizontal plane

Horizontal plane HRTF left ear [dB] 0 0 31 63 −5 94 124 153 −10 −179 −150 −15 −121 −92 Azimuth [degrees] −20 −61 −30 −25 0.1 0.2 0.4 0.8 1.6 3.2 6.4 12.8 Frequency [kHz]

Spatial hearing 16/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics HRIR in horizontal plane

Horizonal plane HRIR left ear 0 31 0.8 63 94 0.6 124 153 0.4 −179 0.2 −150 −121 0

Azimuth [degree] −92 −61 −0.2 −30 −0.4

0 0.5 1 1.5 2 Time [ms]

Spatial hearing 17/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics HRTF in median plane

Median plane HRTF left ear [dB] −45 0

0 −5 45 −10 [degrees]

cc 90 −15 135

180 −20 Elevation

225 −25 0.4 0.8 1.6 3.2 6.4 12.8 Frequency [kHz]

Spatial hearing 18/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics HRIR in median plane

Median plane HRIR left ear −45 0.05

0

45 [degrees]

cc 90 0 135

180 Elevation

225 −0.05 0 0.5 1 1.5 2 2.5 3 3.5 Time [ms]

Spatial hearing 19/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Localization cues HRTFs impose some spatial information to ear canal signals What information do we dig out from those? Binaural localization cues Interaural time difference, ITD Interaural level difference, ILD Monaural localization cues: spectral cues Dynamic cues

Spatial hearing 20/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Interaural time difference (ITD) Sound arrives a bit earlier to one ear than the other ITD varies between -0.7ms and +0.7ms JND of ITD is of order of 20µs D τ = (ϕ + sin ϕ) 2c

D 600 Computed from equation s] μ 400 Δ S Measured values ϕ 200 Time difference ITD [ 0 0 30 60 90 Azimuth angle ϕ [o]

Spatial hearing 21/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Interaural time difference (ITD) We are sensitive to phase differences at low time differences between envelopes at higher frequencies

low frequencies ~200 − ~1600 Hz high frequencies > ~1600 Hz time/phase btw carriers time delay btw envelopes

left

right

ITD ITD

Spatial hearing 22/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Interaural time difference HRTFs measured from a human, ITD of noise sound source in free field computed with an auditory model

12.418.2 5.7 8.5 2.6 3.9 1.1 1.7 0.4 0.7 0.2 1

0.5

0 ITD [ms]

−0.5

−1 −90 −60 18.2 12.4 −30 8.5 5.7 0 3.9 2.6 30 1.7 60 1.1 0.4 0.7 90 0.2 Azimuth [degree] Frequency [kHz]

Spatial hearing 23/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Interaural level difference (ILD)

Head shadows incoming sound Effect depends on wavelength, no shadowing at low frequencies distance, larger ILD at very short distances (< 1m) ILD varies between -20dB and 20dB JND is of order of 1dB for sources near median plane

Spatial hearing 24/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Interaural level difference HRTFs measured from a human, ILD of sound source in free field computed with an auditory model

18.2 8.5 12.4 3.9 5.7 1.7 2.6 0.7 1.1 0.2 0.4 30

20

10

0

ILD [phon] −10

−20

−30 −90 −60 −30 0 30 18.2 8.5 12.4 60 3.9 5.7 1.7 2.6 0.7 1.1 Azimuth [degree] 90 0.2 0.4 Frequency [kHz]

Spatial hearing 25/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Basic lateralization results Subjects report the position of internalized auditory source on interaural axis

a) b)

τph1 τph2 delays a 1 a2 attenuators

signal signal 6 6 left earlier left later left left stronger weaker 4 4

2 2 right right

0 0

2 2 left left

4 4 perceived lateralization perceived lateralization

6 6 -1500 -1000 -500 0 500 1000 1500 -12 -6 0 6 12

interaural delay τph [μs] interaural level difference [dB] (Toole and Sayers, 1965) (Sayers, 1964)

Spatial hearing 26/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Addtional cues

Binaural cues (ITD and ILD) in principle resolve only the cone of confusion where the source lies Other cues are used to resolve the direction inside the cone monaural spectral cues dynamic cues (effect of head movements to cues)

θcc sound source

δcc

cone of confusion

Spatial hearing 27/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Monaural spectral cues Effect of pinna diffraction Effect of reflections/diffraction from torso

Spatial hearing 28/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Monaural spectral cues

Subject 012 HRTF [dB] Subject 017 HRTF [dB] −45 0 −45 0

0 0 −5 −5 45 45 [degrees] [degrees]

cc 90 −10 cc 90 −10 135 135 −15 −15 180 180 Elevation Elevation

225 −20 225 −20 0.8 1.6 3.2 6.4 12.8 0.8 1.6 3.2 6.4 12.8 Frequency [kHz] Frequency [kHz]

Subject 018 HRTF [dB] Subject 019 HRTF [dB] −45 0 −45 0

0 0 −5 −5 45 45 [degrees] [degrees]

cc 90 −10 cc 90 −10 135 135 −15 −15 180 180 Elevation Elevation

225 −20 225 −20 0.8 1.6 3.2 6.4 12.8 0.8 1.6 3.2 6.4 12.8 Frequency [kHz] Frequency [kHz]

Spatial hearing 29/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Monaural spectral cues

Source has to have relatively flat spectrum Relatively reliable cue for most natural Narrow-band sounds hard to localize (mosquito in dimmed room) Humans adapt relatively fast to new HRTFs, especially of visual cues are available The ears also grow throughout the life, constant adaptation

Spatial hearing 30/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Dynamic cues When head rotates, ITD and ILD and spectral cues change Powerful though coarse cue Resolves efficiently front/back/inside-the-head locations, but not fine details

ITD & ILD constant

ITD & ILD ITD & ILD head rotation change a lot change a lot in opposite direction

Spatial hearing 31/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Interaction between spatial hearing and vision Both senses try to find out "where" is the source If visual cue is clear, vision dominates Ventriloquism, within about 30◦ area visual image captures auditory image If visual cue is blurry, or not prominent, auditory cue wins In some cases two events occur, visual event and auditory event

Spatial hearing 32/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Accuracy of localization

How well do we localize point-like sources? What is the accuracy of directional hearing? Azimuth / elevation / 3D Accuracy of perception of spatial distribution of wide sound sources

Spatial hearing 33/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Localization accuracy in horizontal plane

90°

80,7° ±9,2° ϕ

179,3° 180° 359° 0° ±5,5° ±3,6°

281,6° ±10° Direction of sound event Direction of auditory event 270°

Adapted from Blauert (1996)

Spatial hearing 34/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Monaural cues are effective only after adaptation with visual reference, thus field-of-vision has better accuracy

Localization accuracy in median plane

Direction of sound event = 90o b Direction of auditory event

o o o +74 +68 o b = 36 o b = 36 ±13 ±22o o +30 o ±10o +27 ±15o o o 0 o b = 0 o b = 0 = 0o ±9 = 180o

Adapted from Damaske and Wagener (1969)

Spatial hearing 35/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Localization accuracy in median plane

Direction of sound event = 90o b Direction of auditory event

o o o +74 +68 o b = 36 o b = 36 ±13 ±22o o +30 o ±10o +27 ±15o o o 0 o b = 0 o b = 0 = 0o ±9 = 180o

Adapted from Damaske and Wagener (1969) Monaural cues are effective only after adaptation with visual reference, thus field-of-vision has better accuracy

Spatial hearing 35/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Accuracy of 3D localization Task: "Point with nose to the direction of auditory event"

Adapted from Middlebrooks (1992)

Spatial hearing 36/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Accuracy of perception of spatial distribution of sources

13 loudspeakers in free field, % 100 50 black squares denote 0 −105−90 −75 −60 −45 −30 −15 0 15 30 45 60 75 90 105 loudspeakers that produced 100 50 0 −105−90 −75 −60 −45 −30 −15 0 15 30 45 60 75 90 105 100 Task: "Tell which 50 0 loudspeakers are on" −105−90 −75 −60 −45 −30 −15 0 15 30 45 60 75 90 105 100 50 Incorrect perception of 0 −105−90 −75 −60 −45 −30 −15 0 15 30 45 60 75 90 105 distribution in complex cases 100 50 0 Perception of spatial −105−90 −75 −60 −45 −30 −15 0 15 30 45 60 75 90 105 100 distribution of 1, 2, or 3 50 0 loudspeakers correct −105−90 −75 −60 −45 −30 −15 0 15 30 45 60 75 90 105

Adapted from Santala and Pulkki (2011)

Spatial hearing 37/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Directional hearing in enclosed spaces Previous results in free field What about rooms? Humans are relatively ok in localization in rooms. How come?

early reflections

direct path

Spatial hearing 38/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Precedence effect Free-field exists only for short time 1–10ms after direct sound has arrived, and reflections not yet The short time dominates in direction perception Reflections (2 - 30ms) are perceived to arrive from the direction of direct sound

S first auditory event o ϕ = 40o ϕ

α=80o ϕ = 0o

threshold echo ϕ =-40o ST 0 1 2ms 20 30 40 50ms

ST delay τ

Adapted from Blauert (1996)

Spatial hearing 39/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Precedence effect, experiments with two sounds

Direct sound S0: "lead"

Reflected sound ST : "lag"

40 first auditory event no longer detectable [dB] SO

L 20

-

ST ST first auditory event and echo L equally loud 0

amplitude diff amplitude diff -20 echo localized

-40 timbre change audible 0 20 40 60 80 100

ST delay [ms] Adapted from Blauert (1996)

Spatial hearing 40/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Binaural Advantages in Timbre Perception

Try to listen selectively certain direction Binaural detection Binaural decolouration

Spatial hearing 41/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Binaural intelligibility level difference

Minimum understandable level of 10 speech source in different directions in diffuse noise BILD

Reference condition: binaural 5 listening with source in back Binaural intelligibility level

difference (BILD) 0 Monaural intelligibility level MILD difference (MILD) intelligibility level difference [dB]

MILD: best heard with source on −5 −90 −60 −30 0 30 60 90 same side azimuth of signal source [degrees]

BILD gives few advantage Adapted from Blauert (1996) to ipsilateral MILD "Better-ear" listening explains most of the effect Binaural hearing assists by few dB Perception of the distance of source

Cues Room effect Spectral content Binaural cues

Spatial hearing 43/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Distance cue 1: Loudness

Amplitude of direct sound decreases with 1/r law The sound energy carried by the surface of a sphere (A = 4πr 2) is constant, and pressure is related to square root of energy SPL decreases 6dB with every doubling of distance If we know the source (human talker, insect, walking sounds), the distance is perceived relatively accurately in short In free field with real human speaker, similar effect is found:

8

6

4 Average of 5 subjects 2 Distance of auditory event [m] 0 0 2 4 6 8 10 Sound source distance [m] Adapted from Békésy (1949) Distance cue 2: Effect of room

Direct-to-reverberant ratio (DRR) is often mentioned as "perceptual cue", though it is rather a physical measure If the listener has heard a source in a specific room, the change in DRR leads into perception of approaching or distancing source Effect depends on signal. Easily perceivable on tone complexes

Spatial hearing 45/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Distance cue 2: Effect of room

The characteristics in ear canal signals change with DRR = one of distance cues The cue is not known well / which modification in the signal produces the perception One possibility is the sensitivity of ear to phase

Zero-phase cosine signal

Random-phase (scrambled by the room effect) version Response of cochlea to 100Hz sawtooth with phase modifications

Hypothesis: what more "buzzy" is signal, that closer it is perceived (in a room) Original sound phase scrambled with flat magnitude response

Spatial hearing 47/72 Pulkki June 6, 2016 AdaptedDept from Signal Laitinen Processing et al. 2013 and Acoustics Distance cue 3: Effect of air

shear viscosity, thermal conductivity or heat dissipation, and molecular relaxation due to oxygen, nitrogen, and water vapor vibrational, rotational, and translational energy depends on temperature, humidity, static pressure low-pass effect can for instance have values [1-100]dB/km known sounds are perceived closer if they have more energy at high frequencies hissing is perceived closer than normal speech

Spatial hearing 48/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Distance cue 4: short-range binaural cues

with plane waves, the ILD at low frequencies is negligible e.g., when source is 3cm from one ear, it is 30cm from the other ear 1/r law makes ILD to have high magnitude excess-ILD re ITD brings perceived source closer

Spatial hearing 49/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Accuracy of distance perception

In many studies source is perceived too far in short distances source is perceived too close in high distances Acoustic horizon, maximum perceived distance of auditory event Perception depends on signal, listening conditions and source directivity Absolute accuracy low Relative accuracy good (Is the source approaching?)

Spatial hearing 50/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics References

These slides follow corresponding chapter in: Pulkki, V. and Karjalainen, M. Communication Acoustics: An Introduction to Speech, Audio and Psychoacoustics. John Wiley & Sons, 2015, where also a more complete list of references can be found. References used in figures: Blauert, J. (1997) Spatial Hearing – Psychophysics of Human . MIT Press. Damaske, P. and Wagener, B. (1969) Directional hearing tests by the aid of a dummy head. Acta Acustica United with Acustica, 21(1), 30–35. Middlebrooks, J.C. (1992) Narrow-band sound localization related to external ear acoustics. J. Acoust. Soc. Am., 92, 2607–2624. Santala, O. and Pulkki, V. (2011) Directional perception of distributed sound sources. J. Acoust. Soc. Am., 129, 1522. Sayers, B.M. (1964) Acoustic-image lateralization judgments with binaural tones. J. Acoust. Soc. Am., 36(5), 923–926. Toole, F. and Sayers, B.M. (1965) Lateralization judgments and the nature of binaural acoustic images. J. Acoust. Soc. Am., 37(2), 319–324. von Békésy, G. (1949) The moon illusion and similar auditory phenomena. Ame. J. Psychol., 62(4), 540–552.

Spatial hearing 51/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Spatial sound reproduction

Target: relay the perception of sound!

Spatial hearing 52/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Spatial sound reproduction

Could we do the same with sound than with video camera Create narrow beam for each loudspeaker Audible sound includes wave lengths from 2 cm to about 30 m Impossible to build a microphone having constant narrow beam width without coloration and noise problems Higher-order Ambisonics try to do it

Spatial hearing 53/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Spatial sound reproduction

Holography then, perhaps? Lots of spaced microphones Lots of loudspeakers Wave field synthesis Problems High price Requirements for microphones and loudspeakers are high

Spatial hearing 54/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Human spatial hearing Signals from different sources summed in ear canals Sound localization is based on signal analysis in brains At one auditory frequency band one can perceive directions in limited manner

c J. Blauert

Spatial hearing 55/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Human spatial hearing

Directional cues Interaural time difference (ITD) Interaural level difference (ILD) Pinna- and torso-related spectral cues Dynamic change of cues with head movements Suppression of effect of early reflections on directional cues, precedence effect

Spatial hearing 56/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Spatial sound reproduction

Can we exploit knowledge of human spatial hearing? Hearing can be fooled easily E.g. two coherent sources produce one virtual source in the middle Compare with vision: coherent sources do not produce virtual sources Assumption: at one frequency band humans perceive only one direction and one coherence cue

Spatial hearing 57/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Parametric spatial sound reproduction

Capture the sound Analyze spatial parameters Reproduce the sound in a way which recreates the spatial parameters Design the system to avoid any timbral artifacts, sacrifice spatial properties first

micro- spatial loudspeaker phone metadata or headphone or signals in TF in TF time- spatial spatial signals loudspeaker domain domain frequency analysis synthesis signals analysis

Spatial hearing 58/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Directional audio coding

Basic flow diagram

MICROPHONE TIME-FREQUENCY DIRECTION and MEASURED REPRODUCTION AUDIO CHANNELS IN ANALYSIS DIFFUSENESS PROPERTIES OUT ESTIMATION azimuth θ(t,f) 1 1 Short-time Point-like Fourier analysis virtual 2 Energetic elevation ϕ(t,f) 2 OR analysis sources Filterbank cross-fade diffuseness ψ(t,f) N M Diffuse reproduction 1 or M microphone channels

ANALYSIS TRANSMISSION SYNTHESIS

Spatial hearing 59/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics "Teleconference" implementation

Spatial hearing 60/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Properties of teleconference-DirAC

Perfect quality with few sources in free field Diffuse reverberation subject to spatial and timbral artifacts With several simultaneous sources sharing the same time-frequency position Timbral artifacts: "added room effect", "smearing of transients" Spatial artifacts: "sources pull each other" Sources near noise masking threshold: impossible to localize Why these artifacts?

Spatial hearing 61/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics "HQ" implementation

N frequency Directional Direction (azi, ele) Loudspeaker channels and ∫ 2 VBAP setup diffuseness information Diffuseness (Ψ) B-format analysis ∫ 1 microphone ∫ 3 ∫ 3 channels in Sum over 1 - Ψ Ψ gain frequency Filterbank factors channels

Virtual cardioid microphones

B-format audio Loudspeaker Loudspeaker Decorrelation signals single-channel setup information audio parameter Decorrelation

This works better, but dont exactly know why (2007).

Spatial hearing 62/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Properties of HQ-DirAC

All teleconference-DirAC artifacts are mitigated largely Some artifacts still present Challenging acoustical conditions Surrounding applause signals Small rooms with strong early reflections Strong unwanted sources Potential loss of energy

Spatial hearing 63/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics What did we find out

In challenging acoustical conditions sound arrives from multiple directions in one time-frequency position diffuseness will be analyzed to be high sound arrives from many directions, the physical properties of the field resemble diffuse field original sound scene is still perceived to be not similar with diffuse reverberation sound is routed to non-diffuse stream Decorrelation of free-field components causes artifacts, mainly timbral ones Avoid decorrelation!

Spatial hearing 64/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics How to avoid decorrelation

Processing of transients separately Recognize transients Use better time resolution / bypass decorrelation Covariance-domain processing minimize decorrelated energy Divide sound field into sectors from higher-order recording perform separate analysis for each sector Perform more complex analysis to sound field (multiple DOA values etc)

Spatial hearing 65/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Processing transients

Spatial hearing 66/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Processing transients separately

remove transients, reproduce using simple matrixing. mitigates "transient smearing" effects "added room" -effects still present with some signals needs high temporal resolution in processing solves "surrounding applause" cases [Laitinen et al: JAES 59:1/2]

Spatial hearing 67/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Covariance-domain processing

Least-squares optimized solution for synthesis the covariance matrix of output is dictated by directional parameters optimized mixing solution leads to minimization of decorrelated energy applicable to many other spatial sound rendering tasks [Vilkamo et al: JAES 61:6]

Spatial hearing 68/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Covariance-domain processing

Spatial hearing 69/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Solutions with different model of the sound field

Higher number of microphones gives more information about sound field How to use that information in sound reproduction? Divide sound field into sectors (Pulkki, Politis), perform lower-order reproduction for each Analyze multiple DOAs, and then reproduce (FhG, FORTH, Harpex)

Spatial hearing 70/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Sector-based parametric spatial sound reproduc- tion

[Pulkki et al: AES 134], [Politis et al: IEEE manuscript]

Spatial hearing 71/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics Sector-based parametric spatial sound reproduc- tion

"Higher-order DirAC" Challenging acoustical conditions occur rarely within sectors Parameters computed with N:th -order input Audio signals used in synthesis obtained with (N-1):th -order input Self-noise issue of higher-order microphones are also avoided System does not lose acoustic energy in any case

Spatial hearing 72/72 Pulkki June 6, 2016 Dept Signal Processing and Acoustics