TIMBRE ANALYSIS

Tor Halmrast Signal Workshop 4Ms UiO/IMV 2017 Things Take Time TIMBRE takes Time �� = �0�� + �1��−1 Mean of two samples = LOW PASS FILTER �0 = 0.5 �1 = 0.5 x y

Difference between two samples= HIGH PASS FILTER �0 = 0.5 �1 = - 0.5 x y Mean of 3 samples??

�� = �0�� + �1��−1 + �2��−2

�0 = 0.33 �1 = 0.33

�2 = 0.33

Orig. Mean 2 samples Mean 3 samples

Low Pass Low Pass but more HF for «edges»? ��� 2��� + ���� 2��� Imaginary axis ���� = ��� �� + ���� ��

Delay � [seconds] ��(�−�) ��(−�) ��� � = � � Real axis −�� �� = �0�1 + �1��−1 = �1 �0 + �1� Z � = �−�� = �−�2�� Amplitude (linear)

a1 Nyquist

Sum xnao + xn-1 a 1 To avoid overload/clipping Phase 0.3 0.3 Up-sampling 4x

=100 ms Feed Forward Finite Impulse Response FIR

−�� �� = �0�1 + �1��−1 = �1 �0 + �1�

� = �−�� = �−�2�� Transfere Function: � − � −1 1 Zero for H � = �0 + �1� = � �0 = 1 ? −�� −�2�� � = � = � = �1 Magnitude Respons: �−� (�) = 1 H �

Z-plane

http://www.micromodeler.com/dsp/ 0.5 + 0.5�−1 Group Delay:

Linear Phase:

Group Delay= 0.5 samples ���ℎ�� ����� ������: 0.5 + 0.5�−7 Transfer Function: Z-plane: Phase � �

Linear Phase

Group Delay (time delay of the amplitude envelopes of the variuos sinusoidal components) �� � � � = − , where �=phase shift � ��

FIR filter up to z-7: Group Delay (envelope) delayed 7/2=3.5 samples Lower (relative) strength of the

0.5 + 0.1�−7

dB Phase � �

�� � Group Delay � � = − , where �=phase shift � �� DELAY Speech: Perceived as a distinct Echo if a distinct reflection arrives more than 50 ms after the direct sound COMB FILTERS Freq.Specter of DirectSound + Delay

CBTB=CombBetweenTeethBandwidth

PS! For 50 ms: CBTB=1000/50=20 Hz

Oslo Concert Hall w/test reflectors Musikverein, Vienna

Single, distinct reflection from reflectors Many reflections from balconies etc. Comb Filter coloration No Comb Coloration perceived Comb Filter Coloration («Box-Klangfarbe») when CBTB is in the order of the critical bandwidth of the basilare membrane of the cochlea HUMAN ORAL CLICK w/REFLECTION 10ms

FREQUENCY SPECTRUM:

10ms

Cepstrum: Find «rythmic» behavior in the freq.resp. HUMAN ORAL CLICK w/REFLECTION 10ms

FREQUENCY SPECTRUM: Sound File

MIR Toolbox

Cepstrum COMB FILTERS A click for Echolocation, electronically delayed 10 ms=deltaT Comb Filter CBTB (CombBetweenTeethBandwidth = 1/deltaT=1000/10=100 Hz

PRAAT Specter TH Blindeklikk + 10ms refl Edirol ) z H / B d (

20 l e v e l

e r u s s e

r 0 p

d n u o S -20 0 100 200 300 400 500 600 700 800 900 1000 Frequency (Hz)

Autocorr TH Blindeklikk + 10ms refl Edirol 0.9069

0

-0.7394 0 0.1 Time (s)

Power Cepstrum TH Blindeklikk + 10ms refl Edirol 119.3 ) B d (

e d u t i l p m A -16.36 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 Quefrency (s) "cepstrum" =reversing the first four letters of "spectrum". ceptrum = the inverse of the log magnitude spectrum. complex cepstrum, real cepstrum, power cepstrum, phase cepstrum of two signals = addition of their complex cepstra:

vocal excitation (pitch) and vocal tract () are additive in the of the power spectrum and thus clearly separate.

The cepstrum projects all the slowly varying components in log magnitude spectrum to the low frequency region and fast varying components to the high frequency regions.

In the log magnitude spectrum, the slowly varying components represent the envolope corresponds to the vocal tract and the fast varying components to the excitation source.

As a result the vocal tract and excitation source components get represented naturally in the spectrum of speech.

The initial few values in the cepstrum typically 13-15 cepstral values represent the vocal tract information The power cepstrum =squared magnitude of the inverse Fourier transform of the logarithm of the squared magnitude of the Fourier transform of a signal

The complex cepstrum is defined as the Inverse Fourier transform of the logarithm (with unwrapped phase) of the Fourier transform of the signal. This is sometimes called the spectrum of a spectrum. complex cepstrum of signal = IFT(log(FT(the signal))+j2πm) (where m is the integer required to properly unwrap the angle or imaginary part of the complex log function) Many texts define the process as FT → abs() → log → IFT, i.e., that the cepstrum is the "inverse Fourier transform of the log-magnitude Fourier spectrum". (the difference between squaring or taking the absolute value amounts to an overall factor of 2)

SIFT Simplified Inverse Filter Tracking algorithm (hereafter referred to as the SIFT algorithm

Encompasses the desirable properties of both autocorrelation and cepstral pitch analysis techniques. CELLO VIBRATO

PS! 220 Hz in the recording consists of: 1. over the played 110 Hz (with vibrato) 2. Open string- «tail» (non vibrato) (long decay) PITCH DETECTION POWER CEPSTRUM CELLO(dotted lines interval 0.009 quefreq. [s]) 90 ) B d (

e d u t i l p m A

50 0 0.009 0.027 0.1 0.018 Quefrency (s)

Ca. 0.009 s AUTOCORRELATION CELLO (Distance between dotted lines 0.009s) 9 ms? 0.99

f = 1/0.09= 111 Hz?

0 A = 110 Hz

-0.834 0 0.009 0.1 Time (s) Fo=1/0.009= 111.11 Hz SPECTRUM CELLO(dotted lines interval 100 Hz) ) z H / B d 60 ( l e v e l e r u s s

e 40 r p d n u o S 20 0100 500 1000 1500 2000 Frequency (Hz)

PITCH CELLO 112

111 ) z H (

h c t i P 110 110 = A

109 0 6.002 Time (s) CELLO VIBRATO original

CELLO VIBRATO w/ Delay 2.27 ms =0.8m 2 MICROPHONES, OR A REFLECTING WALL

PS! The open string «tail» appears to have shifted 1 oct. up The comb filter (due to the reflection) removes the 1.overtone at 220 Hz 0 110 220 330

0 100 200 300 400 We could of course also cut this «open string»-tail with resynthesis (SPEAR) Another delay time of the reflection:

Cello Original w/delay 7.5ms (Comb Dip at 330 Hz, 5th of A110) «removed» 330 Hz due to 7.5 ms reflection

PS! «Beats» between Vibrato for octave of played A110 + non vibrato for open A-string VIBRATO frequency ??

CEPSTRUM CELLO VIBBRATO 70 ) B d ( e d u t i l p m A

50 0 0.5 1 1.5 2 Quefrency (s)

Vibrato AUTOCORRca 1/0,2=5Hz? CELLO VIBBRATO 0.99 Vibrato ca 1/0.5 = 2 Hz ????

Fundamental=0 vibr. around 110 Hz: -0.8341/110= 0.009=9 ms 0.99 PROBLEMATIC ANALYSIS of Vibrato (mixes with the fundamental0 + ) -0.834 0 0.5 1 1.5 2 Time (s) Orig. VIBRATO analysed w/LPC Linear Predictive Coding

Cello LPC noise

CelloVibratoLPCnoise Power Spectrum 60 ) B d (

e d u t i l p m A

50 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Quefrency (s)

Period ca. 0.2 s: Vibrato freq.: 1/0.2= 5 Hz Alternative: Vibrato from wave: Cello

111

) z H

Cello LPC noise (

h c t i P

109 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 Alternative: Zoom In Pitch CelloVibratoTime (s) :

111 111 0,2 s Vibrato: 1/0.2 = 5Hz ?? ) ) z z H H ( (

h h c t c i t i P Vibrato, varying, P perhaps 0.18 s 109 109 1/018=5.6 Hz 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 22 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3 Time (s) Time (s)

111 ) z H (

h c t i P

109 2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3 Time (s) PERCEIVED «Pitch» of COMB FILTERS 10ms CBTB=100Hz (2x 1,7m)

0 ms

12,5ms CBTB=80Hz (2x2,14m)

2,5ms CBTB=400Hz (2x0,43m)

15 ms CBTB=66,7Hz (2x2,6m)

5 ms CBTB=200Hz (2x0,86m)

17,5ms CBTB=57Hz (2x3m)

7,5 ms CBTB=137Hz (2x1,3m) 20 ms CBTB=50 Hz (2x3,4m)

50 ms CBTB=20 Hz (2x8,6m) Constant white noise approximation: Water fountain, Walking on gravel etc.

Composition made by abruptly moving a reflecting wall

The Rational Anthem of Norway BACKGROUND REPETITION PITCH (Huygen/Bilsen) Stairs:

Δt time Oslo Konserthus: Sagtann-vegg og Reflekterende parti over stolryggen Tonehøyde (uten personer)

REPETITION TONALITY fo=1/Δt [Hz] 2fo, 3fo NB! Her er Δt meget liten!!!

(also in Greek «amphitheatres»… ……..unoccupied) BACKGROUND REPETITION PITCH STEEP/HIGH STAIRS Chichen Itza Pyramide, Mexico

Height increases for each step, Effective step depth increases GLISSANDO downwards! 2 Quetzal bird chirps (recorded in a rain forest) + 2 chirped echoes stimulated by a handclaps at the pyramid

FLUTTER-ECHO

Repetition Frequency→ Tone? Recording of FLUTTER. Wavelet analysis

Air Absorption (High )

Freq. 2 kHz Spherical wave→plane wave

Bass Absorption due to finite size of surfaces (diffraction)

Repetition Pitch (1/ ∆ T), Resonance between surfaces “Standing waves”

Time Repetition Pitch between FLUTTER the surfaces f=1/∆ T

Air Absorption of high frequencies

Background Noise «Tail» due to diffraction, non-infinite surfaces Spherical-Plane waves Spherical wave (from point source) Absorber (free air) Diffraction from edges Refl. Refl. wall wall

Absorber (free air)

Diffr. Diffr. Diffr.

Gradually towards PLANE WAVE

Almost PLANE WAVE («no» sound is lost)

´ ) Kuhl s equation(s

“Sum” of three reverberation “asymptotes” for the reverberation time versus frequency:

1. Low Frequency damping due to finite surface areas: Rev.Time As mirror source goes further and further away, the (log.scale) surface “seems” smaller and smaller compared to wavelength 0.041 × 2�� � = 1 � 2. (Possibly) Damping ,absorption on the surfaces: 0.0041 × � � = 2 � 3. High Freq. Damping in the air (dissipation): 0.0041 � = 3 � The total reverberation time can be re-written as: 1 1 1 1 = + + Hz ��� �1 �2 �3 Max/Msp-SIMULATION OdeClapSkarpMono Dr+Basss

Mozart->Flutt stereo

Hz

RomeoFoster Spectrogram Left/Right

Hz

Sabine-clear-FluttStereo Time [s]

U-SchzBrum clean-Flutt Sabine HP8va Sabine HP Flutter Echo as Moving Average Filter (Comb)

http://www.micromodeler.com/dsp/ IMPORTANT SPECTR. CENTROID

Un-dampened room

Dampened room

Un-dampened room

Spectr. Centroid

Mean reduced 200 Hz when dampening the room Dampened room