Essentials of Introduction of musical instruments Timbre features

Timbre

Li Su

February 13, 2017

Li Su Timbre Essentials of Sounds Pitch Introduction of musical instruments Loudness Timbre features Timbre

Three essences of audio signals

I Pitch

I Loudness

I Timbre

Li Su Timbre Essentials of Sounds Pitch Introduction of musical instruments Loudness Timbre features Timbre

Frequency and pitch

I The higher the of a sinusoidal wave, the higher it sounds

I Human’s audible frequency: 20 Hz – 20,000 Hz (20 kHz)

I Dog’s: ∼ 45 kHz; cat’s: ∼ 64 kHz

I : > 20 kHz; : < 20 Hz

Li Su Timbre Essentials of Sounds Pitch Introduction of musical instruments Loudness Timbre features Timbre

Scientific pitch notation and MIDI number

I Musical Instrument Digital Interface (MIDI): 21 – 108 for piano

I Concert pitch: A4 = 440 Hz

I Reference

From: M. Mueller, Fundamentals of Music Processing, Chapter 1, Springer 2015

Li Su Timbre Essentials of Sounds Pitch Introduction of musical instruments Loudness Timbre features Timbre

Pitch

I Octave equivalence: two differing by a power of 2 sounds similar

I Semitone: two frequencies (i.e., f1 and f2, f1 > f2) differ by 1 1 semitone when their ratio is f1/f2 = 2 12 ≈ 1.059463 I One octave contains 12 semitones

I The center frequency Fpitch(p) of each pitch with MIDI = p is

p−69 Fpitch(p) = 440 × 2 12 (1)

I Example: we have Fpitch(p + 12) = 2Fpitch(p), 1 Fpitch(p+1) = 2 12 ≈ 1.059463 Fpitch(p)

Li Su Timbre Essentials of Sounds Pitch Introduction of musical instruments Loudness Timbre features Timbre

Dynamic, loudness, and intensity

I Dynamic: a term referring to the musical symbols that indicate the volume, like forte (f) or piano (p)

I Loudness: a perceptual, subjective property, depending on intensity, duration and frequency, where the sound can be ordered from quite to loud

I Intensity: a physical property, defined as the sound power per unit area (e.g., W /m2)

I Threshold of hearing (TOH): the minimal sound intensify of a pure tone (i.e., a sinusoid) a human can hear, −12 2 ITOH := 10 W /m 2 I Threshold of pain (TOP): ITOH := 10W /m  I  I dB-scaled sound intensity: dB(I ) = 10 log 10 ITOH

Li Su Timbre Essentials of Sounds Pitch Introduction of musical instruments Loudness Timbre features Timbre

Sound intensity

From: M. Mueller, Fundamentals of Music Processing, Chapter 1, Springer 2015

Li Su Timbre Essentials of Sounds Pitch Introduction of musical instruments Loudness Timbre features Timbre

Equal loudness curve

I Loudness is highly correlated with intensity I Human ears are most sensitive to sounds around 2–4 kHz I Frequency-dependent unit: phon

From: M. Mueller, Fundamentals of Music Processing, Chapter 1, Springer 2015 Li Su Timbre Essentials of Sounds Pitch Introduction of musical instruments Loudness Timbre features Timbre

Timbre

I Timbre is the attribute whereby a listener can judge two sounds as dissimilar using any criterion other than pitch and loudness

I Timbre information allows us to tell apart the sounds of a violin, oboe and trumpet, even when the pitch and loudness of them are the same

I Words describing timbre: bright, dark, warm, harsh, cold, ...

Li Su Timbre Essentials of Sounds Musical instrument families Introduction of musical instruments Properties of some musical instruments Timbre features

Musical instrument families

No unified categories for music instrument families. In common sense:

I Strings: violin, cello, guitar, ...

I Brass: trumpet, trombone, horn, ...

I Woodwind: clarinet, oboe, bassoon, ...

I Percussion: drum, cymbal, hi-hat, xylophone, ... The Hornbostel-Sachs system

I Idiophone: produce sound by vibrating themselves

I Membranophone: produce sound by a vibrating membrane

I Chordophone: produce sound by vibrating strings

I Areophone: produce sound by vibrating air

I (New) electrophone: produce sound by electronic signal

Li Su Timbre Essentials of Sounds Musical instrument families Introduction of musical instruments Properties of some musical instruments Timbre features

Digital audio effects: filter

I Suppress of remove specific components in a given frequency band

I Example: what will happen if we use a high-pass filter (e.g., suppress low-frequency components) on a signal? Original Cut-off frequency = 100 Hz

Cut-off frequency = 200 Hz

Cut-off frequency = 500 Hz

Cut-off frequency = 1000 Hz

Li Su Timbre Essentials of Sounds Musical instrument families Introduction of musical instruments Properties of some musical instruments Timbre features

Digital audio effects: flanging

I Flanging: combining two identical signals together, with a small time difference (around 20 ms)

I Behaves like a comb filter: About comb filter

I The history of flanging: Link

I Other audio effects (e.g., phasing, chorus effect, etc.): visit Wikipedia for resources

I “Infinite flanging”: the Shepard tone effect (the sonic barber pole) Audio

Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features features Energy features

The instantaneous RMS energy v u N u 1 X E(n) = t x(n + i)2 (2) 2N + 1 i=−N

Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features The ADSR curve

I Temporal dynamics of sounds are very critical to the perception of timbre I A general model of the temporal amplitude envelope I Attach-Decay-Sustain-Release (ADSR) I RMS amplitude envelope: low-pass filtering E(n) with cut-off frequency around 30 Hz I Other methods?

From: M. Mueller, Fundamentals of Music Processing, Chapter 1, Springer 2015

Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features Attack time

I “Rise time”: no strick definition

I One definition: the time interval between the point the audio signal reaches 20% and 80% of its maximum value

I Log attack time (LAT):

LAT = log10(t80 − t20) (3)

I Temporal centroid of a note: P Ω nE(n) Ct = P , Ω := {n : onset time < n < offset time} Ω E(n) (4)

Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features Temporal features: piano and violin

From: M. Mueller, Fundamentals of Music Processing, Chapter 1, Springer 2015

Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features Vibrato and tremolo

I Tremolo: periodic variations in amplitude (i.e., amplitude modulation), in some cases called shimmer

I Vibrato: periodic variations in frequency (i.e., frequency modulation), in some cases called jitter

Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features Log-scale spectrum

I Sampling rate fs , window size N, hop size H N−1 X − j2πkm X (n, k) = x(m + nH)h(m)e N (5) m=0 X (n, k) =|X (n, k)|2 (6)

kfs I The index k corresponds to the frequency f (k) := N nH I The index n corresponds to the time t(n) := fs I Human perception of loudness is of log-scale: log X (n, k) I Human perception of pitch is also of log-scale: define for each pitch p

P(p) := {k : Fpitch(p − 0.5) ≤ k < Fpitch(p + 0.5)} (7) P I The log-frequency spectrogram: Y(n, p) := k∈P(p) X (n, k)

Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features Pitch name, MIDI, and frequency

From: M. Mueller, Fundamentals of Music Processing, Chapter 3, Springer 2015

Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features The chromatic scale of piano: log X

From: M. Mueller, Fundamentals of Music Processing, Chapter 3, Springer 2015

Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features The chromatic scale of piano: log Y

From: M. Mueller, Fundamentals of Music Processing, Chapter 3, Springer 2015

Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features More examples

From: M. Mueller, Fundamentals of Music Processing, Chapter 3, Springer 2015

Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features Mel-scale spectrogram

I Mel scale simulates human’s perception of pitch  f  m = 2595 log + 1 (8) 10 700

I Example: 8 mel-scale, triangular filter banks for 65 – 1000 Hz

Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features MFCC

I : the inverse FFT of the log-magnitude spectrum

I Mel-frequency cepstral coefficients (MFCC): a cepstral feature derived from mel-frequency spectrum

I Common usage: 13-, 20-, or 40-term MFCC

I 1st and 2nd temporal differences of MFCC are also important feature

I Building blocks:

Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features Window size, pitch and bandwidth

I Recall spectral leakage: every spectral peak (of a sinusoidal component) has finite width

I Recall the chromatic scale: low pitches distribute denser than high pitches

I Recall Heisenberg uncertainty principle: longer window gives sharper peaks, and vice versa

I Q: What do we mean by “long”? A: In terms of the wavelength (frequency) of the signal!

I Main idea: using long window for low-frequency parts, while using short window for high-frequency part

Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features Q factor

I 3-dB bandwidth: f1 − f2 I Q-factor: f Nf Q = 0 = 0 (9) f1 − f2 ∆fs

I ∆: the 3-dB bandwidth of the window function

I For Hann window, ∆ ≈ 1.50 DFT bins

I To achieve constant Q factor, we have N ∝ 1 f0

Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features Constant-Q transform

I 24 bins per octave (Q = 22.75) I Reference: C. Schorkhuber and A. Klapuri, “Constant-Q transform toolbox for music processing”, in Proceedings of the 7th Sound and Music Computing Conference, Barcelona, Spain, 2010.

Play sound

Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features Constant-Q transform

Play sound

Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features Miscellaneous: derived from X (n, k)

PN PN I Spectral centroid: µ1 = k=1 kX (n, k)/ k=1 X (n, k) PN 2 PN I Spectral spread: µ2 = k=1(k − µ1) X (n, k)/ k=1 X (n, k) I Spectral skewness: PN 3 PN µ3 = k=1(k − µ1) X (n, k)/ k=1 X (n, k) I Spectral kurtosis: PN 4 PN µ4 = k=1(k − µ1) X (n, k)/ k=1 X (n, k) PN I Entropy: H(n) = − k=1 (k log X (n, k)) / log N I Spectral flatness: the ratio between the geometric mean and the arithmetic mean ! (QN X (n, k))1/N SFM(n) = 10 log k=1 (10) 10 1 PN N k=1 X (n, k)

Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features Miscellaneous: derived from X (n, k)

I Spectral irregularity :

PN−1(X (n, k + 1) − X (n, k))2 SI (n) = k=1 (11) PN k=1 X (n, k)

I Spectral roll-off R(n): minimize R(n) such that PR PN k=1 X (n, k) ≥ γ k=1 X (n, k), where γ = 0.95 or 0.85

Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features Miscellaneous

I Zero-crossing rate (ZCR):

N 1 X ZCR(n) = |sign(x(n + i)) − sign(x(n + i − 1))| (12) 2N i=1  1, t > 0,  sign(x) = 0, t = 0, (13) −1, t < 0.

Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features

I For some pitched instrument, the of the sound are not exactly the multiples of the I Example: piano, guitar, harp, chimes, glockenspiel I Free oscillation vs. forced oscillation! I Non-ideal string: Hooke’s Law is no longer valid because of finite mass, cross-section area, gyration, etc. of the string p 2 fn = nf0 1 + βn (14)

Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features Harmonic features

I Denote the amplitude of the i-th harmonic peak as a(i), i = 0, 1, ··· , and Q harmonic peaks (including fundamental)

I Inharmonicity

2 PQ |f − if | × a2(i) IH = × i=1 i 0 (15) f PQ 2 0 i=1 a (i)

I Odd-to-even ratio P 2 q odd a (i) OER = P 2 (16) q even a (i)

Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features Example: piano

What do you see?

Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features The application of timbre feature

I Classification

I Similarity estimation

I Sound quality assessment

I Transcription

I Voice conversion

I Sound synthesis

I And more...

Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features Useful tools for timbre feature extraction

I MIRtoolbox Website

I Essentia Website

I CQT toolbox Website

I And others...

Li Su Timbre