Timbre Features

Essentials of Sounds Introduction of musical instruments Timbre features Timbre Li Su February 13, 2017 Li Su Timbre Essentials of Sounds Pitch Introduction of musical instruments Loudness Timbre features Timbre Three essences of audio signals I Pitch I Loudness I Timbre Li Su Timbre Essentials of Sounds Pitch Introduction of musical instruments Loudness Timbre features Timbre Frequency and pitch I The higher the frequency of a sinusoidal wave, the higher it sounds I Human's audible frequency: 20 Hz { 20,000 Hz (20 kHz) I Dog's: ∼ 45 kHz; cat's: ∼ 64 kHz I Ultrasound: > 20 kHz; infrasound: < 20 Hz Li Su Timbre Essentials of Sounds Pitch Introduction of musical instruments Loudness Timbre features Timbre Scientific pitch notation and MIDI number I Musical Instrument Digital Interface (MIDI): 21 { 108 for piano I Concert pitch: A4 = 440 Hz I Reference From: M. Mueller, Fundamentals of Music Processing, Chapter 1, Springer 2015 Li Su Timbre Essentials of Sounds Pitch Introduction of musical instruments Loudness Timbre features Timbre Pitch I Octave equivalence: two frequencies differing by a power of 2 sounds similar I Semitone: two frequencies (i.e., f1 and f2, f1 > f2) differ by 1 1 semitone when their ratio is f1=f2 = 2 12 ≈ 1:059463 I One octave contains 12 semitones I The center frequency Fpitch(p) of each pitch with MIDI = p is p−69 Fpitch(p) = 440 × 2 12 (1) I Example: we have Fpitch(p + 12) = 2Fpitch(p), 1 Fpitch(p+1) = 2 12 ≈ 1:059463 Fpitch(p) Li Su Timbre Essentials of Sounds Pitch Introduction of musical instruments Loudness Timbre features Timbre Dynamic, loudness, and intensity I Dynamic: a term referring to the musical symbols that indicate the volume, like forte (f) or piano (p) I Loudness: a perceptual, subjective property, depending on sound intensity, duration and frequency, where the sound can be ordered from quite to loud I Intensity: a physical property, defined as the sound power per unit area (e.g., W =m2) I Threshold of hearing (TOH): the minimal sound intensify of a pure tone (i.e., a sinusoid) a human can hear, −12 2 ITOH := 10 W =m 2 I Threshold of pain (TOP): ITOH := 10W =m I I dB-scaled sound intensity: dB(I ) = 10 log 10 ITOH Li Su Timbre Essentials of Sounds Pitch Introduction of musical instruments Loudness Timbre features Timbre Sound intensity From: M. Mueller, Fundamentals of Music Processing, Chapter 1, Springer 2015 Li Su Timbre Essentials of Sounds Pitch Introduction of musical instruments Loudness Timbre features Timbre Equal loudness curve I Loudness is highly correlated with intensity I Human ears are most sensitive to sounds around 2{4 kHz I Frequency-dependent unit: phon From: M. Mueller, Fundamentals of Music Processing, Chapter 1, Springer 2015 Li Su Timbre Essentials of Sounds Pitch Introduction of musical instruments Loudness Timbre features Timbre Timbre I Timbre is the attribute whereby a listener can judge two sounds as dissimilar using any criterion other than pitch and loudness I Timbre information allows us to tell apart the sounds of a violin, oboe and trumpet, even when the pitch and loudness of them are the same I Words describing timbre: bright, dark, warm, harsh, cold, ... Li Su Timbre Essentials of Sounds Musical instrument families Introduction of musical instruments Properties of some musical instruments Timbre features Musical instrument families No unified categories for music instrument families. In common sense: I Strings: violin, cello, guitar, ... I Brass: trumpet, trombone, horn, ... I Woodwind: clarinet, oboe, bassoon, ... I Percussion: drum, cymbal, hi-hat, xylophone, ... The Hornbostel-Sachs system I Idiophone: produce sound by vibrating themselves I Membranophone: produce sound by a vibrating membrane I Chordophone: produce sound by vibrating strings I Areophone: produce sound by vibrating air I (New) electrophone: produce sound by electronic signal Li Su Timbre Essentials of Sounds Musical instrument families Introduction of musical instruments Properties of some musical instruments Timbre features Digital audio effects: filter I Suppress of remove specific components in a given frequency band I Example: what will happen if we use a high-pass filter (e.g., suppress low-frequency components) on a signal? Original Cut-off frequency = 100 Hz Cut-off frequency = 200 Hz Cut-off frequency = 500 Hz Cut-off frequency = 1000 Hz Li Su Timbre Essentials of Sounds Musical instrument families Introduction of musical instruments Properties of some musical instruments Timbre features Digital audio effects: flanging I Flanging: combining two identical signals together, with a small time difference (around 20 ms) I Behaves like a comb filter: About comb filter I The history of flanging: Link I Other audio effects (e.g., phasing, chorus effect, etc.): visit Wikipedia for resources I “Infinite flanging”: the Shepard tone effect (the sonic barber pole) Audio Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features Energy features The instantaneous RMS energy v u N u 1 X E(n) = t x(n + i)2 (2) 2N + 1 i=−N Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features The ADSR curve I Temporal dynamics of sounds are very critical to the perception of timbre I A general model of the temporal amplitude envelope I Attach-Decay-Sustain-Release (ADSR) I RMS amplitude envelope: low-pass filtering E(n) with cut-off frequency around 30 Hz I Other methods? From: M. Mueller, Fundamentals of Music Processing, Chapter 1, Springer 2015 Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features Attack time I \Rise time": no strick definition I One definition: the time interval between the point the audio signal reaches 20% and 80% of its maximum value I Log attack time (LAT): LAT = log10(t80 − t20) (3) I Temporal centroid of a note: P Ω nE(n) Ct = P ; Ω := fn : onset time < n < offset timeg Ω E(n) (4) Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features Temporal features: piano and violin From: M. Mueller, Fundamentals of Music Processing, Chapter 1, Springer 2015 Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features Vibrato and tremolo I Tremolo: periodic variations in amplitude (i.e., amplitude modulation), in some cases called shimmer I Vibrato: periodic variations in frequency (i.e., frequency modulation), in some cases called jitter Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features Log-scale spectrum I Sampling rate fs , window size N, hop size H N−1 X − j2πkm X (n; k) = x(m + nH)h(m)e N (5) m=0 X (n; k) =jX (n; k)j2 (6) kfs I The index k corresponds to the frequency f (k) := N nH I The index n corresponds to the time t(n) := fs I Human perception of loudness is of log-scale: log X (n; k) I Human perception of pitch is also of log-scale: define for each pitch p P(p) := fk : Fpitch(p − 0:5) ≤ k < Fpitch(p + 0:5)g (7) P I The log-frequency spectrogram: Y(n; p) := k2P(p) X (n; k) Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features Pitch name, MIDI, and frequency From: M. Mueller, Fundamentals of Music Processing, Chapter 3, Springer 2015 Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features The chromatic scale of piano: log X From: M. Mueller, Fundamentals of Music Processing, Chapter 3, Springer 2015 Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features The chromatic scale of piano: log Y From: M. Mueller, Fundamentals of Music Processing, Chapter 3, Springer 2015 Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features More examples From: M. Mueller, Fundamentals of Music Processing, Chapter 3, Springer 2015 Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features Mel-scale spectrogram I Mel scale simulates human's perception of pitch f m = 2595 log + 1 (8) 10 700 I Example: 8 mel-scale, triangular filter banks for 65 { 1000 Hz Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features MFCC I Cepstrum: the inverse FFT of the log-magnitude spectrum I Mel-frequency cepstral coefficients (MFCC): a cepstral feature derived from mel-frequency spectrum I Common usage: 13-, 20-, or 40-term MFCC I 1st and 2nd temporal differences of MFCC are also important feature I Building blocks: Li Su Timbre Energy features Essentials of Sounds Temporal features Introduction of musical instruments Spectral features Timbre features Harmonic features Window size, pitch and bandwidth I Recall spectral leakage: every spectral peak (of a sinusoidal component) has finite width I Recall the chromatic scale: low pitches distribute denser than high pitches I

Timbre Features

A Thesis Entitled Nocturnal Bird Call Recognition System for Wind Farm

The Articulatory and Acoustic Characteristics of Polish Sibilants and Their Consequences for Diachronic Change

Detection and Classification of Whale Acoustic Signals

Music Information Retrieval Using Social Tags and Audio

Selecting Proper Features and Classifiers for Accurate Identification of Musical Instruments

The Subjective Size of Melodic Intervals Over a Two-Octave Range

Data-Driven Cepstral and Neural Learning of Features for Robust Micro-Doppler Classiﬁcation

Psychoacoustics of Multichannel Audio

Analyzing Noise Robustness of Mfcc and Gfcc Features in Speaker Identification

Lecture 6: Frequency Scales: Semitones, Mels, and Erbs

Discrimination-Emphasized Mel-Frequency-Warping for Time-Varying Speaker Recognition

Solution of the Three-Dimensional Inverse Acoustic Scattering Problem on the Basis of the Novikovðhenkin Algorithm N