Pitch Tracking

Pitch Tracking

Lecture 7 Pitch Analysis What is pitch? • (ANSI) That attribute of auditory sensation in terms of which sounds may be ordered on a scale extending from low to high. Pitch depends mainly on the frequency content of the sound stimulus, but also depends on the sound pressure and waveform of the stimulus. • (operational) A sound has a certain pitch if it can be reliably matched to a sine tone of a given frequency at 40 dB SPL. ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 2 What is pitch? • A perceptual attribute, so subjective • Only defined for (quasi) harmonic sounds – Harmonic sounds are periodic, and the period is 1/F0. • Can be reliably matched to fundamental frequency (F0) – In audio signal processing, people do not often discriminate pitch from F0 • F0 is a physical attribute, so objective ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 3 Harmonic Sound • A complex sound with strong sinusoidal components at integer multiples of a fundamental frequency. These components are called harmonics or overtones. • Sine waves and harmonic sounds are the sounds that may give a perception of “pitch”. ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 4 Classify Sounds by Harmonicity • Sine wave • Strongly harmonic Oboe Clarinet ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 5 Classify Sounds by Harmonicity • Somewhat harmonic (quasi-harmonic) Marimba 0.3 40 0.2 20 Human 0.1 0 voice 0 Amplitude -20 -0.1 (dB) Magnitude -0.2 -40 0 5 10 15 20 25 0 1000 2000 3000 4000 5000 Time (ms) Frequency (Hz) ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 6 Classify Sounds by Harmonicity • Inharmonic Gong ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 7 How do we perceive pitch? • Complex tones – Strongest frequency? – Lowest frequency? – Greatest common divisor of the harmonics? Oboe ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 8 The Missing Fundamental Frequency Frequency (linear) Time ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 9 Another Example 300:200:2100 Hz 200Hz 300Hz Amp 300 500 700 900 1100 1300 1500 1700 1900 2100 Freq (Hz) ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 10 Pitch perception is affected by • The loudest frequency component • The greatest common divisor of partials • The regular frequency space between partials ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 11 Shepard Tones Barber’s pole Continuous Risset scale ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 12 Pitch Detection • Well defined for harmonic and quasi-harmonic sounds • Estimate the fundamental frequency in each frame of the signal • Quick facts – Human speech: from 40 Hz for low-pitched male to 600 Hz for children or high-pitched female – Piano: 27 Hz – 4,186 Hz – Human hearing range: 20 Hz – 20,000 Hz ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 13 Why is pitch detection important? • Harmonic sounds are ubiquitous – Music, speech, bird singing • Pitch (F0) is an important attribute of harmonic sounds, and it relates to other properties – Music melody key, scale (e.g., chromatic, diatonic, pentatonic), style, emotion, etc. – Speech intonation word disambiguation (for tonal language), statement/question, emotion, etc. What scales are used? What emotion? ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 14 General Process of Pitch Detection • Segment audio into time frames – Pitch changes over time • Detect pitch (if any) in each frame – Need to detect whether the frame contains pitch or not • Post processing to consider context info – Pitch contours are often continuous ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 15 An Example ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 16 How long should the frame be? • Too long: – Contains multiple pitches (low time resolution) • Too short – Can’t obtain reliable detection (low freq resolution) – Should be longer than 3 periods of the signal waveform 0.2 0.1 0 Amplitude -0.1 -0.2 0.74 0.745 0.75 0.755 0.76 0.765 0.77 0.775 0.78 Time (s) 3 periods – For speech or music, how long should the frame be? ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 17 Pitch-related Properties • Time domain signal is periodic. – F0 = 1/period • Spectral peaks have harmonic relations. – F0 is the greatest common divisor. • Spectral peaks are equally spaced. – F0 is the frequency gap. ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 18 Pitch Detection Methods • Time domain signal is • Time domain periodic. – Detect period – F0 = 1/period • Spectral peaks have • Frequency domain harmonic relations. – Detect the divisor – F0 is the greatest common divisor. • Spectral peaks are • Cepstrum domain equally spaced. – Detect the gap – F0 is the frequency gap. ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 19 Pitch detection in time domain • Autocorrelation – Basis: the time-domain signal is periodic – A periodic signal correlates strongly with itself when offset by the fundamental period. – Autocorrelation shows peaks at multiples of pitch period. ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 20 Pitch detection in frequency domain • Calculate the magnitude spectrum • For each pitch hypothesis, calculate salience by – Counting the number of peaks located at its harmonic positions, or – Summing spectral energy at harmonic positions, or – …… • Choose the hypothesis with the highest salience ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 21 Pitch detection in cepstrum domain • Idea: find the frequency gap between adjacent spectral peaks – The log-amplitude spectrum looks pretty periodic – The gap can be viewed as the period of the spectrum – How to find the period then? – Cepstrum’s idea: Fourier transform! ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 22 Pitch detection in cepstrum domain • Cepstrum = |IFT{log|FT(X)|}| ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 23 How to evaluate pitch detection? • Choose some recordings (speech, music) • Get ground-truth – Listen to the signal and inspect the spectrum to manually annotate (time consuming!) – Automatic annotation using simultaneously recorded laryngograph signals for speech (not quite reliable!) • Pitched/non-pitched classification error • Calculate the difference between estimated pitch and ground-truth – Threshold for speech: 10% or 20% in Hz – Threshold for music: 1 quarter-tone (about 3% in Hz) ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 24 Different Methods vs. Ground-truth frame 25 frame 65 ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 25 Frame 65 – Pitched (Voiced) • Has clear harmonic patterns • Different methods give close results, and consistent to the ground-truth 196 Hz. 40 30 20 10 0 LogMagnitude (dB) -10 -20 0 500 1000 1500 2000 2500 3000 Frequency (Hz) ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 26 Frame 25 – Non-pitched (Unvoiced) • No clear harmonic patterns • Different methods give inconsistent results. 40 30 20 10 0 LogMagnitude (dB) -10 -20 0 500 1000 1500 2000 2500 3000 Frequency (Hz) ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 27 Pitch Detection with Noise • Can we still hear pitch if there is some background noise, say in a restaurant? Violin + babble noise • Will pitch detection algorithms still work? • Which domain is less sensitive to what kind of noise? • How to improve pitch detection in noisy environments? ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 28 Summary • Pitch detection is important for many tasks – Time domain: find the period of waveform – Frequency domain: find the divisor of peaks – Cepstrum domain: find the frequency gap between spectral peaks • Single pitch detection is mature in noiseless conditions. • Pitch detection in noisy environments (also called robust pitch detection, noise-resilient pitch detection) is an active research topic. – BaNa [Yang et al., 2014]; PEFAC [Gonzales & Brookes, 2014]; • Multi-pitch Estimation is extremely challenging! ECE 272/472 (AME 272, TEE 272) – Audio Signal Processing, Zhiyao Duan, 2018 29.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    29 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us