Signal Analysis Amplitude Envelope Follower Peak Detection

Signal Analysis Amplitude Envelope Follower Peak Detection

Signal Analysis Music 270a: Signal Analysis Tamara Smyth, [email protected] • Some tools we may want to use to automate analysis Department of Music, are: University of California, San Diego (UCSD) 1. Amplitude Envelope Follower December 2, 2019 2. Peak Detection (attacks or harmonics), surfboard method 3. Pitch Detection, harmonics vs. chaotic signal 4. Frequency/Spectral Envelope (formant tracking, mccs or lpc) 5. Constant overlap-add (COLA) 1 Music 270a: Signal Analysis 2 Amplitude Envelope Follower Peak Detection • An envelope follower will essentially determine the • It is often desirable to automatically detect peaks, amplitude envelope without dipping down into the particularly in a spectral envelope. valleys/zero crossings. • The peak is the location where the slope changes • It is a reduction of the information, representing the directions. overal shape of the signal’s amplitude (i.e. amplidtude spectrum magnitude (linear) envelpe) without the higher frequency information. 14 12 • The amplitude envelope y(n) is given by 10 8 6 y(n)=(1 − ν)|x(n)| + νy(n − 1), 4 magnitude (linear) 2 0 where ν determines how quickly changes in x(n) are 0 500 1000 1500 2000 2500 3000 3500 4000 tracked: magnitude zoomed on first peak 14 – 12 if ν is close to one, changes are tracked slowly 10 – 8 if ν is close to zero, x(n) has an immediate 6 influence on y(n). 4 magnitude (linear) 2 0 600 620 640 660 680 700 720 740 760 780 800 • In order to capture attacks in the signal, the value for frequency (Hz) ν is usually smaller for an increasing signal and large Figure 1: Peak Detection for one that is decreasing. • See envfollower.m % threshhold Music 270a: Signal Analysis 3 Music 270a: Signal Analysis 4 th = 0; Quadratic interpolation % filter descending values uslope = Ymag > [Ymag(1); Ymag(1:end-1)]; % filter ascending values dslope = Ymag >= [Ymag(2:end); 1+Ymag(end)]; • The position of the peak is limited by the resolution % only indeces at maxima retain non-zero value Ymax = Ymag .* (Ymag > th) .* uslope .* dslope; of the DFT/FFT and its estimation can be improved using quadratic interpolation. % peak indeces maxixs = find(Ymax); • The general equation for a parabola is given by y(x) , a(x − p)2 + b, where p is the peak location and b = y(p). • Considering the parabola at points x =0, 1, −1 yields 3 equations, y(0) = ap2 + b = β y(1) = a(1 + p2 − 2p)+ b = γ y(−1) = a(1 + p2 +2p)+ b = α, and 3 unknowns a,p,b. • Solve for a: α − γ α − γ =4ap −→ a = . 4p Music 270a: Signal Analysis 5 Music 270a: Signal Analysis 6 • Solve for p using a: Pitch Detection γ − β = a − 2ap α − γ α − γ = − 2 p 4p 4p • Considerations for Computer Music applications: 4p(γ − β) = α − γ − 2(α − γ)p 1. Signals are often noisy, eg: poor soundcards, other 4p(γ − β)+2p(α − γ) = α − γ instruments/voices, 2p(γ − 2β + α) = α − γ α − γ 2. How much frequency resolution is needed? Correct p = . octave a must, but will a semitone suffice? 2(γ − 2β + α) 3. What latency can be tolerated (what framesize should be used for analysis?) • Finally, the height at peak p is given by 4. Does the instrument have well-defined/behaved y(p) = b harmonics? = β − ap2 α − γ • In the paper by de la Cuadra et al., “Efficient Pitch = β − p2. 4p Detection Techniques for Interactive Music”, four (4) pitch detection algorithms are summarized: • Another way (Dan Ellis) 1. Harmonic Product Spectrum 2. Maximum Likelihood 3. Cepstrum-Biased HPS 4. Weighted Autocorrelation Function Music 270a: Signal Analysis 7 Music 270a: Signal Analysis 8 Harmonic Product Spectrum (HPS) • Due to noise, frequencies below about 50 Hz should not be searched for a pitch. • Pros: HPS is simple to implement, does well under a • HPS (Noll 1969) measures the maximum coincidence wide range of conditions, and runs in real-time. for harmonics for each spectral frame according to • Cons: low frequency resolution must be enhanced by R zero-padding, so that the spectrum can be Y (ω)= Y |X(ωr)|, (1) interpolated to the nearest semitone. This means that r=1 high frequencies are also being unecessarily where R is the number of harmonics being considered. interpolated. • The resulting periodic correlation array Y (ω) is then • See hps.m searched for a maximum value of a range of possible fundamental frequencies ωi X(ω ) 0 Yˆ = max Y (ωi) (2) ωi X(ω ) 1 to obtain the fundamental frequency estimate. X(ω ) • Octave errors are common (detection is sometimes an 2 octave too high). X(ω ) 3 • To correct, apply this rule: if the second peak X(ω ) amplitude below initially chosent pitch is 4 approximately 1/2 of the chosen pitch AND the ratio of amplitudes is above a threshold (e.g., 0.2 for 5 Figure 2: Harmonic Product Spectrum harmonics), THEN select the lower octave peak as the pitch for the current frame. Music 270a: Signal Analysis 9 Music 270a: Signal Analysis 10 Uses of Linear Predictive Coding Concept of LPC (LPC) • Given a digital system, can the value of any sample • LPC, a statistical method for predicting future values be predicted by taking a linear combination of the of a waveform on the basis of its past values1, is often previous N samples? used to obtain a spectral envelope. • Stated mathematically, can a set of coefficients, ak, • LPC differs from formant tracking in that: be determined such that – the waveform remains in the time domain; y(n)= a1y(n − 1)+ a2y(n − 2)+ ... + aN y(n − N). resonances are described by the coefficients of an all-pole filter. • That is, can a signal be represented as coefficients of an all-pole (only feedback terms) IIR filter. – altering resonances is difficult since editing IIR filter coefficients can result in an unstable filter. • If yes, then the coefficients and the first N samples – analysis may be applied to a wide range of sounds. would completely determine the remainder of the signal, because the rest of the samples can be • LPC is often used to determine the filter in a calculated by the above equation. source-filter model of speech2 which: – characterizes the response of the vocal tract. – reconstitutes the speech waveform when driven by the correct source. 1Markel, J. D., and Gray, A. H., Hr. Linear Prediction of Speech. New York: Apringer-Verlag, 1976. 2In a source-filter model of speech, there is assumed to be no feedback dependency between the vibrating vocal folds and the vocal track Music 270a: Signal Analysis 11 Music 270a: Signal Analysis 12 LPC residual LPC Order • The answer is actually “not precisely” for a finite N. • The accuracy of the predictor improves with an • Rather, we determine the cofficients that give the increase of order N: best prediction by minimizing the difference, or error • The smallest value of N that will yield sufficiently e(n), between the actual sample values of the input quality in the speech representation is related to waveform y(n) and the waveform re-created using the 1. the highest frequency in the speech, derived predictors yˆ(n). 2. the number of formant peaks expected and min{e(n)} = min{yˆ(n) − y(n)}. n n 3. the sampling rate. • The smaller the average value of the error, also called • There is no exact relationship for determining what the residual, the better the set of predictors. value of N should be used, but in general it’s between • The residual may be used to exactly reconstruct the 10 and 20 or the sampling rate in kHz plus 4 (for original signal y(n) by using it as an input to our fs = 15kHz, you might use a N = 19). all-pole filter, that is • Schemes for determining predictors by minimizing the y(n)= b0e(n)+a1y(n−1)+a2y(n−2)+...+aNy(n−N) residual work either explicitly or implicitly. where b0 is a scaling factor that gives the correct • A more rigorous treatment of the subject can be amplitude. found in J. Makhoul’s tutorial3 • When the speech is voiced, the residual is essentially a periodic pulse waveform with the same fundamental frequency as the speech. When unvoiced, the residual 3Makhoul, J. “Linear Prediction, a Tutorial Review.” Proceedinds of the institute of Electrical and is similar to white noise. Electronics Engineers, 63, 1975, 561-580. Music 270a: Signal Analysis 13 Music 270a: Signal Analysis 14 LPC Analysis in Matlab • Matlab for generating original, lpc, and residual spectra. LPC Analysis of the vowel "aah" 25 %lpc original spectrum lpc spectrum order = fs/1000 + 5; % order residual spectrum xlpc = lpc(xw, order)’; % coefficients 20 %windowed speech frequency response 15 zpf = 3; Nfft = 2^nextpow2(N*zpf); XW = fft(xw, Nfft); 10 % lpc spectrum [H,w] = freqz(1, xlpc, Nfft, ’whole’); 5 0 %inverse filter to obtain residual 0 500 1000 1500 2000 2500 E = XW./H; Figure 3: LPC Analysis Results for vowel ’a’. e = real(ifft(E)); Music 270a: Signal Analysis 15 Music 270a: Signal Analysis 16 Constant Overlap Add (COLA) signal X(ω), that is: ∞ ∞ ∞ , −jωn X Xm(ω) X X x(n)w(n − mR)e m=−∞ m=−∞ n=−∞ • Mathematical definition of the Short-time Fourier ∞ ∞ −jωn Transform (STFT) is given by = X x(n)e X w(n − mR) ∞ n=−∞ m=−∞ X ω x n w n mR e−jωn, ∞ m( )= X ( ) ( − ) −jωn n=−∞ = X x(n)e = X(ω), if COLA. n=−∞ where R is the hopsize, and m is the length of the window.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    5 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us