<<

Real-Time Time-Domain Pitch Tracking Using

Eric Larson Ross Maddox Departments of Mathematics, Physics, Center for Performing Arts Technology and Philosophy, Kalamazoo College University of Michigan School of Music [email protected] [email protected]

ABSTRACT

A pitch tracker based on the fast lifting transform (FLWT) is developed in MATLAB and C++. Emphasis is placed on low latency (~25 ms), high time resolution, and accuracy. The pitch tracking algorithm implements the FLWT us- ing the Haar wavelet, a transform which is shown to be mathematically equivalent to splitting a signal into low-pass and high-pass components and downsampling (generating approximations and details, respectively). Approximations are used in combination with intelligent peak detection to determine the pitch of vocal sam- ples. An averaging scheme is employed to provide enhanced resolution. This algorithm was tested on natural and synthetic signals—including a female vocal sample and two sets of generated sinusoidal signals—to determine its per- formance characteristics. Overall performance exceeded expectations, with most errors made in voiced/unvoiced detection.

I. Background organized pitches. These pitches, in com- Natural sounds are a composition of a bination with their timing, enable us to fundamental frequency with a set of har- distinguish Beethoven’s third symphony monics which occur at near integer from Led Zeppelin’s “Stairway to multiples of that fundamental. The fre- Heaven”. quency that the human ear interprets as the Pitch tracking, then, is useful for musi- pitch of a sound is this fundamental fre- cal analysis as well as for speech analysis. quency, even if it is absent in the sound. General pitch tracking is used in speech The pitch (or fundamental frequency)1 of recognition and identification schemes as natural sounds is important in many con- well as in the automated transcription of texts. Pitch is largely responsible for music. Real-time pitch tracking can be inflections in human speech that carry used as an aide for (or judge of) musical conversational cues (e.g. discriminating performers, revealing intricacies of per- between the spoken phrases “you are go- formance not immediately revealed by the ing to the store.” and “you are going to the ear. store?”). These inflections also play a role Tracking the pitch of natural sounds— in allowing us to consistently identify a here the focus is primarily on speech—is given speaker. In addition to speech, mu- not a trivial task. The topic of pitch track- sical compositions are typically sets of ing has been well explored, and there are several established pitch tracking methods available today, alongside several uncon- 1 Although there are classical distinctions between ventional ones. All pitch trackers have pitch and fundamental frequency, the two will be certain advantages and disadvantages. considered equivalent for the purposes of this pa- per.

1 Any real-time pitch tracker relies on determine the period and, hence, the pitch processing consecutive small portions of a of the signal. signal to produce pitch values. This proc- The primary frequency-domain analy- ess is called windowing. With a sample sis method is the cepstrum (whose name is rate of 44100 Hz, it is not uncommon to a rearrangement of the word spectrum), use a window length of 1024 or 2048 which involves taking the magnitude spec- samples—which correspond to minimum trum of the log of the magnitude spectrum pitch-determination latencies of about 25 of the windowed signal. The peaks of this and 50 ms, respectively. Processing time double-FFT can be examined to determine following the windowing adds to the la- the frequency of the original signal by re- tency. Although shorter windows can lead lying on the “periodicity” of its harmonics, to lower latency and higher time resolu- which in natural sounds are integer multi- tion, they can compromise resolution in ples of the fundamental. the . In the time domain, All real-time pitch trackers can be shorter windows limit the lowest detect- evaluated in terms of four main perform- able frequency. ance characteristics. First is computation Pitch trackers fall into two general time, which should be minimized in order categories: time-domain and frequency- to minimize latency. The cepstrum per- domain. The former analysis examines the forms two FFTs, and original signal, often applying filters requires many convolutions, which each and/or convolution to analyze the signal in consist of many multiplication opera- its original state, amplitude vs. time. The tions—which both methods require latter uses a transform (usually the Fast large computation time, and hence larger , FFT) to break the sig- latencies. Second is determination of nal down into its frequency components, voiced segments from unvoiced (or yielding information about its amplitude sounded from unsounded for instrumental vs. frequency. It then analyzes this to de- samples). A pitch tracker should be able to termine the fundamental frequency. Both distinguish unpitched consonants from of these have advantages and disadvan- pitched vowels, as well as ignore areas of tages when it comes to frequency silence. Third is pitch accuracy, how resolution and processing time. closely the pitch tracker’s estimate A good summary of established real- matches the true pitch. A pitch tracker time pitch detection methods is provided should have good resolution as well as in [2]. However, two of the most popular avoid making gross errors, such as dou- methods will be described briefly here. blings, halvings, and large spikes, which Autocorrelation is a primary time- can occur due to noise, transients, strong domain method. Autocorrelation selects a upper harmonics, and changing overall portion of the windowed signal and com- amplitude within a window. putes the correlation (sum of the products The currently established methods of divided by the length of the portion, a pitch tracking all have drawbacks in some measure of similarity between two signals) of these categories. In this paper, we pre- between that portion and an equal-length sent a pitch tracker with enhanced sliding segment of the windowed signal, at performance in all of these respects in an every point in that window. It then uses attempt to offer a viable, fast real-time al- the locations of spikes in the correlation to ternative to other pitch trackers.

2 Haar Wavelet The FLWT using a Haar wavelet is 1 mathematically equivalent to running a low-pass filter and downsampling to pro- duce the approximation component and running a high-pass filter and downsam- 0 pling to produce the detail component. The

Amplitude equations derived previously by De- bauchies and Sweldens for the FLWT with the Haar wavelet are

-1 0 1 2 dx0 ()nn= (2+ 1) Samples ax()nn= (2) 0 FIG. 1. The Haar wavelet, which serves as the dda100()nnn=− () () basis of the FLWT (fast lifting wavelet transform) aad()nnn= ()+ (), (1) used in this algorithm. 101

where x(n) is the original signal, a1(n) the II. Method first approximation, and d1(n) the first de- When visually examining a periodic tail. With some algebra, it follows that signal, it is almost always easy to see the these are equivalent to periodicity. In designing this pitch tracker, we set out to model the process of visually dx(nn )= (2+− 1) x (2 n ) determining the period of a windowed sig- 1 nal using simple logic and computationally xx(2nn+ 1)+ (2 ) a1 ()n = . (2) inexpensive transforms to find the extrema 2 which correctly reveal the periodicity of waveforms. From these equations it is clear that the approximation component is simply an A. Fast Lifting Wavelet Transform application of an averaging filter (a first- The algorithm uses an implementation order low-pass) with downsampling (tak- of the Fast Lifting Wavelet Transform ing every other result); and the detail (FLWT). A wavelet transform splits a sig- component is an application of a first- nal into an approximation and a detail. difference filter (a high-pass) with down- Successive transforms can be applied to (taking every other result). In the approximations to reduce noise and this way, the Haar wavelet FLWT pro- reveal underlying periodicity which is vides a quick method of splitting a signal normally more difficult to extract from the into high-pass and low-pass components. original signal. Any of several different This splitting scheme can help reveal mother wavelets can be used to perform underlying periodicity. By discarding the the transform. This pitch-tracking algo- detail component and performing another rithm implements a FLWT using the Haar FLWT on the approximation, additional wavelet. For details on wavelets and the levels of the wavelet transform are gener- FLWT in a general setting, refer to [1]. ated. The repeated (but limited) Figure 1 shows the Haar wavelet, the application of the FLWT in this manner mother wavelet used to generate the ideally yields a pitch-tracker that is robust FLWT of this algorithm. to noise and able to distinguish pitched sounds from unvoiced consonants. Using

3 Original Signal 80 Hz to 3000 Hz, the use of several re-

0.4 peated low-pass operations and downsamplings does not compromise the 0.2 frequency components that allow pitch 0 determination. Since at least two wavelet Amplitude -0.2 approximations must be performed, there

-0.4 is an inherent ideal frequency ceiling of 200 400 600 800 1000 Nyquist/4, which for CD sample rate is

Approximation Level 1 about 5500 Hz. See Figure 2 for an image of the wavelet transformation, containing 0.4 an original signal and the first, second, and

0.2 third wavelet approximations. Note that with each successive wavelet transform, 0 the number of samples in the approxima- Amplitude -0.2 tion is halved as the signal is split into

-0.4 approximation and detail; this provides a 100 200 300 400 500 practical operational limit on the number

Approximation Level 2 of wavelet transform levels that can be run on any given window of a signal. 0.4

0.2 B. Algorithm The steps of the pitch tracking algo- 0 rithm for each current window of the Amplitude -0.2 signal are as follows:

-0.4 50 100 150 200 250 1) Perform the FLWT, discard the detail

Approximation Level 3 component. 2) Find the first local maxi- 0.4 mum/minimum after each zero-

0.2 crossing of the current approximation. 3) Determine the distance between 0 the maxima/minima. Amplitude -0.2 4) Check to see if the mode distance is

-0.4 equivalent to that of the previous level. 20 40 60 80 100 120 If so take that to be the period and Samples move to the next window; if not, go to

FIG. 2. Amplitude vs. Time for an original signal the next level (not to exceed a speci- (top) and the first/second/third wavelet approxima- fied wavelet transform level limit) and tions (second, third, and fourth from the top). Note return to step 1; if the level limit is ex- the reduction of noise and simplification of the ceeded, assume that the signal is signal with each successive approximation. unpitched and move to the next win-

dow. this low-pass technique allows the com- puter to “see” the underlying periodic One of the first uses of the wavelet trans- waveform that is already apparent to the form in pitch detection came from human observer. Since the desired fre- Kadambe in [4], and proved successful. In quency of operation is from about

4 the same vein, Erçelebi in [3] used the the current wavelet level i in the following FLWT to speed up the implementation. way (where Fs is the sample rate): Our work is similar to his in many re- spects. We build on his work, employing ⎛⎞⎢⎥F δ = maxs , 1 . (3) several improvements: 1) a more robust ⎜⎟⎢⎥i ⎣⎦2 F peak-detection algorithm (accompanied by ⎝⎠ a more explicit description of method); 2) a different wavelet level decision scheme; Next, the distances between these 3) a transient detector that marks the win- peaks are calculated. For each maximum dowed signal as unpitched if the RMS of index mi, the distances between mi and the first third is more than four times that subsequent maximum indices mi+1, mi+2, of the last third (or vice versa); as well as mi+3, …, mi+N are calculated and stored 4) a mode-averaging method that increases (the number N of subsequent peaks taken frequency resolution, especially in the into consideration is a controllable pa- high frequencies. rameter). Taking several distance levels helps to ensure that a waveform which C. Peak Detection/Differences yields more than one maximum value per The maximum-finding process2 in- period is still analyzed correctly. While volves several steps. The average this may appear prima facie to also induce amplitude is calculated and subtracted halvings or thirdings of the frequency, the from every element of the windowed sig- finitude of the window size ensures that nal, removing the DC component.3 A there are more of the correct distance than lower threshold for maxima is set which is of its integer multiples resulting from tak- a percentage of the window global maxi- ing several levels of differences. mum. Next, the location of the first local These differences calculated using the maximum (following the first derivative maxima are combined with the differences test—a change from positive to negative calculated using the minima to determine a slope) between every set of zero crossings mode distance. For each distance in set, with an amplitude greater than the thresh- the number of distances close to (i.e. old is recorded as a location of a within a specified tolerance δ of) it are maximum. This procedure sets an upper counted; the distance with the most other limit on the number of maxima equal to distances close to it in value is taken to be one fewer than the number of zero cross- the center mode. In the case of a tie be- ings, providing an upper limit on the tween two such modes, the larger mode is taken if it is twice as long as the smaller frequency. Also, no maximum index mi can be recorded if it is within some mini- mode (biasing toward frequency halvings). mum distance δ of the previous maximum The mode from the previous window pitch detection is also passed to the function, index mi-1. The minimum distance δ is a controllable parameter related to the speci- assisting in the case of a near-tie. If the fied maximum returnable frequency F and number of occurrences of a mode candi- date is within one of that of the actual mode, it biases mode selection toward the 2 The minimum finding process is equivalent to the previous window’s period. maximum finding process, with appropriate sign and inequality adjustments D. Mode Averaging 3 In practice, the value is not truly subtracted from every element, but the method implemented gener- ates equivalent results with less computation.

5 This algorithm has been implemented Measured vs. Actual Frequency 1400 in MATLAB and C++; see appendix A for

1200 the MATLAB implementation.

1000

800 III. Results and Discussion

600 The controllable parameters of the al-

400 gorithm are the maximum frequency F from which δ is derived; the number of

Measured Frequency (Hz) 200 subsequent peaks N to consider when cal- 200 400 600 800 1000 1200 1400 culating distances; the global threshold percentage M of the window maximum Measured vs. Actual Frequency (no averaging) 1400 that a local max must exceed to be

1200 counted; and L, the maximum number of

1000 wavelet transforms to perform before a window is deemed pitchless. There was 800 much experimentation with all of these 600 numbers, and pitch detection performance 400 was highest using the values listed in Ta-

Measured Frequency(Hz) 200 ble I. 200 400 600 800 1000 1200 1400 The criteria used to evaluate the pitch Actual Frequency (Hz) tracker are computation time (latency), FIG. 3. The response of the pitch tracker to a 1- voiced/unvoiced determination, and over- Hz-stepped sinusoidal signal from 100 to 1500 Hz. all pitch detection accuracy. Each of these Averaging method (upper plot) improves accuracy performance characteristics is discussed over the non-averaging method (bottom). below. Performance was tested using re- corded or synthesized samples, rather than Once this center mode is selected, an live input to allow for greater control and averaging scheme is employed to increase repeatability of tests. In general, the pitch frequency resolution. The of the dis- tracker was calibrated to function over a tances within δ of the center mode is taken frequency range of about 100-1500 Hz, so to be the period of the signal (with appro- most tests were run on samples from 90- priate scaling by a power of two to 1440 Hz to allow test over a large range compensate for the downsampling of the and easy octave-based bracketing. FLWT). In the low frequencies, this aver- aging occurs over only a limited number A. Computation Time of periods and has a small effect on the The latency is comprised of two parts: increased resolution; in the high frequen- the buffering period, in which the win- cies, where a simple integer mode would yield a high pitch detection error (due to TABLE I. Pitch Tracker Parameters discretization and sample rate limitations), Maximum Difference Maxima FLWT the averaging function enhances the per- Frequency Levels Threshold Levels formance by providing a more accurate F N M L divisor when calculating frequency from 3000 Hz 3 0.75 6 period. See Figure 3 for an illustration of this increase in frequency resolution in a Table I. List of algorithm parameters experimentally four-octave test from 90 to 1440 Hz. determined to yield best performance.

6 Frequency vs. Time 400

300

200

Frequency (Hz) Frequency 100

0 0 5 10 15 20 25 Time (s)

FIG. 4. Frequency vs. Time for a 26-second sample of female vocals. One voiced-to-unvoiced and three unvoiced-to-voiced errors occurred (~0.4 % error rate). Good time and frequency resolution of the pitch tracker reveal vibrato, toward the end of sustained notes, and vocal trills.

dowed signal is acquired, and the pitch curred, yielding an error rate of less than computation on that window. For the tests 0.4%. Although this is not an exhaustive done on this pitch tracker, the sample rate analysis, it is representative of the algo- was 44100 Hz and the window size was rithm’s general behavior; voiced/unvoiced 1024 samples, yielding a buffering time of error rates among other test samples were 23 ms. Using a 3 GHz Pentium 4 Proces- less than 1%. sor, the computation time for each window in MATLAB was 4 ms, yielding a total la- C. Pitch Detection Accuracy tency of 27 ms. The C++ implementation The first pitch detection accuracy has an even shorter computation time, with measure was an error-per-octave test on overall latency even closer to the 23 ms sinusoidal waveforms of constant fre- buffering time. This implies that if the quency within a 1024-sample window. minimum frequency present in the signal The accuracy of the pitch tracker was very were closer to 200 Hz, the window size good throughout the tested frequency could be decreased to 512 samples, reduc- range of 90 – 1440 Hz, in increments of 1 ing both window computation time and Hz each interval. See Figure 5 values at buffering time, nearly cutting the latency each octave of the RMS error in Hz, RMS in half. error in cents (defined as hundredths of a musical semitone), and mean error in Hz. B. Voiced/Unvoiced Detection The maximum RMS error was about 0.6 The voiced/unvoiced detection in the Hz in the 720-1440 Hz range. Mean errors pitch tracker was its weakest performance were negligibly centered about zero. This category. There were specific windows test indicates extremely good performance which the pitch tracker failed to recognize on sinusoidal waveforms, with the RMS as unvoiced, yielding a spike in the fre- error increasing approximately exponen- quency where there should have been a tially with frequency, doubling with every zero to denote unvoiced. See Figure 4 for octave band. This leads to a nearly uni- an illustration of this on a 26-second fe- formly small (1 ± 0.5) RMS error in cents male vocal recording. In over 1000 sample across the octave bands tested. This indi- windows, three unvoiced-to-voiced errors cates that the pitch tracker has near- and one voiced-to-unvoiced error oc- uniform accuracy in terms of musical in-

7 RMS Error (Hz) vs. Octave TABLE II. Missing Fundamental First Failures 0.8

0.6 F0 90 127 180 255 360 509 720

0.4 Fn 720 1018 1620 1527 1440 1527 1440 0.2

RMS Error(Hz) n 8 8 9 6 4 3 2 0 90-180 180-360 360-720 720-1440 RMS Error (cents) vs. Octave Table II: Missing Fundamental test first failures. 2 For each fundamental frequency F0 (in Hz), the lowest frequency of the first adjacent upper har- 1 monic pair to fail, Fn (in Hz), is listed, as well as its harmonic number n. Note that the missing funda- mental detection fails (except for the 90 Hz case) 0 RMS Error (cents) Error RMS 90-180 180-360 360-720 720-1440 when at least one of Fn and Fn+1 is above the fre- quency range of the pitch tracker. Maximum Error Magnitude (cents) vs. Octave 6 within the JND an overwhelming propor- 4 tion of the time.

2 A second, qualitative analysis of accu-

Error (cents) racy was performed on a 26-second female 0 90-180 180-360 360-720 720-1440 vocal sample. See Figure 4 for the pitch Mean Error (Hz) vs. Octave tracker’s analysis of this sample. This 0.02 analysis reveals the details of the singer’s 0.01 vibrato at the end of words, and trills. No doublings or halvings occurred during the 0 analysis. Although rare, other tested sam-

Mean Error (Hz) -0.01 ples did have halvings, at a rate less than 90-180 180-360 360-720 720-1440 Frequency Range (Hz) one half of one percent. No doublings were found in any of the analyses of test samples. FIG. 5. RMS Error (Hz), RMS Error (cents), The third test was a missing- Maximum Error Magnitude (cents), and Mean Error (Hz) vs. Frequency Range (Hz) over four fundamental test. Signals were synthesized octaves. Mean error across all octaves is about 0 ± using upper harmonics (with at least one 0.01 Hz. RMS error increases exponentially, dou- odd harmonic), omitting the fundamental, bling with every octave, achieving a maximum for fundamental frequencies spanning 90 – value (worst performance) of about 0.6 Hz in the 720 Hz by half-octaves. For each fre- 720-1440 octave. quency, nine 100-window samples were constructed by adding the adjacent har- tervals across its frequency range. Addi- monic pairs (2nd and 3rd, then 3rd and 4th, tionally, this accuracy is below the just and so on up to the 10th and 11th). Each of noticeable difference (JND) of pitches. these nine samples was analyzed, and Ta- According to [5], although the JND varies ble II shows the threshold of errors for from person to person and across frequen- each missing fundamental F . cies, a good estimate is that the JND for 0 With the exception of the 90 Hz case, pitch is about 5 cents. Since the maximum the pitch tracker accurately estimated the error across all frequency bands was about pitch of the missing fundamental up to the 5 cents, the algorithm provides accuracy point at which the higher of the two summed harmonics was above the maxi-

8 mum detectable frequency. So long as value proves useful, and could be easily both harmonics were below around 2500 added to the existing code. Hz, the pitch tracker detected the missing In non real-time applications, a fundamental. In the 90 Hz case, the pitch smoothing function could also be imple- tracker began failing when the upper har- mented. Most pitch trackers that deal with monics were 720 and 810 Hz; the reason signals in non real-time implement some for this failure was not determined. How- form of a smoothing algorithm to deal ever, a signal containing only the 8th and with voiced/unvoiced errors and pitch dou- 9th harmonics is unlikely to be encountered blings and halvings. This algorithm would in practical use. Overall, the missing fun- benefit similarly from such a method, but damental performance was very good, would lose its real-time functionality. limited mainly by the highest detectable The pitch tracker compared favorably frequency of the pitch tracker. to the established methods of cepstrum and autocorrelation in all three categories. D. Discussion Computationally this algorithm was This pitch detection method has very cheaper, voiced/unvoiced errors were good resolution in the low frequencies similarly frequent across methods, and (due to its use of time-domain methods) pitch accuracy was as good or better. and in the high-frequencies (due to the av- eraging method). This resolution enables IV. Conclusions accurate pitch detection as outlined above. In this paper a real-time pitch tracking The use of a 1024-sample analysis win- algorithm was presented. Low measure- dow with sounds sampled at 44100 Hz ment error (high pitch detection accuracy) yields very good time resolution as well. expectations were exceeded by the algo- When dealing with vocal samples, this rithm due to its good frequency resolution algorithm is relatively immune to pitch in both low and high frequencies. Short doublings and halvings, which are a com- analysis window length led to good time mon problem for pitch trackers. For other resolution. Error rates (voiced/unvoiced types of sound, such as distorted electric and frequency halvings) were low, but fur- guitar, limited testing revealed doublings ther improvements could be made. due to strong upper harmonics. Halvings Computational costs were even lower than were rare in all samples tested, and did not expected, yielding very low latency peri- appear systematically. ods. Compared to established methods, An absolute threshold could be intro- this algorithm compares favorably, and duced to reduce the voiced/unvoiced errors could be useful in real-time applications seen with this algorithm. A threshold re- where latency and pitch detection accuracy duces spikes in unvoiced sections, but also are emphasized. trims very low-level voiced sections, which was considered a poor tradeoff. If a V. Acknowledgments high enough input signal level could be We would like to thank Prof. S. Errede guaranteed, an absolute threshold could be for the opportunity work for him this a valuable and computationally cheap im- summer; J. Boparai for his help in the lab; provement for real-time applications. For M. Winkler for working with us and help- non real-time applications, a threshold ing us in the lab; J. Beauchamp and M. based on the global maximum/minimum Bay for advice on pitch tracking; The Physics department at University of Illi-

9 nois at Urbana Champaign for hosting the REU program; and the National Science Foundation for support via grant PHY- 0243675.

VI. References [1] Daubechies, Ingrid and Wim Sweldens. “Factoring Wavelet Trans- forms into Lifting Steps.” J. Fourier Anal. Appl, 1998.

[2] de la Cuadra, Patricio et al. “Efficient Pitch Detection Techniques for Inter- active Music.” Proceedings of the 2001 International Computer Music …, 2001.

[3] Erçelebi, Ergun. “Second generation wavelet transform-based pitch period estimation and voiced/unvoiced deci- sion for speech signals.” Applied Acoustics 64 (2003) 25–41.

[4] Kadambe S, Faye Boudreaux-Bartels G. Application of the wavelets trans- form for pitch detection of speech signals. IEEE Trans on Information Theory 1992;38( 2):917–24.

[5] Rossing, Thomas D., The Science of Sound 2nd Ed, Addison-Wesley 1990.

10 Appendix A: wavePitch.m function freq = wavePitch(,fs,oldFreq) % % WAVEPITCH Determine the pitch of a given short portion of data. % % WAVEPITCH(data, fs, oldFreq) data is a [ 1 x samples ] array. Optional % inputs are fs (the sample rate of the signal, defaulting to 44100 if % unspecified), and oldFreq (the frequency from the previous window). % % N.B. Data input should be at least 256 samples long. 1024 is recommended. % It also must be a multiple of 64. % % Copyright 2005 Ross Maddox (University of Michigan) % and Eric Larson (Kalamazoo College) if (nargin < 1) return; if (nargin < 2) fs = 44100; if (nargin < 3) oldFreq = 0; end oldMode = 0; if(oldFreq) oldMode = fs/oldFreq; end dataLen = length(data); freq = 0; % The freq to return lev = 6; % Six levels of analysis globalMaxThresh = .75; % Thresholding of maximum values to consider maxFreq = 3000; % Yields minimum distance to consider valid diffLevs = 3; % Number of differences to go through (3 is diff @ third neighbor) maxCount(1) = 0; minCount(1) = 0; a(1,:) = data; aver = mean(a(1,:)); globalMax = max(a(1,:)); globalMin = min(a(1,:)); maxThresh = globalMaxThresh*(globalMax-aver) + aver; % Adjust for DC Offset minThresh = globalMaxThresh*(globalMin-aver) + aver; % Adjust for DC Offset

%% Begin pitch detection %% for (i = 2:lev) newWidth = dataLen/2^(i - 1);

%% Perform the FLWT %%

j = 1:newWidth; d(i,j) = a(i-1,2*j) - a(i-1,2*j-1); a(i,j) = a(i-1,2*j-1) + d(i,j)/2;

%% Find the maxes of the current approximation %%

minDist = max(floor(fs/maxFreq/2^(i-1)),1); maxCount(i) = 0; minCount(i) = 0;

climber = 0; % 1 if pos, -1 if neg if (a(i,2) - a(i,1) > 0) climber = 1; else climber = -1; end

canExt = 1; % Tracks whether an extreme can be found (based on zero crossings) tooClose = 0; % Tracks how many more samples must be moved before another extreme

for (j = 2:newWidth-1) test = a(i,j) - a(i,j - 1);

if (climber >= 0 && test < 0) if(a(i,j - 1) >= maxThresh && canExt && ~tooClose) maxCount(i) = maxCount(i) + 1; maxIndices(i,maxCount(i)) = j - 1; canExt = 0; tooClose = minDist; end climber = -1;

11

elseif (climber <= 0 && test > 0) if(a(i,j - 1) <= minThresh && canExt && ~tooClose) minCount(i) = minCount(i) + 1; minIndices(i,minCount(i)) = j - 1; canExt = 0; tooClose = minDist; end climber = 1; end

if (a(i,j) <= aver && a(i,j - 1) > aver) || (a(i,j) >= aver && a(i,j - 1) < aver) canExt = 1; end

if(tooClose) tooClose = tooClose - 1; end end

%% Calculate the mode distance between peaks at each level %%

if (maxCount(i) >= 2 && minCount(i) >=2)

% Calculate the differences at diffLevs distances

differs = []; for (j = 1:diffLevs) % Interval of differences (neighbor, next-neighbor...) k = 1:maxCount(i) - j; % Starting point of each run differs = [differs abs(maxIndices(i,k+j) - maxIndices(i,k))]; k = 1:minCount(i) - j; % Starting point of each run differs = [differs abs(minIndices(i,k+j) - minIndices(i,k))]; end

dCount = length(differs);

% Find the center mode of the differences

numer = 1; % Require at least two agreeing differs to yield a mode mode(i) = 0; % If none is found, leave as zero

for (j = 1:dCount)

% Find the # of times that distance j is within minDist samples of another distance numerJ = length(find( abs(differs(1:dCount) - differs(j)) <= minDist));

% If there are more, set the new standard if (numerJ >= numer && numerJ > floor(newWidth/differs(j))/4) if (numerJ == numer) if oldMode && abs(differs(j) - oldMode/(2^(i-1)) ) < minDist mode(i) = differs(j); elseif ~oldMode && (differs(j) > 1.95*mode(i) && differs(j) < 2.05*mode(i)) mode(i) = differs(j); end else numer = numerJ; mode(i) = differs(j); end elseif numerJ == numer-1 && oldMode && abs(differs(j)-oldMode/(2^(i-1))) < minDist mode(i) = differs(j); end end

%% Set the mode via averaging %%

if (mode(i)) mode(i) = mean(differs(find( abs(mode(i) - differs(1:dCount)) <= minDist) )); end

%% Determine if the modes are shared %%

if(mode(i-1) && maxCount(i - 1) >= 2 && minCount(i - 1) >= 2)

% If the modes are within a sample of one another, return the calculated frequency if (abs(mode(i-1) - 2*mode(i)) <= minDist ) freq = fs/mode(i-1)/2^(i-2); return; end end end end

12