Time-Frequency Analysis of Musical Rhythm Xiaowen Cheng, Jarod V

Time-frequency Analysis of Musical Rhythm Xiaowen Cheng, Jarod V. Hart, and James S. Walker

e shall use the mathematical tech- from a variety of music styles (rock drumming, niques of Gabor transforms and African drumming, and jazz drumming). We also continuous wavelet transforms to explore three examples of the connection between analyze the rhythmic structure of rhythm and melody (a jazz piano piece, a Bach pi- Wmusic and its interaction with melod- ano transcription, and a jazz orchestration). These ic structure. This analysis reveals the hierarchical examples provide empirical justification of our structure of rhythm. Hierarchical structure is com- method. Finally, we explain how the parameters mon to rhythmic performances throughout the for percussion scalograms are chosen in order world’s music. The work described here is interdis- to provide a satisfactory display of the pulse ciplinary and experimental. We use mathematics to trains that characterize a percussion passage (a aid in the understanding of the structure of music, key component of our method). A brief concluding and have developed mathematical tools that (while section provides some ideas for future research. not completely finished) have shown themselves to be useful for this musical analysis. We aim to Gabor Transforms and Music explore ideas with this paper, to provoke thought, We briefly review the widely employed method of not to present completely finished work. Gabor transforms [17], also known as short-time The paper is organized as follows. We first Fourier transforms, or spectrograms, or sonograms. summarize the mathematical method of Gabor The first comprehensive effort in employing spec- transforms (also known as short-time Fourier trans- trograms in musical analysis was Robert Cogan’s forms, or spectrograms). Spectrograms provide a masterpiece, New Images of Musical Sound [9]—a tool for visualizing the patterns of time-frequency book that still deserves close study. In [12, 13], structures within a musical passage. We then re- Dörfler describes the fundamental mathematical view the method of percussion scalograms, a new aspects of using Gabor transforms for musical technique for analyzing rhythm introduced in [34]. analysis. Two other sources for applications of After that, we show how percussion scalograms are short-time Fourier transforms are [31, 25]. There used to analyze percussion passages and rhythm. is also considerable mathematical background We carry out four analyses of percussion passages in [15, 16, 19], with musical applications in [14]. Using sonograms or spectrograms for analyzing Xiaowen Cheng is a student of mathematics at the Uni- the music of bird song is described in [21, 30, 26]. versity of Minnesota–Twin Cities. Her email address is The theory of Gabor transforms is discussed in [email protected]. complete detail in [15, 16, 19], with focus on its Jarod V. Hart is a student of mathematics at the Universi- discrete aspects in [1, 34]. However, to fix our ty of Kansas, Lawrence. His email address is jhart@math. notations for subsequent work, we briefly describe ku.edu. this theory. James S. Walker is professor of mathematics at the Uni- The sound signals that we analyze are all dig- versity of Wisconsin–Eau Claire. His email address is ital, hence discrete, so we assume that a sound [email protected]. signal has the form {f (tk)}, for uniformly spaced

356 Notices of the AMS Volume 56, Number 3 (a) (b) (c) Figure 1. (a) Signal. (b) Succession of shifted window functions. (c) Signal multiplied by middle window in (b); an FFT can now be applied to this windowed signal.

values tk = k∆t in a finite interval [0,T]. A Gabor support of w(tk − τm) for each m. The efficacy of transform of f , with window function w, is defined these Gabor transforms is shown by how well they as follows. First, multiply {f (tk)} by a sequence of produce time-frequency portraits that accord well M shifted window functions {w(tk − τ`)}`=0, produc- with our auditory perception, which is described M ing time localized subsignals, {f (tk)w(tk − τ`)}`=0. in the vast literature on Gabor transforms that we M briefly summarized above. Uniformly spaced time values, {τ` = tj`}`=0 are used for the shifts (j being a positive integer greater than 1). The windows {w(t − τ )}M are k ` `=0 1.5 all compactly supported and overlap each other. See Figure 1. The value of M is determined by the minimum number of windows needed to cover 1 [0,T], as illustrated in Figure 1(b). Second, because w is compactly supported, we treat each subsignal {f (tk)w(tk − τ`)} as a finite 0.5 sequence and apply an FFT F to it. (A good, brief explanation of how FFTs are used for frequency 0 analysis can be found in [1].) This yields the Gabor transform of {f (tk)}: M (1) {F{f (tk)w(tk − τ`)}} . −0.5 `=0 −0.5 −0.25 0 0.25 0.5 Note that because the values tk belong to the finite (a) (b) interval [0,T], we always extend our signal values beyond the interval’s endpoints by appending Figure 2. (a) Blackman window, λ = 1. Notice zeroes, hence the full supports of all windows are that it closely resembles the classic Gabor included. window—a bell curve described by a Gaussian The Gabor transform that we employ uses a exponential—but it has the advantage of Blackman window defined by compact support. (b) Time-frequency  0.42 + 0.5 cos(2πt/λ)+ representation—the units along the horizontal  w(t) = 0.08 cos(4πt/λ) for |t| ≤ λ/2 are in seconds, along the vertical are in Hz—of  three Blackman windows multiplied by the real 0 for |t| > λ/2 part of the kernel eii2πnk/N of the FFT used in a for a positive parameter λ equaling the width Gabor transform, for three different frequency of the window where the FFT is performed. The values n. Each horizontal bar accounts for Fourier transform of the Blackman window is very 99.99% of the energy of the cosine-modulated nearly positive (negative values less than 10−4 in Blackman window (Gabor atom) graphed size), thus providing an effective substitute for a below it. Gaussian function (which is well known to have minimum time-frequency support). See Figure 2. It is interesting to listen to the sound created by the three Gabor atoms in Figure 2(b). You Further evidence of the advantages of Blackman- can watch a video of the spectrogram being windowing is provided in [3, Table II]. In Figure 2(b) traced out while the sound is played by going we illustrate that for each windowing by w(tk − τm) to the following webpage: we finely partition the frequency axis into thin rectangular bands lying above the support of the (2) http://www.uwec.edu/walkerjs/TFAMRVideos/ window. This provides a thin rectangular partition and selecting the video for Gabor Atoms. The of the (slightly smeared) spectrum of f over the sound of the atoms is of three successive pure

March 2009 Notices of the AMS 357 (a) Drum Clip (b) Piano scale notes (c) Bach melody Figure 3. Three spectrograms. (a) Spectrogram of a drum solo from a rock song. (b) Notes along a piano scale. (c) Spectrogram of a piano solo from a Bach melody.

tones, on an ascending scale. The sound occurs short horizontal segments are the fundamentals precisely when the cursor crosses the thin dark and overtones of the piano notes. There are also bands in the spectrogram, and our aural perception thin vertical swatches located at the beginning of of a constant pitch matches perfectly with the each note. They are the percussive attacks of the constant darkness of the thin bands. These Gabor notes (the piano is, in fact, classed as a percussive atoms are, in fact, good examples of individual instrument). notes. Much better examples of notes, in fact, than Part (c) of Figure 3 shows a spectrogram of the infinitely extending (both in past and future) a clip from a piano version of a famous Bach sines and cosines used in classical Fourier analysis. melody. This spectrogram is much more complex, Because they are good examples of pure tone notes, rhythmically and melodically, than the first two these Gabor atoms are excellent building blocks passages. Its melodic complexity consists in its for music. polyphonic nature: the vertical series of horizontal We shall provide some new examples that fur- segments are due to three-note chords being played ther illustrate the effectiveness of these Gabor on the treble scale and also individual notes played transforms. For all of our examples, we used 1024 as counterpoint on the bass scale.1 (This contrasts point FFTs, based on windows of support Ü 1/8 sec with the single notes in the monophonic passage with a shift of ∆τ ≈ 0.008 sec. These time-values in (b).) We will analyze the rhythm of this Bach are usually short enough to capture the essential melody in Example 5 below. features of musical frequency change. In Figure 3 we show three basic examples of Scalograms, Percussion Scalograms, and spectrograms of music. Part (a) of the figure shows Rhythm a spectrogram of a clip from a rock drum solo. In this section we briefly review the method of Notice that the spectrogram consists of dark ver- scalograms (continuous wavelet transforms) and tical swatches; these swatches correspond to the then discuss the method of percussion scalograms. striking of the drum, which can be verified by watching a video of the spectrogram (go to the Scalograms website in (2) and select the video Rock Drum Solo). As the cursor traces over the spectrogram in the The theory of continuous wavelet transforms is video, you will hear the sound of the drum strikes well-established [10, 8, 27]. A CWT differs from a during the times when the cursor is crossing a spectrogram in that it does not use translations vertical swatch. The reason why the spectrogram of a window of fixed width; instead it uses trans- consists of these vertical swatches will be explained lations of differently sized dilations of a window. in the next section. These dilations induce a logarithmic division of the Part (b) of Figure 3 shows a spectrogram of a frequency axis. The discrete calculation of a CWT recording of four notes played on a piano scale. that we use is described in [1, Section 4]. We shall Here the spectrogram shows two features. Its main only briefly review the definition of the CWT in feature is a set of four sections consisting of groups of horizontal line segments placed verti- 1The chord structure and counterpoint can be determined cally above each other. These vertical series of either by careful listening or by examining the score [2].

358 Notices of the AMS Volume 56, Number 3 order to fix our notation. We then use it to analyze Notice that the complex exponential ei2πνt/ω has percussion. frequency ν/ω. We call ν/ω the base frequency. Given a function Ψ, called the wavelet, the It corresponds to the largest scale s = 1. The bell- W −1/2 −π(t/ω)2 continuous wavelet transform Ψ [f ] of a sound shaped factor ω e in (7) damps down signal f is defined as the oscillations of Ψ, so that their amplitude is 1 Z ∞ t − τ significant only within a finite region centered (3) W [f ](τ, s) = √ f (t) dt Ψ Ψ at t = 0. See Figures 13 and 14. Since the scale s −∞ s parameter s is used in a reciprocal fashion in for scale s > 0 and time-translation τ. For the Equation (3), it follows that the reciprocal scale function in the integrand of (3), the variable s Ψ 1/s will control the frequency of oscillations of the produces a dilation and the variable τ produces a function s−1/2 (t/s) used in Equation (3). Thus, translation. Ψ frequency is described in terms of the parameter We omit various technicalities concerning the 1/s, which Equation (6) shows is logarithmically types of functions that are suitable as wavelets; Ψ scaled. This point is carefully discussed in [1] and see [8, 10, 27]. In [8, 11], Equation (3) is derived from a simple analogy with the logarithmically [34, Chap. 6], where Gabor scalograms are shown structured response of our ear’s basilar membrane to provide a method of zooming in on selected to a sound stimulus f . regions of a spectrogram. We now discretize Equation (3). First, we assume that the sound signal f (t) is non-zero only over Pulse Trains and Percussion Scalograms the time interval [0,T]. Hence (3) becomes The method of using Gabor scalograms for analyz- 1 Z T t − τ ing percussion sequences was introduced by Smith W [f ](τ, s) = √ f (t) dt. in [32] and described empirically in considerable Ψ s Ψ s 0 detail in [33]. The method described by Smith We then make a Riemann sum approximation to involved pulse trains generated from the sound this last integral using tm = m∆t, with uniform signal itself. Our method is based on the spectro- spacing ∆t = T /N, and discretize the time variable gram of the signal, which reduces the number of τ, using τk = k∆t. This yields samples and hence speeds up the computation, (4) N−1 making it fast enough for real-time applications. T 1 X W [f ](τ , s) ≈ √ f (t ) ([t − τ ]s−1). (An alternative method based on an FFT of the Ψ k m Ψ m k N s m=0 whole signal, the phase vocoder, is described in The sum in (4) is a correlation of two discrete [31].) sequences. Given two N-point discrete sequences Our discussion will focus on a particular percussion sequence. This sequence is a passage from {fk} and {Ψk}, their correlation {(f : Ψ)k} is defined by the song, Dance Around. Go to the URL in (2) and select the video, Dance Around percussion, to hear N−1 X this passage. Listening to this passage you will hear (5) (f : Ψ)k = fmΨm−k . m=0 several groups of drum beats, along with some shifts in tempo. This passage illustrates the basic (Note: For the sum in (5) to make sense, the sequence principles underlying our approach. { } is periodically extended, via := .) Ψk Ψ−k ΨN−k In Figure 4(a) we show the spectrogram of the Thus, Equations (4) and (5) show that the CWT, at each scale s, is approximated by a mul- Dance Around clip. This spectrogram is composed of a sequence of thick vertical segments, which tiple of a discrete correlation of {fk = f (tk)} and s −1/2 −1 we will call vertical swatches. Each vertical swatch {Ψk = s Ψ(tks )}. These discrete correlations are computed over a range of discrete values of s, corresponds to a percussive strike on a drum. typically These sharp strikes on drum heads excite a con- tinuum of frequencies rather than a discrete tonal −r/J (6) s = 2 , r = 0, 1, 2,...,I · J, sequence of fundamentals and overtones. Because where the positive integer I is called the number of the rapid onset and decay of these sharp strikes octaves and the positive integer J is called the num- produce approximate delta function pulses—and a ber of voices per octave. For example, the choice delta function pulse has an FFT that consists of a of 6 octaves and 12 voices corresponds—based on constant value for all frequencies—it follows that the relationship between scales and frequencies these strike sounds produce vertical swatches in described below—to the equal-tempered scale used the time-frequency plane. for pianos. Our percussion scalogram method has the The CWTs that we use are based on Gabor following two parts: wavelets. A Gabor wavelet, with width parameter ω I. Pulse train generation. We generate a and frequency parameter ν, is defined as follows: “pulse train”, a sequence of subintervals −1/2 −π(t/ω)2 i2πνt/ω (7) Ψ(t) = ω e e . of 1-values and 0-values (see the graph at

March 2009 Notices of the AMS 359 0 2.048 4.096 sec

s−1 = 24 ∼ 8.93

−1 2 strikes s = 2 ∼ 2.23 sec

0 s−1 = 20 ∼ 0.56 (a) (b) G Figure 4. Calculating a percussion scalogram for the Dance Around sound clip. (a) Spectrogram of sound waveform with its pulse train graphed below it. (b) Percussion scalogram and the pulse train graphed above it. The dark region labeled by G corresponds to a collection of drum strikes that we hear as a group, and within that group are other subgroups over shorter time scales that are indicated by the splitting of group G into smaller dark blobs as one goes upwards in the percussion scalogram (those subgroups are also aurally perceptible). See Figure 5 for a better view of G.

the bottom of Figure 4(a)). The rectangular- thick vertical line segments at the top half shaped pulses in this pulse train correspond of the scalogram correspond to the drum to sharp onset and decay of transient bursts strikes, and these segments ﬂow downward in the percussion signal graphed just above and connect together. Within the middle of the pulse train. The widths of these pulses the time-interval for the scalogram, these are approximately equal to the widths of the drum strike groups join together over four vertical swatches shown in the spectrogram. levels of hierarchy (see Figure 5). Listening to Most importantly, the location and duration this passage, you can perceive each level of of the intervals of 1-values corresponds to this hierarchy. our hearing of the drum strikes, while the location and duration of the intervals of Now that we have outlined the basis for the per- 0-values corresponds to the silences between cussion scalogram method, we can list it in detail. the strikes. In Step 1 of the method below we The percussion scalogram method for analyzing describe how this pulse train is generated. percussive rhythm consists of the following two II. Gabor CWT. We use a Gabor CWT to an- steps. alyze the pulse train. This CWT calculation Percussion Scalogram Method is performed in Step 2 of the method. The Step 1. Let {g(τm, yk)} be the spectrogram image, rationale for performing a CWT is that the like in Figure 4(a). Calculate the average g¯ over all pulse train is a step function analog of a frequencies at each time-value τm: sinusoidal of varying frequency. Because of P−1 this analogy between tempo of the pulses and 1 X (8) g(τ¯ ) = g(τ , y ), frequency in sinusoidal curves, we employ m P m k a Gabor CWT for analysis. As an example, k=0 see the scalogram plotted in Figure 4(b). The (where P is the total number of frequencies yk),

360 Notices of the AMS Volume 56, Number 3 α β and denote the average of g¯ by A:

M 1 X Level 1 Level 1 (9) A = g(τ¯ ). + m M 1 m=0

Then the pulse train {P(τm)} is deﬁned by

(10) P(τm) = 1{τk : g(τ¯ k)>A}(τm). 2 Level 2 where 1 is the indicator function. The values Level 2 {P(τ )} describe a pulse train whose intervals of m Level 3 1-values mark off the position and duration of the vertical swatches (hence of the drum strikes). See Figure 6. Step 2. Compute a Gabor CWT of the pulse train signal {P(τm)} from Step 1. This Gabor CWT provides an objective picture of the varying rhythms within a percussion performance. Remarks. (a) For the time intervals corresponding to vertical swatches, equations (8) and (9) produce Level 4 Level 3 values of g¯ that lie above the average A (because A is pulled down by the intervals of silence). See Figure 5. A rhythm hierarchy, obtained from Figure 6(a). For some signals, where the volume the region corresponding to G in Figure 4. The level is not relatively constant (louder passages hierarchy has two parts, labeled α and β. In interspersed with quieter passages) the total av- each part the top level, Level 1, comprises the erage A will be too high (the quieter passages individual strikes. These strikes merge at will not contribute to the pulse train). We should Level 2 into regions which correspond to instead be computing local averages over several double strikes and which are aurally (but not all) time-values. We leave this as a goal perceptible as groupings of double strikes. for subsequent research. In a large number of Notice that the Level 2 regions for β lie at cases, such as those discussed in this article, we positions of slightly increasing then have found that the method described above is decreasing strike-frequency as time proceeds; adequate. (b) For the Dance Around passage, the this is aurally perceptible when listening to entire frequency range was used, as it consists the passage. There is also a Level 3 region for entirely of vertical swatches corresponding to the α that merges with the Level 2 regions for β to percussive strikes. When analyzing other percus- comprise the largest group G. sive passages, we may have to isolate a particular frequency range that contains just the vertical swatches of the drum strikes. We illustrate this later in the musical examples we describe (see the research). However, the Haar CWT is more difficult next section, “Examples of Rhythmic Analysis”). to interpret, as shown in Figure 7. (c) We leave it as an exercise for the reader to We have already discussed the percussion show that the calculation of g(τ¯ m) can actually scalogram in Figure 4(b). We shall continue this be done in the time-domain using the data from discussion and provide several more examples the windowed signal values. (Hint: Use Parseval’s of our method in the next section. In each case, theorem.) We chose to use the spectrogram values we find that a percussion scalogram allows us to because of their ease of interpretation—especially finely analyze the rhythmic structure of percussion when processing needs to be done, such as using sequences. only a particular frequency range. The spectrogram provides a lot of information to aid in the Examples of Rhythmic Analysis processing. (d) Some readers may wonder why we have computed a Gabor CWT in Step 2. Why not As discussed in the previous section, a percussion compute, say, a Haar CWT (which is based on a scalogram allows us to perceive a hierarchal orga- step function as wavelet)? We have found that nization of the strikes in a percussion sequence. a Haar CWT does provide essentially the same Hierarchical structures within music, especially information as the magnitudes of the Gabor CWT within rhythmic passages and melodic contours, (which is all we use in this article; using the phases is a well-known phenomenon. For example, in an of the complex-valued Gabor CWT is left for future entertaining and thought-provoking book [24] with an excellent bibliography, This Is Your Brain On 2 The indicator function 1S for a set S is defined by 1S (t) = Music, Daniel Levitin says in regard to musical 1 when t ∈ S and 1S (t) = 0 when t ∉ S. production (p. 154):

March 2009 Notices of the AMS 361 6 1.5

4 1

2 0.5

0 0

−2 −0.5 0 1.024 2.048 3.072 4.096 0 1.024 2.048 3.072 4.096 Figure 6. Creation of a pulse train. On the left we show the graph of g¯ from equation (8) for the spectrogram of the Dance Around sound clip [see Figure 4(a)], which we have normalized to have an average of A = 1. The horizontal line is the graph of the constant function 1. The pulse train, shown on the right, is then created by assigning the value 1 when the graph of g¯ is larger than A, and 0 otherwise.

Our memory for music involves hierarchical strong and weak beats that we count off as encoding—not all words are equally salient, “ONE-two-THREE-four.” The overall pattern and not all parts of a musical piece hold equal is summed up in musical notation as the status. We have certain entry points and exit time signature…The third representation is a points that correspond to specific phrases reductional structure. It dissects the melody in the music…Experiments with musicians into essential parts and ornaments. The orna- have confirmed this notion of hierarchical ments are stripped off and the essential parts encoding in other ways. Most musicians can- further dissected into even more essential not start playing a piece of music they know parts and ornaments on them…we sense it at any arbitrary location; musicians learn when we recognize variations of a piece in music according to a hierarchical phrase classical music or jazz. The skeleton of the structure. Groups of notes form units of melody is conserved while the ornaments practice, these smaller units are combined differ from variation to variation. into larger units, and ultimately into phrases; phrases are combined into structures such In regard to the strong and weak beats referred as verses and choruses of movements, and to by Pinker, we observe that these are reflected by ultimately everything is strung together as a the relative thickness and darkness of the vertical musical piece. segments in a percussion scalogram. For exam- In a similar vein, related to musical theory, Steven ple, when listening to the Dance Around passage, Pinker summarizes the famous hierarchical theory the darker groups of strikes in the percussion of Jackendoff and Lerdahl [23, 22] in his fascinating scalograms seem to correlate with loudness of book, How The Mind Works [28, pp. 532–533]: the striking. This seems counterintuitive, since the pulse train consists only of 0’s and 1’s, which Jackendoff and Lerdahl show how melodies are formed by sequences of pitches that would not seem to reflect varying loudness. This are organized in three different ways, all phenomenon can be explained as follows. When a at the same time…The first representation pulse is very long, that requires a more energetic is a grouping structure. The listener feels striking of the drum, and this more energetic that groups of notes hang together in mo- playing translates into a louder sound. The longer tifs, which in turn are grouped into lines pulses correspond to darker spots lower down on or sections, which are grouped into stanzas, the scalogram, and we hear these as louder sounds. movements, and pieces. This hierarchical tree (The other way that darker spots appear lower is similar to a phrase structure of a sentence, down is in grouping of several strikes. We do not and when the music has lyrics the two part- hear them necessarily as louder individual sounds, ly line up...The second representation is a but taken together they account for more energy metrical structure, the repeating sequence of than single, narrow pulses individually.)

362 Notices of the AMS Volume 56, Number 3 (a) (b) Figure 7. (a) Magnitudes of Gabor CWT of a pulse sequence. (b) Haar CWT of the same pulse sequence.

With these descriptions of the hierarchical scalograms can be used to read off a musical score structure of music in mind, we now turn to repre- for the drumming from its recorded sound. This sentations of them within four different percussion is important because percussion playing is often sequences. extemporaneous, hence there is a need for notating particularly important extemporaneous passages Example 1: Rock Drumming as an aid to their repetition by other performers. In Figure 8 we show a percussion scalogram for a There is, however, much more information in a clip from a rock drum solo, which we have partially percussion scalogram. We can also use parentheses analyzed in the previous section. Here we complete to mark off the groupings of the notes into their our analysis by describing the hierarchy shown in hierarchies, as follows: the scalogram in a more formal, mathematical way, and then introducing the notion of production rules ∗| (∗| ∗| ∗|) ((∗|(∗|∗| ))(∗|∗| )(∗|∗|)(∗|∗|)) for the performance of the percussion sequence. (∗| ∗|) ((∗|(∗| ∗|)(∗|∗| ))(∗|∗|)) We can see that there are five separate groupings of drum strikes in the scalogram in Figure 8: 0 0 The advantage of this notation over the previous A B C B C one is that the hierarchical groupings of notes is (1-level) (2-levels) (4-levels) (2-levels) (4-levels) indicated. We believe this enhanced notation, along The separate hierarchies within these groupings with the videos that we create of sound with per- can be symbolized in the following way. We use cussion scalograms, provide an important tool for the notation ∗| to symbolize a “whole note”, a analyzing the performance of percussion sequences. longer duration, more emphasized strike. And the For example, they may be useful in teaching per- notation ∗| to symbolize a “half note”, a shorter formance technique (recall Levitin’s discussion of duration, less emphasized strike. This allows us to how musicians learn to play musical passages) by symbolize the different emphases in the rhythm. adding two adjuncts, notation plus video, to aid the Furthermore, the underscore symbol will be ear in perceiving subtle differences in performance used to denote a rest between strikes (notes). For technique. example, ∗| ∗| symbolizes a half note followed by In addition to this symbolic notation for per- a rest followed by a whole note. Using this notation, cussion passages, there is an even deeper (and the strikes in Figure 8 are symbolized by somewhat controversial) notion of production rules for the generation of these percussion sequences (analogous to Chomsky’s notion of “deep structure” ∗| ∗| ∗| ∗| ∗|∗|∗| ∗|∗| ∗|∗|∗|∗| ∗| ∗| ∗|∗| ∗|∗|∗| ∗|∗| in linguistics that generates, via production rules, the syntactical hierarchy of sentences: [22, Sec- This notation is essentially equivalent to the tion 11.4] and [7, Sections 5.2, 5.3]). Examples of standard notation for drumming used in musical these rules in music are described in [22, pp. 283– scores (for examples of this notation, see [29] 285, and p. 280]. Rather than giving a complete and [5]). We have thus shown that percussion mathematical description at this time, we will

March 2009 Notices of the AMS 363 This passage is quite interesting in that it is A B C B0 C0 comprised of only 20 drum strikes, yet we shall see that it contains a wealth of complexity. First, we can see that there are seven separate groupings of drum strikes in the scalogram in Figure 9:

A B C A B C D (1-level) (3-level) (3-level) (1-level) (3-level) (3-level) (2-level)

Notice the interweaving of different numbers of levels within this sequence of groups. Second, the drum strikes can be notated with hierarchical grouping as follows: ∗| ((∗|∗|) (∗| ∗| )) ((∗|∗|)(∗| )(∗|)) ∗| ((∗|∗|) (∗| ∗| )) ((∗|∗|)(∗| )(∗|)) (∗|∗|) This passage is interesting not only in terms of the complex hierarchical grouping of notes, but also Figure 8. Percussion scalogram for rock because of the arrangement of the time intervals drumming. The labels are explained in between notes. It is a well-known fact among Example 1. To view a video of the percussion musicians that the silences between notes are scalogram being traced out along with the at least as important as the notes themselves. drumming sound, go to the URL in (2) and In this passage we have the following sequence select the video for Dance Around percussion. of time-intervals between notes (1 representing a short rest, 2 representing a long rest, and 0 representing no rest): simply give a couple of examples. For instance, the 2011100022011100010 grouping B0 in the passage is produced from the which quantitatively describes the “staggered” grouping B, by clipping two strikes off the end: sound of the drum passage. (The reader might find 0 B ← End(B) it interesting to compute the sequence of rests for As another example, if we look at the starting notes Example 1, and verify that it is less staggered, with in the groups C and C0 defined by longer sequences of either 1’s or 0’s.) | | | 0 | | | Start(C) := ∗(∗∗ ) and Start(C ) := ∗(∗ ∗) Example 3: Jazz Drumming 0 then Start(C ) is produced from Start(C) by a In this example we consider a couple of cases of modulation of emphases: jazz drumming. In Figure 10(a) we show a percus- Start(C0) ← Modulation(Start(C)) sion scalogram created from the drum solo at the beginning of the jazz classic, Sing Sing Sing. The In this paper we are only giving these two examples tempo of this drumming is very fast. Our notation of production rules, in order to give a flavor of the for this sequence was obtained from examining idea. A more complete discussion is a topic for a the percussion scalogram both as a picture and as future paper. the video sequence (referred to in the caption of the figure) is played. Here is the notated sequence: Example 2: African Drumming Our second example is a passage of African drum- Very fast ming, clipped from the beginning of the song ∗| ∗| ∗| ∗| ∗| ∗| ∗| ∗| ∗| ∗| ∗| Welela from an album by Miriam Makeba. In this ∗| ∗| ∗| ∗| ∗| ∗| ∗| ∗| ∗| ∗| ∗| ∗| case, the spectrogram of the passage, shown in Figure 9(a), has some horizontal banding at lower Our notation differs considerably from the notation frequencies that adversely affect the percussion given for this beginning drum solo in the original scalogram by raising the mean value of the spectro- score [29], the first several notes being: gram averages. Consequently, we used only values Jungle Drum Swing from the spectrogram that are above 1000 Hz ∗| ∗| ∗| ∗| ∗| ∗| ∗| ∗| ∗| to compute the percussion scalogram shown in Figure 9(b). By listening to the video referenced in Setting aside its racist overtones, we observe that the caption of Figure 9, you should find that this this tempo instruction is not terribly precise. We percussion scalogram does accurately capture the can see from comparing these two scores, that timing and grouping of the drum strikes in the the drummer (Gene Krupa) is improvising the passage. percussion (as is typical with jazz). Our percussion

364 Notices of the AMS Volume 56, Number 3 A B C A B C D

1000

0 (a) (b) BC BCD Figure 9. (a) Spectrogram for African drumming. Between 0 and 1000 Hz, as marked on the right side of (a), there are a considerable number of horizontal line segments. Those segments adversely affect the percussion scalogram. Consequently only frequencies above 1000 Hz are used to create the percussion scalogram. (b) Percussion scalogram for African drumming, using frequencies above 1000 Hz. The labels are explained in Example 2. To view a video of the percussion scalogram being traced out along with the drumming sound, go to the URL in (2) and select the video for Welela percussion. scalogram method allows us to derive a precise within the larger groupings notated as hand claps notation for Krupa’s improvisation. We leave it as in the original score. an exercise for the reader to notate the hierarchical These examples are meant to illustrate that the structure of this drum passage, based on the percussion scalogram method can provide useful percussion scalogram. From our notation above, musical analyses of drumming rhythms. Several we find that the pattern of rests in Krupa’s playing more examples are given at the Pictures of Music has this structure: website [6]. We now provide some examples of using both spectrograms and percussion scalograms to analyze both the melodic and rhythmic 20111012211212110001201 aspects of music. Because they are based on an Here, as with the African drumming, we see a assumption of intense pulsing in the musical signal staggered pattern of rests. due to percussion, which is only satisfied for some Our second example of jazz drumming is a clip tonal instruments, percussion scalograms do not of the beginning percussive passage from another always provide accurate results for tonal instru- jazz classic, Unsquare Dance. In the score for the ments. However, when they do provide accurate piece [5], the following pattern of strikes (indicated results (a precise description of the timings of the as hand clapping) notes), they reveal the rhythmic structure of the | | | | music (which is our goal). We now provide three ∗ ∗ ∗ ∗ examples of successful analyses of melody and is repeated in each measure (consistent with the rhythm. 7/4 time signature). Listening to the passage as the video is played, we can hear this repeated series of “strikes” as groups of very fast individual Example 4: A Jazz Piano Melody strikings of drumsticks. The drummer (Joe Morello) In Figure 11(a) we show a percussion scalogram is improvising on the notated score by replacing of a recording of a jazz piano improvisation by individual hand claps by these very rapid strikings Erroll Garner. It was captured from a live recording of his drumsticks. It is noteworthy that, in many [18]. Since this is an improvisation, there is no instances, the percussion scalogram is sensitive musical score for the passage. Several aspects enough to record the timings of the individual of the scalogram are clearly evident. First, we can drumstick strikings. The scalogram is thus able see a staggered spacing of rests as in the African to reveal, in a visual representation, the double drumming in Example 2 and the jazz drumming aspect to the rhythm: individual drum strikings in the Sing Sing Sing passage in Example 3. There

March 2009 Notices of the AMS 365 (a) (b) Figure 10. (a) Percussion scalogram for drum solo in Sing Sing Sing using frequencies above 1000 Hz. (b) Percussion scalogram for complex drum stick percussion in Unsquare Dance using frequencies above 2000 Hz. To view videos of these percussion scalograms being traced out along with the drumming sound, go to the URL in (2) and select the videos for Sing Sing Sing percussion or Unsquare Dance percussion.

(a) (b) Figure 11. (a) Percussion scalogram of a clip from an Erroll Garner jazz piano passage (using frequencies above 1800 Hz). To view a video of the percussion scalogram being traced out along with the piano playing, go to the URL in (2) and select the video for Erroll Garner piano recording. The label S indicates a syncopation in the melody. (b) Percussion scalogram from a clip of a piano interpretation of a Bach melody (using frequencies above 3000 Hz). To view a video of the percussion scalogram being traced out along with the piano sound, go to the URL in (2) and select the video for Bach piano piece (scalogram).

is also a syncopation in the melody, indicated by perform perfectly here (for example the last note the interval marked S in Figure 11(a). By syncopa- in the sequence marked S is split in two at the tion we mean an altered rhythm, “ONE-two-three- top; the scalogram has detected the attack and the FOUR,” rather than the more common “ONE-two- decay of the note), when viewed as a video the THREE-four.” The percussion scalogram provides percussion scalogram does enable us to quickly us with a visual representation of these eﬀects, identify the timing and hierarchical grouping of which is an aid to our listening comprehension. the notes (which would be much more diﬃcult Although the percussion scalogram does not using only our ears).

366 Notices of the AMS Volume 56, Number 3 Example 5: A Bach Piano Transcription 1000 As a simple contrast to the previous example, we M brieﬂy discuss the percussion scalogram shown in 850 Figure 11(b), obtained from a piano interpretation T RT of a Bach melody, Jesu, Joy of Man’s Desiring. The sound recording used was created from a MIDI sequence. In contrast to the previous jazz piece, this classical piece shows no staggering of rests, 500 Hz and no syncopation. The hierarchy of groupings of notes is also more symmetrical than for the jazz H piece. This hierarchy of notes, the rhythm of the passage, is easily discernable from this percussion scalogram, while it is not clearly evident from the score [2] (at least to untrained musicians). B

Example 6: A Jazz Orchestral Passage 0 For our final example, we analyze the spectrogram and percussion scalogram shown in Figure 12. They were obtained from a passage from a recording of the jazz orchestral classic, Harlem Air Shaft, by Duke Ellington. This passage is quite interesting in that it is comprised of only about 15 notes, yet we shall see that it contains a wealth of complexity. (We saw this in the African drum passage as well; perhaps we have an aspect of aesthetic theory here.) We now describe some of the elements comprising the rhythm and melody within this passage. It should be noted that, although there is a score for Harlem Air Shaft, that score is a complex orchestration that requires a large amount of musical expertise to interpret. Our spectrogram/percussion scalogram approach provides a more easily studied SS description of the melody and rhythm, including visual depictions of length and intensity of notes Figure 12. Top: Spectrogram of a passage from from several instruments playing simultaneous- a recording of Harlem Air Shaft. Bottom: ly. Most importantly, the spectrogram provides an Percussion scalogram of the same passage objective description of recorded performances. It (using all frequencies). The boxed regions and can be used to compare different performances in labels are explained in the text. To view videos an objective way. Our percussion scalograms facili- of this spectrogram and scalogram, go to the tate the same kind of objective comparison of the URL in (2) and select the links indicated by rhythm in performances. Harlem Air Shaft. (1) Reflection of Notes. The passage contains a sequence of high pitched notes played by a slide trombone (wielded by the legendary (2) Micro-Tones. The pitch interval de- “Tricky Sam” Nanton). This sequence divides scribed by going down in frequency from 855 into two groups of three, enclosed in the rect- to 845 Hz is angles labeled T and RT in the spectrogram = ≈ shown at the top of Figure 12. The three log2(855/845) 0.017 1/48, notes within T are located at frequencies of which is about 1/4 of the (logarithmic) half- approximately 855, 855, and 845 Hz. They tone change of 1/12 of an octave (on the are then reflected about the frequency 850, 12-tone chromatic scale). Thus within T the indicated by the line segment M between the pitch descends by an eighth-tone. Similarly, in two rectangles, to produce the three notes RT the pitch ascends at the end by an eighth- within RT at frequencies of approximately tone. These are micro-tone intervals, intervals 845, 845, and 855 Hz. The operation of that are not representable on the standard reflection R about a specific pitch is a com- 12-tone scoring used in Western classical mon, group-theoretical, operation employed music. They are, in fact, half the interval of in classical music [20]. the quarter-tones that are a characteristic

March 2009 Notices of the AMS 367 of jazz music (based on its roots in African Justification of the Percussion Scalogram tonal scales [4]). Here Ellington synthesizes Method a melodic characteristic of jazz (micro-tones) Choosing the Width and Frequency Parameters with one of classical music (reflection about a In this section we discuss how the parameters pitch level). are chosen to provide a satisfactory display of a (3) Staggered Syncopation. Viewing the scalogram for a pulse train, which is the second video of the percussion scalogram shown in step of the percussion scalogram method. The term Figure 12, we note that there is a staggering satisfactory means that both the average number of rests between notes, and we also observe a of pulses/sec (beats/sec) are displayed and the syncopation. This syncopation occurs when individual beats are resolved. the slide trombone slides into and between To state our result we need to define several the emphasized notes in the structures T and parameters. The number T will stand for the time duration of the signal, while B will denote the total RT. The pattern lying above the segments number of pulses in the pulse train signal. We will marked S in the figure is use the positive parameter p to scale the width ω one – TWO – three – FOUR – (rest) – FIVE and frequency ν defined by pT B (11) ω = , ν = . an unusual, staggered syncopation, of rhythm. 2B pT (One thing we observe when viewing the video Notice that ω and ν are in a reciprocal relationship; is that the percussion scalogram does cap- this is in line with the reciprocal relation between ture the timings of most of the note attacks, time-scale and frequency that is used in wavelet although it does not perfectly reflect some analysis. Notice also that the quantity B/T in ν is muted horn notes—because the attacks of equal to the average number of pulses/sec. The those notes are obscured by the higher best choice for the parameter p in these formulas volume trombone notes. Nevertheless, the will be described below. Two further parameters scalogram provides an adequate description are the number of octaves I and voices M used in of the driving rhythm of the trombone notes, the percussion scalogram. We shall see that these and the muted trumpet notes at the end of two parameters will depend on the value of δ, the the passage.) minimum length of a 0-interval (minimum space between two successive pulses). (4) Hierarchies of Melody. Within the pas- Now that we have defined our parameters, we sage there are several different types of can state our main result: melody, over different length time-scales. First, we note that there is the hierarchy Given the constraints of using positive integers for of T and RT at one level along with their the octaves I and voices M and using 256 total cor- combination into one long passage, linked relations, satisfactory choices for the parameters of melodically by a sequence of muted horn a percussion scalogram are: notes H. Notice that the pitch levels in H pT B √ ω = , ν = , p = 4 π exhibit, over a shorter time-scale, the pitch B pT pattern shown in the notes within T togeth- (12) $ ! % p2T 2 3 256 er with RT. That is a type of hierarchical I = log − ,M = . 2 2 organization of melodic contour. There is δB 2 I a further hierarchical level (in terms of a longer time-scale) exhibited by the melodic The remainder of this section provides the rationale contour of the bass notes (shown as a long for this result (notice that the value of ω in (12) sinusoidal arc within the region marked by B). is twice the value given in (11); we shall explain Here the bassist, Jimmy Blanton, is using the why below). While this rationale may not be a bass as a plucked melodic instrument as well completely rigorous proof, it does provide useful as providing a regular tempo for the other insights into how a Gabor CWT works with pulse players. This is one of his major innovations trains, and it does provide us with the method for the bass violin in jazz instrumentation. used to produce the scalograms for the pulse trains shown in the section “Examples of Rhythmic We can see from this analysis that this passage Analysis” (and for the other examples given at within just 6 seconds reveals a wealth of structure, the Pictures of Music website [6]). In fact, we have including many features that are unique to jazz. found that in every case, the method provides a Such mastery illustrates why Duke Ellington was satisfactory display of the scalogram for a pulse one of the greatest composers of the twentieth train from a musical passage (whether the pulse century. train is accurate is a separate question; we are still

368 Notices of the AMS Volume 56, Number 3 working on extending its capabilities as described of the instrument. This happens because small previously). values of s result in the function We shall denote a given pulse train by {P(tm)}. −1/2 −1 s ([tm − τk]s ) (Here we use {tm = τm} as the time-values; this Ψ notational change is clarified by looking at equa- being dampened very quickly, so very little other tion (16).) This signal satisfies P(tm) = 1 during than the actual beats are detected by the Gabor the duration of a beat and P(tm) = 0 when there is CWT. no beat. We use the Gabor wavelet in (7) to analyze Detection of the rhythm and grouping of the per- the pulse signal. Since we are using the complex cussion signal is accomplished by the larger values Gabor wavelet, we have both a real and imaginary of s that result in a slowly dampened Gabor wavelet. part: As the correlation sum moves to values such − 1 −π(t/ω)2 2πνt that t ≠ τ , the function s−1/2 ([t − τ ]s−1) (13) Ψ<(t) = ω 2 e cos m k Ψ m k ω is being dampened. But with the wavelet be- − 1 −π(t/ω)2 2πνt ing dampened more slowly now, the values of (14) =(t) = ω 2 e sin . Ψ ω −1/2 −1 s Ψ ([tm − τk]s ) are larger near tm = τk than These real and imaginary parts have the same they were before. Hence the tm values where envelope function: P(tm) = 1 will result in summing more values of − 1 −π(t/ω)2 the wavelet that are significantly large. Therefore, (15) E (t) = ω 2 e . Ψ any beat that is close to another beat will result The width parameter ω controls how quickly in larger correlation values for larger values of s. E is dampened. For smaller values of ω the func- Ψ Notice also that those values of tm where P(tm) = 0 tion dampens more quickly. Also, ω controls the that are close to tm values where P(tm) = 1 will = magnitude of the wavelet function at t 0. In fact, result in summing across the lesser dampened we have (0) = ω−1/2. The width parameter also Ψ Gabor wavelet values—our scalogram will thus be affects the frequency of oscillations of the wavelet. registering the grouping of closely spaced beats. As ω is increased, the frequency of oscillations of Now we need to choose the parameters ω and the wavelet is decreased. See Figure 13. ν based on {P(t )} to obtain the desired shape The frequency parameter ν is used to control m for the Gabor wavelet. To choose these parameters the frequency of the wavelet within the envelope for a specific percussion signal we will use B/T as function. This parameter has no effect on the envelope function, as shown in Equation (15). As our measure of the average beats per second. The ν is increased, the Gabor wavelet oscillates much average time between beats will then be the recip- more quickly. See Figure 14. rocal of the average beats per second: T /B. Then When using the Gabor wavelet to analyze mu- we let the width parameter ω and the frequency sic, correlations are computed using the Gabor parameter ν be defined by (11), with parameter wavelet with a scaling parameter s. For our pulse p > 0 used as a scaling factor. With these width and frequency parameters, the Gabor wavelet is train P these correlations are denoted (P : Ψs ) for s = 2−r/M , r = 0, 1, ..., IM, and are defined by s 2B −π(2Bt/pT )2 i4πB2/(p2T 2) (16) (17) Ψ(t) = e e . M pT X (P : )(τ ) = P(t )s−1/2 ([t − τ ]s−1). Ψs k m Ψ m k We want to detect beats that are within T /B, m=0 the average time between beats, of each other. Since {P(t )} is a binary signal, the terms of this m Likewise, we want separation of the beats that are sum will equal s−1/2 ([t − τ ]s−1) if P(t ) = 1, Ψ m k m not within T /B of each other. We accomplish this and 0 if P(t ) = 0. The values of τ represent m k by inspecting the envelope function evaluated at the center of the Gabor wavelet being translated t = T /B, along the time axis. So for values of tm closer to τk, −1/2 − −1 s s Ψ ([tm τk]s ) will be larger in magnitude. T 2B 2 (18) = e−4π/p . Then, at values for tm where P(tm) = 1 and tm = τk, ΨE B pT the corresponding term in the correlation sum will be The value of the enveloping function ΨE (T /B) can be written as a function of the parameter p, call it P(t )s−1/2 ([t − τ ]s−1) = s−1/2 (0) m Ψ m k Ψ M(p): s 1 2B s = √ 2B 2 s pT (19) M(p) = e−4π/p . pT which will represent the striking of an instrument. So as s reaches its smallest values, near s = 2−I , Remembering that T and B are constants de- the correlations will have large magnitude values termined by the percussion sound signal, the only near τk, and where P(tm) = 1, i.e., at the beat maximization of the magnitude of the wavelet

March 2009 Notices of the AMS 369 (a) ω = 0.5 (b) ω = 1 (c) ω = 2 Figure 13. Real parts of Gabor wavelet with frequency parameter ν = 1 and width parameter ω = 0.5, 1, and 2. For each graph, the horizontal range is [−5, 5] and the vertical range is [−2, 2].

(a) ν = 0.5 (b) ν = 1 (c) ν = 2 Figure 14. Real parts of Gabor wavelet with width parameter ω = 1 and frequency parameter ν = 0.5, 1, and 2. For each graph, the horizontal range is [−5, 5] and the vertical range is [−2, 2].

at t = T /B becomes a simple one variable opti- remedy that defect, when we display a percussion mization problem. The first derivative of M(p) scalogram we double the width in order to push is down the lowest reciprocal-scale by one octave. 2 0 16π − p Hence we use the following formulas M (p) = 2 p . 2p3e4π/p pT /2B pT B √ √ (20) ω = , ν = , p = 4 π Hence p = 4 π maximizes the value of the en- B pT = velope function of the wavelet at t T /B, thus for displaying our percussion scalograms. allowing us to detect beats within T /B of each other. Choosing Octaves and Voices With the wavelet function dampened sufficiently slowly, we know that the envelope function is The variable 1/s along the vertical axis of a per- sufficiently wide. But the correlations are comput- cussion scalogram (see Figure 4(b), for example) ed by taking the magnitude of the sum of the is related to frequency, but on a logarithmic scale. complex Gabor wavelet samples. Since the real and To find the actual frequency at any point along imaginary parts involve products with sines and the vertical axis we compute the base frequency ν/ω multiplied by the value of 1/s. The value of cosines, there are intervals where the functions I determines the range of the vertical axis in a are negative. It is these adjacent negative regions, scalogram, i.e., how large 1/s is, and the value of on each side of the main lobe of , that allow for Ψ< M determines how many correlations per octave the separation of beats that are greater than T /B we are computing for our scalogram. apart but less than 2T /B apart (if they are more In order to have a satisfactory percussion scalo- than 2T /B apart, the dampening of produces ΨE gram, we need the maximum wavelet frequency low-magnitude correlations). equal to the maximum pulse frequency. The scale variable s satisfies s = 2−k/M , where k = 0, 1,...,IM. Width and Frequency for Better Display Hence the maximum 1/s we can use is calculated There is one wrinkle to the analysis above. If the as follows: width and frequency parameters are set according 1 = 2IM/M = 2I . to Equation (11), then at the lowest reciprocal-scale s value 1/s = 1 the display of the percussion scalo- Now let δ be the minimum distances between gram cuts off at the bottom, and it is difficult to pulses on a pulse train. By analogy of our pulse perceive the scalogram’s features at this scale. To trains with sinusoidal curves, we postulate that the

370 Notices of the AMS Volume 56, Number 3 (a) I = 3 (b) I = 4 (c) I = 5 Figure 15. Examples of percussion scalograms using different values of I, the number of octaves, for the Dance Around percussion passage. Graph (b) uses the value of I = 4 calculated from Equation (23). maximum pulse frequency should be one-half of the individual drum beats very well. On the other 1/δ. Setting this maximum pulse frequency equal hand, if I is set too high, say I = 5 in Figure 15(c), to the maximum wavelet frequency, we have then the scalogram is too finely resolved. In partic- 1 ν ular, at the top of the scalogram, for 1/s = 25, we (21) = 2I . 2δ ω find that the scalogram is detecting the beginning Notice that both sides of (21) have units of and ending of each drum strike as separate events, beats/sec. which overestimates by a factor of 2 the number Using the equations for ν and ω in (20), we of strikes. rewrite Equation (21) as Having set the value of I, the value for M can then be expressed as a simple inverse proportion, 1 ν = 2I depending on the program’s capacity. For example, 2δ ω with Fawav [35] the number of correlations used !2 B in a scalogram is constrained to be no more than = 2I . pT 256, in which case we set Solving for I yields 256 (24) M = ! p2T 2 I (22) I = log − 1. 2 δB2 and that concludes our rationale for satisfac- torily choosing the parameters for percussion Because I is required to be a positive integer, we scalograms. shall round down this exact value for I. Thus we set Conclusion $ 2 2 ! % p T 3 In this paper we have described the way in which (23) I = log2 − . δB2 2 spectrograms and percussion scalograms can be To illustrate the value of selecting I per Equa- used for analyzing musical rhythm and melody. tion (23), in Figure 15 we show three different While percussion scalograms work fairly effectively scalograms for the Dance Around percussion se- on brief percussion passages, more research is quence. For this example, Equation (23) yields the needed to improve their performance on a wider value I = 4. Using this value, we find that the variety of music (especially when the volume is scalogram plotted in Figure 15(b) is able to detect highly variable). We only briefly introduced the the individual drum strikes and their groupings. If, use of spectrograms for analyzing melody and however, we set I too low, say I = 3 in Figure 15(a), its hierarchical structure; more examples are dis- then the scalogram does not display the timings of cussed in [34] and at the website [6]. Our discussion

March 2009 Notices of the AMS 371 showed how percussion scalograms could be used music, using Gabor representations, Proceedings of to distinguish some styles of drumming, but much the Diderot Forum on Mathematics and Music, 1999, more work remains to be done. Further research Vienna. is also needed on using local averages, instead of [15] H. Feichtinger and T. Strohmer, eds., Gabor the global average A that we employed, and on Analysis and Algorithms, Birkhäuser, Boston, MA, 1998. determining what additional information can be [16] , eds. Advances in Gabor Analysis, Birkhäuser, gleaned from the phases of the Gabor CWTs. Boston, MA, 2002. [17] D. Gabor, Theory of communication, Journal of the Acknowledgment Institute for Electrical Engineers 93 (1946), 873–880. This research was supported by the National [18] Erroll Garner recording from http://www.youtube. Science Foundation Grant 144-2974/92974 (REU com/watch?v=H0-ukGgAEUo. site SUREPAM program). We also thank William [19] K. Gröchenig, Foundations of Time-Frequency Analysis, Birkhäuser, Boston, MA, 2001. Sethares for his helpful comments on an earlier [20] L. Harkleroad, The Math Behind the Music, version of this paper. Cambridge University Press, Cambridge, UK, 2006. [21] D. Kroodsma, The Singing Life of Birds, Houghton- References Mifflin, NY, 2005. [1] J. F. Alm and J. S. Walker, Time-frequency [22] F. Lerdahl and R. Jackendoff, A Generative Theory analysis of musical instruments, SIAM Review of Tonal Music, MIT Press, Cambridge, MA, 1983. 44 (2002), 457–476. Available at http://www. [23] F. Lerdahl and R. Jackendoff, An overview of hi- uwec.edu/walkerjs/media/38228[1].pdf. erarchical structure in music, in Machine Models of [2] J. S. Bach, (arranged and adapted by M. Scott), Music (1992), S. Schwanauer and D. Levitt, eds., MIT Jesu, Joy of Man’s Desiring. A free version of Press, Cambridge, MA, 289–312. the part of the score used for the clip an- [24] D. Levitin, This Is Your Brain On Music, Dutton, New alyzed here can be obtained from http:// York, 2006. www.musicnotes.com/sheetmusic/mtd.asp?ppn [25] G. Loy, Musimathics: The Mathematical Foundations =MN0026859 of Music, Vol. 2, MIT Press, Cambridge, MA, 2007. [3] P. Balazs et al., Double preconditioning for Gabor [26] F. B. Mache, Music, Myth and Nature, Contempo- frames, IEEE Trans. on Signal Processing 54 (2006), rary Music Studies, Vol. 6., Taylor & Francis, London, 4597–4610. 1993. [4] L. Bernstein, The world of jazz, The Joy of Music, [27] S. Mallat, A Wavelet Tour of Signal Processing, Amadeus Press, Pompton Plains, NJ, 2004, pp. 106– Second Edition, Academic Press, San Diego, CA, 131 1999. [5] D. Brubeck, Unsquare Dance. The beginning of [28] S. Pinker, How the Mind Works, Norton, NY, 1997. the score, containing the hand-clapping intro- [29] L. Prima, Sing Sing Sing. The initial part of duction, can be viewed for free at http://www. the score, containing the drum solo introduc- musicnotes.com/sheetmusic/mtd.asp?ppn= tion, can be viewed for free at http://www. MN0042913. musicnotes.com/sheetmusic/scorch.asp?ppn= [6] X. Cheng, et al., Making Pictures of Music: New SC0005700. Images and Videos of Musical Sound. Available at [30] D. Rothenberg, Why Birds Sing: A Journey into the http://www.uwec.edu/walkerjs/PicturesOf Mystery of Bird Song, Basic Books, NY, 2005. Music/. Examples of more analyses of rhythmic per- [31] W. Sethares, Rhythm and Transforms, Springer, cussion can be found at http://www.uwec.edu/ New York, NY, 2007. walkerjs/PicturesOfMusic/Musical [32] L. M. Smith, Modelling rhythm perception by con- Rhythm.htm. tinuous time-frequency analysis, Proceedings of [7] N. Chomsky, Syntactic Structures, Mouton de the 1996 International Computer Music Conference, Gruyter, New York, 2002. Hong Kong, 392–395. [8] C. K. Chui, Wavelets: A Mathematical Tool for Signal [33] , A multiresolution time-frequency analy- Analysis, SIAM, Philadelphia, PA, 1997. sis and interpretation of musical rhythm, thesis, [9] R. Cogan, New Images of Musical Sound, Harvard University of Western Australia, 2000. University Press, Cambridge, MA, 1984. [34] J. S. Walker, A Primer on Wavelets and their [10] I. Daubechies, Ten Lectures on Wavelets, SIAM, Scientific Applications, Second Edition, Chapman Philadelphia, PA, 1992. & Hall/CRC Press, Boca Raton, FL, 2008. Material [11] I. Daubechies and S. Maes, A nonlinear squeezing of referenced from Chapters 5 and 6 is available the continuous wavelet transform based on auditory at http://www.uwec.edu/walkerjs/primer/ nerve models, in Wavelets in Medicine and Biology, Ch5extract.pdf; http://www.uwec.edu/ CRC Press, Boca Raton, FL, 1996, pp. 527–546. walkerjs/primer/Ch6extract.pdf. [12] M. Dörfler, Gabor analysis for a class of signals [35] , Document on using Audacity and Fawav. called music, dissertation, University of Vienna, Available at http://www.uwec.edu/walkerjs/ 2002. tfamr/DocAudFW.pdf. [13] , Time-frequency analysis for music signals— [36] J. S. Walker and G. W. Don, Music: A time- A mathematical approach, Journal of New Music frequency approach, preprint, available at http:// Research 30, No. 1, March 2001. www.uwec.edu/walkerjs/media/TFAM.pdf. [14] M. Dörfler and H. G. Feichtinger, Quantita- tive description of expression in performance of

372 Notices of the AMS Volume 56, Number 3