BISTATE REDUCTION AND COMPARISON OF PATTERNS

Olivier Lartillot Fred Bruford RITMO Centre for Interdisciplinary Studies in Centre for Digital Rhythm, Time and Motion, University of Oslo Queen Mary University, United Kingdom [email protected] [email protected]

ABSTRACT patterns specifically, rhythmic interaction and instrumental configuration has been shown to have a significant affect This paper develops the hypothesis that symbolic drum on the perception of syncopation, a fundamental rhythmic patterns can be represented in a reduced form as a sim- property [3]. However, in much existing work on drum pat- ple oscillation between two states, a Low state (commonly tern similarity modelling, similarity is modelled as a linear associated with kick drum events) and a High state (often combination of monophonic rhythm similarity models cal- associated with either or high hat). Both an culated on each instrument part, without consideration to onset time and an accent time is associated to each state. rhythmic interactions or integration. The systematic inference of the reduced form is formal- Our aim is to design new models of similarity for drum ized. This enables the specification of a rhythmic struc- patterns that take into account both rhythmic interaction tural similarity measure on drum patterns, where reduced and rhythmic integration. The bistate reduction algorithm patterns are compared through alignment. The two-state proposed in this paper attempts to extract the core struc- representation allows a low computational cost alignment, ture of drum pattern as an interaction between two states, once the complex topological formalization is fully taken ‘Low’ and ‘High’, and to use this reduced representation into account. A comparison with the Hamming distance, as to compare drum patterns through alignment. well as similarity ratings collected from listeners on a drum The drum patterns used in this paper are part of a varied loop dataset, indicates that the bistate reduction enables to set of 160 stylistically accurate symbolic patterns recorded convey subtle aspects that goes beyond surface-level com- by human drummers on an electronic , taken from parison of rhythmic textures. the Groove library of BFD3 [4], a commercially available virtual drum kit plugin. Originally collected by [5], the pat- 1. INTRODUCTION tern dataset is drawn from a wide range of genres and sub- One of the most fundamental areas of both Music Infor- genres, with 8 genre groups used: Hiphop/Dance, Funk, mation Retrieval (MIR) and music cognition research con- Blues/Country, Pop, Reggae/Latin, Rock, Metal and Jazz. cerns the modelling of musical similarity, due to the es- In addition to multiple genres, they are of varied complex- sential importance of similarity in music perception and ity and function, with some containing fills. the practical utility of similarity models in music retrieval In Section 2, we discuss in greater detail approaches applications such as recommendation systems. Modelling to modelling monophonic rhythm similarity and assess the similarity for drum patterns is a particular challenging vari- state-of-the-art in drum pattern similarity modelling and its ant of this research, though one with potential applica- limitations. In Section 3 and Section 4, we begin discus- tion in a wide range of intelligent music production tools sion of the bistate reduction algorithm and bistate sequence such as drum pattern recommendation systems or auto- alignment technique respectively as a novel means of rep- matic drum pattern generation systems. resenting complex drum patterns and estimating the per- The challenges of drum pattern similarity modelling ceived similarity between them. In Section 5, the sequence lies in the complex nature of polyphonic rhythm percep- alignment algorithm is evaluated in its ability to predict tion. The integration of coincident rhythms into a resultant human similarity ratings for pairs on drum patterns in our ‘multirhythm’ has been argued to be an essential feature of dataset. polyphonic rhythm perception, especially in the context of drumming music [1]. Rhythmic interactions may also be 2. RELATED WORK a significant aspect of polyphonic rhythm perception, with complex rhythmic structures said to emerge from the inter- Models of rhythmic similarity are significant in both music action between rhythmic levels [2]. In the context of drum perception and music informatics research due to rhythm’s fundamental importance in music. For drum patterns in particular, models of rhythmic similarity have various c Olivier Lartillot, Fred Bruford. Licensed under a Cre- practical applications such as automatic generation of vari- ative Commons Attribution 4.0 International License (CC BY 4.0). At- tribution: Olivier Lartillot, Fred Bruford, “Bistate reduction and com- ations on drum patterns [6,7], or in visually mapping drum parison of drum patterns”, in Proc. of the 21st Int. Society for Music patterns by similarity to facilitate exploration of drum pat- Information Retrieval Conf., Montréal, Canada, 2020. tern libraries [8, 9].

318 Proceedings of the 21st ISMIR Conference, Montreal,´ Canada, October 11-16, 2020

closed open high hat high hat

closed snare high hat

snare kick

12345678 kick

12345678 Figure 1. Drum pattern Early RnB 33. Velocity is repre- sented in grey level, from 0 in white to maximum value in Figure 3. Drum pattern Reggae Grooves Fill 3. black.

high high low low 12345678 12345678 Figure 4. Reduction of Reggae Grooves Fill 3. Figure 2. Reduction of Early RnB 33. State accents are shown in black and state onsets (if different from accents) in grey. ‘Low’ corresponds to and ‘high’ to . sultant ‘multirhythms’ due to the perceptual integration of ↓ ↑ simultaneous rhythmic streams. Additionally, while fixed weighting schemes have been used to weight more percep- Many models of rhythmic similarity for drum patterns tually important kit instruments [6], these may not be flexi- are based on monophonic rhythm similarity models. These ble enough to account for the possibly changing contextual methods are based on multiple possible representations of importance of different instruments in a drum pattern. rhythms. The most common method is as a vector consist- ing of either an onset or silence at each metrical position of the rhythm. The Hamming distance, a simple and popular 3. BISTATE REDUCTION model, works by counting the number of metrical positions Basic drum patterns are usually defined by the alterna- in two rhythm vectors that differ in value (one onset, the tion of, typically, bass (or “kick”) drum and snare drum other rest) [10]. The swap or directed-swap distance can strokes, with a further subdivision on the ride or also be calculated between rhythm vectors by counting the hi-hat 1 . The ordering in this definition implicitly indicates number of swap operations (exchanges between adjacent that the bass and snare alternation is of higher level metrical positions) that need to be performed to turn one of salience, while the cymbal or hi-hat subdivision is of rhythm into the other [11]. Other possible rhythm similar- lesser importance. ity measures use inter-onset intervals (IOIs), vectors of the distance between each successive onset as a multiple of the 3.1 Dominant Drum Selection smallest possible metrical position. An example of this is the chronotonic distance [10]. A thorough overview of nu- The first hypothesis guiding the approach presented in this merous approaches to rhythm similarity modelling can be paper is that drum patterns can be reduced by only focusing found in [10]. on these two most dominant classes of drum strokes. In the Typically, drum pattern similarity is modelled by stack- definition of basic drum patterns above, the two dominant ing these monophonic rhythm similarity models calculated drums are said to be bass and snare drums. But sometimes on each part separately. This has been found to be success- the snare drums can be replaced by, for instance, closed ful in a few cases; [7] found that the Hamming distance hi-hat. and directed-swap distance correlated strongly with human For a given drum pattern, we propose a simple method ratings of similarity for drum patterns, with the Hamming for selecting the two dominant drums. It consists in or- distance outperforming the directed-swap. In [8], the au- dering the drums in a decreasing hierarchical order, and thors found a 2D projection of the Hamming distance could selecting the two first drums, in this ordered list, that are partially cluster drum patterns in a way that matched their active in the given drum pattern. The first selected drum genre tags for a small dataset of patterns. In a pilot study, will be called the Low drum, as it often relates to the bass a model similar to the Hamming distance was also found drum and to the use of lower frequencies, while the second to correlate to listeners’ similarity ratings for drum pat- selected drum will be called the High drum. terns [12]. This model counts whether the number of met- For the drum patterns discussed throughout the paper, rical positions where rhythms differ, but adds a weighting the reduction was performed by using the following hier- metric based on metrical awareness. archical ordering: 1. kick drum, 2. snare drum, 3. closed The limitation of these models is that by stacking mono- hi-hat. It was not necessary to consider other drums, since phonic rhythm similarity measures calculated on each part at least two of these three drums were active in each drum separately, they fail to account for effects of rhythmic in- 1 Cf. for instance https://en.wikipedia.org/wiki/Drum_ teraction between instrument parts, or the perception of re- beat

319 up

down

12345678 up 12345678 down Figure 4. Reduction of Reggae Grooves Fill 3. When 12345678 . Drum pattern N Country Intro with kick drum a state is repeated twice successively, the first square Figure 5 12345678(lower row), closed hi-hat (middle row). Velocity is repre- up (lighter) correspondsup to the onset time and the second Figure 4. Reduction of Reggae Grooves Fill 3. When sented in grey level, from 0 in white to maximum value in down square to the accentdown time, withFigure its velocity 5. Drum indicated pattern inN Country Intro with kick drum a state is repeated twice successively, the first square black. 12345678gray-scale. ‘down’12345678 corresponds(lower to and row), ‘up’ closed to . hi-hat (middle row). Velocity is repre- (lighter) corresponds to the onset time and the second # " 1234567812345678 sented in grey level, from 0 in whiteup to maximum value in square to the accent time, with its velocity indicated in Figure 4. ReductionFigure of Reggae 4. Reduction Groovesblack. Fill of 3Reggae. When Grooves Filldown 3. When Figure 5.12345678 Drum pattern N Country. Drum Intro patternwithN kick Country drum Intro with kick drum gray-scale. ‘down’ correspondsa170 state to3.2.1 isand repeated Handling ‘up’ to twicea of. state Simultaneous successively, is repeated Strokes the twice first successively, square the first square Figure 5 # " (lower row), closed hi-hat (middle row). Velocity is repre- up 2 (lower row), closed hi-hat (middle row). Velocity is repre- (lighter)171 At corresponds first sight, to(lighter) to detect the the onset correspondsand timetypes and to the the, we second onset would time sim- and the second # down" sentedFigure in grey 6 level,. Reduction fromsented 0 in of in whiteN grey Country tolevel, maximum fromIntro. 0 value inSee white Figure in to maximum 4 value in square172 to the accent time,square with to the its accentvelocity time, indicated with its in velocity indicated in 170 3.2.1 Handling of Simultaneousply Strokes need to look for the location of12345678 alternation between theblack. gray-scale. ‘down’ corresponds to and ‘up’ to . caption for an explanation.black. 173 Down and2 Up drums.gray-scale. The ‘down’ difficulty corresponds comes from to theand fact ‘up’ to . 171 At first sight, to detect the and types , we would sim- # " # " #174 that" the two drums are often playedFigure together. 6. Reduction In such occa- of Nup Country Intro. See Figureup 4 172 ply need to look for the location of alternation between the down M M 175 sions, inference of or type reliescaption on thefor ancontext explanation. defined 213 2. If t = anddown v >v + .2, the new event is and 170 3.2.1 Handling of Simultaneous Strokes 1234567812345678" 173 Down and Up drums. The difficulty comes170 from3.2.1 the# factHandling" of Simultaneous Strokes # " M " 176 by the recent past. 214 the mental model is updated to t = and its profile 174 2 that the two drums are171 oftenAt played first sight, together. to detect In such the occa-and types , we would sim- 2 " 177 For instance,171 At a Down first sight, event to followed detect the byM aand simultaneoustypes ,215M we would sim-updated according to equation 1. 175 sions, inference of or type relies on the context defined# 213" 2. If t #= and" v >vFigure+ .2 6,. the Reduction new eventFigure of is Nand Country 6. Reduction Intro. See of N Figure Country 4 Intro. See Figure 4 172 ply178 needDown to look + Up for172 eventtheply location needwould to of be look alternation generally for the locationbetween reduced of theto alternation(", ). between the # " # caption" for anM explanation." M 176 by the recent past. 214 the mental model# " is updated216 to3.t If =v =0and, itscaption the profile new for event an explanation. is . If t = as well, its 173 Down179 Similarly, and Up drums. an173 UpDown The event difficulty and followed Up drums. comes by a simultaneous from The difficultythe fact Down comes from the fact #" " " M 177 For instance, a Down event followed by a simultaneous 215 updated according to equation 1. M v 174 that180 the+ two Up drums event174 would are often be playedgenerally together. reduced In to such( , occa-). But this " that the two drums are often played together.217 In such occa-profile is updated if v >v , or if v > 2 and 178 Down + Up event would be generally reduced to ( , ). "3 # M M M " " M" 175 sions,181 inference of or type relies on the context defined 213 2. If t =M and213Mv >v + .2, the new event is and would not175 be necessarilysions,# inference" true216 if of the3. velocityorIf vtype=0 relies,is the not onnew con- the event218 context is . defined Ifvt v + .2, the new event is and 179 Similarly, an Up event followed by a simultaneous# " Down # " " # " " " " 176 # " # " # "# M M M by182 thestant. recent Forpast.176 instance,by the although recent past. a loud Down followed214 by a theM mental model214 isv updatedthe mental to t model= and is updated its profile to t = and its profile 180 + Up event would be generally reduced to ( , ). But this 217 v >v v > " M " M 177 For instance, a Down event followed by aprofile simultaneous is updated if 219 , or if 2 and t = v >v +.2 " 183 loud Up followed177"3 # byFor ainstance, loud Down a Down + Up would event followed be reduced215 by a" simultaneousupdated"4. Similar according215" to point to equation2updated above, if 1.accordingand to equation 1. , 181 would not be necessarily true if the velocity is not con- M " # # 178 Down + Up event would be generally218 reducedv v +.2, # " " vM " " M 183 loud Up followed by a loud Down + Up would be reduced # " " # M M v 180 +186 Up eventFor would instance180 be generally+ in Up Figure event reduced would3, kick to be and( generally, snare). But drumsreduced this 217 are to 222( , )profile. But"1. this is updated217# if# vprofile>v is,updated or if v if>v " >vand , or if v > " and 184 220 the new event is and the mental model is updated" to " 2 2 to ( , , ), a loud Down followed by a soft Up followed by "3 # "3 # M " " " " # " # 181 would187 played not be necessarilytogether181 would at thetruenot beginning if the be necessarilyvelocity of barsM is 5,true not 6 ifand con- the 8.218 velocity# Be- isv notv if+t.2, = and v >v +.2, 187 # # played together at the beginning of bars 5, 6 and 8. Be- M " M " # # 184 to190( , ing, ), gently a loud184 (from Downto 4.2 followed( , to, 4.4).),223 aby loud We a soft Down haven’t5.UpIf thefollowed followed heard current any by by velocity220 a clearsoft Up(v followed,vthe) newis closeby event220 to isv theand profilev thethe<. mental new1, event modelv is isandv updated<. the1 mental, to (2) model is updated to 188 cause the kick drum plays a# clear" # at the beginning# " of# # M" | # #M| | " # "| 185 a loud191 Downstate.# + Therefore,Up185 woulda loud instead when Down snare224 be + reduced Up drum wouldof and to the( instead kick, mental, drums). be model221 reduced are to (t , =, )and. 221 its profile updatedt = and according its profile to equation updated according to equation 189 bar 4 (i.e., 4.1), which continues" with snare drum play- # " " 225 # " "# # 186 192Forsuperposed instance in186 on Figure theFor next 3, instance beat kick (5.1), and in thissnare Figure is drumsperceived 3, kick are as and222 a snare. drums1. the are new222 event’s1. type corresponds to the mental 190 ing gently (from 4.2 to 4.4). We haven’t heard any clear M " M 187 v v <.2261, v model’sv <. one,1, and(2) the profile is updated according to played193 But together because at187 the theplayed beginning snare drumtogether of is bars played at the 5, 6 beginningloudly and 8.| on# Be- ofthe bars next#| 5, 6 and| " 8. Be-"| 191 state. Therefore, when snare drum and kick drums are 223 5. If the current223 velocity5. If(v the,v current) is close velocity to the( profilev ,v ) is close to the profile 188 cause194 thetick kick (5.2), drum the next plays beat a clear (6.1) isat rather the beginning perceived of as a . 227 equation 1. # " " 188 cause the kick225 drumthe plays new a clear event’sat type thebeginning corresponds of to the mental # " 192 superposed on the next beat (5.1), this is perceived as a . # 224 # of the mental224 model 189 195 # of the mental model bar 4 (i.e.,Same 4.1), for 8.1.189 whichbar continues 4 (i.e.," 4.1), with which snare drumcontinues play- with snare228 drum play- 193 But because the snare drum is played loudly on the next 226 model’s one, and the profile isThe updated reasoning according behind to these heuristics cannot be de- 190 196 We propose to explain this phenomenon by hypothesis- M M M M ing gently (from190 4.2 toing 4.4). gently We (from haven’t 4.2 heard to 4.4). any We clear haven’t heard229 any clearv v <.1, v v <.1, 194 227 equation 1. tailed in this paper due to spacev constraint.v <.1, (2)v v <.1, (2) tick (5.2), the next beat (6.1) is rather perceived as a . | # #| | " "| # " 191 197state.ing Therefore, that the191 categorisation whenstate. snare# Therefore, drumvs. andis when kick based drums snare on comparing drum are and230 kick drumsInterestingly are enough, this methodology| # | also enables| " to | 195 Same for 8.1. " # " 198 " 228 The reasoning behind225 thesethe heuristics new event’s cannot type be de- corresponds to the mental 192 superposedeach on new the drum192 next stroke beat (5.1), (or superposed this is perceived strokes) as a to. mental 231 analyse a more225 problematicthe new example event’s as the type one corresponds shown in to the mental 196 We propose to explain this phenomenon by hypothesis-superposed on the next beat (5.1), this is perceived as a . 199 229 tailed in this paper" due226 to space constraint.model’s one," and the profile is updated according to Proceedings of the 21st ISMIR Conference, Montreal,´ Canada,193 But becausemodels October the of snare193and 11-16,But drumtypes. because is And played 2020 the in our loudlysnare modelling, drum on the is thisnext played canloudly be 232 onFigures the next 5 and226 6. Themodel’s difficulty one, here and lies the in profile the fact is updated that according to 197 ing that the categorisation vs. is based# on comparing" 230 200 Interestingly enough,227 this methodologyequation 1. also enables to 194 tick# (5.2),simplified" the next194 by beatcomparingtick (6.1) (5.2), is to the rather only next the perceived beat mental (6.1) model as is a rather. related perceived233 the as Up a drum. 227 does notequation accentuate 1. but instead events, 198 each new drum stroke (or superposed strokes) to mental 231 analyse a more problematic# example as the one shown in " # 195 201 to one of the types and , namely the type considered as # Same for 8.1. 195 Same for 8.1. 228 234Theexcept reasoning when behindit is played these alone, heuristics in such cannot case it emphasises be de- 199 models of and types. And in our modelling, this# can be"232 Figures 5 and 6. The difficulty here lies in228 the factThe that reasoning behind these heuristics" cannot be de- 1 #1 " 196 202Wecurrently propose to active.196 explain1 this phenomenon by hypothesis- 235 200 We propose to explain this phenomenon229 tailed by hypothesis-events! in this paper The method due to space is able constraint. to avoid using this deceptively simplified by comparing to only the mental model related 233 the Up drum does not accentuate but instead229 tailedevents, in this paper due to space constraint. pattern. In other words, a drum pattern featuring kick drum 197 ing203 that theMore categorisation precisely, thevs. mentalis basedmodel on is comparingrepresented by a 197 ing that the categorisation vs. is230 based236 onInterestingly comparingobvious" heuristics enough,230 # and thisInterestingly base methodology the detection enough, also thisof enables the methodologystate to on also enables to 201 to one of the types and , namely theM type considered# as 234" except when# it is played" alone, in such case it emphasises # #198 "each204 newtype drumt stroke, (oraltogether superposed with strokes) a series to of mental two veloci- 198 each new drum stroke (or superposed231 strokes)analyse237 the to a subtlemental moreproblematic high231 hatanalyse events. example" a more as problematic the one shown example in as the one shown in and closed hi-hat, but no snare drum, will use kick drum as 202 currently active. M 2M{# "} 235 events! The method is able to avoid using this deceptively 199 models205 ties of (vand,v types.), called Andprofile in ourof modelling, the mental this model, can be and first 203 More precisely, the mental model is represented199 models by of a and types. And in our modelling,232 Figures this 5 can and be 6.232 TheFigures difficulty 5 and here 6. lies The in difficultythe fact that here lies in the fact that +0.2 # # "" +0.2#236 obvious" heuristics and base238 the detection3.2.2 State of Onsetthe state and Accent on Times M 200 simplified206 initiated by comparing to ( , to only). At the any mental given model time, withrelated a Down Low and closed hi-hat as High. A drum pattern featuring 204 type t , altogether with a series200 of1 twosimplified1 veloci- by comparing to only the233 mentalthe model Up drum related does233 not#the accentuate Up drum doesbut instead not accentuateevents, but instead events, 2 {# "} 207 event of velocity v and an237 Upthe event subtle of high velocity hat events.v , we 239 The approach presented in the" previous section# leads" to # M M 201 to one of the types201 andto one, namely of the types the typeand considered, namely as the234 typeexcept considered when it as is played alone, in such case it emphasises 205 ties (v ,v ), called profile of the mental model,# and"# first # " " 234 except when it is played alone, in such case it emphasises snare drum and closed hi-hat but no kick drum will use # " 202 currently208 consider active. the following alternatives:238 3.2.2 State Onset and Accent240 Timesthe generation of a sequence of events of types "and . " 206 initiated to ( , ). At any given time,202 withcurrently a Down active. 235 events! The method235 isevents! able to Theavoid method using this is able deceptively to avoid" using# this deceptively 0.2 0.2 0.2 241 1 1203 More precisely,203 the mentalMore modelprecisely, is represented the mental bymodel a is representedSuccessive by a repetitions of events of same type are reduced snare drum as Low and closed hi-hat as High. 207 event of velocity v and an209 Up event1. If ofv velocity>v v+,.2 we, the239 newThe event approach is . presented Besides,236 if inobvious the previous heuristics section236 andobvious base leads the to heuristics detection and of the basestate the detection on of the state on # M " M 242 204 type t , #204altogetherMtype" t with a, seriesaltogether of two# veloci- with aM series of twointo veloci- one single state. This practically means# in particular # 208 consider the following alternatives:210 v >v .2, the240 mentalthe model generation is set of to at sequence237= the subtle of events high of hat types237 events.theand subtle. high hat events. 1 0 M 2M{1# "}0 M 2M{# "} 1 243 0 0.2 0.2205 ties (v ,v ),# called205 # profileties0.2(v of,v the mental), called model,profile andof the first mental# model,that and successive first repetition" # of a given drum (while the other For the example shown in Figures 1 and 3, the closed 211 as well, and its profile241 isSuccessive identified with repetitions the current of events of same type are reduced # " # " 238 3.2.2244 State Onset238 and3.2.2 Accent State Times Onset and Accent Times 209 1. If v >v +206.2, theinitiated new eventto ( is,206. Besides,). At any if given time, with a Down drum remains silent) are collapsed into one single event. 212 initiated to242( into, one). single At any state. given This time, practically with a Down means in particular # M " 1event:#1 M 1 1 245 However, if the most accented stroke is not the first one, hi-hat sequence is considered as of less importance and 210 v >v .2207, theevent mental of modelvelocity isv setand to t an= Up event of velocity v , we 239 The approach presented in the previous section leads to 207 event of velocity243 thatv successiveand an Up repetition event of of velocity a givenv drum, we (while239 The the other approach presented in the previous section leads to # # # # # " 246 this needs" to be specifically indicated in the pattern de- 211 as well, and its208 profileconsider is identified the following with the alternatives: current 240 the generation of a sequence of events of types and . 208 considerM the244M followingdrum remains alternatives: silent) are collapsed into one single240 the event. generation of a sequence of events of types and . therefore ignored. Figure 5. State transition diagrams, where the current state 247 scription, as it plays an important role in the" pattern# char- 212 event: (v ,v )=(v ,v ) 241(1) " # # 245" However,# " if the most accentedSuccessive stroke repetitions is not241 theSuccessive offirst events one, of repetitions same type of are events reduced of same type are reduced 209 1. If v2 >v209+ .2, the1. If newv event>v is+ ..2, Besides, the new if event is248. Besides,acteristics. if 0 #We will avoid" using the term#state246 here,this" as needs it# will be to used be more specifically242 specif-into indicated one single in state. the242 pattern Thisinto one practically de- single state. means This in particular practically means in particular is respectively (left diagram), (middle)M and (right).M M 249# ForM that reason, to each state are associated three at- 210 M M vically>v in section210. 3.2.2.2, the mentalv >v model is.2, set the to mentalt = model is set to t = ↓ # ↑ 247 243 that successive repetition243 of a given drum (while the other (v ,v )=(3 v ,v# ) (1)# # scription, as it plays# an important250 role in the# patternthat char- successive repetition of a given drum (while the other 211 as well,In# this and" paper,211 itsvelocity profiledesignates is identified the dynamics with theof drum current strokes. tributes: 3.2 Two-State Alternation For given values for v (X axis)# and" v (Y axis) isas indicated well, and its profile is identified244 drum with theremains current silent) are collapsed into one single event. 2 We will avoid↓ using the term state here, as↑ it will be used more specif- 248 acteristics. 244 drum remains silent) are collapsed into one single event. 212 event: 212 event: 245 However, if the most accented stroke is not the first one, the next state:ically(red) in section 3.2.2.(blue), 0 (white). See the249 text forFor that reason, to each state are associated245 threeHowever, at- if the most accented stroke is not the first one, 3 The second hypothesis developed in the proposed ap- ↓In this paper,↑velocity designates the dynamics of drum strokes. 250 tributes: 246 this needs to be246 specificallythis needs indicated to be specifically in the pattern indicated de- in the pattern de- M M M M more explanation. (v ,v )=(v ,v ) (v ,v )=((1)v247,v scription,) as it(1) plays247 anscription, important as role it plays in the an pattern important char- role in the pattern char- # " # " # " # " proach is that drum patterns can be reduced as an alter- 2 248 acteristics. We will avoid using the term2 Westate willhere, avoid as itusing will thebe used term morestate specif-here, as it will be used more specif- 248 acteristics. ically in section 3.2.2. ically in section 3.2.2. 249 For that reason,249 to eachFor state that arereason, associated to each three state at- are associated three at- nation between the Low and High drums. This can be for- 3 In this paper, velocity designates3 In this the paper, dynamicsvelocity of drumdesignates strokes. the dynamics250 oftributes: drum strokes. 250 tributes: malised as an alternation between two states, respectively 1. If v > v + .2, the next state is . This corresponds ↓ ↑ ↓ designated and . Figures 2 and 4 shows the reduction to the region in dark red in each diagram in Figure ↓ ↑ corresponding to the sequences in Figures 1 and 3. 5. The profile is updated. In other words, when the Low stroke is significantly stronger than the High 3.2.1 Handling of Simultaneous Strokes stroke, the new state is clearly . ↓ At first sight, to detect the and states, we would sim- ↓ ↑ 2. Similarly, if v > v + .2, the next state is (dark ply need to look for the location of alternation between the ↑ ↓ ↑ blue region in Figure 5) and its profile updated. Low and High drums. The difficulty comes from the fact that the two drums are often played together. In such occa- 3. If the current state is 0 (i.e, undefined), the next state sions, inference of or state relies on the context defined is if v > .2 (light red region in the left diagram of ↓ ↑ ↓ ↓ by the recent past. Figure 5). In other words, at the very beginning of For instance in Figure 3, kick and snare drums are a drum pattern, an ambiguous mixture of Low and played together at the beginning of beats 5, 6 and 8. Be- High drums is associated with , because it is con- ↓ cause the kick drum plays a clear at the beginning of sidered as the default state (while is a departure ↓ ↑ beat 4 (i.e., 4.1), which continues with snare drum play- from the default state). If on the contrary v .2 ↓ ≤ ing gently (from 4.2 to 4.4). We haven’t heard any clear the next state is if v > .2 or if v = 0 (light blue ↑ ↑ ↓ state. Therefore, when snare drum and kick drums are region). ↑ superposed on the next beat (5.1), this is perceived as a . ↑ 4. If v = 0 and if the current state is , the next state But because the snare drum is played loudly on the next ↓ vM ↓ tick (5.2), the next beat (6.1) is rather perceived as a . is if v > ↑ . If on the contrary the current state ↓ ↑ ↑ 2 Same for 8.1. is already , the profile is updated only if v > vM . We propose to explain this phenomenon by hypothesis- ↑ ↑ ↑ M ing that the categorisation vs. is based on comparing 5. If the current state is and v > v + .2, the next ↓ ↑ ↑ ↓ ↑ state is (light blue region in the middle diagram each new drum stroke (or superposed strokes) to mental ↑ models of and states. And in our modelling, this can be of Figure 5). In other words, a significant increase ↓ ↑ simplified by comparing to only the mental model related in velocity of the High strike (even if the velocity to one of the states and , namely the state considered as remains lower than the Low strike’s one) triggers a ↓ ↑ transition to the state. currently active. ↑ More precisely, the mental model is represented by a 6. Similarly, if the current state is and v > vM + state s , , 0 (where state 0 is used solely when start- ↑ ↓ ↓ ∈ {↓ ↑ } .2, the next state is (light red region in the right ing the analysis, before the first actual state or has ↓ ↓ ↑ diagram of Figure 5). been detected) altogether with a series of two velocities (vM , vM ) [0, 1]2, called profile of the mental model, ↓ ↑ ∈ 3.2.2 State Onset and Accent Times and first initiated to (0, 0). The profile is updated with the current configuration The approach presented in the previous section leads to the generation of a sequence of states and . Successive rep- ↑ ↓ (vM , vM ) = (v , v ) (1) etitions of a same state are reduced into one single state. ↓ ↑ ↓ ↑ This practically means in particular that successive repeti- every time there is a state transition. tions of a given drum (while the other drum remains silent) At any given successive point in the drum pattern, fea- are collapsed into one single state. turing at that instant a Low drum stroke of velocity v However, if the most accented stroke is not the first one, ↓ ∈ [0, 1] (with 0 indicating no stroke) and a High drum stroke this needs to be specifically indicated in the pattern de- of velocity v [0, 1], we consider the following alterna- scription, as it plays an important role in the pattern char- ↑ ∈ tives, which are tested in the indicated order until a suitable acteristics. For that reason, with each state (excepted the condition is found: initial 0 state) are associated three attributes:

320 Proceedings of the 21st ISMIR Conference, Montreal,´ Canada, October 11-16, 2020

!( ) !( ) ↓ ↓ ↑ ↑ ! S S OA ↓ ↓ ↓ ↓ ( ) S B ↓ ↓ ↑ ! OA S S ↑ ↑ ↑ ↑ ( ) B S ↑ ↓ ↑

Table 1. Alignment configurations, triggered by state tran- sitions, in the first sequence (left column) or/and in the sec- ond sequence (top row). In the absence of transition in any Figure 6. Alignment between two reduced drum patterns of the two sequences, no alignment configuration needs to A and B. Penalties are given for each individual state, in- be defined. dicated with a number or ’P’. See the text for more expla- nation. whether it is or • ↓ ↑ From configuration A or B, if only one sequence the onset time, i.e., the temporal position of the first • • transitions, this leads to S, while if both sequences stroke in that successive repetition of strokes transition at the same time, this leads to opposite the accent time, i.e., the temporal position of the states O. • most accented stroke. From O, another double-transition leads to another • O, while a single transition leads to either A or B. 4. BISTATE SEQUENCE ALIGNMENT The third hypothesis developed in this paper is that the per- 4.2 Numerical distance estimation ceived distance between two drum patterns can be approxi- The proposed distance model is based on an evaluation of mated by aligning the two corresponding bistate sequences how well each of the two drum patterns aligns with the and estimating the alignment mismatches. other. For each successive state of each drum pattern, a penalty is given if that state is not found in the other se- 4.1 Alignment principle quence around the same temporal position. Offset in the To allow the formalisation of this alignment, the set of two temporal positions of the state on the two sequences also states , is further decomposed into four states !, contributes to the penalty. {↓ ↑} {↓ ↑ !, ( ), ( ) in order to take into account all the temporal The basic mechanisms are illustrated below using the ↓ ↑ } relationships: toy example shown in Figure 6. In this first example, we assume that both sequences start at the same state ( ). ! and ! corresponds to a new transition to, respec- ↓ •↓ ↑ tively, and . The first state starts at the same time (t = 1) on ↓ ↑ • both sequences, leading to a S configuration with ( ) and ( ) indicates a continuation of the corre- ↓ • ↓ ↑ 0 penalty (shown in green in the Figure). sponding states and . ↓ ↑ The next event appears first in sequence A (A at This leads to a 4 4 matrix of possible alignment con- • ↑ × t = 2). The same transition appears next in sequence figurations, shown in Table 1. We can distinguish three B(S at t = 3). It is thus a quasi-synchronised tran- types of alignment configurations: ↑ sition, with 1 tick delay between the two sequences. The two sequences are in same state (“S”), both low A penalty of 1 is therefore given to the new state in • (“S ”) or both high (“S ”). each sequence. (The onset of each state is shown ↓ ↑ in orange while the position of the opposite state is One sequence introduces the opposite state ahead shown in yellow.) • of the other sequence: “A” if it the first sequence ahead, “B” if it the second sequence. A transition B at t = 4 is followed by another • ↓ transition back to S . This state is deleted by The two sequences are in opposite states (“O”). ↑ ↓ • giving the maximum penalty P (in red) and the new state at t = 5 (in blue) is ignored. The possible transitions between configurations are the ↑ following, as described in Table 1: A transition A at t = 6 is followed by a double • ↓ Starting from the same state (for instance S ), then transition O at t = 8, which is considered as both • ↓ a 2-tick-delayed transition of B to state and as the both sequences transition at the same time (for in- ↓ start of a new transition A . Same for the subse- stance S ). ↑ ↑ quent double transition O at t = 9. The new tran- Starting from the same state (for instance S ), then sition A being followed by a transition S , that • ↓ ↓ ↑ one sequence transitions first (for instance A ). state is removed. ↑ ↓ 321 Proceedings of the 21st ISMIR Conference, Montreal,´ Canada, October 11-16, 2020

Similarity Model r p Hamming Distance 0.604 2.97e-9 Hamming Distance (2 channels) 0.539 2.58e-7 Bistate Sequence Alignment 0.556 8.49e-8 min(Hamming (2 chan), Alignment) 0.606 2.65e-9 min(Hamming, Alignment) 0.692 1.21e-12

Table 2. Pearson correlation coefficient r and p-value be- tween mean similarity ratings and distance models.

Figure 7. Alignment between two reduced drum patterns distributed. More information on the collection of this A and B. Boxed series of cells show states containing re- dataset may be found in [5]. peated strokes, where the accent is the rightmost cell. To test the overall extent to which the bistate sequence alignment algorithm corresponds to perceived similarity, we calculated the Pearson correlation between the distance Sequence A continuing after the end of sequence B, • calculated by the bistate sequence alignment algorithm and any transition to the opposite state A (here only at ↓ the mean of human-supplied similarity ratings. We used t = 11) is deleted. the established Hamming distance as a reference for eval- In Figure 7, the sequences starting in opposite states, an uation, since, as indicated in Section 2, it gave the best additional state is added before the start of sequence B, results in previous works. The first step of our approach, ↓ to which is given a penalty of 4. In this example, since the presented in section 3.1, can be studied separately by also next event is a in sequence B, the previous is removed. computing the Hamming distance on the two main chan- ↓ ↑ The rest of Figure 7 illustrates the handling of states nels of the drum patterns. containing repeated strokes with the accentuated stroke ap- pearing after the initial onset. Concerning the transition to 5.2 Results and Discussion first by B at t = 3 and then by A at t = 5, the penalty ↑ The results can be seen in Table 2. Various values for the is only 1 because it is computed between the accent in se- parameter P were tried and the best results were obtained quence B (at t = 4) and the onset of sequence A. For the with P = 8. The Hamming distance’s correlation on the same reason, the next transition to at t = 7 is considered ↓ complete drum patterns is slightly stronger than the bistate without penalty. While B remains in , with its attack at ↓ sequence alignment, which is itself very slightly better than t = 9, A transitions to with an attack at t = 8, which is ↑ the Hamming distance computed on the two main drum therefore deleted. channels. Each penalty is weighted with respect to the corre- When combining the Hamming distance (computed on sponding accent velocity. For instance in the case of a the whole drum loops) with the alignment by taking the drum stroke of very low intensity (such as a ), minimum of them both, the correlation is better than the the contribution of this penalty to the overall distance will Hamming distance alone, with a correlation of 0.692 vs be low. Finally the distance between the two sequences is 0.604. This increase in correlation was found to be statisti- defined as the maximum between the two weighted sums. cally significant (t=1.748, p=0.04). This could indicate that both these two algorithms capture fundamentally different aspects of similarity, with the Hamming distance capturing 5. EVALUATION low-level similarities between rhythms, and the bistate se- To investigate the potential use of the bistate sequence quence alignment capturing qualities relating to rhythmic alignment algorithm for drum pattern similarity modelling, interaction and structure. we examined its calculated distances between 80 pairs of The difference between the distances can be seen in Fig- drum patterns in our dataset in relation to human-provided ure 8 where the similarity ratings of all 80 pairs are plotted perceptual similarity ratings. alongside distances calculated by both the Hamming dis- tance and the bistate sequence alignment algorithm. The differences between the bistate sequence alignment 5.1 Methodology algorithm and Hamming distance can be further demon- We collected similarity ratings for the drum patterns via strated through viewing of some particular examples. Fig- an online listening test. The symbolic patterns were first ure 9 shows a pair of Soul Grooves patterns that share very synthesized into audio via a generic sampled drum kit. 21 similar kick and snare drum patterns while differing on listeners rated similarity for the 80 pairs of patterns using a other drum tracks. Since the similarity was judged by the continuous scale. Median internal consistency for all par- listeners as high, this demonstrates the interest, in partic- ticipants calculated using the ICC (2,1) for the repeated ular cases such as this one, in focusing the comparison of pairs was 0.85, equaling good agreement. The inter-rater drum patterns on the main kick and snare drums. And in- agreement for the 21 listeners calculated using the same deed, computing the Hamming distance on those two main ICC (2,1) was 0.73, equally moderate-to-good agreement. channels leads to a similarly low distance value (cf. red The spreads of ratings for each comparison were normally arrow in Figure 8).

322 Proceedings of the 21st ISMIR Conference, Montreal,´ Canada, October 11-16, 2020

Figure 8. Comparison of the Z-score of the new proposed distance (in blue), the Hamming distance (in yellow), the Hamming distance on the two main channels (in purple) and the listeners’ ratings (in red), for each of the 80 pairs of drum patterns. The red and blue arrows indicate the examples shown in, respectively, Figures 9 and 10.

crash crash high tom high tom

mid tom mid tom ride ride low tom low tom open open crash crash high hat high hat closed closed ride ride high hat high hat closed closed high hat high hat snare snare snare snare

kick kick kick kick 12345678 12345678 12345678 12345678

high high

low low

12345678 12345678 high high

low low

12345678 12345678

Figure 9. Alignment between drum patterns Soul Grooves Figure 10. Alignment between drum patterns Batucara 3 2 (top left and center) and Soul Grooves 31 (top right and and Cascara 5 using the same conventions as in Figure 9. bottom). The only penalties contributing to the distance is a 1-tick offset (in orange) and a deleted state (in red). As it could have been expected, the comparison be- tween the proposed distance and a simple surface-level Figure 10 illustrates the interest of the method beyond Hamming distance concerning their abilities to mimic lis- the simple selection of two main channels. In this example, teners similarity judgments shows that for the most part, the two sequences differ significantly at the surface level, listeners rely on surface characteristics of the overall rhyth- even if we restrict the scope on the kick and snare drums. mic texture to compare drum patterns. However, in par- On the other hand, the corresponding reduced bistate se- ticular cases the underlying core structure can be of im- quences are similar and show a clear alignment, leading portance as well, and this is where the proposed distance to a relatively low distance on par with the listeners’ juge- can be of interest. Combining surface-level and deeper- ment. level representations seems to improve the overall similar- ity modelling. We may hypothesise that further progress in this endeavour could be made possible through the inte- 6. CONCLUSIONS gration of more refined cognitive models. This article proposed an attempt to reduce drum patterns into an underlying core structure with the aim to compare 7. ACKNOWLEDGEMENTS drum patterns by aligning their reduced representation one with each other. Clearly the problem of rhythmic reduction This work was partially supported by the Research Coun- is of high complexity, and is addressed here solely as an in- cil of Norway through its Centers of Excellence scheme, termediary step for the establishment of a distance measure project number 262762, the MIRAGE project, grant num- between drum patterns. These mechanisms of selection of ber 287152 and the TIME project, grant number 249817. dominant drums and of inference of Low and High states The work was also funded by the UK Engineering and could be studied further in order to provide more advanced Physical Sciences Research Council (EPSRC) and by a description of drum patterns. Short-Term Scientific Mission grant provided by Nord-

323 Proceedings of the 21st ISMIR Conference, Montreal,´ Canada, October 11-16, 2020

Forsk’s Nordic University Hub “Nordic Sound and Mu- sic Computing Network—NordicSMC”, project number 86892.

8. REFERENCES [1] W. Anku, “Principles of rhythm integration in african drumming,” Black Music Research Journal, pp. 211– 238, 1997. [2] S. Handel and G. R. Lawson, “The contextual nature of rhythmic interpretation,” Perception & Psychophysics, vol. 34, no. 2, pp. 103–120, 1983. [3] M. A. Witek, E. F. Clarke, M. L. Kringelbach, and P. Vuust, “Effects of polyphonic context, instrumen- tation, and metrical location on syncopation in mu- sic,” Music Perception: An Interdisciplinary Journal, vol. 32, no. 2, pp. 201–217, 2014. [4] FXpansion, “BFD3 https://www.fxpansion.com/ products/bfd3/,” Accessed May 2020. [5] F. Bruford, M. Barthet, S. McDonald, and M. San- dler, “Modelling musical similarity for drum patterns: A perceptual evaluation,” in Proceedings of the 14th International Audio Mostly Conference: A Journey in Sound, 2019, pp. 131–138. [6] R. Vogl, M. Leimeister, C. Ó. Nuanáin, M. Hlatky, S. Jordà Puig, and P. Knees, “An intelligent interface for drum pattern variation and comparative evaluation of algorithms,” Journal of the audio engineering Soci- ety. 2016; 65 (7/8): 503-13., 2016. [7] C. O. Nuanáin, P. Herrera, and S. Jorda, “Target-based rhythmic pattern generation and variation with genetic algorithms,” in Sound and Music Computing Confer- ence, 2015. [8] F. Bruford, M. Barthet, S. McDonald, and M. Sandler, “Groove Explorer: An Intelligent Visual Interface for Drum Loop Library Navigation,” in IUI Workshops, 2019. [9] D. Gómez-Marín, S. Jorda, and P. Hererra, “Drum rhythm spaces: from global models to style-specific maps,” in International Symposium on Computer Mu- sic Multidisciplinary Research (CMMR), 2017. [10] G. T. Toussaint, “A comparison of rhythmic similarity measures.” in Proc. of the International Society for Mu- sic Information Retrieval Conference (ISMIR), 2004. [11] C. Guastavino, F. Gómez, G. Toussaint, F. Maran- dola, and E. Gómez, “Measuring similarity between flamenco rhythmic patterns,” Journal of New Music Research, vol. 38, no. 2, pp. 129–138, 2009. [12] D. Gómez-Marín, S. Jordà, and P. Herrera, “Pad and Sad: Two awareness-Weighted rhythmic similarity dis- tances,” in Proceedings of the 16th International So- ciety for Music Information Retrieval Conference (IS- MIR), 2015.

324