A Distance Model for Rhythms
Total Page:16
File Type:pdf, Size:1020Kb
A Distance Model for Rhythms Jean-Fran¸cois Paiement [email protected] Yves Grandvalet [email protected] IDIAP Research Institute, Case Postale 592, CH-1920 Martigny, Switzerland Samy Bengio [email protected] Google, 1600 Amphitheatre Pkwy, Mountain View, CA 94043, USA Douglas Eck [email protected] Universit´ede Montr´eal, Department of Computer Science, CP 6128, Succ. Centre-Ville, Montr´eal, Qu´ebec H3C 3J7, Canada Abstract nested periodic components. Such a hierarchy is im- Modeling long-term dependencies in time se- plied in western music notation, where different levels ries has proved very difficult to achieve with are indicated by kinds of notes (whole notes, half notes, traditional machine learning methods. This quarter notes, etc.) and where bars establish measures problem occurs when considering music data. of an equal number of beats. Meter and rhythm pro- In this paper, we introduce a model for vide a framework for developing musical melody. For rhythms based on the distributions of dis- example, a long melody is often composed by repeating tances between subsequences. A specific im- with variation shorter sequences that fit into the met- plementation of the model when consider- rical hierarchy (e.g. sequences of 4, 8 or 16 measures). ing Hamming distances over a simple rhythm It is well know in music theory that distance patterns representation is described. The proposed are more important than the actual choice of notes in model consistently outperforms a standard order to create coherent music (Handel, 1993). In this Hidden Markov Model in terms of conditional work, distance patterns refer to distances between sub- prediction accuracy on two different music sequences of equal length in particular positions. For databases. instance, measure 1 may be always similar to measure 5 in a particular musical genre. In fact, even random music can sound structured and melodic if it is built by 1. Introduction repeating random subsequences with slight variation. Many algorithms have been proposed for audio beat Reliable models for music would be useful in a broad tracking (Dixon, 2007; Scheirer, 1998). Probabilistic range of applications, from contextual music genera- models have also been proposed for tempo tracking tion to on-line music recommendation and retrieval. and inference of rhythmic structure in musical audio However, modeling music involves capturing long-term (Whiteley et al., 2007; Cemgil & Kappen, 2002). The dependencies in time series, which has proved very dif- goal of these models is to align rhythm events with ficult to achieve with traditional statistical methods. the metrical structure. However, simple Markovian as- Note that the problem of long-term dependencies is sumptions are used to model the transitions between not limited to music, nor to one particular probabilis- rhythms themselves. Hence, these models do not take tic model (Bengio et al., 1994). into account long-term dependencies. A few generative Music is characterized by strong hierarchical depen- models have already been proposed for music in gen- dencies determined in large part by meter, the sense eral (Pachet, 2003; Dubnov et al., 2003). While these of strong and weak beats that arises from the inter- models generate impressive musical results, we are not action among hierarchical levels of sequences having aware of quantitative comparisons between models of th music with machine learning standards, as it is done Appearing in Proceedings of the 25 International Confer- in Section 3 in terms of out-of-sample prediction ac- ence on Machine Learning, Helsinki, Finland, 2008. Copy- right 2008 by the author(s)/owner(s). curacy. In this paper, we focus on modeling rhyth- A Distance Model for Rhythms mic sequences, ignoring for the moment other aspects a learning process requiring a prohibitive amount of of music such as pitch, timbre and dynamics. How- data: in order to learn long range interactions, the ever, by capturing aspects of global temporal struc- training set should be representative of the joint dis- ture in music, this model should be valuable for full tribution of subsequences. To overcome this problem, melodic prediction and generation: combined with an we summarize the joint distribution of subsequences audio transcription algorithm, it should help improve by the distribution of distances between these sub- the poor performance of state-of-the-art transcription sequences. This summary is clearly not a sufficient systems; it could as well be included in genre classifiers statistics for the distribution of subsequences, but its or automatic composition systems (Eck & Schmidhu- distribution can be learned from a limited number of ber, 2002); used to generate rhythms, the model could examples. The resulting model, which generates dis- act as a drum machine or automatic accompaniment tances, is then used to recover subsequences. system which learns by example. Our main contribution is to propose a generative 2.2. Decomposition of Distances model for distance patterns, specifically designed for l l Let D(x ) = (di,j)ρ×ρ be the distance matrix asso- capturing long-term dependencies in rhythms. In Sec- l l l l ciated with each sequence x , where di,j = d(yi, yj). tion 2, we describe the model, detail its implemen- Since D(xl) is symmetric and contains only zeros on tation and present an algorithm using this model for the diagonal, it is completely characterized by the up- rhythm prediction. The algorithm solves a constrained per triangular matrix of distances without the diago- optimization problem, where the distance model is nal. Hence, used to filter out rhythms that do not comply with the inferred structure. The proposed model is evalu- ρ−1 ρ l Y Y l ated in terms of conditional prediction error on two p(D(x )) = p(di,j|Sl,i,j) (1) distinct databases in Section 3 and a discussion fol- i=1 j=i+1 lows. where S = {dl | (1 < s < j and 1 ≤ r < s) 2. Distance Model l,i,j r,s (2) or (s = j and 1 ≤ r < i)} . In this Section, we present a generative model for In words, we order the elements column-wise and do distance patterns and its application to rhythm se- a standard factorization, where each random variable quences. Such a model is appropriate for most music depends on the previous elements in the ordering. data, where distances between subsequences of data Hence, we do not assume any conditional indepen- exhibit strong regularities. dence between the distances. 2.1. Motivation Since d(yi, yj) is a metric, we have that d(yi, yj) ≤ d(y , y ) + d(y , y ) for all i, j, k ∈ {1, . , ρ}. This in- l l l m i k k j Let x = (x1, . , xm) ∈ R be the l-th rhythm se- equality is usually referred to as the triangle inequality. 1 n quence in a dataset X = {x ,..., x } where all the Defining sequences contain m elements. Suppose that we con- l l l struct a partition of this sequence by dividing it into αi,j = min (dk,j + di,k) and l l l k∈{1,...,(i−1)} ρ parts defined by yi = (x1+(i−1)m/ρ, . , xim/ρ) with l l l (3) βi,j = max (|dk,j − di,k|) , i ∈ {1, . , ρ}. We are interested in modeling the dis- k∈{1,...,(i−1)} tances between these subsequences, given a suitable we know that given previously observed (or sampled) metric d(y , y ): m/ρ × m/ρ → . As was pointed i j R R R distances, constraints imposed by the triangle inequal- out in Section 1, the distribution of d(y , y ) for each i j ity on dl are simply specific choice of i and j may be more important when i,j modeling rhythms (and music in general) than the ac- l l l βi,j ≤ di,j ≤ αi,j . (4) tual choice of subsequences yi. One may observe that the boundaries given in Eq. (3) Hidden Markov Models (HMM) (Rabiner, 1989) are contain a subset of the distances that are on the con- commonly used to model temporal data. In princi- ditioning side of each factor in Eq. (1) for each indexes ple, an HMM is able to capture complex regularities i and j. Thus, constraints imposed by the triangle in- in patterns between subsequences of data, provided equality can be taken into account when modeling each its number of hidden states is large enough. However, factor of p(D(xl)): each dl must lie in the interval im- when dealing with music, such a model would lead to i,j posed by previously observed/sampled distances given A Distance Model for Rhythms in Eq. (4). Figure 1 shows an example where ρ = 4. eral edit distance such as the Levenshtein distance. l Using Eq. (1), the distribution of d2,4 would be condi- However, this approach would not make sense psycho- l l l l tioned on d1,2, d1,3, d2,3, and d1,4, and Eq. (4) reads acoustically: doing an insertion or a deletion in a l l l l l rhythm produces a translation that alters dramatically |d1,2 − d1,4| ≤ d2,4 ≤ d1,2 + d1,4. Then, if subsequences l l l l the nature of the sequence. Putting it another way, y1 and y2 are close and y1 and y4 are also close, we l l rhythm perception heavily depends on the position on know that y2 and y4 cannot be far. Conversely, if sub- sequences yl and yl are far and yl and yl are close, which rhythmic events occur. In the remainder of this 1 2 1 4 l l l paper, di,j is the Hamming distance between subse- we know that y2 and y4 cannot be close.