A Topic Model for Melodic Sequences

A Topic Model for Melodic Sequences Athina Spiliopoulou [email protected] Amos Storkey [email protected] School of Informatics, University of Edinburgh Abstract the Variable-gram Topic model which employs a Dirich- let Variable-Length Markov Model (Dirichlet-VMM) We examine the problem of learning a proba- (Spiliopoulou & Storkey, 2011) for the parametrisation bilistic model for melody directly from musical of the topic distributions over words. The Dirichlet- sequences belonging to the same genre. This VMM models the temporal structure by learning con- is a challenging task as one needs to capture texts of variable length that are indicative of the future. not only the rich temporal structure evident in At the same time, the latent topics represent different music, but also the complex statistical depen- music regimes, thus allowing us to model the different dencies among different music components. styles, tonalities and dynamics that occur in music. To address this problem we introduce the The model does not make any assumptions explicit Variable-gram Topic Model, which couples to music, but it is particularly suitable in the music the latent topic formalism with a systematic context, as it is able to model temporal dependencies of model for contextual information. We evalu- considerable complexity without enforcing a stationar- ate the model on next-step prediction. Addi- ity assumption for the data. Each sequence is modelled tionally, we present a novel way of model eval- as a mixture of latent components (topics), and each uation, where we directly compare model sam- component models Markov dependencies of different ples with data sequences using the Maximum order according to the statistics of the data that are Mean Discrepancy of string kernels, to assess assigned to it. how close is the model distribution to the data distribution. We show that the model has the To evaluate the performance of the model we perform highest performance under both evaluation a comparative analysis with related models, using two measures when compared to LDA, the Topic metrics. The first one is the average next-step predic- Bigram and related non-topic models. tion log-likelihood of test sequences under each model. The second is the Maximum Mean Discrepancy (MMD) 1. Introduction (Gretton et al., 2006) of string kernels computed be- tween model samples and test-data sequences. In both Modelling the real-world complexity of music is an in- evaluations, we find that using topics improves per- teresting problem for machine learning. In Western formance, but it does not overcome the need for a music, pieces are typically composed according to a systematic temporal model. The Variable-gram topic system of musical organization, rendering musical struc- model, which couples these two strategies has the high- ture as one of the fundamentals of music. Nevertheless, est performance under both evaluation objectives. characterizing this structure is particularly difficult, as it depends not only on the realization of several musical The contributions of this paper are: (a) We intro- elements, such as scale, rhythm and meter, but also on duce the Variable-gram Topic model, which extends the relation of these elements both within single time the topic modelling methodology by considering condi- frames and across time. This results in an infinite num- tional distributions that model contextual information ber of possible variations, even within pieces from the of considerable complexity. (b) We introduce a novel same musical genre, which are typically built according way of evaluating generative models for discrete data. to a single musical form. This employs the MMD of string kernels to directly compare model samples with data sequences. To tackle the problem of melody modelling we propose 2. Background Appearing in Proceedings of the 29 th International Confer- ence on Machine Learning, Edinburgh, Scotland, UK, 2012. A number of machine learning and statistical ap- Copyright 2012 by the author(s)/owner(s). proaches have been suggested for music related prob- A Topic Model for Melodic Sequences lems. Here we discuss methods that take as input discrete music sequences and attempt to model the melodic structure. Lavrenko & Pickens(2003) propose Markov Random Fields (MRFs) for modelling poly- phonic music. The model is very general, but in order to remain tractable much information is discarded, thus making it less suitable for realistic music. Weiland et al. (2005) propose a Hierarchical Hidden Markov Model (HHMM) for pitch. The model has three internal states Figure 1. An example Dirichlet-VMM tree for a binary se- that are predefined according to the structure of the mu- quence. Contexts 01 and 11 are only observed once and sic genre examined. Eck & Lapalme(2008) propose an thus are not included in the tree. Note that for readability, LSTM Recurrent Neural Network for modelling melody. contexts in this figure are denoted in chronological order. The network is conditioned on the chord and certain distributions over words. We begin with a description previous time-steps, chosen according to the metrical of the Dirichlet-VMM. boundaries. Paiement et al.(2009) provide an interesting approach that incorporates musical knowldege in 3.1. The Dirichlet-VMM the melody modelling task. They define a graphical model for melodies given chords, rhythms and a se- The Dirichlet-VMM is a Bayesian hierarchical model for quence of Narmour features, which are extracted from discrete sequential data defined over a finite alphabet. an Input-Output HMM conditioned on the rhythm. It models the conditional probability distribution of the next symbol given a context, where the length of the A very successful line of research examines the transfer context varies according to what we actually observe. of methodologies from the fields of statistical language Long contexts that occur frequently in the data are modelling and text compression to the modelling of mu- used during prediction, while for infrequent ones, their sic. Dubnov et al.(2003) propose two dictionary-based shorter counterparts are used. prediction methods, Incremental Parsing (IP) and Pre- diction Suffix Trees (PSTs), for modelling melodies Similarly to a VMM, the model is represented by a with a Variable-Length Markov model (VMM). De- suffix tree that stores contexts as paths starting at the spite its fairly simple nature the VMM is able to cap- root node; the deeper a node in the tree the longer ture both large and small order Markov dependencies the corresponding context. The depth of the tree is and achieves impressive musical generations. Begleiter upper bounded by L, the maximum allowed length for et al.(2004) study six different alogrithms for training a a context. The tree is not complete; only contexts that VMM. These differ in the way they handle the counting occur frequently enough in the data and convey useful of occurences, the smoothing of unobserved events and information for predicting the next symbol are stored. the variable-length modelling. Spiliopoulou & Storkey The Probabilistic Suffix Tree algorithm for constructing (2011) propose a Bayesian formulation of the VMM, a VMM tree is detailed in Ron et al.(1994). the Dirichlet-VMM, for the problem of melody mod- In contrast to the VMM, parameter estimation in the elling. The model is shown to significantly outperform Dirichlet-VMM is driven by Bayesian inference. Let a VMM trained using the PST algorithm. Finally, an w denote a symbol from the alphabet and j index the interesting application of dictionary-based predictors nodes in the tree, with cj = w1 : : : w`; ` 2 f1;:::;Lg, in the music context is presented in Pearce & Wiggins denoting the context of node j. Each node j is identified (2004). They describe a multiple viewpoint system by the conditional probability distribution of the next comprising a cross-product of Prediction by Partial symbol given context cj, which we denote by φ ≡ Match (PPM) models. ijj P (w = ijcj). In the Dirichlet-VMM this distribution is 3. The Variable-gram Topic Model modelled through a Dirichlet prior centred at the parent node, φj ∼ Dirichlet(βφpa(j)), where β denotes the In this section we introduce the Variable-Gram Topic concentration parameter of the Dirichlet distribution, model, which we later apply to melodic sequences. In pa(j) denotes the parent of node j, with corresponding the context of music modelling, documents correspond context cpa(j) = w1 : : : w`−1, and we have used the to music pieces and words correspond to notes. The bold notation φj to denote the parameter vector φ·|j. Variable-Gram Topic model extends Latent Dirichlet An example Dirichlet-VMM is depicted in Figure1. Allocation (LDA) by employing the Dirichlet Variable- Due to the conjugacy of the Dirichlet distribution Length Markov model (Dirichlet-VMM) (Spiliopoulou to the multinomial, posterior inference in this model & Storkey, 2011) for the parametrisation of the topic ^ is exact. Let φijj ≡ P (w = ijcj;D) denote the es- A Topic Model for Melodic Sequences Algorithm 1 Generative Process for the Variable- gram Topic model. Input: Dirichlet-VMM T , K, Θ, Φ for each document d in the corpus ! do for each time-step t; t 2 f1; :::; Tdg in d do Choose a topic zt;d ∼ Multinomial(θd) Choose a word wt;d ∼ Multinomial(φ ) cwt;d ;zt;d end for end for Figure 2. Graphical models: (a) LDA without the plate notation for words (b) Variable-gram Topic model. Let j index the leaf nodes of a Dirichlet-VMM, i.e. the contexts that can be used during prediction, and timate for φ after observing data D. We have ijj cw = wt−1 : : : wt−`; ` 2 f1;:::;Lg denote the context ^ ^ t φj ∼ Dirichlet(βE[φpa(j)] + N·|j), where N·|j denotes of word wt. The parameters φk characterising word the counts associated with context j in the data and generation within topic k are defined by φijj;k ≡P (wt = E[·] denotes expectation.

A Topic Model for Melodic Sequences

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support