ARTICLE TYPE ¯ Identification using Repetitive Note Patterns from prescriptive notations of H. G. and Thippur V. Sreenivas Dept of ECE, Indian Institute of Science, Bangalore-12, India [email protected], [email protected]

Abstract Carnatic music, a form of Indian Art Music, has relied on an oral tradition for transferring knowledge across several generations. Over the last two hundred years, the use of prescrip- tive notations has been adopted for learning, sight-playing and sight-singing. Prescriptive notations offer generic guidelines for a raga¯ rendition and do not include information about the ornamentations or the gamakas, which are considered to be critical for characterizing a raga¯ . In this paper, we show that prescriptive notations contain raga¯ attributes and can reliably identify a raga¯ of Carnatic music from its octave-folded prescriptive notations. We restrict the notations to 7 notes and suppress the finer note position information. A dictio- nary based approach captures the statistics of repetitive note patterns within a raga¯ notation. The proposed stochastic models of repetitive note patterns (or SMRNP in short) obtained from raga¯ notations of known compositions, outperforms the state of the art melody based raga¯ identification technique on an equivalent melodic data corresponding to the same com- positions. This in turn shows that for Carnatic music, the note transitions and movements have a greater role in defining the raga¯ structure than the exact note positions.

Keywords: Prescriptive notation, N-multigram, variable length pattern, raga¯ identification

1. Introduction other ornamentations) while the notations of compo- Carnatic music is a form of Indian Art music which has sitions offer just basic skeletal structure. Some related been traditionally passed down through generations. socio-political and musicological history is chronicled It has largely followed an aural tradition. Early ma- in (Weidman, 2006), (Sambamoorthy, 1998). jor works of 9th century such as Matanga Muni’s Bhri- The role of prescriptive notations in conveying the haddesi contain solf`ege notation, akin to western mu- essence of a raga¯ within a composition has been de- arXiv:1711.11357v1 [eess.AS] 30 Nov 2017 sic (Doraiswamy, 2014). However, this notation has bated. Many Carnatic musicians are of the view that not been widely adopted. In the 17th century, Car- Carnatic music cannot be notated as it follows an aural natic music became codified the way we know it today tradition, and even if notated would not serve much through the work Chaturdandi Prakashika by Venkata- purpose (Vijayakrishnan, 2007) 2. In his detailed mu- makhin (Sriram, 2004). It is in this work, the melodic sicological study, Vijayakrishnan (2007) presents anal- framework referred to as ¯ , has been mathemat- ogy between Carnatic music and grammar. The work ically defined. While raga¯ renditions are associated expresses that notation is not indicative of many nu- with florid dynamism across notes, the notations used ances of ragas¯ , but motivates towards need for a de- are prescriptive in nature. These notations form the tailed (descriptive) notation. Contrarily, in (Subra- basic skeletal structure of a composition and are usu- manian, 2012), it is argued that the skeletal notation ally regarded as generic guidelines for a learner 1. allows a performer to interpret the composition ac- Thus, the renditions of compositions are replete with cording to their own music tradition. This can result gamakas (which involve pitch modulations, glides and in multiple ways of rendering a compositional nota- tion. There have been inceptive semi-automated sys- 1Descriptive notations (followed in western classical music) offer a precise rule, wherein the rendition exactly follows the note se- 2cf. http://www.kgvijayakrishnan.com/notation. quences in the notation. html 2 Raga¯ Identification using Repetitive Note Patterns from prescriptive notations of Carnatic Music

tems such as the 2 component stage-dance synthesis model (Subramanian et al., 2012) and the “Gayaka” system (Subramanian, 2012) to generate acceptable rendition(s) given the notations. Thus, the significance of skeletal structure of prescriptive notations in defin- ing ragas¯ in Carnatic music has been irresolute and inconclusive. This paper addresses a scientific curios- ity - ‘Do prescriptive notations contain any information about raga¯ structures?’3. We present a brief overview of notations used in Figure 1: [Color online] An illustration of musical no- Carnatic music in Section 1.1 before motivating the tation of a composition in Sriranjani raga¯ com- reader towards Carnatic music analysis from skeletal posed by Thyagaraja (Source: http://ananthp. notations (Section 1.2). github.io/carnatic_scores/). Three metric 1.1 An Overview of Carnatic Music Notation cycles of the transcription are shown. The text in Similar to the notes in solf`ege notation of west- black denotes the notes, and those in blue denotes ern music, the notes in Carnatic music are re- the composition’s lyrics. We highlight that the note ferred to as the swaras. They are represented as sequence does not include finer pitch position in- Sa, Ri1, Ri2, Ri3, Ga1, Ga2, Ga3, Ma1, Ma2, P a, formation. {Da , Da , Da , Ni , Ni , Ni . They correspond 1 2 3 1 2 3} to 12 pitch positions in an octave with Sa (re- Indian music notations Explanation ferred to as shadja) representing the tonic. The Swaras Sa, Ri, Ga, Whole-notes similar to relative frequency ratios of these notes are pre- Ma, P a, Da, Ni solf`ege notation in Western music End of a metric cycle sented in Appendix A. A raga¯ can use atleast 4 k A sub-measure of a metric cycle notes from the set Sa, Ri, Ga, Ma, P a, Da, Ni with, | { } tala¯ Musical meter Ri (Ri1, Ri2, Ri3), Ga (Ga1, Ga2, Ga3), Ma avarthana¯ Repeated cycle of tala¯ (Ma∈, Ma ), Da (Da∈ , Da , Da ) and N ∈ 1 2 ∈ 1 2 3 ∈ X˙ or X A note X in higher or lower octave (Ni , Ni , Ni ) (Raman,¯ 2008). Table 2 shows the ˙ 1 2 3 } X or X The note X is a semi-note notes within an octave used in some ragas.¯ or quarter-note Ragas¯ contain structures at 3 layers - (i) note posi- X,X; X, ; & X;; , Increase duration of note X by tions (ii) gamakas(iii) note movement. A raga¯ rendi- one, two, three, & six beats tion involves usage of selected notes along with those Table 1: Carnatic music notation alongside its expla- gamakas, which are characteristic transitions between nation notes in a melody. The skeletal notations of a compo- sition contain only information pertaining to note po- sitions and note movement. approaches for computational analysis of ragas¯ has A typical notation for a particular composition in mainly involved music recordings (Gulati et al., 2014, raga¯ Sriranjani is shown in Figure 1. Table 1 details 2015; Dutta and Murthy, 2014; Gulati et al., 2016; the current notations used in Carnatic music 4. A typ- Dutta et al., 2015). The large abundance of audio ical notation contains a sequence of notes and the as- recordings aid towards such approaches. However, sociated lyrics. It can also be viewed as a sequence there has been little work to analyze ragas¯ from of notes (or lyrics) set within multiple metric cycles their notations (Ross et al., 2017). This is more pro- (avarthanas¯ ). We emphasize that Carnatic music no- nounced in Carnatic music due to usage of gamakas in tation comprises of note sequences with coarse dura- renditions of compositions but seldom find repre- tional representation, while gamakas are not repre- sentations in notations. It is a popular belief that sented (Subramanian, 2013). Most notations contain gamakas can be used for distinguishing ragas¯ (Vi- a couple of commencement lines indicating the finer jayakrishnan, 2007; Krishna and Ishwar, 2012) 5.A note positions involved in the chosen raga¯ . However, study and analysis of gamakas shows that they con- the finer note positions are not indicated for the rest tain short-term structural information (Krishnaswamy, of the composition. Such notations have been success- 2003; Krishna and Ishwar, 2012). We hypothesize fully used for learning, sight-singing and sight-playing. longer structures of ragas¯ are reflected in the note 1.2 Motivation towards analysis of notations movements which are present in prescriptive nota- Computational analysis of ragas¯ involves identifica- tions. Albeit musical scores/notations could be asso- tion, synthesis or motif spotting for ragas¯ . Existing ciated with raga¯ information, there are pedagogical, music exercises recommended in acknowledged music 3Throughout this work, we use the term structure to refer to ar- rangement of notes within a raga¯ . references such as (Sambamoorthy, 1998) which chal- 4We use these notations consistently throughout this work. The Indian and Western terminologies here are analogous and not exact 5for example, raga¯ Thodi¯ and Sindhu are said to have counterparts. similar note sequences, but are distinguished by gamakas 3 Raga¯ Identification using Repetitive Note Patterns from prescriptive notations of Carnatic Music

Raga¯ (Abbreviations) Swaras cally model (fixed length as well as variable length) 1. Hari-Kambh¯ oji¯ (Hk) Sa, Ri2, Ga3, Ma1, P a, Da2, Ni2 recurrent note patterns (RNP) from notations using 2. Bhairavi (Bh) Sa, Ri2, Ga2, Ma1, P a, Da1, Da2, Ni3 7 basic notes (Sa, Ri, Ga, Ma, P a, Da, Ni) (ii) quan- 3. Shankarabharana¯ (Sb) Sa, Ri2, Ga3, Ma1, P a, Da2, Ni3 titatively measure agreements of these patterns with 4. Thodi¯ (Td) Sa, Ri1, Ga2, Ma1, P a, Da1, Ni2 musicologically valid reference phrases (MVRP) (iii) 5. Natta¯ (Na) Sa, Ri , Ga , Ma , P a, Da , Ni 3 3 1 3 3 test the impact of RNP as a useful representation for 6. Panthuvarali¯ (Pv) Sa, Ri1, Ga3, Ma2, P a, Da1, Ni3 raga¯ identification (RID) task (iv) compare the perfor- 7. Madhyamavathi¯ (Md) Sa, Ri2, Ma1, P a, Ni2 8. (Kh) Sa, Ri2, Ga3, Ma1, P a, Da2, Ni2 mance of state-of-the-art melody-based RID algorithm 9. B¯egada (Bg) Sa, Ri2, Ga3, Ma1, P a, Da2, Ni2, Ni3 on an analogous audio rendition dataset. 10. (Ka) Sa, Ri2, Ga3, Ma2, P a, Da2, Ni3 The paper is organized as follows: The mathemat- 11. ¯ (Sh) Sa, Ri , Ga , Ma , P a, Da , Ni 2 3 1 2 2 ical formulation required for computing the statistics 12. Reetigowla (Rg) Sa, Ri , Ga , Ma , P a, Da , Ni 2 2 2 2 2 of sub-sequences in a raga¯ is presented in Section 2. Table 2: Various ragas¯ and their set of notes within an The stochastic models used for computing the statistics octave. These do not convey any sequencing infor- of sub-sequences are described in Section 3. Using the mation. Entries 3 and 9 have same set of notes, but sub-sequence statistics we propose a dictionary based are melodically different. approach for raga¯ identification in Section 4. The ex- perimental setup with details on the database and the performance measures are detailed in Section 5. The lenge learners/enthusiasts to identify a raga¯ from only performance analysis of the proposed raga¯ identifica- 7 notes and prescriptive note movements; this is a non- tion methods based on notations and their compari- trivial task (such as identifying topic/author from text) son with audio based raga¯ identification is presented and suggests that it necessitates good knowledge of the in Section 6. The relation between the raga¯ identifica- semantics of a raga¯ . While grammar has been strongly tion algorithms is presented in Section 7. The discus- associated with Carnatic music, there is possibilities of sion on the results and the algorithms considered are presence of partial grammatical structures even within presented in Section 8. the prescriptive notations. With most of the compo- 1.4 Proximity to symbolic processing in Western sitions being associated with prescriptive notations, and folk music we present a computational analysis that is possible with such (prescriptive) information; the approach as There have been computational approaches proposed well as data is complementary to that in melodic con- to discover repeated music patterns from symbolic mu- tours. In this paper, we propose approaches to analyze sic representations. These have been used to find mo- ragas¯ from structures present in notations (which are tifs, themes, chorus and other musical structures on devoid of gamakas) thus addressing a gap in the lit- data ranging from Jazz music to Western art music; an erature. Further, this analysis presents an alternative overview of the approaches, challenges in attempting schema for similarity analysis of raga¯ structures using musical pattern discovery, using either approximate or notational/score based information. exact pattern matches, and in evaluating them in con- trast to an expert musician’s view point has been ad- 1.3 This paper: Raga¯ Identification based on Sub- dressed in detail in (Janssen et al., 2013). Geometric sequence Statistics approaches involving using notion of sequences as a In order to analyse ragas¯ and capture structures in no- shape have been found to be advantageous for poly- tations, we resort to chunking theory (Gobet et al., phonic music (Meredith et al., 2002). String based ap- 2001; Janata and Grafton, 2003; Rohrmeier and Re- proaches (involving string of notes/ pitch class/ MIDI buschat, 2012). A chunk is often defined as a collec- note numbers or XML data) generally involve finding P tion of elements having strong associations with each length pattern in a melody M by sliding all possible P other, but weaker associations with elements within patterns over M and tracking the matches and filtering other chunks (Gobet et al., 2001). Chunks can be of it using either rules motivated by Generative Theory fixed size or variable sizes. Notes in music can be seen of Tonal Music theorey (Nieto and Farbood, 2012) or to be analogous to words in natural language process- its frequency of occurrance/length/duration/rhythmic ing. In the latter, words combine (grammatically) to consistency (Meek and Birmingham, 2001). Other form meaningful phrases, sentences to convey an idea approaches involve finding long repeating patterns or topic; while in music, note sequences form musico- (Cambouropoulos, 2006), partitioning methods (Cam- logical phrases to impart a raga¯ bhava¯ (emotion) to the bouropoulos, 2006),(Karydis et al., 2007). We observe listener, and thus a raga¯ can be compared to a language that most approaches involve trees to represent the (Vijayakrishnan, 2007). Thus, analogous to comput- patterns/ phrases along with the associated frequency ing the word statistics to characterize a language (Bie- of occurrence information (Janata and Grafton, 2003). mann and Mehler, 2014), we compute the statistics of While many of these approaches use frequency of oc- sub-sequences of notes to characterize the structure currence to discover patterns, we need to compare the of ragas¯ . The focus of this work is to (i) stochasti- frequency of patterns across ragas¯ in order to compare 4 Raga¯ Identification using Repetitive Note Patterns from prescriptive notations of Carnatic Music

and contrast the structures present. Hence, we choose N-gram to use stochastic models. 2. Mathematical formulation Let the set of all possible notes for any raga¯ be denoted by the basic 7 notes, V = (i) (i) (i) (i) (i) … ut-2 ut-1 ut ut+1 ut+2 … Sa, Ri, Ga, Ma, P a, Da, Ni . Let a notation {of a raga¯ composition, r be A}. It comprises of a sequence of notes within metric cycles such that A = [A1,A2,A3,...,AI ], where Ai is a se- Figure 2: A graphical model depicting tri-gram lan- th quence of notes in i metric cycle denoted by guage model. (i) (i) (i) (i) Ai , [ut=1, ut=2, . . . , ut , . . . , ut=T ] i 1, 2,...,I i ∀ ∈ { } 6 (i) and I is the number of metric cycles . ut is the (i) The first term of (2) is the joint probability of an tth note in the ith metric cycle. u V and T is t ∈ i N-length sequence while the second term is the condi- the number of notes in Ai i.e., Ti = Ai where . tional probability of a note, given its N 1 predeces- denotes cardinality of a set. The vocabulary| | is the| | − V sors. When TA < N, the joint probability is approxi- same for all ragas¯ . Since we are interested in com- mated using the marginal probabilities. puting the frequency of repetition of sub-sequences Consider the set of all possible N- of notes for characterizing a raga¯ , we refer to any length sub-sequences, S = sm where note sub-sequence as “Repetitive Note Patterns” or (m) (m) (m) N { (m}) sm = [y , y , . . . , y ] V , yn V, RNP in short. We propose to discover RNP and their 1 2 N ∈ ∈ m [1, 2,..., VN ] and VN is the N-fold Cartesian statistics from A using stochastic models. Therefore ∈ | | product. Let θm , Pr(sm) be the joint probability of we define the probability of a raga¯ transcript as (m) (m) (m) Pr(A) = Pr([A ,A ,...,A ]). Assuming statistical the sequences and ζm , Pr(yN [y1 , . . . , yN 1]) be 1 2 I | − independence across metric cycles 7, we have, the conditional probability of the sequence. θm can be estimated from A normalized counts of sm in A. ζm I I can be found by normalizing across sequences having (i) (i) (i) (m) (m) Pr(A) = Pr(Ai) = Pr(u1 , u2 , . . . , u ) (1) Ti the same context [y1 , . . . , yN 1]. i=1 i=1 − Y Y We refer to collection of sm, θm, ζm , m N { } ∀ ∈ A sub-sequence a of a sequence A is denoted by a A. 1, 2,..., V as the N-gram dictionary, N gram. { | |} D − The data models for estimating RNP from A are pre- Now, S represents the RNP with their probability N V sented in the following section. of occurrence denoted by θN gram , θm m| =1| N − { } V and ζN gram , ζm m| =1| . To account for unseen 3. Discovering RNP from music notation − { } 3.1 N-gram model sequences, Good-Turing smoothing technique (Gale, A typical stochastic model used in music for modeling 1995) is used. This results in assigning non-zero fixed length subsequences is the N-gram and its var- (and miniscule) probabilities for unencountered sub- ied smoothing techniques (Downie, 1999; Scholz et al., sequences. 2009; Unal et al., 2012; Hillewaere et al., 2012; Gillick 3.2 M-skip-N-gram model et al., 2010; Senturk,¨ 2011). Consider the classical N- Analogous to dropping adjectives and descriptives in a gram model as applied to note transcripts of a raga¯ , A. sentence without changing the meaning of a sentence, A graphical model depicting the Markov dependency in music also one can skip over or add a few notes for tri-gram (N = 3) is shown in Figure 2. The N- without significant perceptual difference in its melodic gram model presumes the statistical dependency of a effect. For example, in a composition, Swaminatha of given note on the N 1 previous notes. This will lead raga¯ Na¯¯tai, the notes - Sa, SaRiGaMa can be impro- − to modeling of the dependency of note sequences up vised as SaNiSaRiGaMa. A statistical model to ac- to a fixed length of N. The probability of A as for this count for such skips in temporal data is the “M-skip-N- model (from (1)): gram model” (Goodman, 2001),(Guthrie et al., 2006). We propose an alternative formulation of M-skip- I Ti (i) (i) (i) (i) (i) N-gram model to that in (Guthrie et al., 2006), using Pr(A) = Pr(u1 , . . . , uN ) Pr(ut ut N+1 . . . ut 1) | − −the concept of partially ordered sets (posets) (Rudin, i=1 t=N+1 Y Y (2) 1991). Consider the binary relation, , on a set of (i) ≤ (i) temporal indices of Ai, t = 1, 2,...,Ti . t , is 6 Superscripts are within parenthesis to differentiate from expo- (i) { } (i)≤ nent operator. a poset. Let τk be any N-element subset of t , de- 7 (i) (i) (i) (i) (i) (i) The statistical independence across metric cycles results in fined as τ , t , t , . . . , t , with t t , j chunking synchronized to metric cycles. It offers computational ad- k { k1 k2 kN } kj ∈ ∀ ∈ vantages and incorporates the possible boundaries of musical pat- (i) 1, 2,...,N , such that τk , is also a poset (Rudin, terns to coincide with (either beginning or end of) the metric cycle. { } ≤ T However, not all phrase patterns need be bound to metric cycles. 1991). Hence, k 1, 2,..., i . Let τ (i) be a collec- ∈ { N }  5 Raga¯ Identification using Repetitive Note Patterns from prescriptive notations of Carnatic Music

tion of N-element posets such that: τ (i) = τ (i) t(i) t(i) (N + M 1) k kN − k1 ≤ − n   o Given A, we define collection of temporal in- dices, M,N , for M-skip-N-gram model as M,N , I T(i) T i=1 τ . Let U be the set of N-length RNP ob- tained∪ from collecting the notes in A at the temporal indices, . The cardinality of is dependent TM,N TM,N on Ti, I, M and N. (Guthrie et al., 2006) discusses in Figure 3: A graphical model of N-multigram model detail the coverage possible with M-skip-N-gram. N- for a raga¯ r. gram is a special case of M-skip-N-gram with M = 0. Let M-skip-N-gram dictionary, be a D(M,N) skip − N where B denotes the set of all possible boundaries of collection of sm, θm, ζm , m 1, 2,..., V . The { } ∀ ∈ { | |} A, and b = I b is one such partition. Such a parti- joint probability θm is estimated from counts of the se- i=1 i tion b, includes the metric cycle boundaries. We refer quence sm in U and conditional probabilities are ob- S tained by suitably normalizing probabilities. to this as N-multigram model, where N is indicative of (i) There still can exist unseen sequences which are maximum length of any xk . Now, the problem is to assigned a small non-zero probabilities through Good obtain the boundaries b, given A. This can be solved using explicit search of all possible partitions. The sen- Turing smoothing. Again, the dictionary (M,N) skip N D − tence boundary information is discarded in (Deligne contains V number of N-length RNP and their asso- ciated joint| and| conditional probabilities. and Bimbot, 1995). In contrast, in this work, by includ- ing the metric cycle boundaries within b, the boundary 3.3 N-multigram model search space is significantly reduced. As in (Deligne The use of N-gram or M-skip-N-gram results in fixed and Bimbot, 1995), we employ segmental-K-means to length RNP. Since we do not know apriori the length of ( ) obtain optimal partition b ∗ . RNP that can characterize a raga¯ , we would like to ex- In order to define objective function in segmental- plore the flexibility of discovering variable length RNP. K-means framework, we require the notion of an N- N-gram based approaches estimate the dependency of multigram dictionary for a raga¯ r, N multi which a note only based on notes preceding it and does not contains the set of all possible note sequencesD − with a consider notes succeeding it. However, we want to maximum length of N and their associated probabili- consider the entire notation into account to discover ties i.e., varied length RNP. We assume that varied length RNP are the building blocks of a notation. This leads to N Y segmentation approach for discovering variable length n N multi = (sm, θm): sm V ; θm , Pr(sm) D − ∈ RNP. A model with similar assumption as addressed in ( n=1 )m=1 language and text modeling in (Deligne and Bimbot, [ N n Y 1995; Jian-Tao Sun, 2006) is the multigram model. where Y = n=1 V . Defining, Θ , θm m=1, we Adopting a similar approach, we segment A to Ω num- can now rewrite (3)| which| is parameterized{ by} Θ as: ber of consecutive varied length segments, x Ω , P { j}j=1 Y I ω(bi) such that maximum length of any segment is N i.e., (i) δ(x ,sm) x N. The notations are naturally segmented at Pr(A B = b; Θ) = [θm] k (4) j | | | ≤ m=1 i=1 the level of avarthana¯ (similar to the sentence bound- Y Y kY=1 ary information in language models). Including this I (i) (i) where δ(x , sm) = 1 if x = sm and 0 otherwise. A information, we can now define Ω = i=1 ωi, where k k ω is the number of variable length segments of A . graphical model of the same is shown in Figure 3. i P i Let Bi denote the set of all possible partitions of With Pr(B) assumed to be uniform, the ob- A(i) resulting in sub-sequences with a maximum length jective function of maximizing complete likelihood, T (i) 1 T (i) 1 Pr(B, A;Θ) in segmental-K-means framework is equiv- N. Thus, there can be a total of − − k=κ k alent to: partition possibilities where κ = T (i)/N . A spe- P  cific partition b = b = 0, b , . . .b , b c= T (i) i 0 1 ω(bi) ∈ max log Pr(A B = b;Θ) B results in ω (b ) sub-sequences. Let kth sub- b,Θ | i i   (i) (i) sequence be xk = ubk 1+1, . . . , ubk . Thus, A = Y − subject to m=1 θm = 1, so that Θ is a valid dis- x(i), x(i),..., x(i) . Assuming that x(i) are inde- 1 2 ω(bi)  k tribution. The unknowns to be estimated are b and { } P hpendent, we have: i Θ. The optimal b and Θ can be estimated by alterna- tive maximization hill-climbing approach (Van Trees, I ω(bi) Pr(A B = b) = Pr(x(i)) (3) 2002). The optimal segmentation given the parame- | k i=1 ters is estimated, followed by optimal parameter esti- Y kY=1 6 Raga¯ Identification using Repetitive Note Patterns from prescriptive notations of Carnatic Music

mation for the given segmentation, iteratively as: 4.1 N-gram and M-skip-N-gram models (r) Let N gram be the dictionary of a raga¯ r obtained I D − (l+1) (l) using N-gram approach which contains N-length RNP b = arg max log Pr(Ai bi;Θ ) (5) (r) b B | and their associated joint (θN gram) and conditional i=1 i∈ i [ (r) − probabilities (ζN gram). The probability of Atest, given (r) − N gram, can be expanded using chain rule as in (2) D − Θ(l+1) = arg max log Pr(A b(l+1);Θ) for J metric cycles. While discovering RNP from train- Θ | (r) (r) n o ing data, the parameters θN gram and ζN gram are es- c (b(l+1)) − − = m (6) timated from the data, whereas for the RID task, these Ω(b(l+1)) (r) parameters are used from N gram for computing the objective function in (7).D For−M-skip-N-gram model, (l+1) (r) where l indicates iteration index, cm(b ) and Pr(Atest; ) can be estimated similarly. (l+1) (M,N) skip Ω(b ) indicates number of times sequence sm re- D − peats and total number of segments obtained, respec- 4.2 N-multigram model tively, given b(l+1). Θ obtained from (6) is a multino- Given Atest, we hypothesize that segmenting it using (r) mial distribution. RNP from N multi would be optimal in an ML sense, if raga¯ correspondingD − to A is indeed r. Therefore, Suppose sm = [a b] and its sub-sequence sk = [a] test we obtain optimally segment A , w.r.t each r as : have same occurrence counts. This implies θm and θk test will be same. The subsequent segmentation step will ( ,r) (r) b ∗ = arg max Pr(Atest b;ΘN multi) result in a boundary favoring sm over sk in order to b | − maximize (5). Thus, the formulation favors segmenta- J tion having longer recurrent sub-sequences. (test) (r) = arg max log Pr(Aj bj;ΘN multi) b B | − The algorithm iterates between (5) and (6). The j=1 j j [ ∈ stopping criteria is chosen as the rate of change in like- lihood between successive iterations below a thresh- While discovering RNP using N-multigram, we use al- ternate maximization method for optimal segmenta- old, . Since RNP not in A will necessarily be as- Y (r) tion and optimal parameter estimation; whereas, for signed zero probability and θm = 1 constraint (r) m=1 RID, optimal segmentation of Atest using Θ alone is (r) (r) is valid for every iteration, initial N multi can con- done. Pr(A b( ,r); ) can be calculated us- P D − test ∗ N multi tain all N-length sub-sequences of A. After con- | D − ( ,r) ≤ (r) ing (4) for J metric cycles. The superscript r in b ∗ vergence, RNP of N multi which are associated with indicates raga¯ index and not iteration index as in (5). D − zero probabilities are discarded. If Yeff is the total number of RNP associated with non-zero probabilities, 5. Experimental Setup (r) N n 5.1 Carnatic music notation database post-convergence, Yeff < n=1 V . No smoothing technique is utilized. The role of| dictionaries| in RID There are very few choices for large corpora of Carnatic task is explored in Section 4.P music notation that are publicly available. One such choice as noted in (Srinivasamurthy et al., 2014) is the 4. RNP based Raga¯ Identification “Carnatic Music Archive”, which is a steadily growing In this section, we propose a method to utilize the dis- database compiled by Dr. Shivkumar Kalyanaraman, covered RNP and their associated probabilities for RID. a practicing Carnatic musician (Shivkumar, 1998). It Let be a set of ragas¯ considered. For each raga¯ r , contains notations along with audio recordings. We R ∈ R we obtain a dictionary, (r) using the techniques de- choose 12 diverse ragas¯ as listed in Table (II) in this D tailed in Section 3 on training data of each raga¯ , where work. The chosen ragas¯ cover almost all note positions (r) (r) (r) (r) and are diverse in the use of gamakas during rendi- N gram, (M,N) skip, N multi . D ∈ D − D − D − tion. The chosen ragas¯ are also popular. The number of n (test) (test) (test)o Let Atest = A1 ,A2 ,...,AJ denote the compositions for each of the chosen ragas¯ varies from notation of an unknownh raga¯ , with J metrici cycles. We 4 to 10. The chosen compositions range from basic propose closed-set raga¯ identification, for a given Atest, raga¯ exercises such as geethe to significant ones such in ML framework using (r) as: as varna and krithi; each of these can comprise any- D where between 10 to 100 avarthanas¯ of skeletal notes.

(r) The dataset is pre-processed before training/testing as rˆ = arg max Pr(Atest ; ) (7) r D detailed below. ∈R 5.2 Pre-processing where rˆ is the detected raga¯ identity. We collectively In this work, all the (seven) notes are constrained to a refer to the three stochastic models based approaches single octave by mapping upper as well as lower octave for discovering RNP as “SMRNP”. The details of esti- to middle octave. To investigate the role of note se- mating Pr(A ; (r)) for each of the 3 techniques, test quences alone for raga¯ identification, we assume each are as below. D 7 Raga¯ Identification using Repetitive Note Patterns from prescriptive notations of Carnatic Music

(r) note to be of unit duration. Elongated notes are re- N multi. In testing stage, for any Atest sequence of D − peated by the number of durational units; for example, Ttest elements from V (finite and closed set), there ex- a notation such as “Ma;” is replaced with Ma Ma Ma ists atleast one b which partitions Atest into Ttest sub- (cf. Table 1). Some metrical cycles comprise of just sequences of length one; thus unseen data is handled ; - these imply an elongated note present in the pre- without smoothing. The performance is reported for N || || vious metrical cycle is continued; these are discarded, varied from 5 to 8. to incorporate metric cycle independence assumption 5.4.2 Overview of TDMS algorithm and audio database (cf. 1). The performance of melody-based TDMS for RID al- 5.3 Training and testing setup gorithm is evaluated alongside the notation-based RID We use 4-fold cross-validation setup; the set of com- algorithms, for the considered list of ragas¯ and its com- positions of a raga¯ are partitioned into 4 sub-samples. positions. In this a 2-dimensional feature is obtained The experiment is repeated 4-folds, with each sub- from melody contours as follows: sample as test set, while the remaining 3 form the Consider melodic sequence ct, where ct is tonic training set. The compositions corresponding to a • normalized melody contour (in cent scale) and training fold are concatenated resulting in A(r) .A t 1, 2,...,T . train ∈ { } training fold of each raga¯ comprises of 200 metric An octave wrapping quantizer with η number • cycles, and consisting of more than 2000∼ notes. Test of levels is constructed; resolution of each level set comprises of 30 to 100 metric cycles and 300 to is 1200/η. Mathematically, vt = B(ct) = ηct 1500 notes. The partitions are maintained across algo- 1200 mod η rithms to ensure similar training and testing data. The A surface S is created such that each element of •   performance of the algorithms is evaluated on each of the surface si,j is populated using the quantized T the 4-folds of the data and the average is reported. melody as, si,j = t=τ I(vt, i)I(vt τ , j) where 0 < i, j < η, I is an indicator operator.− 5.4 Algorithms considered for comparison of RID The surface post-processedP : compressed, Alongside the performance of SMRNP algorithms for • smoothed across bins and normalized. RID, we also investigate role of musicologically ac- The original work reports raga¯ recognition accu- cepted phrases as characterizing a raga¯ for RID. We re- racy of 86.7% using TDMS technique on a dataset fer to the characteristic phrases of each raga¯ as agreed of 480 full-length high quality recordings of Carnatic upon by three trained musicians as the “musicologi- music spread over 40 ragas¯ with pitch extracted us- cally valid reference phrases” or MVRP in short. We ing Essentia library (https://github.com/MTG/ perform RID on the equivalent melody dataset (which essentia). The parameters used in the original pa- contains the gamakas and fine duration information) per are: time delay of τ = 0.3s to generate the time- using the state of art TDMS (Time Delay Melodic Sur- delayed melody surface, compression parameter α = face) technique proposed in (Gulati et al., 2016). The 0.75, Gaussian smoothing parameter σ = 2, symmetric implementation details of the proposed SMRNP algo- KL Divergence as distance measure and k = 1 for 1-NN rithms, MVRP on notation database and TDMS are classifier. given in the following sections. In this work, we use an audio database obtained 5.4.1 Details of SMRNP algorithms from (Shivkumar, 1998) and corresponding to the no- (r) tations used for evaluating SMRNP algorithms - we Since each fold of Atrain results in a dictionary rep- resentation using SMRNP algorithms, each raga¯ has 4 refer to it as equivalent audio database. Pitch is ex- dictionary representations. tracted using Praat software (Boersma and Weenink, The performance is evaluated for N-gram model 2016; Meer and Rao, 2006). The tonic for each com- for varied N (up to N = 8); it is chosen to represent position is manually determined. All melody contours length of typical MVRP. Since there is no prior work on are tonic normalized to 146.83 Hz. The same 4-fold Carnatic music transcripts, we consider performance of training/testing partitions as used with the notation bi-gram, N = 2 as baseline performance. dataset is used. The parameters of TDMS technique For M-skip-N-gram model, the number of possi- in this work are: τ = 0.3 s, with symmetric KL Diver- ble N-length sub-sequences considered increases with gence as the distance measure, σ = 2, α = 0.8 and avarthana¯ length T and with increasing skips, M. 1-Nearest Neighbour classifier (k = 1). However, increasing M is seen to skew the distribution 5.4.3 MVRP for Carnatic music towards unlikely sequences, while decreasing impor- The set of MVRP of a raga¯ is consistent across musi- tance of adjacent notes. Hence, we report performance cians but not closed (Srikantan, 2004), (Vasanthamad- for M = 1, 2 for bigram and trigram. havi, 2005). For each of the 12 ragas¯ considered, For N-multigram model, the stopping criteria in the the MVRP as agreed upon three performing musi- training stage is chosen as rate of change in likelihood cians is tabulated and can be accessed at https:// to be less than 0.1%. A flat initialization of uniform sites.google.com/site/carnaticmvrp/ .The probability distribution is used as Θinit,r for obtaining musicians were asked to list what they consider as im- 8 Raga¯ Identification using Repetitive Note Patterns from prescriptive notations of Carnatic Music

portant swara sancharas (note transitions/movements) Models N P R for the ragas¯ listed. Thus, MVRP listed by the musicians MVRP - 0.39 0.35 is not alongside the knowledge of the dataset consid- τ = 0.3 s 0.85 0.79 ered, but independent of it. Hence, it is possible that τ = 0.01 s (bigram) 0.52 0.60 the listed phrases may not appear in the specific com- TDMS positions of the ragas¯ considered in the database. τ = 1 s 0.69 0.73 Let (r) represent the set of all MVRP for a raga¯ r. τ = 6 s 0.67 0.71 Nref For RID using the MVRP, we assign a raga¯ label, rˆ to 2 0.83 0.77 test data as: 3 0.90 0.85 (r) 4 0.68 0.73 rˆ = arg max C(Atest ref ) (8) r | N ∈R N-gram 5 0.67 0.75 6 0.65 0.75 where C(A (r) ) represents the count of occur- test| Nref 7 0.69 0.68 rences of elements of (r) in A . However, unlike Nref test 8 0.73 0.64 dictionary based approaches, A(r) is not required to train 2 (M = 1) 0.66 0.70 obtain set. Nref 2 (M = 2) 0.66 0.75 M-skip-N-gram 5.5 Performance metric 3 (M = 1) 0.845 0.875 5.5.1 RID as detection problem 3 (M = 2) 0.82 0.81 Viewing RID as a detection problem, precision and re- call measures are used to gauge the performance of the 5 0.92 0.83 algorithms. Let TP (r), FP (r) and FN (r) denotes the 6 0.93 0.85 N-multigram total number of true positive, false positive and false 7 0.77 0.79 negatives of each raga¯ r, obtained by testing all the 8 0.87 0.79 (r) TP (r) 4-folds; then, precision is P = TP (r)+FP (r) and re- (r) Table 3: Avg. precision and recall rates for RID using call is R(r) = TP . The former is a measure TP (r)+FN (r) different models of quality of correct detections and latter, a measure of quantity of correct detections. 5.5.2 Perplexity of the models for the considered ragas¯ . We analyze the performance Perplexity (or 2cross entropy) measures goodness of of each model for RID task. model’s distribution for given data (whose actual dis- tribution is unknown). Maximizing probability implies 6.1.1 Analysis of MVRP for RID minimizing perplexity. Lower perplexity implies a bet- Some reasons accounting low P and R for RID using ter model, and a perplexity of 1 on training data in- listed transcribed MVRP are: dicates over-fitting, as the model exactly represents The MVRP may / may not occur in the dataset • the seen data and may not generalize to unseen test considered due to the approach adopted in listing data. Drawing on the analogy between ragas¯ and lan- MVRP i.e., the transcripts are based on record- guages, we quantify the performance of SMRNP tech- ings and adheres to it, while MVRP are listed niques using perplexity, for both training and test data without any such restraint. If k of ref phrases N corpora. As true distribution of RNP (p(x)) is un- occur in the corpus (exact match atleast once), known, empirical distribution is used (Jurafsky and an objective assessment of agreement of these Martin, 2009). Perplexity of a model (r) w.r.t. data MVRP w.r.t. the considered dataset is reported D sequence A of T notes is: with average recall measure Rec = k/ ref (in 4-fold validation setting) for both training|N | and T p(x) log Pr(x; (r)) 1 (r) test corpus, for each raga¯ in first two columns 2 T log2 Pr(xt; ) (r) − x A D − t=1 D (A) = 2 ∈ 2 (9) of Table 4. Rec on training corpus is higher than P P ≈ P test corpus (due to training data containing more 6. Results and Analysis data). Some ragas¯ such as Shankarabharana 6.1 RID: Performance comparison show extremely low Rec in training as well as Table 3 summarizes the performance accuracy of RID test data set. averaged across ragas¯ and 4-fold test set, using MVRP, Within the listed MVRP, it is observed that some TDMS, N-gram, M-skip-N-gram and N-multigrams • of these elements are common across different with bigram used as a baseline performance. N- ragas.¯ The renditions are different due to specific multigram model shows better recognition perfor- intonation, gamakas and a sample such scenario mance than N-gram model or M-skip-N-gram model; is depicted in Figure 4 for 2 ragas¯ Shankarab- specifically, N-multigram model with N = 6 shows harana and Thodi. The MVRP listed may not higher precision and recall compared to other models be distinctive without the finer note position and 9 Raga¯ Identification using Repetitive Note Patterns from prescriptive notations of Carnatic Music

500 . . (a) Bigram (b) Trigram (c) 4-gram Bh Hk Md Na Pv Sb Td Bg Ka Kh Rg Sh Bh Hk Md Na Pv Sb Td Bg Ka Kh Rg Sh 450 P D N S R Bh Hk Md Na Pv Sb Td Bg Ka Kh Rg Sh Bh 0.75 0.25 Bh 1.00 Bh 1.00 400 Hk 0.50 0.50 Hk 0.25 0.250.50 Hk 0.25 0.250.50 350 Md 1.00 Md 1.00 Md 0.750.25 Na 1.00 Na 1.00 Na 1.00 300 Pv 1.00 Pv 0.75 0.25 Pv 0.50 0.50 250 Sb 0.750.25 Sb 1.00 Sb 1.00 Td 1.00 Td 1.00 Td 1.00

0.5 1 1.5 2 2.5 3 Bg 0.25 0.50 0.25 Bg 0.50 0.50 Bg 0.50 0.250.25 ! Time (s) Ka 0.50 0.25 0.25 Ka 1.00 Ka 1.00 ! Kh 1.00 Kh 1.00 Kh 1.00 500 . . Rg 0.250.75 Rg 1.00 Rg 1.00 450 S R Sh 0.25 0.75 Sh 0.25 0.75 Sh 0.25 0.75 Pitch (Hz) (Hz) Pitch P D N 400 (d) 5-multigram (e) 6-multigram (f) 7-multigram Bh Hk Md Na Pv Sb Td Bg Ka Kh Rg Sh Bh Hk Md Na Pv Sb Td Bg Ka Kh Rg Sh Bh Hk Md Na Pv Sb Td Bg Ka Kh Rg Sh 350 Bh 1.00 Bh 1.00 Bh 1.00 300 Hk 0.250.50 0.25 Hk 0.250.50 0.25 Hk 0.250.50 0.25 Md 1.00 Md 1.00 Md 1.00 250 Na 1.00 Na 1.00 Na 1.00 Pv 1.00 0 0.5 1 1.5 2 2.5 3 3.5 Pv 1.00 Pv 0.75 0.25 Sb 1.00 Sb 1.00 Sb 1.00 Td 1.00 Td 1.00 Td 1.00 Bg 0.75 0.25 Bg 0.75 0.25 Bg 0.75 0.25 Figure 4: [Color Online] Pitch contour (in Ka 0.50 0.25 0.25 Ka 0.50 0.50 Ka 0.50 0.25 0.25 ˙ ˙ Kh 1.00 Kh 1.00 Kh 1.00 blue) of phrase ‘P aDaNiSaRi’ rendered in Rg 1.00 Rg 1.00 Rg 1.00 Sh 1.00 raga¯ Shankarabharana and Thodi depicting the in- Sh 1.00 Sh 1.00 terplay of notes, intonation, gamakas and duration. The horizontal dashed lines indicate position of P a Figure 5: Confusion matrices for N-gram and N- and Sa˙ with tonic at 200 Hz. multigram for varied N. Raga¯ abbreviations cor- respond to raga¯ names of Table 2.

the associated gamakas; these lead to confusion amongst ragas¯ for the test data. trix changes markedly with varying N, resulting in var- The presence/absence of augmented/reduced ied precision rates. The same can be observed in Fig- • notes in either test corpus or MVRP poses a prob- ure 5(a,b,c). lem for exact pattern match. 6.1.4 RID using M-skip-N-gram models 6.1.2 RID using TDMS Skip-grams are N-grams formed from M + N length The RID performance on equivalent audio data using contexts. In case of M-skip-2-gram models, increasing TDMS technique is at 79%. The change in perfor- M results in poorer performance, implying skipping mance w.r.t. that reported in (Gulati et al., 2016) is more notes can alter statistics in an adverse way. 1- due to changes in evaluation set (quality of record- skip-trigram shows good P and R for RID; but for with ings in the database, pitch extraction tool and length M 2, the distribution has reshaped in a degenerative ≥ of recordings). TDMS algorithm essentially calculates way. Thus, on increasing analysis context (M +N), dis- bi-gram between the finely quantized melody contour tribution of N-length sub-sequences is altered, but may and its τ-delayed version. A post processed version of not necessarily result in good RID performance. We hy- this τ-delayed-bigram (cf. Section 7.2) is observed to pothesize that in attempting to accommodate for im- result in an average of 25% improvement in recall provisations, the statistics of adjacent notes is affected. ∼ rate compared to that in the absence of post-processing 6.1.5 RID using N-multigram models ( 52%) for τ = 0.3 s. The performance of the al- ∼ P and R of N-multigram models shows more promise gorithm w.r.t. delay parameter τ is tabulated. The than N-grams or Mskip-N-gram models. A gradual τ-delay-bigram model performs better than bi-gram performance degradation of R as N increases is ob- model since a note would either have to stabilize or served. Within N-multigram models, RID accuracy is transitioned to the subsequent one after some delay. least for raga¯ Begada and Kalyani. Also the confusion Similarly, increasing delay can affect delayed bi-gram matrices does not vary erratically with changing N, in- statistics and hence reduce performance. dicating that dictionaries across N are consistent rep- 6.1.3 RID using N-gram models resentations (cf. Fig 5(d,e,f)). The baseline performance of bigram is promising (and From confusion matrix of Figure 5(e), it can be ob- is close to that of TDMS at τ = 0.3 s). Performance served raga¯ Begada and Kalyani are found confused of trigram is better than that of bigram model. In- with raga¯ Bhairavi, and raga¯ is confused creasing N from trigram to 4-gram and further, shows with Bhairavi and Khamas. Thus, though Begada and a rapid degradation in performance. The average re- Bhairavi are rendered with different note positions (cf. call rate across ragas¯ decreases with increasing N, but Table 2), the dictionary representation obtained from precision does not show any consistent trend. For some 7-note data representation shows that the grammati- ragas¯ such as Begada and Panthuvarali, detection of the cal structure might be more similar than the rest (such said ragas¯ is not achieved for N > 4. This is due to un- as both ragas¯ having frequent usage of ‘SaGaRiGa’ seen test sequences reducing likelihood of data w.r.t. grammatical structure in forming a phrase). model. This also accounts for trigram performing bet- From RID performance of SMRNP models, we can ter than higher N-gram models. It can be safely said infer that RNP can characterize a raga¯ . Also, we infer that trigram shows more promise than higher N-gram that transcripts contain higher than 1st order Markov for RID task. We have also observed the confusion ma- dependencies. While trigram and skip-trigram show 10 Raga¯ Identification using Repetitive Note Patterns from prescriptive notations of Carnatic Music

Figure 6: (a) Similarity matrix in inverted gray scale depicting the KL Divergence between 2 dictionaries Figure 7: [Color online] Raga¯ -wise trends of test for P obtained from N-multigram (N=6). Each raga¯ has SMRNP algorithms. The corresponding RID accu- 4 possible dictionaries (due to 4 fold CV). racy is indicated by the colorbar. competitive performance, N-multigram being a union dow; N-gram considers consecutive N notes, while of N-grams gives a slight edge over the others and skip-gram allows for M possible skips for consid- ≤ is less≤ sensitive to choice of N. We look at perplexity ering N-length sequences and N-multigram considers measure in order to compare the models, and to see if all N sequences at initial iteration. We relate the ≤ it is indicative of RID performance. algorithms considered. The relation between N-gram and -skip- -grams has been explicitly discussed in 6.1.6 Similarity measure for raga¯ dictionaries M N Section 3.2. Relation between other algorithms is re- We analyze the distance between the N-multigram viewed here. models to understand the similarities between the dic- tionaries. We define the similarity between ith and jth 7.1 N-gram, N-multigram and Segment model dictionary as the symmetric KL Divergence between the Pr(A) in N-gram model is obtained by applying chain two i.e., rule (cf. (2)), where as in N-multigram model, it is obtained by a segmentation approach by formulation Y θ(i) Y θ(j) (cf. (3)). Consider (3) of N-multigram model; the (i, j) = θ(i)log m + θ(j)log m (10) S m (j) m (i) note sequences within a metric cycle are a concatena- m=1 θm m=1 θm X X tion of varied length RNP. The independence of RNP Figure 6 shows a similarity matrix obtained using sequences results in formulation as in (3). Clearly, it is symmetric KL divergence between all the folds of the a unigram of variable length RNP. If each RNP (xo) is a dictionaries of all ragas¯ . Each dictionary of a raga¯ is unit, a N-gram model applied to this sequence results compared with all other possible dictionaries. A block in: diagonal structure implies the dictionaries across the ω 4-fold training-sets are close to each other. Along the Pr(A) = Pr(x1, x2,..., xN 1) Pr(xo xo 1 ..., xo N+1) − | − − non-diagonal entries, we find that the dictionaries of o=N Begada has least distance with those of Bhairavi, where Y as the Begada is not necessarily the closest to Bhairavi. If N=2, then this results in first order Markov mod- This asymmetric nature of closeness can be observed eling of the RNP sequence. This is the basic segment by comparing any 2 rows of the KL divergence matrix. model, also referred to as Hidden Semi-Markov mod- els (HSMM), which are an extension to HMM (Hid- 6.2 Correlation of Perplexity with RID den Markov Models), but with latent process being Figure 7 depicts bar plot of raga¯ -wise perplexity, for a semi-Markov chain. One way of introducing semi- different algorithms and with color-bar representing Markov property is by explicitly modeling the duration the RID recall rates for each raga¯ . While raga¯ Be- of the states by a discrete distribution (Ostendorf et al., gada (Bg) and Kalyani (Ka) consistently show high per- 1996); it is referred to as explicit duration HMM. An- plexity trends, and their corresponding RID score is other approach to introduce semi-Markovian property low for all SMRNP algorithms; however, when test has been the variable duration HMM, whose gener- is less, it does not imply better RID, as can be seenP for alization is the segment model. Further, Hierarchical raga¯ Harikambhoji (Hk). This implies that a SMRNP HMMs have been proposed as a generalization of seg- model may model unseen test data of a raga¯ well, how- ment models (Murphy, 2002a). Many more variations ever, it may not yet best represent a raga¯ w.r.t. other of HSMMs have been proposed (Yu, 2010; and raga¯ models. Willsky, 2012). The basic segment model is the closest 7. Contrasting the RID models to multigram model. In segment models, each state of The SMRNP algorithms find repetitive temporal struc- latent Markov chain emits a sequence of observations tures in note sequences within a local temporal win- (while in HMM, each state emits a single observation 11 Raga¯ Identification using Repetitive Note Patterns from prescriptive notations of Carnatic Music

Raga¯ Rec onRec on Rec on N gram Rec on N multi D − D − MVRP is present in training set. Trivially, the sum of Atrain Atest N = 5N = 6N = 7N = 8 N = 5N = 6N = 7N = 8 the columns of Rec on N gram must be less than Bh 0.48 0.33 0.19 0.13 0 0 0.15 0.21 0.13 0.19 D − Hk 0.59 0.34 0.14 0.07 0.05 0.05 0.14 0 0.07 0.11 or equal to that of the training set; N gram contains D − Md 0.42 0.23 0.22 0.03 0.03 0 0.08 0.03 0.03 0.03 all N length RNP, each assigned a non-zero probabil- Na 0.43 0.27 0.18 0 0.07 0 0.18 0.05 0.07 0.07 ity either due to presence in training data or due to Pv 0.48 0.27 0.15 0.12 0.12 0 0.17 0.05 0.17 0.12 Sb 0.30 0.22 0.11 0 0.03 0 0.11 0.05 0.04 0.07 smoothing. Amongst, N-grams, 5-gram shows highest Td 0.32 0.2 0.15 0.12 0 0 0.11 0.07 0.03 0 non-zero recall rates, indicating many ragas¯ have ele- Bg 0.43 0.21 0.23 0.05 0 0 0.18 0.18 0.11 0 ments in MVRP of length 5. The average Rec across Ka 0.52 0.26 0.20 0.10 0.05 0 0.15 0.05 0.07 0.02 ragas¯ drops drastically for N = 6-gram and beyond. Kh 0.38 0.18 0.1 0.05 0 0 0.05 0.05 0 0.05 Average Rec using N multi degrades gradually with Rg 0.52 0.37 0.33 0.05 0 0 0.33 0.05 0.05 0.05 D − Sh 0.38 0.22 0.27 0 0 0 0.18 0 0 0 increasing N. For some ragas¯ , Rec is sometimes lesser Avg. 0.44 0.26 0.19 0.06 0.03 0.004 0.15 0.07 0.06 0.06 than that of N-gram as RNP of N multi must neces- sarily occur atleast once in the datasetD − (and there is no Table 4: The average recall rate of MVRP in train- smoothing). The segmentation step in N-multigram ing set, testing set and with N gram and D − models weigh higher length sequences (N = 6, 7, 8) N multigram. In the latter two cases, N is varied − over lower length ones (N <= 5). This also con- Dfrom 5 to 8. tributes to lower recall rates. However, in some ragas¯ , Rec on N multi is higher than N-gram as N multi D − D − vector). The difference between segment model and includes RNP of length <= N. The same also reflects as lack of monotonic behavior in Rec on N multi with multigram models is that HSMM models have an ad- D − ditional Markov probability at the latent variable level, increasing N. Due to validity of both possibilities, we while multigram assumes independence of latent vari- can infer that there is not much overlap between dis- ables. Hence, multigram is a simpler, special case of covered RNP and MVRP (the latter is more dependent segment model (Murphy, 2002b). on musicological knowledge than the former). 7.2 TDMS technique and bi-gram 7.4 Analysis of Dictionary To relate TDMS algorithm with the bigram, we inter- In N-multigram models, effective size of dictionaries pret the final 2 steps of TDMS algorithm for extract- can be different for various ragas¯ , and in each valida- ing features as follows: each element of the surface tion fold of a raga¯ ; whereas the dictionary size of N- indicates the count of number of joint occurences of gram and M-skip-N-gram (M plays a role in construct- sequences (quantized levels) (i, j) for all possible in- ing more N-grams) is same for all ragas¯ and is equal dices, but these quantized melodic points are not adja- to VN . For example, size of 5 gram dictionary will be cent but separated by a time delay τ - this can be seen 75 = 16807 and exponential− growth is expected with as τ delayed bigram (Huang et al., 1992) containing increasing N. The dictionaries of N-gram (and skip- joint probabilities. The post processing is equivalent to grams) are sparsely populated, but all unencountered a smoothing operation applied to N-grams to account entries are assigned non-zero (and miniscule) values for unseen sequences. Hence, we can refer to the algo- as a result of Good-Turing smoothing. In contrast, dic- rithm is τ-delayed-bi-gram. tionary size in N-multigram is obtained from optimal segmentation. A plot of average dictionary sizes (from 7.3 Correlation of dictionaries of SMRNP with the dictionaries of 4-fold training set) for each raga¯ is MVRP In order to measure correlation of dictionary RNP (ob- shown in Figure (8). As expected, dictionary size of N- tained from training data) with that of MVRP 8, we multigram for any raga¯ increases with increasing N. use standard precision, recall metrics (Meredith, 2016) Comparing with the N-gram dictionary size, the num- used in retrieval applications to quantify the extent the ber of entries in N-multigram does not exceed 5000 even for parameter. Thereupon, -multigram dictionaries represent a set of MVRP of each raga.¯ . N = 8 N model can be said to be a more compact representation If k of ref MVRP are detected in a raga¯ dictionary N (r) than N-gram (or skip-gram). of size model , then precision, P rc , k/ and |D | (r) |D | A sample clip of dictionaries, and recall, Rec = k/ . Owing to large size of dic- N gram ref for Begada raga¯ is shown inD Table− 5 for tionaries in either|NN-gram| (of order of > 10, 000) or N multi ND =− 6. Some of the successive entries of in N-multigram (of order of 3000 4000), the preci- N gram (Ex: Entries 4,5,7 or 14-16) are part of a longerD − se- sion measure is bound to be low and− hence we report quence. Many of the RNP have same count resulting in only recall for each raga¯ , averaged over the dictionar- them having same probability values. ies obtained in the 4-fold validation setting. The aver- age recall rate for all ragas¯ and for varied values of N N multi contains RNP of length ranging from 1 D − is summarized in Table 4. to N. The probability values in N multi is higher than that of for the sameD − RNP; for exam- Rec in N gram is non-zero if a listed N length N gram D − ple, the RNP DMaMaGaRiRiP− a is found by both the 8 cf. https://sites.google.com/site/carnaticmvrp/ models, but N gram assigns a probability of 0.004 D − 12 Raga¯ Identification using Repetitive Note Patterns from prescriptive notations of Carnatic Music

4000 N=5 Ni Ni Da Ni Sa Ri Pa Ma Ga Ma Ri Sa Ma Da Pa Ma Da Ni Sa Ni N=6 (a) Sa Sa Ri Ga Sa Da Ni Pa Ma Da Ri Ri Pa Ma Pa Ni Ga Ma Ri Ri 3000 N=7 N=8 Ga Ma Ma Da Ri Ri Pa Ma

2000 Ri Ga Ri Sa Sa Pa Ma Da Da Ni Sa Ri Ga Ri Sa Da Ni Sa Sa Sa Sa (b) Sa Sa Pa Pa Pa Pa Pa Pa Pa Ma Ma Ga Ma Ri Sa Sa Sa Sa Ri Ga 1000 Ma Pa Ma Ma Pa Da Pa Da

0 Bh Hk Md Na Pv Sb Td Bg Ka Kh Rg Sh Pa Ma Ga Ma Ma Ga Ma Ri Sa Ni Sa Ri Ga Da Pa Pa Pa Ma Da Da (c) Pa Ma Ma Ga Ri Sa Da Ni Sa Da Pa Ma Pa Da Ni Pa Ma Ga Ma Ri Figure 8: [Color Online] Average dictionary size using Ga Ma Pa Ma Ni Ni Ni N-multigram for various ragas¯ for N varying from Figure 9: [Color Online] Music notations generated 5 to 8. Contrastingly N-gram (M-skip-N-gram) for Sahana raga¯ using (a) with N = 6 (b) Dmulti dictionaries are of size 16807, 117649, 823543 and N gram with N = 3 (c) skip with (M,N) = 5764801 for N varying from 5 to 8, but sparsely pop- D(3, 1)− . MVRP that can be manuallyD identified from ulated. the generated sequences are highlighted in red.

Sl.no RNP of N gram Pr(s) RNP of N multi Pr(s) D − D − 1 Pa Pa Pa Pa Pa Pa 0.022 Pa Pa Pa Pa Pa Pa 0.031 will be common. For example, the number of common 2 Sa Sa Sa Sa Sa Sa 0.010 Sa Sa Sa Sa Sa Sa 0.020 RNP in all the 4-fold dictionaries of raga¯ Begada are 3 Pa Ma Pa Da Pa Sa 0.004 Ma Ma Ma 0.017 4 Ni Ni Ni Da Pa Pa 0.004 Ma Ma Ga Ri Ri Pa 0.017 much lesser (< 100) compared to the size of dictionary 5 Pa Pa Ma Pa Da Pa 0.004 Ma Pa Da Pa Sa Ni 0.017 depicted in Figure 8. 6 Ni Ni Ni Ni Da Pa 0.004 Sa Ni Da Pa Sa 0.011 A few MVRP present in (r) are identified (high- 7 Ni Ni Ni Ni Ni Da 0.004 Da Pa Ma Pa Pa 0.011 D 8 Ri Sa Sa Sa Sa Sa 0.004 Pa Pa Pa Pa Pa 0.011 lighted using blue font in Table 5. A detailed analysis 9 Ni Sa Ri Sa Ri Sa 0.004 Ma Ni Da Pa Pa Pa 0.011 of correlation between dictionaries and MVRP is pre- 10 Ni Sa Ni Sa Ri Sa 0.004 Ni 0.008 sented in Section 7.3. 11 Sa Ni Sa Ri Sa Ri 0.004 Da Pa 0.008 12 Ma Ma Ma Ga Ri Ri 0.004 Sa Ri Sa 0.008 7.5 Generating sequences 13 Ma Ma Ma Ma Ga Ri 0.004 Ni Sa Ni 0.008 14 Ma Ma Ma Ma Ma Ga 0.004 Ni Sa Ni Sa 0.008 As the SMRNP considered are generative models, we 15 Ma Ma Ga Ri Ri Pa 0.004 Ni Ni Da Pa 0.008 generate music notations. A sample sequence gener- 16 Da Pa Pa Pa Pa Pa 0.004 Pa Da Pa Ri Sa 0.008 ated by the models for Sahana raga¯ is shown in Figure 17 Sa Ri Sa Ri Sa Ni 0.004 Ni Da Pa Ni Ri Sa 0.008 18 Ma Pa Da Pa Sa Ni 0.004 Ni Ni Ni Ni Da Pa 0.008 9. 19 Pa Sa Sa Sa Sa Sa 0.003 Ri Sa Ri Sa Ni Da 0.008 8. Discussion and Conclusions 20 Pa Ri Sa Sa Sa Sa 0.003 Ni Ni Ni Ni Ni Da 0.008 Inspite of low match with MVRP, RID results demon- Table 5: [Color Online] Dictionaries of Begada strate that RNP obtained from using just 7 notes of raga¯ for N=6 sorted in decreasing order of proba- skeletal transcripts are a better feature representation bilities. Some MVRP are highlighted in blue color. than MVRP itself and can identify a raga¯ . The RID re- sults of SMRNP algorithms are an indicator that skele- tal note transcripts contain considerable amount of raga¯ information. Also, the sequencing of notes itself (entry 15), while the same RNP is assigned a higher can aid in raga¯ identification task. Higher order(> 1) value, 0.017, in N-multigram (entry 4). Thus, in- Markov dependencies are present in the data as can be seen from performance of trigram, skip-gram and spite of presence of some common RNP in N multi D − N-multigram techniques. We hypothesize that inclu- and N gram, probabilities of RNP are different. In D − N-gram models, the count is normalized by the total sion of finer note interval (such as Ri1; Ri2; Ri3) will number of valid N length sequences, whereas in N- automatically boost RID accuracy. Such grouping of ( ,r) finer note sequences complemented with knowledge of multigram, the count is normalized by Ω(b ∗ ). The (r) gamakas can result in better raga¯ models. This would final number of segments determined by the N multi D − be akin to the effect of adding grammar models to - Ω( ( ,r)), is unknown initially and corpus dependent, b ∗ acoustic models in speech recognition. (r) N while YN gram = V , r is corpus independent. This The melody contour based TDMS algorithm essen- accounts− for differences| | ∀ in the estimated distributions. (r) tially utilizes very fine note positions, while smooth- On analyzing N multi, w.r.t. the training data, some D − ing out effects of durations in obtaining transition ma- RNP are observed to be truncated due to maximum trix from the melody signal (Gulati et al., 2016). It RNP length limited by N. Increasing N will increase does not utilize sequential information. Melodic con- computational complexity on one end and also leads tours are a florid representation of prescriptive nota- to forcefully looking for longer sequences, resulting in tion replete with note intonations, durational informa- over-training. tions and gamakas. Yet, this large amount of infor- The 4-fold training results in 4 (different) dictio- mation can pose difficulties in computationally repre- naries, for a given raga.¯ Owing to data dependency, senting/identifying a raga¯ . This is reflected in the re- (r) only a few RNP of N multi across the 4 dictionaries call rates using TDMS technique. There is potential D − 13 Raga¯ Identification using Repetitive Note Patterns from prescriptive notations of Carnatic Music

promise in using N-grams/N-multigrams with N > 2 sic form in this work; Carnatic music contains more or- in TDMS algorithm. This is not trivial as it will involve namentations than Hindustani music form. As we have more than 2-dimensional surfaces. While we have ex- been able to capture raga¯ structure with use of only 7 plored the advantages of utilizing (prescriptive) mu- notes of the prescriptive notations, we hypothesize the sic notations for characterizing a raga¯ , the problem of analysis will hold for other music forms such as Hin- mapping melodic contours to symbolic notation and dustani and Turkish classical music where phrases form vice versa is still in its nascent stage 9. It essential for a a major basis for melody description. While N-gram detailed understanding and representation of the mu- models are employed in analysis of folk music, West- sic form. ern music and segmentation algorithms for structural In this work, the notes across metric cycles are segmentations (Sargent et al., 2017; Levy and Sandler, assumed to be independent. While MVRP can ex- 2006), we believe the usage of segment model in mod- ist across metric cycles, the assumption permits us to eling grammar and in evaluating melodic similarity in leverage the computational advantage in the segmen- other forms of music such as Jazz will be analogous. tation stage of the multigram model. The results (with Summarizing, the major contributions of this work the independence assumption) show the importance are: of considering long variable length sub-sequences for 1. Starting from the octave-folded notations of characterizing a raga¯ . ragas¯ in Carnatic music, using only 7 basic notes The effects of fine durational aspects of notation and no gamaka information, we show that it is (such as reduced and augmented note durations) are possible to identify ragas¯ based only on the statis- not considered in this work. Also, many of the compo- tics of sub-sequences. This demonstrates the va- sitions contain morphed MVRP with some notes aug- lidity of RNP as a representatives of ragas¯ . mented or reduced in duration (the notes that allow 2. For the set of raga¯ renditions containing both duration reduction and/or augmentation are based on audio recordings and notations, we show that grammar of a raga¯ ), so as to fit well in the metric cycle. raga¯ identification using the notations can out- Future work must incorporate these to result in a better perform the state of the art audio-only based dictionary representation and its correlate with MVRP. technique for raga¯ identification. This signifies Segmenting such patterns and incorporating beat in- result is that we can see that much of raga¯ struc- formation into its unique representation will result in tures are present as discrete elements while ex- sophisticated models for the music form. We have pressions are present as continuous variabilities. examined and evaluated the importance of temporal 3. Experimental evidence to show that the proposed structure in Carnatic music notations although it is con- approach (based on repetitive patterns) outper- sidered only as a skeleton of a rendition. The study forms identifying ragas¯ that based on musico- shows that it is possible to automatically identify a logically valid reference phrases, as agreed upon raga¯ based on recurrent note patterns. Although, few by three practicing musicians. This shows that MVRP are found in the stochastic dictionaries, these repetitive patterns are more powerful than char- alone do not suffice to characterize raga¯ for RID task. acteristic patterns for computational analysis. Experiments show that N-multigram model provides advantages over N-gram models as well as skip grams A. Ratio of notes within an octave The tonic frequency in Indian art music is the refer- in capturing RNP of varied lengths; in terms of perplex- ence base frequency. The rest of the notes are fixed ity, they are a good fit to test data and result in better with respect to this anchor. The ratios between the performance on RID task. Also, we have shown that notes within an octave are maintained irrespective it is not necessarily beneficial to increase the length of the starting tonic pitch frequency (Krishnaswamy, of RNP (N). Owing to the performance, N-multigram 2003). Table (6) tabulates the ratios and frequen- dictionary can be viewed as an viable symbolic feature cies of notes w.r.t. a fixed tonic f = 146.83 Hz representation of a raga¯ . 0 (corresponding to note in Western music, D ). A Grammatical structures are present internally in the 3 melodic notation/transcription cannot have the fol- prescriptive notation and seem to play a stronger role lowing pairs as they map to the same frequency: than the fine pitch values themselves as can be con- (Ri , Ga ), (Ri , Ga ), (Da , Ni ) or (Da , Ni ). trasted in the performance of N-multigram with that of 2 1 3 2 2 1 3 2 TDMS. The stochastic language models explored in this Acknowledgments work are generic but analyzed specific to Carnatic mu- We thank musicians Vid. Deepak Paramashivan, Vid. Raman Shankaran (violinist) and Vid. Deepika Janaki- 9 The mapping of pitch contours to its swara notation can be (i) raman (vocalist) for helping create MVRP list. We also manual - involves a trained musician transcribing the audio clip to thank authors of (Gulati et al., 2016) for providing the its corresponding notes. (ii) automated - note sequences can be ob- tained by the process of quantizing pitch contours of the lead melody original implementation of their algorithm. extracted from the audio signal of a Carnatic music rendition (Ran- References jani et al., 2011, 2017) (Koduri et al., 2014). The challenges in automatic transcription is detailed in (Krishnaswamy, 2003; Koduri, Biemann, C. and Mehler, A. (2014). Text Mining: From 2016). Ontology Learning to Automated Text Processing Ap- 14 Raga¯ Identification using Repetitive Note Patterns from prescriptive notations of Carnatic Music

Notes f0 in Hz Notes f0 in Hz Signal-Image Technology and Internet-Based Sys- (ratio with tonic) (ratio with tonic) 1 Sa 146.83 (1) 8. P a 220.24 (1.5) tems (SITIS), pages 264–271. IEEE. 2. Ri1 156.66 (1.067) 9. Da1 234.92 (1.6) Gulati, S., Serra, J., and Serra, X. (2015). An Eval- 3. Ri or Ga 165.18(1.125) 10. Da or Ni 244.76 (1.667) 2 1 2 1 uation of Methodologies for Melodic Similarity 4. Ri3 or Ga2 176.19 (1.20) 11. Da3 or Ni2 264.29 (1.8) 5. Ga3 183.53 (1.25) 12. Ni3 275.30 (1.875) in Audio Recordings of indian Art Music. In ˙ 6. Ma1 195.28 (1.33) 13. S 293.66 (2) Proc. IEEE Intl. Conf. Acoust. Speech Signal Process., 7. Ma 207.9 (1.416) 2 pages 678–682. IEEE. Table 6: Relative pitch positions, frequencies and ratio Guthrie, D., Allison, B., Liu, W., Guthrie, L., and Wilks, of notes within an octave. Y. (2006). A closer look at skip-gram modelling. In Proc. Int. Conf. Language Resources and Evaluation (LREC), pages 1–4. plications. Theory and Applications of Natural Hillewaere, R., Manderick, B., and Conklin, D. (2012). Language Processing. Springer International Pub- String Methods for Folk Tune Genre Classification. lishing. In Proc. Int. Soc. on Music Info. Retr. Conf (ISMIR), Boersma, P. and Weenink, D. (2016). Praat: doing pho- pages 217–222. netics by computer (version 6.0.14). Huang, X., Alleva, F., wuen Hon, H., yuh Hwang, Cambouropoulos, E. (2006). Musical parallelism and M., and Rosenfeld, R. (1992). The SPHINX-II melodic segmentation. Music Perception: An Inter- Speech Recognition System: An Overview. Com- disciplinary Journal, 23(3):249–268. put., Speech and Language, 7:137–148. Deligne, S. and Bimbot, F. (1995). Language Modeling Janata, P. and Grafton, S. T. (2003). Swinging in the by Variable Length Sequences: Theoretical formu- Brain: Shared Neural Substrates for Behaviors re- lation and Evaluation of Multigrams. In Proc. IEEE lated to Sequencing and Music. Nature neurosci. Intl. Conf. Acoust. Speech Signal Process., volume 1, J., 6(7):682–687. pages 169–172. IEEE. Janssen, B., De Haas, W. B., Volk, A., and Van Kranen- Doraiswamy, G. (2014). Sound of Indian Music. Berklee burg, P. (2013). Finding Repeated Patterns in Mu- College of Music. sic: State of knowledge, challenges, perspectives. Downie, J. S. (1999). Evaluating a simple approach to In Proc. Int. Symp. on Comput. Music Modeling and music information retrieval: Conceiving melodic N- Retrieval (CMMR), pages 277–297. Springer. grams as text. PhD thesis, The University of West- Jian-Tao Sun, Z. C. (2006). Text Classification im- ern Ontario. proved through Multigram Models. In ACM. Dutta, S. and Murthy, H. A. (2014). Discovering Typ- Proc. Int. Conf. on Inform. and Knowledge Manage ical Motifs of a Raga from One-Liners of Songs in (CIKM), page 672681, New York, NY, USA. ACM. Carnatic Music. In Proc. Int. Soc. on Music Info. Johnson, M. J. and Willsky, A. (2012). The Hierarchi- Retr. Conf (ISMIR), pages 397–402. cal Dirichlet Process Hidden Semi-Markov Model. Dutta, S., Sekhar, P. K., and Murthy, H. A. (2015). Raga arXiv preprint arXiv:1203.3485. Verification in Carnatic Music using Longest Com- Jurafsky, D. and Martin, J. H. (2009). Speech and Lan- mon Segment Set. In Proc. Int. Soc. on Music Info. guage Processing. Prentice-Hall, Inc., 2 edition. Retr. Conf (ISMIR), pages 605–611. Karydis, I., Nanopoulos, A., and Manolopoulos, Y. Gale, W. A. (1995). Good-turing smoothing without (2007). Finding maximum-length repeating pat- tears. Journal of Quantitative Linguistics, 2. terns in music databases. Multimedia Tools and Gillick, J., Tang, K., and Keller, R. M. (2010). Ma- Applications, 32(1):49–71. chine Learning of Jazz Grammars. Comput. Music Koduri, G. K. (2016). Towards a multimodal knowledge J., 34(3):56–66. base for Indian art music: A case study with melodic Gobet, F., Lane, P. C., Croker, S., Cheng, P. C., Jones, intonation. PhD thesis, Universitat Pompeu Fabra, G., Oliver, I., and Pine, J. M. (2001). Chunking Barcelona. Mechanisms in Human Learning. Trends in Cogni- Koduri, G. K., Ishwar, V., Serra,` J., and Serra, X. tive Sci., 5(6):236–243. (2014). Intonation analysis of ragas¯ in carnatic Goodman, J. (2001). A bit of progress in language music. Journal of New Music Research, 43(1):72– modeling. CoRR, cs.CL/0108005. 93. Gulati, S., Serra,` J., Ganguli, K. K., S¸enturk,¨ S., and Krishna, T. M. and Ishwar, V. (2012). Carnatic Music: Serra, X. (2016). Time-Delayed Melody Surfaces , Gamaka, Motif and Raga Identity. In Proc. for raga¯ Recognition. In Proc. Int. Soc. on Music of 2nd CompMusic Workshop, Istanbul, pages 12– Info. Retr. Conf (ISMIR), pages 751–757, New York 18. (USA). Krishnaswamy, A. (2003). Application of Pitch Track- Gulati, S., Serra, J., Ishwar, V., and Serra, X. (2014). ing to South Indian Classical Music. In Proc. IEEE Mining Melodic Patterns in Large Audio Collec- Intl. Conf. Acoust. Speech Signal Process., volume 5, tions of Indian Art Music. In IEEE Int. Conf. pages V–557. IEEE. 15 Raga¯ Identification using Repetitive Note Patterns from prescriptive notations of Carnatic Music

Levy, M. and Sandler, M. (2006). New methods in Series in Pure and Applied Mathematics. McGraw- structural segmentation of musical audio. In Sig- Hill Inc., New York, second edition. nal Proc. Conf., 14th European, pages 1–5. IEEE. Sambamoorthy, P. (1998). South Indian Music. Number Meek, C. and Birmingham, W. P. (2001). Thematic bk. 4. Indian Music Publishing House. extractor. Sargent, G., Bimbot, F., and Vincent, E. (2017). Es- Meer, W. v. d. and Rao, S. (2006). What you hear isn’t timating the structural segmentation of popular what you see: The representation and cognition of music pieces under regularity constraints. IEEE fast movements in Hindustani music. In Frontiers Trans. Audio Speech Lang. Process., 25(2):344– in Research in Speech and Music (FRSM), pages 1– 358. 8, Lucknow, India. Scholz, R., Vincent, E., and Bimbot, F. (2009). Robust Meredith, D., editor (2016). Computational Music Modeling of Musical Chord Sequences using prob- Analysis. Springer. abilistic N-grams. In Proc. IEEE Intl. Conf. Acoust. Meredith, D., Lemstrom,¨ K., and Wiggins, G. A. (2002). Speech Signal Process., pages 53–56. IEEE. Algorithms for discovering repeated patterns in Senturk,¨ S. (2011). Computational Modeling of Im- multidimensional representations of polyphonic provisation in Turkish Folk Music using Variable- music. Journal of New Music Research, 31(4):321– Length Markov Models. Master’s thesis, Georgia 345. Institute of Technology. Shivkumar, K. (1998). Carnatic Music Krithi Au- Murphy, K. P. (2002a). Dynamic Bayesian Networks: dio Archive. [Online]. Available: http://www. Representation, Inference and Learning. PhD thesis, shivkumar.org/music/. University of California, Berkeley. Srikantan, R. K. (2004). Karnataka Sangeetha, Senior Murphy, K. P. (2002b). Hidden Semi-Markov Models Grade. Karnataka Secondary Education Examina- (HSMMs). tion Board, Govt. of Karnataka. Nieto, O. and Farbood, M. M. (2012). Perceptual eval- Srinivasamurthy, A., Koduri, G. K., Gulati, S., Ishwar, uation of automatically extracted musical motives. V., and Serra, X. (2014). Corpora for Music In- In Proceedings of the 12th International Conference formation Research in Indian Art Music. In smc, on Music Perception and Cognition, pages 723–727. pages 1029–1036, Athens, Greece. Ostendorf, M., Digalakis, V. V., and Kimball, O. A. Sriram, V. (2004). Carnatic Summer: Lives of Twenty (1996). From HMM’s to segment models: A uni- Great Exponents. EastWest Books (Madras). fied view of stochastic modeling for speech recog- Subramanian, M. (2012). Generating Computer Music nition. IEEE Trans. Audio Speech Lang. Process., from Skeletal Notations for Carnatic Music Com- 4(5):360–378. positions. In Proc. of 2nd CompMusic Workshop, Raman,¯ Y. (2008). Essentials of Musicology in South Istanbul, pages 107–112. Indian Music. Indian Music Publishing House. Subramanian, S. K. (2013). Modeling Gamakas¯ of Car- Ranjani, H. G., Arthi, S., and Sreenivas, T. V. (2011). natic Music as a Synthesizer for Sparse Prescrip- Carnatic music analysis: Shadja, Swara identi- tive Notation. Master’s thesis, National University fication and Raga verification in using of Singapore. stochastic models. In Proc. Workshop on Applicat. Subramanian, S. K., Wyse, L., and McGee, K. (2012). of Signal Process. to Audio and Acoust., pages 29– A two-component representation for modeling 32. gamakas of carnatic music. In Proc. of 2nd Comp- Ranjani, H. G., Paramashivan, D., and Sreenivas, T. V. Music Workshop. (2017). Quantized melodic contours in indian Unal, E., Bozkurt, B., and Karaosmanoglu, M. K. art music perception: Application to transcrip- (2012). N-gram based Statistical Makam Detec- tion. In Proceedings of the 18th International Soci- tion on Makam Music in Turkey Using Symbolic ety for Music Information Retrieval Conference, IS- Data. In Proc. Int. Soc. on Music Info. Retr. Conf MIR 2017, Suzhou, China, October 23-27, 2017, (ISMIR), pages 43–48. pages 174–180. Van Trees, H. L. (2002). Detection, estimation, and Rohrmeier, M. and Rebuschat, P. (2012). Implicit modulation theory. Part IV. , Optimum array pro- Learning and Acquisition of Music. Topics in Cog- cessing. Wiley-Interscience, New York. nitive Sci. J., 4(4):525–553. Vasanthamadhavi, V. (2005). Theory of Music. Prism Ross, J. C., Mishra, A., Ganguli, K. K., Bhattacharyya, Books. P., and Rao, P. (2017). Identifying raga similarity Vijayakrishnan, K. (2007). The Grammar of Carnatic through embeddings learned from compositions’ Music, volume 8. Walter de Gruyter. notation. In Proceedings of the 18th International Weidman, A. (2006). Singing the Classical, Voicing the Society for Music Information Retrieval Conference, Modern: The Postcolonial Politics of Music in South ISMIR 2017, Suzhou, China, October 23-27, 2017, India. Duke University Press. pages 515–522. Yu, S.-Z. (2010). Hidden Semi-Markov Models. Artifi- Rudin, W. (1991). Functional analysis. International cial Intell., 174(2):215–243.