Comparative Pattern Analysis of Cretan Folk Songs ∗
Total Page:16
File Type:pdf, Size:1020Kb
Comparative Pattern Analysis of Cretan Folk Songs ∗ Darrell Conklin Department of Computer Science and Artificial Intelligence Universidad del Pa´ısVasco San Sebasti´an,Spain IKERBASQUE: Basque Foundation for Science Bilbao, Spain [email protected] Christina Anagnostopoulou Department of Music Studies University of Athens Athens, Greece [email protected] Abstract This paper reports on data mining of Cretan folk songs for distinctive patterns. A pattern is distinctive if it occurs with higher probability in a corpus as compared to an anticorpus. A small set of Cretan folk songs was collected, organized using a small knowledge base of classes, and mined using distinctive pattern discovery methods. Several highly distinctive and confident patterns emerge. keywords: ethnomusicology, data mining, folk song analysis, pattern discovery, classifi- cation 1 Introduction In recent years there is a renewed interest in folk song analysis partly driven by increasing interests in cultural heritage, and also by advances in music informatics methods. The ∗This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. Please cite the definitive version which was published at MML' 10, 3rd International Workshop on Machine Learning and Music, October 25, 2010, Florence, Italy, pages 33{36. 1 ability to make predictions, from music content, of song properties such as region, dance type, tune family, instrumentation, modality, and social function is an important part of the management of large corpora. Music data mining methods play a key role in building predictive models for folk song classification. The folk music of Crete, like all traditional Greek music, can be divided primarily into dance and non-dance (which are further subdivided into \Tavla" or \songs of the table" and \road" songs). There are various types of songs belonging to each of these categories. For dance songs, the most common types found in Crete are Kontylies and Pentozalis, Malevisiotis, Syrtos and Sousta. The songs that are not danced are Rizitika, Tambahaniotika, and other table songs, and then various function type songs, such as lament songs, wedding songs, and lullabies. Although most of these types of songs are found throughout the island, each area has its own characteristic musical style. The music of the West of Crete is quite distinct from the music of the East, although there are also many common songs known to the whole island. Therefore, in stylistic analysis of this music, it is necessary to look not only for patterns that distinguish between song types, but also how the music of each region differs. This paper reports on some initial results obtained with a small collection of folk tunes from various villages across Crete, exploring the hypothesis that there may exist patterns of correlation between origin, function and type of song on the one hand, and melodic patterns on the other. As an analysis method we have chosen the MGDP method [4, 5] which discovers patterns that are distinctive and maximally general in a corpus. This paper will discuss the encoding of a collection of pieces into a knowledge base for analysis, describe the application of the MGDP method to the corpus, report on some interesting patterns found, and finally present some ideas for future work. 2 2 Method This section describes the data preparation and data mining methods employed. The data mining method finds patterns that are over-represented in a positive set (called the corpus) with respect to a background set of pieces (called the anticorpus). 2.1 Corpus Three main sources were used for the Cretan folk songs: the first one was the collection of transcriptions done by Samuel Baud-Bovy in 1953-54 [3], published by the Laographic Music Archives of Melpo Merlie in Athens. It contains old and unknown songs which were recorded and transcribed in various villages around the island. The second source was by Peristeris [8], as found in the Amargiannakis archives at the Academy of Athens and the University of Athens. The third one was a collection of well-known songs, transcribed for lyra playing [2]. Finally, a few pieces were added from the transcriptions done by Theodossopoulou [9]. Songs from printed scores were encoded using the Sibelius and Finale software, and exported to symbolic midi format for computational analysis. Following encoding of the pieces, a knowledge base was developed to describe the classes of all pieces. Four broad categories were chosen: song type, song hypertype, area, and hyperarea (see Figure 1). Encoded songs were classified into the song types classification specified by Baud-Bovy [3]. Hyperarea classification was according to a map of Crete, and hypertype according to the classification of Greek folk music specified by Amargiannakis [1] and Baud-Bovy [3] and inspection of song lyrics. While preparing the corpus and classifying the songs, several interesting points came up for discussion, and decisions had to be taken: • as mentioned above, there are song types found throughout the island, such as the new year song (kalanda). These songs were therefore placed in all classes of area (and 3 hyperarea); • there are also songs which might have been recorded at a specific village (e.g., a small village of Chania), but these songs are sung all through the Western part of the island: These songs were thus added to both rethymno and chania areas; • for a few songs, although the song type was clear, there was no indication in the sources regarding its area. Therefore, in our classification, these pieces are not considered part of any class for either area or hyperarea mining; • finally, some songs were noted as simply being Tavla songs, without any further song type classification, and from the music it was not possible to deduce a more specific description. In our opinion this meant that they should be treated as not known for the song type category, though they still participated in the song hypertype class tavla. A total of 106 songs was in our final collection. Figure 1 depicts the categories along with their classes and the number of pieces in each class. 2.2 MGDP discovery A pattern is a sequence of event features. A rich set of features, including rhythmic ones, can be ascribed to events [4], but for this study only melodic intervals are used due to the small corpus size. A piece x instantiates a pattern P , written P (x), if the pattern occurs (possibly multiple times) in the piece: if the components of the pattern are instantiated by successive events in the piece. A pattern P subsumes (is more general than) a pattern Q if all instances of Q are also instances of P : if 8xQ(x) ) P (x) is valid (true in all possible corpora). For example, the pat- tern [+3] subsumes the pattern [+3; +1], which in turn subsumes, for example, [+2; +3; +1] 4 hyperarea hypertype west (64) east (47) tavla (61) dance (51) song type sousta (5) wedding (14) city song (7) pentozalis (11) kontylies (17) lament (8) nanourisma (12) epic song (1) malevisiotis (1) rizitiko (8) syrtos (17) kalanda (1) area chania (30) rethymno (29) herakleion (12) lassithi (35) syria (4) Figure 1: The categories and classes used in this study, along with the number of pieces in each class. 5 and [+3; +1; −4]. To rank discovered patterns it is possible to partition a collection of pieces into an explicit anticorpus (denoted ), contrasting it with the analysis corpus (denoted ⊕). A distinctive pattern is one that is frequent and sufficiently over-represented in the corpus as compared to the anticorpus. An intuitive way to measure this is according to the relative empirical probability of a pattern in the corpus and anticorpus: p(P |⊕) ∆(P ) def= (1) p(P | ) where p(P |⊕) def= c⊕(P )=n⊕ p(P | ) def= c (P )=n where c⊕(P )(c (P )) is the total count of a pattern P in the corpus (anticorpus), and n⊕ (n ) is the number of pieces in the corpus (anticorpus). A pattern P is called a maximally general distinctive pattern (MGDP) if it is distinc- tive (with at least a specified minimum ∆(P )), and if there does not exist a more general (subsuming) pattern that is also distinctive. Thus the set of all MGDP is the top border of the virtual pattern subsumption taxonomy such that all patterns above the border are not distinctive. Patterns are ranked primarily by distinctiveness (Equation 1). Additionally, for every distinctive pattern the confidence of the association rule P ) ⊕ is computed. This is the probability of the class given the pattern: c⊕(P ) p(⊕|P ) def= ; (2) c(P ) 6 where c(P ) is the piece count of pattern P in the entire piece collection, which may differ from the quantity c⊕(P ) + c (P ) due to overlap between classes. 3 Results For each category hyperarea, area, song hypertype, and song type the MGDP set is found by in turn considering each class as the corpus ⊕ and all remainding classes as the anticorpus . For example, for the area category, considering the class chania, the classes rethymno to syria (see Figure 1) are taken together as the anticorpus. For all experiments, melodic intervals patterns with ∆(P ) ≥ 3 were sought. The mini- mum p(P |⊕) used was 20% of the corpus. Also, very sparse classes syria, kalanda, male- visiotis, and epic song were not considered as positive corpora as it is possible for even the low 20% threshold to be satisfied by just a single piece. Table 1 shows a subset of patterns found, presenting 3 patterns for each category that are highly distinctive with ∆(P ) ≥ 5, with piece count c⊕(P ) ≥ 5. Several patterns also have high confidence (Equation 2).