Raga Identification by Using Swara Intonation

Journal of ITC Sangeet Research Academy, vol. 23, December, 2009 Raga Identification by using Swara Intonation Shreyas Belle, Rushikesh Joshi and Preeti Rao As a result, automatic raga identification can Abstract—In this paper we investigate provide a basis for searching for similar songs and information pertaining to the intonation of swaras generating automated play-lists that are suited for (scale-degrees) in Hindustani Classical Music for a certain aesthetic theme. It can also be used by automatically identifying ragas. We briefly explain novice musicians who find it difficult to why raga identification is an interesting problem and the various attributes that characterize a raga. distinguish ragas which are very similar to each We look at two approaches by other authors that other. It might also evolve into a system which exploit some of these characteristics. Then we checks how accurately a person is performing a review musicological studies that mention certain raga. The distinguishing characteristics of intonation variability of swaras across ragas, ragas are typically the scale (set of notes/swaras) providing us a basis for using swara intonation that is used, the order and hierarchy of its swaras, information in raga recognition. We describe an their manner or intonation and ornamentation, experiment that compares the intonation characteristics for distinct ragas with the same set their relative strength, duration and frequency of of swaras. Features derived from swara intonation occurrence. The present work addresses the are used in a statistical classification framework to problem of raga identification from an audio classify audio segments corresponding to different recording of a Hindustani classical vocal ragas with the same swaras. performance. In particular, we extract information about how the swaras of the performance are Index Terms— Hindustani Music, Raga intoned to achieve this. Identification, Swara, Intonation. II. PREVIOUS WORK I. INTRODUCTION Previously reported work on raga detection has agas form a very important concept in been limited to using information regarding the Hindustani classical music and capture the R probability distribution of the swaras and, to some mood and emotion of performances. A raga is a extent, their temporal sequences. Pitch-class tonal framework for composition and Distributions (PCDs) and Pitch-class Dyad improvisation. It embodies a unique musical idea. Distributions (PCDDs) which represent the probabilities of the dyads, have been used as Manuscript received January 22, 2010. Shreyas Belle, was with the Department of Computer features for raga recognition [1]. The database Science and Engineering, Indian Institute of Technology consisted of 20 hours of unaccompanied ragas Bombay, Mumbai, 400076, India (phone: +91-99308- along with some commercial recordings which 45442; e-mail: [email protected]). were split into 30 s and 60 s segments. There were Preeti Rao, is with the Department of Electrical Engineering, Indian Institute of Technology Bombay, a total of 31 distinct ragas. With an SVM Mumbai, 400076, India (phone: +91-22-2576-7695; e- classifier and 10-fold cross-validation, 75.2% and mail: [email protected]). 57.1% accuracy was achieved with PCDs and Rushikesh Joshi is with the Department of Computer PCDDs respectively. In [2], an automatic raga Science and Engineering, Indian Institute of Technology identification system is described that combines Bombay, Mumbai, 400076, India (phone: +91-22-2576- 7730; e-mail: [email protected]). the use of Hidden Markov Models (HMMs) and Pakad matching to identify ragas. The idea behind using HMMs was that the sequence of notes for a raga is very well defined. Given a certain note, the III. DETAILS OF DATASET transition to another note would have a well For the purpose of our experiments we selected defined probability. Generally each raga has a vocal performances by various artists in four pakad which is a characteristic sequence of notes ragas, namely Desh, Tilak Kamod, Bihag, and that is usually played while performing a raga. Kedar. Desh and Tilak Kamod make use of the Detection of these sequences facilitated the same scale. Similarly Bihag and Kedar have the identification of the raga. The dataset consisted of same scale. For each raga we chose multiple 31 samples from 2 ragas. An overall accuracy was performances each by a different artist. All 87% was achieved. performances were converted to mono channel The authors of [3] have observed that with a sampling rate of 22050Hz, 16 bits per Hindustani vocal music artists are particular about sample. From all these performances, segments in the specific position in which they intone a certain which the artist lingered on notes for some time swara within its pitch interval. They have also without much ornamentation were chosen to be seen that these positions are such that their analyzed. The exact details of the ragas, artists, frequencies are in ratios of small integers. This segment length have been provided in Table I. results in consonance of the swaras. Depending on the sequence in which notes are allowed to be performed in the raga, the artist may have to IV. EXPERIMENTAL METHODOLOGY choose a certain position of a note to ensure Each selected segment was heard for its entire consonance with the previous or next note. This length by a trained musician to confirm that it would also result in different intonations of contained enough information to make it possible certain swaras for ragas that have the same scale to detect the raga that was being performed. but are otherwise distinct. We can safely say that The trained musician pointed out that the raga professional performers would closely adhere to was, in fact, recognised by her within the first 30 s these ratios. In [4], the variation in the frequencies of the segment. We used the entire segment as a of each swara for many ragas has been shown. single token for the purpose of automatic This motivates us to explore information about the identification however. positioning of the pitch of each swara in For each of these segments, the vocal pitch was performances for raga recognition. While the extracted at regular intervals and written to a pitch previous work made use of probability of contour file. These pitch values were used in occurrence of pitches, dyads, sequences of notes conjunction with the tonics (which were manually and the occurrence of pakads, they did not make detected) of the performances to create Folded use of intonation information of each swara. It is Pitch Distributions (FPDs). From these, PCDs our hypothesis that two ragas with the same scale were generated. will differ in the way their notes are intoned. This A. Pitch Extraction will help in classifying ragas which are easily confused while using methods mentioned in The raw audio waveforms of the selected previous studies. segments were passed to the polyphonic melody Most of the quoted previous studies were extractor which detected the pitch of the singing restricted to unaccompanied vocal performances voice. The details of how the pitch was detected specially recorded for the investigations. This was are available in [5]. Pitches were extracted every necessary due to the difficulty of pitch tracking in 20 ms from the range of 100 Hz to 1000 Hz with a polyphonic music (i.e. with accompanying tabla, resolution of 0.01 Hz. The obtained pitch contour tanpura or harmonium as is typical in vocal music was validated by listening to the re-synthesised performances). In the present work, we use a pitch contour. Any vocal detection errors or pitch recently available semi-automatic polyphonic tracking errors were corrected by selecting the melody extraction interface on commercial specific segments of the input audio and running recordings of vocal classical music [5]. the melody extractor with manually adjusted parameters. Accurate pitch contours about the corresponding swara centre assuming an corresponding to the vocal melody were thus equally tempered scale. This means that the first extracted for all the segments in the study. bin was at 0 cents, second at 100 cents, third at We tried further to extract steady note 200 cents and so on. The boundary between two sequences of at least 200 ms duration from the bins was defined as the arithmetic mean of the pitch contour such that the difference between the centre of the two bins in cents. maximum and minimum pitch values of the The PCDs were constructed from tonic aligned continuous sequence within than 50 cents. FPDs as follows. After the bin boundaries were Unfortunately the number of steady sequences defined for the PCD, all the FPD bins which fell extracted was too few for further analysis. A within the boundaries of a PCD bin contributed to larger database along with an experimentally that PCD bin. For example, the bins from 50 to tuned set of parameters (minimum acceptable 149 of the FPD were added to give the value of duration, maximum pitch variation permitted) the 2nd bin of the PCD. Though PCDs give a could help us with an investigation restricted to good summary of the probability of usage of the steady notes. 12 swaras they loose out the finer details about B. Folded Pitch Distributions how they are intoned. A pitch distribution gives the probability of D. Swara Features occurrence of a pitch value over the segment In order to exploit information about the duration. The distribution that we used had bins specific intonation of the swaras, we returned to corresponding to pitches ranging from, 100 Hz to the tonic aligned FPD. First the FPD was divided 1000 Hz with 1 Hz intervals. While generating a into 12 partitions of 100 cents each, such that the pitch distribution for a pitch contour, the first partition was centred about 0 cents. Each probability for a bin corresponding to frequency f partition corresponded to one swara with the first was given by the number of pitch values with the one corresponding to shadj.

Load more